Logging Best Practices for AI Agents: Structured Logs for Debugging and Audit
Implement structured logging for AI agent systems with correlation IDs, log levels, sensitive data redaction, and queryable JSON output that makes debugging production agent issues fast and audit-ready.
Why Standard Logging Falls Short for Agents
A typical web application logs a request, processes it, and logs a response. An AI agent might process a single user message through five or more steps: prompt construction, memory retrieval, LLM inference, tool calls, response validation, and memory storage. Each step can fail independently, and the failure modes are fundamentally different from traditional applications — an LLM might return a valid HTTP 200 response that contains completely wrong instructions for a tool call.
Standard print() statements or unstructured log lines make it nearly impossible to reconstruct what happened during a conversation. Structured logging with correlation IDs, consistent fields, and sensitive data redaction transforms your logs from a wall of text into a queryable debugging and audit system.
Setting Up Structured Logging with structlog
The structlog library produces JSON log lines with consistent fields that are easy to parse and query in log aggregation tools like Elasticsearch, Loki, or CloudWatch.
flowchart TD
Q{"What matters most<br/>for your team?"}
DIM1["Time to first<br/>production deploy"]
DIM2["Total cost of<br/>ownership at scale"]
DIM3["Debuggability and<br/>observability"]
DIM4["Ecosystem and<br/>community support"]
PICK{Score the<br/>four axes}
A(["Pick<br/>Option A"])
B(["Pick<br/>Option B"])
Q --> DIM1 --> PICK
Q --> DIM2 --> PICK
Q --> DIM3 --> PICK
Q --> DIM4 --> PICK
PICK -->|Speed and ecosystem| A
PICK -->|Control and TCO| B
style Q fill:#4f46e5,stroke:#4338ca,color:#fff
style PICK fill:#f59e0b,stroke:#d97706,color:#1f2937
style A fill:#0ea5e9,stroke:#0369a1,color:#fff
style B fill:#059669,stroke:#047857,color:#fff
import structlog
import uuid
structlog.configure(
processors=[
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.add_log_level,
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.BoundLogger,
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
)
def get_logger(agent_name: str, conversation_id: str = None):
"""Create a logger bound with agent context."""
if conversation_id is None:
conversation_id = str(uuid.uuid4())
return structlog.get_logger().bind(
agent_name=agent_name,
conversation_id=conversation_id,
)
Every log line produced by this logger automatically includes the agent name, conversation ID, timestamp, and log level — all as structured JSON fields.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Correlation IDs Across Agent Steps
A single conversation generates logs across multiple functions and sometimes multiple services. Bind a conversation ID at the start and pass the logger through each step so every log line is linked.
async def handle_conversation(user_message: str, user_id: str):
conversation_id = str(uuid.uuid4())
log = get_logger("support-agent", conversation_id).bind(user_id=user_id)
log.info("conversation_started", message_length=len(user_message))
# Memory retrieval
log.info("memory_retrieval_started")
memories = await retrieve_memories(user_message)
log.info("memory_retrieval_completed", results_count=len(memories))
# LLM call
log.info("llm_call_started", model="gpt-4o")
response = await call_llm(user_message, memories)
log.info(
"llm_call_completed",
model="gpt-4o",
prompt_tokens=response.usage.prompt_tokens,
completion_tokens=response.usage.completion_tokens,
finish_reason=response.choices[0].finish_reason,
)
# Tool execution
if response.tool_calls:
for tool_call in response.tool_calls:
log.info(
"tool_call_started",
tool_name=tool_call.function.name,
)
try:
result = await execute_tool(tool_call)
log.info("tool_call_completed", tool_name=tool_call.function.name)
except Exception as e:
log.error(
"tool_call_failed",
tool_name=tool_call.function.name,
error=str(e),
)
raise
log.info("conversation_completed")
return response.content
The resulting log output looks like this — every line shares the same conversation_id, making it trivial to filter in your log aggregation tool:
{"event": "conversation_started", "agent_name": "support-agent", "conversation_id": "a1b2c3d4...", "user_id": "user_789", "message_length": 142, "level": "info", "timestamp": "2026-03-17T10:30:00Z"}
{"event": "llm_call_completed", "agent_name": "support-agent", "conversation_id": "a1b2c3d4...", "model": "gpt-4o", "prompt_tokens": 1250, "completion_tokens": 340, "level": "info", "timestamp": "2026-03-17T10:30:02Z"}
Redacting Sensitive Data
Agent logs often contain user messages, PII, or API keys embedded in tool call arguments. Build a redaction processor that strips sensitive fields before they hit your log backend.
import re
SENSITIVE_PATTERNS = {
"email": re.compile(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"),
"phone": re.compile(r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b"),
"ssn": re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),
"api_key": re.compile(r"(sk-|pk-|api[_-]?key[=:]\s*)[a-zA-Z0-9]{20,}"),
}
def redact_sensitive_data(logger, method_name, event_dict):
"""structlog processor that redacts PII from log values."""
for key, value in event_dict.items():
if isinstance(value, str):
for pattern_name, pattern in SENSITIVE_PATTERNS.items():
value = pattern.sub(f"[REDACTED_{pattern_name.upper()}]", value)
event_dict[key] = value
return event_dict
# Add to structlog processors list before JSONRenderer
structlog.configure(
processors=[
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.add_log_level,
redact_sensitive_data, # Runs before serialization
structlog.processors.JSONRenderer(),
],
)
Choosing Log Levels for Agent Events
Use consistent log levels across your agent codebase. A clear convention prevents important signals from being buried in noise.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
| Level | When to Use |
|---|---|
| DEBUG | Prompt contents, full LLM responses, tool arguments |
| INFO | Step start/completion, token counts, conversation lifecycle |
| WARNING | Retries, fallback model usage, slow LLM responses |
| ERROR | Tool failures, LLM errors, validation failures |
| CRITICAL | Agent loop crashes, data corruption, auth failures |
In production, set the level to INFO and enable DEBUG only when actively investigating an issue. This keeps log volume manageable while preserving enough context for post-incident analysis.
FAQ
Should I log the full LLM prompt and response?
Log full prompts and responses at DEBUG level only. At INFO level, log metadata like token counts, model name, and finish reason. Full prompts can contain PII and consume significant storage — a single conversation might generate megabytes of prompt text. For audit scenarios, consider writing full prompts to a separate, access-controlled store with shorter retention.
How do I correlate logs across multiple agents in a multi-agent system?
Use two IDs: a conversation_id that is unique per user conversation and a trace_id that follows the request across agent handoffs. When your triage agent calls a specialist agent, pass both IDs in the request. This lets you filter by conversation to see the full user interaction or by trace to see the technical execution path.
What log aggregation tools work best for agent logs?
Any tool that supports structured JSON logs works well. Grafana Loki is lightweight and integrates directly with Grafana dashboards. Elasticsearch with Kibana provides powerful full-text search across log fields. For cloud-native setups, AWS CloudWatch Logs Insights or Google Cloud Logging both support JSON field queries natively.
#Logging #StructuredLogging #Debugging #Audit #AIAgents #AgenticAI #LearnAI #AIEngineering
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.