The Privacy Problem with Agent Traces

Agent traces are invaluable for debugging and monitoring, but they create a serious privacy challenge. A typical trace captures the full text of every LLM input and output, every tool argument and return value, and every piece of context passed between agents. In a healthcare chatbot, that means patient symptoms and medical history flowing into your trace storage. In a financial advisor agent, that means account numbers and transaction details. In a customer support agent, that means email addresses, phone numbers, and complaint details.

Storing this data in trace backends — especially third-party platforms — can violate GDPR, HIPAA, CCPA, and SOC 2 requirements. The OpenAI Agents SDK provides built-in controls to manage what sensitive data appears in traces, but using these controls effectively requires understanding the full picture.

The trace_include_sensitive_data Flag

The primary control mechanism is the trace_include_sensitive_data setting on RunConfig. When set to False, the SDK strips LLM inputs, LLM outputs, and tool arguments from trace spans. The structural data — span names, types, durations, and hierarchy — remains intact.

flowchart LR
    APP(["Agent or API"])
    SDK["OTel SDK<br/>GenAI conventions"]
    COL["OTel Collector"]
    subgraph BACKENDS["Backends"]
        TR[("Traces<br/>Tempo or Honeycomb")]
        MET[("Metrics<br/>Prometheus")]
        LOG[("Logs<br/>Loki or ELK")]
    end
    DASH["Grafana plus alerts"]
    PAGE(["Pager"])
    APP --> SDK --> COL
    COL --> TR
    COL --> MET
    COL --> LOG
    TR --> DASH
    MET --> DASH
    LOG --> DASH
    DASH --> PAGE
    style SDK fill:#4f46e5,stroke:#4338ca,color:#fff
    style DASH fill:#f59e0b,stroke:#d97706,color:#1f2937
    style PAGE fill:#dc2626,stroke:#b91c1c,color:#fff

from agents import Agent, Runner, RunConfig

agent = Agent(
    name="Medical Triage Agent",
    instructions="You help patients describe symptoms and recommend next steps.",
)

# Traces will NOT include message content or tool arguments
result = await Runner.run(
    agent,
    "I have been experiencing chest pain and shortness of breath for 3 days",
    run_config=RunConfig(trace_include_sensitive_data=False),
)

With this flag disabled, the trace still shows:

An agent_span for "Medical Triage Agent" with its duration
Generation spans showing the model name, token counts, and latency
Function spans showing tool names and durations

But the actual patient message, the model's response, and any tool arguments containing patient data are omitted.

Environment Variable Control

For production deployments, you typically want a global default rather than per-request configuration. The SDK respects an environment variable that sets the default behavior:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

# In your deployment configuration (Dockerfile, k8s manifest, .env)
export OPENAI_AGENTS_DISABLE_TRACING_SENSITIVE_DATA=1

When this variable is set, all traces default to excluding sensitive data. Individual runs can still override:

# This specific run WILL include sensitive data despite the env var
result = await Runner.run(
    agent,
    query,
    run_config=RunConfig(trace_include_sensitive_data=True),
)

This pattern is useful for debugging: keep sensitive data disabled globally, but enable it temporarily for specific traces you need to inspect during an incident.

Selective Data Redaction

The binary flag is a blunt instrument. Sometimes you need traces that include some data but redact specific fields. The SDK does not provide field-level redaction natively, but you can implement it with a trace processor:

import re
from agents.tracing import TracingProcessor, Span

REDACTION_PATTERNS = [
    (re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}\b"), "[EMAIL_REDACTED]"),
    (re.compile(r"\bd{3}[-.]?d{3}[-.]?d{4}\b"), "[PHONE_REDACTED]"),
    (re.compile(r"\bd{3}-d{2}-d{4}\b"), "[SSN_REDACTED]"),
    (re.compile(r"\bd{13,19}\b"), "[CARD_REDACTED]"),
    (re.compile(r"\b[A-Z]{2}d{2}[A-Z0-9]{11,30}\b"), "[IBAN_REDACTED]"),
]

class RedactingTraceProcessor(TracingProcessor):
    def _redact(self, text: str) -> str:
        if not isinstance(text, str):
            return text
        for pattern, replacement in REDACTION_PATTERNS:
            text = pattern.sub(replacement, text)
        return text

    def _redact_dict(self, data: dict) -> dict:
        redacted = {}
        for key, value in data.items():
            if isinstance(value, str):
                redacted[key] = self._redact(value)
            elif isinstance(value, dict):
                redacted[key] = self._redact_dict(value)
            elif isinstance(value, list):
                redacted[key] = [
                    self._redact(item) if isinstance(item, str) else item
                    for item in value
                ]
            else:
                redacted[key] = value
        return redacted

    def on_span_end(self, span: Span) -> None:
        if span.data:
            span.data = self._redact_dict(span.data)

    async def shutdown(self) -> None:
        pass

Register this processor, and every span's data payload is scrubbed of emails, phone numbers, SSNs, and card numbers before being sent to any downstream exporter. The redaction happens in-process before data leaves your infrastructure.

GDPR imposes specific requirements that affect how you design agent tracing:

Right to Erasure (Article 17) — Users can request deletion of their personal data. If your traces contain personal data, you must be able to find and delete traces associated with a specific user.

Data Minimization (Article 5) — You should only collect data that is necessary for the stated purpose. Storing full conversation transcripts in traces when you only need latency metrics violates this principle.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Purpose Limitation — Data collected for debugging cannot be repurposed for training or analytics without explicit consent.

Here is a GDPR-compliant tracing architecture:

import hashlib
from agents.tracing import TracingProcessor, Trace, Span

class GDPRCompliantProcessor(TracingProcessor):
    def __init__(self, metrics_backend, audit_backend):
        self.metrics = metrics_backend
        self.audit = audit_backend

    def _pseudonymize_user(self, user_id: str) -> str:
        """One-way hash for trace correlation without storing real IDs."""
        return hashlib.sha256(
            f"{user_id}:trace-salt-rotate-monthly".encode()
        ).hexdigest()[:16]

    def on_trace_start(self, trace: Trace) -> None:
        user_id = (trace.metadata or {}).get("user_id")
        if user_id:
            trace.metadata["user_id"] = self._pseudonymize_user(user_id)

    def on_span_end(self, span: Span) -> None:
        # Only export structural and performance data
        self.metrics.record({
            "trace_id": span.trace_id,
            "span_type": span.span_type,
            "span_name": span.name,
            "duration_ms": (span.end_time - span.start_time).total_seconds() * 1000,
            "tokens": span.data.get("total_tokens", 0) if span.data else 0,
        })

    def on_trace_end(self, trace: Trace) -> None:
        # Audit log records that a trace happened, not what it contained
        self.audit.log({
            "trace_id": trace.trace_id,
            "workflow": trace.name,
            "timestamp": trace.end_time.isoformat(),
            "pseudonymized_user": (trace.metadata or {}).get("user_id"),
        })

    async def shutdown(self) -> None:
        pass

This processor exports performance metrics and audit records without any personal data. The pseudonymized user ID allows you to correlate traces for a single user (for debugging patterns) without being able to identify who the user is.

Implementing Data Retention Policies

Traces should not live forever. Implement retention policies that automatically purge old trace data:

from datetime import datetime, timedelta

class RetentionAwareProcessor(TracingProcessor):
    def __init__(self, storage_backend, retention_days: int = 30):
        self.storage = storage_backend
        self.retention_days = retention_days

    def on_trace_end(self, trace: Trace) -> None:
        self.storage.store(
            trace_id=trace.trace_id,
            data=self._extract_safe_data(trace),
            ttl=timedelta(days=self.retention_days),
        )

    def _extract_safe_data(self, trace: Trace) -> dict:
        return {
            "trace_id": trace.trace_id,
            "workflow": trace.name,
            "duration_ms": trace.duration_ms,
            "span_count": trace.span_count,
            "created_at": datetime.utcnow().isoformat(),
            "expires_at": (
                datetime.utcnow() + timedelta(days=self.retention_days)
            ).isoformat(),
        }

    async def shutdown(self) -> None:
        pass

Layered Privacy Architecture

The most robust approach combines multiple strategies:

from agents import add_trace_processor

# Layer 1: Global sensitive data exclusion
# Set OPENAI_AGENTS_DISABLE_TRACING_SENSITIVE_DATA=1 in environment

# Layer 2: Pattern-based redaction for any data that slips through
add_trace_processor(RedactingTraceProcessor())

# Layer 3: GDPR-compliant export with pseudonymization
add_trace_processor(GDPRCompliantProcessor(metrics_db, audit_log))

# Layer 4: Time-bounded retention
add_trace_processor(RetentionAwareProcessor(trace_store, retention_days=30))

Each layer catches what the previous one might miss. The environment variable prevents sensitive data from entering traces at the SDK level. The redacting processor catches any PII that enters through custom spans. The GDPR processor pseudonymizes identifiers and strips content. The retention processor ensures nothing persists beyond its useful life.

Testing Your Privacy Controls

Privacy controls are only effective if they are verified. Write tests that confirm redaction works:

import pytest
from unittest.mock import MagicMock

def test_email_redaction():
    processor = RedactingTraceProcessor()
    span = MagicMock()
    span.data = {"input": "Contact me at [email protected] for details"}
    processor.on_span_end(span)
    assert "[email protected]" not in str(span.data)
    assert "[EMAIL_REDACTED]" in span.data["input"]

def test_sensitive_data_flag_excludes_content():
    result = Runner.run_sync(
        agent,
        "My SSN is 123-45-6789",
        run_config=RunConfig(trace_include_sensitive_data=False),
    )
    # Verify trace spans do not contain the input
    for span in collected_spans:
        assert "123-45-6789" not in str(span.data)

Sensitive data handling is not an afterthought — it is a prerequisite for production tracing. Build your privacy controls before you deploy, not after a compliance audit discovers patient data in your Langfuse dashboard.

Sensitive Data Handling in Agent Traces

The Privacy Problem with Agent Traces

The trace_include_sensitive_data Flag

Environment Variable Control

Selective Data Redaction

Implementing Data Retention Policies

Layered Privacy Architecture

Testing Your Privacy Controls

Try CallSphere AI Voice Agents

Related Articles You May Like

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

The Agent Evaluation Stack in 2026: From Trace to Eval Score

Agent Tracing 101: Spans, Sessions, and the Hidden Failure Modes They Reveal

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

OpenAI revenue run-rate — April 2026 read — April 2026 update

Stargate progress update — April 2026 site and capex

The Privacy Problem with Agent Traces

The trace_include_sensitive_data Flag

Environment Variable Control

Selective Data Redaction

GDPR Compliance Strategy

Implementing Data Retention Policies

Layered Privacy Architecture

Testing Your Privacy Controls

Try CallSphere AI Voice Agents

Related Articles You May Like

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

The Agent Evaluation Stack in 2026: From Trace to Eval Score

Agent Tracing 101: Spans, Sessions, and the Hidden Failure Modes They Reveal

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

OpenAI revenue run-rate — April 2026 read — April 2026 update

Stargate progress update — April 2026 site and capex