Data Loss Prevention for AI Agents: Preventing Sensitive Data Leakage
Implement data loss prevention policies for AI agents that detect and block sensitive data in prompts and responses. Covers DLP policy engines, content scanning with regex and NER, blocking rules, and exception handling workflows.
The Unique DLP Challenge with AI Agents
Traditional DLP systems monitor file transfers, email attachments, and database exports. AI agents create a new exfiltration vector that bypasses all of these controls. An employee can paste a customer list into an agent prompt, ask it to summarize financial data from a confidential document, or instruct it to email internal metrics to an external address.
The risk is bidirectional. Sensitive data can leak into the agent (through prompts and tool inputs) and out of the agent (through responses, tool calls, and downstream API calls). A comprehensive DLP strategy must scan both directions.
Building a DLP Scanner
The scanner inspects text for patterns that match sensitive data categories: personally identifiable information, financial data, health records, credentials, and proprietary business data.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart LR
REQ(["Inbound request"])
PII["PII detection<br/>regex plus NER"]
POL{"Policy engine<br/>OPA or rules"}
REDACT["Redact or mask"]
LLM["LLM call"]
OUT["Response"]
AUDIT[("Append only<br/>audit log")]
BLOCK(["Block plus<br/>notify DPO"])
REQ --> PII --> POL
POL -->|Allow| REDACT --> LLM --> OUT --> AUDIT
POL -->|Deny| BLOCK
style POL fill:#4f46e5,stroke:#4338ca,color:#fff
style AUDIT fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
style OUT fill:#059669,stroke:#047857,color:#fff
import re
from dataclasses import dataclass
from enum import Enum
class Sensitivity(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class Action(str, Enum):
ALLOW = "allow"
WARN = "warn"
REDACT = "redact"
BLOCK = "block"
@dataclass
class DLPRule:
name: str
pattern: re.Pattern
sensitivity: Sensitivity
action: Action
description: str
DLP_RULES = [
DLPRule(
name="ssn",
pattern=re.compile(r"d{3}-d{2}-d{4}"),
sensitivity=Sensitivity.CRITICAL,
action=Action.BLOCK,
description="US Social Security Number",
),
DLPRule(
name="credit_card",
pattern=re.compile(r"(?:d{4}[- ]?){3}d{4}"),
sensitivity=Sensitivity.CRITICAL,
action=Action.BLOCK,
description="Credit card number",
),
DLPRule(
name="email_address",
pattern=re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}"),
sensitivity=Sensitivity.MEDIUM,
action=Action.WARN,
description="Email address",
),
DLPRule(
name="api_key",
pattern=re.compile(r"(?:sk|pk|api)[_-][A-Za-z0-9]{20,}"),
sensitivity=Sensitivity.CRITICAL,
action=Action.BLOCK,
description="API key or secret",
),
DLPRule(
name="aws_access_key",
pattern=re.compile(r"AKIA[0-9A-Z]{16}"),
sensitivity=Sensitivity.CRITICAL,
action=Action.BLOCK,
description="AWS access key ID",
),
]
@dataclass
class ScanResult:
rule_name: str
matched_text: str
action: Action
sensitivity: Sensitivity
position: tuple[int, int]
class DLPScanner:
def __init__(self, rules: list[DLPRule]):
self.rules = rules
def scan(self, text: str) -> list[ScanResult]:
findings = []
for rule in self.rules:
for match in rule.pattern.finditer(text):
findings.append(ScanResult(
rule_name=rule.name,
matched_text=match.group(),
action=rule.action,
sensitivity=rule.sensitivity,
position=(match.start(), match.end()),
))
return findings
def redact(self, text: str) -> str:
findings = sorted(self.scan(text), key=lambda f: f.position[0], reverse=True)
for finding in findings:
if finding.action in (Action.REDACT, Action.BLOCK):
start, end = finding.position
placeholder = f"[{finding.rule_name.upper()}_REDACTED]"
text = text[:start] + placeholder + text[end:]
return text
Integrating DLP Into the Agent Pipeline
The scanner runs at two points: when the user submits a prompt (inbound DLP) and when the agent generates a response or invokes a tool (outbound DLP). The gateway from the previous post is the ideal integration point.
from fastapi import HTTPException
class DLPMiddleware:
def __init__(self, scanner: DLPScanner, audit_logger):
self.scanner = scanner
self.audit = audit_logger
async def check_inbound(self, user_id: str, agent_id: str, text: str) -> str:
findings = self.scanner.scan(text)
if not findings:
return text
blocked = [f for f in findings if f.action == Action.BLOCK]
if blocked:
await self.audit.log_dlp_violation(
user_id=user_id,
agent_id=agent_id,
direction="inbound",
findings=[f.__dict__ for f in blocked],
)
raise HTTPException(
status_code=422,
detail=(
"Your message contains sensitive data that cannot "
"be processed. Please remove: "
+ ", ".join(f.rule_name for f in blocked)
),
)
warnings = [f for f in findings if f.action == Action.WARN]
if warnings:
await self.audit.log_dlp_warning(
user_id=user_id, agent_id=agent_id,
direction="inbound", findings=[f.__dict__ for f in warnings],
)
redactable = [f for f in findings if f.action == Action.REDACT]
if redactable:
text = self.scanner.redact(text)
return text
async def check_outbound(self, agent_id: str, text: str) -> str:
findings = self.scanner.scan(text)
blocked = [f for f in findings if f.action == Action.BLOCK]
if blocked:
await self.audit.log_dlp_violation(
user_id="system", agent_id=agent_id,
direction="outbound",
findings=[f.__dict__ for f in blocked],
)
return self.scanner.redact(text)
return text
Named Entity Recognition for Context-Aware DLP
Regex catches formatted patterns like SSNs and credit card numbers. But sensitive data also appears as unstructured text: "John Smith's salary is $185,000" or "the patient was diagnosed with diabetes." Use NER models to detect person names, monetary values, medical terms, and organization names, then apply policies based on the entity type and the agent's data access level.
Exception Handling and Override Workflows
Not every match is a real violation. An agent discussing credit card processing might legitimately reference card number formats. Build an exception workflow where authorized users can request a DLP bypass for specific use cases. Each exception is logged, time-limited, and requires approval from a data steward.
FAQ
How do you handle DLP for agents that process documents and images?
For documents, extract text before scanning. For images, use OCR to extract visible text and scan the result. Also scan document metadata, which can contain author names, revision history, and internal file paths. For agents that generate images, implement a separate content moderation pipeline that checks for watermarks, logos, or embedded text containing sensitive data.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Does DLP scanning add noticeable latency to agent responses?
Regex-based scanning adds less than a millisecond for typical prompt sizes. NER-based scanning adds 10 to 50 milliseconds depending on the model and text length. This is negligible compared to LLM inference time. Run DLP scanning concurrently with other pre-processing steps to minimize any impact.
How do you keep DLP rules updated as new sensitive data patterns emerge?
Maintain DLP rules in a versioned configuration store, not in application code. Platform security teams update rules through the admin dashboard. New rules take effect immediately without redeploying the gateway. Run new rules in "audit only" mode for a week before enabling blocking, so you can tune false positive rates.
#EnterpriseAI #DLP #DataSecurity #Compliance #Privacy #ContentScanning #AgenticAI #LearnAI #AIEngineering
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.