API Error Design for AI Agent Services: Problem Details, Error Codes, and Retry Hints

Why Error Design Matters More for AI Agents

When a human encounters an API error, they read the message, understand the context, and decide what to do. An AI agent has none of that intuition. It needs structured, machine-readable error responses that tell it exactly what went wrong, whether to retry, and how long to wait. Poor error design turns every transient failure into a hard failure for autonomous agents.

The best API error format for AI agents follows RFC 7807 (Problem Details for HTTP APIs), augmented with agent-specific fields like retry hints and error taxonomies.

RFC 7807 Problem Details Format

RFC 7807 defines a standard JSON structure for API errors. It includes a type URI for machine identification, a human-readable title and detail, the HTTP status code, and an optional instance URI pointing to the specific occurrence.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    CLIENT(["Client SDK"])
    GW["API Gateway<br/>auth plus rate limit"]
    APP["FastAPI app<br/>handlers and DI"]
    VAL["Pydantic validation"]
    SVC["Service layer<br/>business logic"]
    DB[(Database)]
    QUEUE[(Background queue)]
    OBS[(Tracing)]
    CLIENT --> GW --> APP --> VAL --> SVC
    SVC --> DB
    SVC --> QUEUE
    SVC --> OBS
    SVC --> CLIENT
    style GW fill:#4f46e5,stroke:#4338ca,color:#fff
    style APP fill:#f59e0b,stroke:#d97706,color:#1f2937
    style DB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from fastapi.exceptions import RequestValidationError
from pydantic import BaseModel

app = FastAPI()

class ProblemDetail(BaseModel):
    type: str
    title: str
    status: int
    detail: str
    instance: str | None = None
    # Agent-specific extensions
    error_code: str | None = None
    retryable: bool = False
    retry_after_seconds: int | None = None

def problem_response(
    status: int,
    error_type: str,
    title: str,
    detail: str,
    error_code: str | None = None,
    retryable: bool = False,
    retry_after: int | None = None,
    instance: str | None = None,
) -> JSONResponse:
    body = ProblemDetail(
        type=f"https://api.example.com/errors/{error_type}",
        title=title,
        status=status,
        detail=detail,
        instance=instance,
        error_code=error_code,
        retryable=retryable,
        retry_after_seconds=retry_after,
    )
    headers = {}
    if retry_after is not None:
        headers["Retry-After"] = str(retry_after)

    return JSONResponse(
        status_code=status,
        content=body.model_dump(exclude_none=True),
        media_type="application/problem+json",
        headers=headers,
    )

Error Taxonomy for AI Agent Services

Define a clear error taxonomy so agents can programmatically classify errors and decide on the appropriate recovery strategy.

class ErrorCodes:
    # Authentication & Authorization
    AUTH_TOKEN_EXPIRED = "auth.token_expired"
    AUTH_TOKEN_INVALID = "auth.token_invalid"
    AUTH_INSUFFICIENT_SCOPE = "auth.insufficient_scope"

    # Rate Limiting
    RATE_LIMIT_EXCEEDED = "rate_limit.exceeded"
    RATE_LIMIT_TOKENS = "rate_limit.token_budget_exceeded"

    # Model Errors
    MODEL_OVERLOADED = "model.overloaded"
    MODEL_NOT_FOUND = "model.not_found"
    MODEL_CONTEXT_LENGTH = "model.context_length_exceeded"

    # Validation
    VALIDATION_FAILED = "validation.failed"
    VALIDATION_CONTENT_FILTER = "validation.content_filter"

    # Resource Errors
    RESOURCE_NOT_FOUND = "resource.not_found"
    RESOURCE_CONFLICT = "resource.conflict"
    RESOURCE_QUOTA_EXCEEDED = "resource.quota_exceeded"

    # Internal
    INTERNAL_ERROR = "internal.error"
    INTERNAL_TIMEOUT = "internal.timeout"

Applying the Error Pattern to Endpoints

Here is how these error responses look in practice across common failure scenarios in an AI agent API.

@app.post("/v1/chat/completions")
async def chat_completions(request: dict):
    model = request.get("model")
    messages = request.get("messages", [])

    if not model:
        return problem_response(
            status=422,
            error_type="validation-error",
            title="Validation Failed",
            detail="The 'model' field is required.",
            error_code=ErrorCodes.VALIDATION_FAILED,
        )

    token_count = estimate_tokens(messages)
    if token_count > 128000:
        return problem_response(
            status=400,
            error_type="context-length-exceeded",
            title="Context Length Exceeded",
            detail=(
                f"Request contains {token_count} tokens, "
                f"exceeding the model maximum of 128000."
            ),
            error_code=ErrorCodes.MODEL_CONTEXT_LENGTH,
        )

    if is_rate_limited(request):
        return problem_response(
            status=429,
            error_type="rate-limit-exceeded",
            title="Rate Limit Exceeded",
            detail="You have exceeded 100 requests per minute.",
            error_code=ErrorCodes.RATE_LIMIT_EXCEEDED,
            retryable=True,
            retry_after=30,
        )

    try:
        result = await call_llm(model, messages)
        return result
    except ModelOverloadedError:
        return problem_response(
            status=503,
            error_type="model-overloaded",
            title="Model Overloaded",
            detail="The model is currently at capacity. Please retry.",
            error_code=ErrorCodes.MODEL_OVERLOADED,
            retryable=True,
            retry_after=5,
        )

Global Exception Handlers

@app.exception_handler(RequestValidationError)
async def validation_exception_handler(
    request: Request, exc: RequestValidationError
):
    errors = exc.errors()
    detail_parts = []
    for err in errors:
        field = " -> ".join(str(loc) for loc in err["loc"])
        detail_parts.append(f"{field}: {err['msg']}")

    return problem_response(
        status=422,
        error_type="validation-error",
        title="Request Validation Failed",
        detail="; ".join(detail_parts),
        error_code=ErrorCodes.VALIDATION_FAILED,
    )

@app.exception_handler(Exception)
async def generic_exception_handler(request: Request, exc: Exception):
    # Log the full exception internally
    import logging
    logging.exception("Unhandled exception")

    return problem_response(
        status=500,
        error_type="internal-error",
        title="Internal Server Error",
        detail="An unexpected error occurred. Please retry or contact support.",
        error_code=ErrorCodes.INTERNAL_ERROR,
        retryable=True,
        retry_after=10,
    )

Client-Side Error Handling for Agents

On the agent side, the structured error format enables intelligent retry logic.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

import httpx

async def call_with_retry(url: str, body: dict, max_retries: int = 3):
    for attempt in range(max_retries + 1):
        response = await httpx.AsyncClient().post(url, json=body)

        if response.status_code < 400:
            return response.json()

        error = response.json()
        retryable = error.get("retryable", False)
        retry_after = error.get("retry_after_seconds", 2 ** attempt)

        if not retryable or attempt == max_retries:
            raise AgentAPIError(
                code=error.get("error_code"),
                detail=error.get("detail"),
                status=response.status_code,
            )

        await asyncio.sleep(retry_after)

FAQ

Why use RFC 7807 instead of a custom error format?

RFC 7807 is an IETF standard that most HTTP client libraries and API gateways understand. Using it means your errors work with existing tooling out of the box. The application/problem+json media type signals to clients that the response follows a known structure. You can extend it with custom fields like retryable and error_code without breaking the standard.

How should AI agents decide whether to retry an error?

Agents should check the retryable field first. If true, use the retry_after_seconds value as the delay. If the field is absent, use HTTP status code heuristics: 429 (rate limit) and 503 (service unavailable) are generally retryable; 400, 401, 403, 404, and 422 are not. Always cap retries with a maximum attempt count and total timeout to prevent infinite retry loops.

Should I include stack traces in error responses?

Never in production. Stack traces expose internal implementation details, file paths, library versions, and potentially sensitive data. Log the full stack trace server-side with a correlation ID, and include that correlation ID in the instance field of the Problem Details response so your support team can locate the relevant logs.

#APIErrorDesign #RFC7807 #ErrorHandling #FastAPI #AIAgents #AgenticAI #LearnAI #AIEngineering

API Error Design for AI Agent Services: Problem Details, Error Codes, and Retry Hints

Why Error Design Matters More for AI Agents

RFC 7807 Problem Details Format

Error Taxonomy for AI Agent Services

Applying the Error Pattern to Endpoints

Global Exception Handlers

Client-Side Error Handling for Agents

FAQ

Why use RFC 7807 instead of a custom error format?

How should AI agents decide whether to retry an error?

Should I include stack traces in error responses?

Try CallSphere AI Voice Agents

Related Articles You May Like

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026