Skip to content
Learn Agentic AI
Learn Agentic AI10 min read6 views

API Error Design for AI Agent Services: Problem Details, Error Codes, and Retry Hints

Design machine-readable API error responses for AI agents using RFC 7807 Problem Details, structured error codes, and retry hints. Build error responses that agents can parse and act on programmatically.

Why Error Design Matters More for AI Agents

When a human encounters an API error, they read the message, understand the context, and decide what to do. An AI agent has none of that intuition. It needs structured, machine-readable error responses that tell it exactly what went wrong, whether to retry, and how long to wait. Poor error design turns every transient failure into a hard failure for autonomous agents.

The best API error format for AI agents follows RFC 7807 (Problem Details for HTTP APIs), augmented with agent-specific fields like retry hints and error taxonomies.

RFC 7807 Problem Details Format

RFC 7807 defines a standard JSON structure for API errors. It includes a type URI for machine identification, a human-readable title and detail, the HTTP status code, and an optional instance URI pointing to the specific occurrence.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart LR
    CLIENT(["Client SDK"])
    GW["API Gateway<br/>auth plus rate limit"]
    APP["FastAPI app<br/>handlers and DI"]
    VAL["Pydantic validation"]
    SVC["Service layer<br/>business logic"]
    DB[(Database)]
    QUEUE[(Background queue)]
    OBS[(Tracing)]
    CLIENT --> GW --> APP --> VAL --> SVC
    SVC --> DB
    SVC --> QUEUE
    SVC --> OBS
    SVC --> CLIENT
    style GW fill:#4f46e5,stroke:#4338ca,color:#fff
    style APP fill:#f59e0b,stroke:#d97706,color:#1f2937
    style DB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from fastapi.exceptions import RequestValidationError
from pydantic import BaseModel

app = FastAPI()

class ProblemDetail(BaseModel):
    type: str
    title: str
    status: int
    detail: str
    instance: str | None = None
    # Agent-specific extensions
    error_code: str | None = None
    retryable: bool = False
    retry_after_seconds: int | None = None

def problem_response(
    status: int,
    error_type: str,
    title: str,
    detail: str,
    error_code: str | None = None,
    retryable: bool = False,
    retry_after: int | None = None,
    instance: str | None = None,
) -> JSONResponse:
    body = ProblemDetail(
        type=f"https://api.example.com/errors/{error_type}",
        title=title,
        status=status,
        detail=detail,
        instance=instance,
        error_code=error_code,
        retryable=retryable,
        retry_after_seconds=retry_after,
    )
    headers = {}
    if retry_after is not None:
        headers["Retry-After"] = str(retry_after)

    return JSONResponse(
        status_code=status,
        content=body.model_dump(exclude_none=True),
        media_type="application/problem+json",
        headers=headers,
    )

Error Taxonomy for AI Agent Services

Define a clear error taxonomy so agents can programmatically classify errors and decide on the appropriate recovery strategy.

class ErrorCodes:
    # Authentication & Authorization
    AUTH_TOKEN_EXPIRED = "auth.token_expired"
    AUTH_TOKEN_INVALID = "auth.token_invalid"
    AUTH_INSUFFICIENT_SCOPE = "auth.insufficient_scope"

    # Rate Limiting
    RATE_LIMIT_EXCEEDED = "rate_limit.exceeded"
    RATE_LIMIT_TOKENS = "rate_limit.token_budget_exceeded"

    # Model Errors
    MODEL_OVERLOADED = "model.overloaded"
    MODEL_NOT_FOUND = "model.not_found"
    MODEL_CONTEXT_LENGTH = "model.context_length_exceeded"

    # Validation
    VALIDATION_FAILED = "validation.failed"
    VALIDATION_CONTENT_FILTER = "validation.content_filter"

    # Resource Errors
    RESOURCE_NOT_FOUND = "resource.not_found"
    RESOURCE_CONFLICT = "resource.conflict"
    RESOURCE_QUOTA_EXCEEDED = "resource.quota_exceeded"

    # Internal
    INTERNAL_ERROR = "internal.error"
    INTERNAL_TIMEOUT = "internal.timeout"

Applying the Error Pattern to Endpoints

Here is how these error responses look in practice across common failure scenarios in an AI agent API.

@app.post("/v1/chat/completions")
async def chat_completions(request: dict):
    model = request.get("model")
    messages = request.get("messages", [])

    if not model:
        return problem_response(
            status=422,
            error_type="validation-error",
            title="Validation Failed",
            detail="The 'model' field is required.",
            error_code=ErrorCodes.VALIDATION_FAILED,
        )

    token_count = estimate_tokens(messages)
    if token_count > 128000:
        return problem_response(
            status=400,
            error_type="context-length-exceeded",
            title="Context Length Exceeded",
            detail=(
                f"Request contains {token_count} tokens, "
                f"exceeding the model maximum of 128000."
            ),
            error_code=ErrorCodes.MODEL_CONTEXT_LENGTH,
        )

    if is_rate_limited(request):
        return problem_response(
            status=429,
            error_type="rate-limit-exceeded",
            title="Rate Limit Exceeded",
            detail="You have exceeded 100 requests per minute.",
            error_code=ErrorCodes.RATE_LIMIT_EXCEEDED,
            retryable=True,
            retry_after=30,
        )

    try:
        result = await call_llm(model, messages)
        return result
    except ModelOverloadedError:
        return problem_response(
            status=503,
            error_type="model-overloaded",
            title="Model Overloaded",
            detail="The model is currently at capacity. Please retry.",
            error_code=ErrorCodes.MODEL_OVERLOADED,
            retryable=True,
            retry_after=5,
        )

Global Exception Handlers

Register global exception handlers to ensure every error follows the Problem Details format, even unhandled exceptions.

@app.exception_handler(RequestValidationError)
async def validation_exception_handler(
    request: Request, exc: RequestValidationError
):
    errors = exc.errors()
    detail_parts = []
    for err in errors:
        field = " -> ".join(str(loc) for loc in err["loc"])
        detail_parts.append(f"{field}: {err['msg']}")

    return problem_response(
        status=422,
        error_type="validation-error",
        title="Request Validation Failed",
        detail="; ".join(detail_parts),
        error_code=ErrorCodes.VALIDATION_FAILED,
    )

@app.exception_handler(Exception)
async def generic_exception_handler(request: Request, exc: Exception):
    # Log the full exception internally
    import logging
    logging.exception("Unhandled exception")

    return problem_response(
        status=500,
        error_type="internal-error",
        title="Internal Server Error",
        detail="An unexpected error occurred. Please retry or contact support.",
        error_code=ErrorCodes.INTERNAL_ERROR,
        retryable=True,
        retry_after=10,
    )

Client-Side Error Handling for Agents

On the agent side, the structured error format enables intelligent retry logic.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

import httpx

async def call_with_retry(url: str, body: dict, max_retries: int = 3):
    for attempt in range(max_retries + 1):
        response = await httpx.AsyncClient().post(url, json=body)

        if response.status_code < 400:
            return response.json()

        error = response.json()
        retryable = error.get("retryable", False)
        retry_after = error.get("retry_after_seconds", 2 ** attempt)

        if not retryable or attempt == max_retries:
            raise AgentAPIError(
                code=error.get("error_code"),
                detail=error.get("detail"),
                status=response.status_code,
            )

        await asyncio.sleep(retry_after)

FAQ

Why use RFC 7807 instead of a custom error format?

RFC 7807 is an IETF standard that most HTTP client libraries and API gateways understand. Using it means your errors work with existing tooling out of the box. The application/problem+json media type signals to clients that the response follows a known structure. You can extend it with custom fields like retryable and error_code without breaking the standard.

How should AI agents decide whether to retry an error?

Agents should check the retryable field first. If true, use the retry_after_seconds value as the delay. If the field is absent, use HTTP status code heuristics: 429 (rate limit) and 503 (service unavailable) are generally retryable; 400, 401, 403, 404, and 422 are not. Always cap retries with a maximum attempt count and total timeout to prevent infinite retry loops.

Should I include stack traces in error responses?

Never in production. Stack traces expose internal implementation details, file paths, library versions, and potentially sensitive data. Log the full stack trace server-side with a correlation ID, and include that correlation ID in the instance field of the Problem Details response so your support team can locate the relevant logs.


#APIErrorDesign #RFC7807 #ErrorHandling #FastAPI #AIAgents #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

AI Strategy

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Q1 2026 saw a record acquisition wave: Aircall bought Vogent (May), Meta acquired Manus and PlayAI, OpenAI closed six deals. The voice AI consolidation phase has begun.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026

The supervisor pattern in LangGraph for coordinating specialist agents, with full code, an eval pipeline that scores routing accuracy, and the failure modes to watch for.