Skip to content
Learn Agentic AI
Learn Agentic AI14 min read8 views

Secure API Gateway for AI Agents: Kong, Traefik, and Custom Gateway Patterns

Set up a secure API gateway for AI agent systems using Kong, Traefik, and custom FastAPI patterns. Covers authentication plugins, rate limiting, request transformation, and routing strategies.

Why AI Agent Platforms Need an API Gateway

An API gateway is a single entry point that sits in front of your AI agent services and handles cross-cutting concerns: authentication, rate limiting, request routing, logging, and protocol translation. Without a gateway, every agent service must independently implement these concerns, leading to inconsistency and duplicated security logic.

For AI agent platforms specifically, a gateway provides three critical capabilities: it enforces rate limits to prevent a single tenant from exhausting GPU resources, it routes requests to different agent versions for A/B testing, and it transforms requests between the public API format and the internal service format.

Gateway Architecture for Multi-Agent Systems

A typical architecture places the gateway between the public internet and your internal agent services:

flowchart LR
    CLIENT(["Client SDK"])
    GW["API Gateway<br/>auth plus rate limit"]
    APP["FastAPI app<br/>handlers and DI"]
    VAL["Pydantic validation"]
    SVC["Service layer<br/>business logic"]
    DB[(Database)]
    QUEUE[(Background queue)]
    OBS[(Tracing)]
    CLIENT --> GW --> APP --> VAL --> SVC
    SVC --> DB
    SVC --> QUEUE
    SVC --> OBS
    SVC --> CLIENT
    style GW fill:#4f46e5,stroke:#4338ca,color:#fff
    style APP fill:#f59e0b,stroke:#d97706,color:#1f2937
    style DB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
Client --> API Gateway --> Triage Agent --> Research Agent
                      --> Tool Executor
                      --> Conversation Service
                      --> Billing Service

The gateway handles TLS termination, authentication, rate limiting, and routing. Internal services communicate via mTLS or service tokens as discussed in previous posts.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Kong Gateway Configuration

Kong is a widely deployed API gateway with a rich plugin ecosystem. Configure it for an AI agent platform using its declarative YAML format:

# kong.yml
_format_version: "3.0"

services:
  - name: agent-api
    url: http://agent-service:8000
    routes:
      - name: agent-routes
        paths:
          - /api/agents
        strip_path: false
    plugins:
      - name: jwt
        config:
          claims_to_verify:
            - exp
          header_names:
            - Authorization
      - name: rate-limiting
        config:
          minute: 60
          hour: 1000
          policy: redis
          redis_host: redis
          redis_port: 6379
      - name: request-transformer
        config:
          add:
            headers:
              - "X-Gateway-Request-Id:$(uuid())"
              - "X-Gateway-Timestamp:$(now())"
      - name: cors
        config:
          origins:
            - "https://app.example.com"
          methods:
            - GET
            - POST
            - PUT
            - DELETE
          headers:
            - Authorization
            - Content-Type
            - X-Session-Id
          max_age: 3600

Traefik Configuration for Kubernetes

Traefik integrates natively with Kubernetes through IngressRoute custom resources, making it a natural choice for agent platforms running on K8s:

# traefik-ingress.yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: agent-api
  namespace: ai-agents
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`api.agents.example.com`) && PathPrefix(`/api/agents`)
      kind: Rule
      services:
        - name: agent-service
          port: 8000
      middlewares:
        - name: agent-auth
        - name: agent-rate-limit
        - name: agent-headers
  tls:
    certResolver: letsencrypt
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: agent-rate-limit
  namespace: ai-agents
spec:
  rateLimit:
    average: 60
    burst: 20
    period: 1m
    sourceCriterion:
      requestHeaderName: X-API-Key
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: agent-headers
  namespace: ai-agents
spec:
  headers:
    customRequestHeaders:
      X-Gateway: "traefik"
    customResponseHeaders:
      X-Content-Type-Options: "nosniff"
      X-Frame-Options: "DENY"
      Strict-Transport-Security: "max-age=31536000; includeSubDomains"

Building a Custom FastAPI Gateway

For full control, build a lightweight gateway directly in FastAPI. This is ideal when your routing logic depends on request content (like routing to different agent versions based on the model parameter):

# gateway/main.py
import time
import uuid
import httpx
from fastapi import FastAPI, Request, HTTPException, Depends
from fastapi.responses import StreamingResponse

app = FastAPI(title="Agent API Gateway")

# Service registry
SERVICES = {
    "agents": "http://agent-service:8000",
    "tools": "http://tool-service:8001",
    "conversations": "http://conversation-service:8002",
}

@app.middleware("http")
async def gateway_middleware(request: Request, call_next):
    # Add request tracking headers
    request_id = str(uuid.uuid4())
    start_time = time.time()

    response = await call_next(request)

    # Add response headers
    duration_ms = (time.time() - start_time) * 1000
    response.headers["X-Request-Id"] = request_id
    response.headers["X-Response-Time-Ms"] = f"{duration_ms:.2f}"
    return response

Content-Based Routing

Route requests to different backend services based on the request body. This is useful for directing agent execution requests to specialized model servers:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

@app.post("/api/agents/execute")
async def route_agent_execution(
    request: Request,
    user: TokenPayload = Depends(get_current_user),
):
    body = await request.json()
    model = body.get("model", "default")

    # Route to different backends based on model
    routing_table = {
        "gpt-4": "http://openai-agent-service:8000",
        "claude-3": "http://anthropic-agent-service:8000",
        "local-llama": "http://local-agent-service:8000",
        "default": SERVICES["agents"],
    }

    target_url = routing_table.get(model, routing_table["default"])

    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{target_url}/api/agents/execute",
            json=body,
            headers={
                "Authorization": request.headers.get("Authorization"),
                "X-Org-Id": user.org_id,
                "X-User-Id": user.sub,
            },
            timeout=120.0,
        )

    return response.json()

Gateway-Level Rate Limiting with Redis

Implement tiered rate limiting based on the user's subscription plan:

import redis.asyncio as redis

redis_client = redis.from_url("redis://redis:6379/0")

PLAN_LIMITS = {
    "free": {"rpm": 10, "rpd": 100},
    "pro": {"rpm": 60, "rpd": 5000},
    "enterprise": {"rpm": 300, "rpd": 50000},
}

async def check_rate_limit(user: TokenPayload = Depends(get_current_user)):
    plan = await get_user_plan(user.sub)
    limits = PLAN_LIMITS.get(plan, PLAN_LIMITS["free"])

    minute_key = f"rl:{user.sub}:minute:{int(time.time()) // 60}"
    day_key = f"rl:{user.sub}:day:{int(time.time()) // 86400}"

    pipe = redis_client.pipeline()
    pipe.incr(minute_key)
    pipe.expire(minute_key, 60)
    pipe.incr(day_key)
    pipe.expire(day_key, 86400)
    results = await pipe.execute()

    minute_count = results[0]
    day_count = results[2]

    if minute_count > limits["rpm"]:
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded (per minute)",
            headers={"Retry-After": "60"},
        )
    if day_count > limits["rpd"]:
        raise HTTPException(
            status_code=429,
            detail="Daily rate limit exceeded",
            headers={"Retry-After": "3600"},
        )

FAQ

Should I use Kong, Traefik, or a custom gateway?

Use Kong if you need a mature plugin ecosystem with built-in support for JWT, OAuth2, OIDC, and advanced rate limiting out of the box. Use Traefik if you are on Kubernetes and want auto-discovery of services through ingress annotations. Build a custom FastAPI gateway when you need content-based routing, complex request transformation, or business logic in the gateway layer. Many teams start with Traefik for basic routing and add a thin FastAPI gateway behind it for application-specific logic.

How do I handle streaming responses through a gateway?

AI agent responses often stream via SSE (Server-Sent Events). Your gateway must proxy the response as a stream without buffering the entire body. In a custom FastAPI gateway, use httpx.AsyncClient.stream() and return a StreamingResponse. In Kong and Traefik, disable response buffering for streaming endpoints. Test latency carefully — gateways that buffer before forwarding add significant time-to-first-token latency.

How should I version my AI agent API through the gateway?

Use URL path versioning (/v1/agents, /v2/agents) routed to different backend services. The gateway maintains a routing table that maps version prefixes to the appropriate service version. Support a Sunset response header on deprecated versions to give clients advance notice. Allow enterprise customers to pin to specific versions while gradually migrating the default version for new users.


#APIGateway #Kong #Traefik #FastAPI #AIAgents #RateLimiting #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Strategy

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Q1 2026 saw a record acquisition wave: Aircall bought Vogent (May), Meta acquired Manus and PlayAI, OpenAI closed six deals. The voice AI consolidation phase has begun.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026

The supervisor pattern in LangGraph for coordinating specialist agents, with full code, an eval pipeline that scores routing accuracy, and the failure modes to watch for.