Skip to content
Learn Agentic AI
Learn Agentic AI14 min read27 views

Building an AI Agent SaaS Platform: Architecture Patterns

Design and build a multi-tenant AI agent SaaS platform with user isolation, API key management, token metering, billing integration, and scalable infrastructure using the OpenAI Agents SDK.

The AI Agent Platform Opportunity

As agentic AI matures, many teams are building platforms that let end users create and run AI agents without writing code. These platforms face unique architectural challenges: multi-tenancy, usage-based billing, token metering, agent isolation, and the need to support hundreds of concurrent agent runs without one tenant's workload degrading another's.

This post covers the architecture patterns for building a production AI agent SaaS platform on the OpenAI Agents SDK.

System Architecture Overview

A multi-tenant agent platform has five core layers:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
  1. API Gateway — Authentication, rate limiting, request routing
  2. Tenant Management — User accounts, API keys, configuration
  3. Agent Runtime — Executes agent workflows with tenant isolation
  4. Metering and Billing — Tracks token usage, enforces limits, bills customers
  5. Storage — Agent definitions, conversation history, tool configurations
Client -> API Gateway -> Agent Runtime -> OpenAI API
                |              |
          Tenant DB      Metering Service
                |              |
          Agent Store    Billing System

Database Schema

The core schema captures tenants, agents, API keys, and usage:

# models.py
from sqlalchemy import Column, String, Integer, Float, Boolean, DateTime, ForeignKey, Text, JSON
from sqlalchemy.orm import relationship, DeclarativeBase
from sqlalchemy.dialects.postgresql import UUID
import uuid
from datetime import datetime

class Base(DeclarativeBase):
    pass

class Tenant(Base):
    __tablename__ = "tenants"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    name = Column(String(255), nullable=False)
    email = Column(String(255), unique=True, nullable=False)
    plan = Column(String(50), default="free")  # free, pro, enterprise
    monthly_token_limit = Column(Integer, default=1_000_000)
    is_active = Column(Boolean, default=True)
    created_at = Column(DateTime, default=datetime.utcnow)

    api_keys = relationship("APIKey", back_populates="tenant")
    agents = relationship("AgentConfig", back_populates="tenant")
    usage_records = relationship("UsageRecord", back_populates="tenant")

class APIKey(Base):
    __tablename__ = "api_keys"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), nullable=False)
    key_hash = Column(String(64), unique=True, nullable=False)
    key_prefix = Column(String(8), nullable=False)  # First 8 chars for identification
    name = Column(String(255), nullable=False)
    is_active = Column(Boolean, default=True)
    last_used_at = Column(DateTime, nullable=True)
    created_at = Column(DateTime, default=datetime.utcnow)

    tenant = relationship("Tenant", back_populates="api_keys")

class AgentConfig(Base):
    __tablename__ = "agent_configs"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), nullable=False)
    name = Column(String(255), nullable=False)
    model = Column(String(50), default="gpt-4.1")
    instructions = Column(Text, nullable=False)
    tools = Column(JSON, default=[])
    handoffs = Column(JSON, default=[])
    is_published = Column(Boolean, default=False)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)

    tenant = relationship("Tenant", back_populates="agents")

class UsageRecord(Base):
    __tablename__ = "usage_records"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), nullable=False)
    agent_id = Column(UUID(as_uuid=True), ForeignKey("agent_configs.id"), nullable=True)
    model = Column(String(50), nullable=False)
    input_tokens = Column(Integer, default=0)
    output_tokens = Column(Integer, default=0)
    total_tokens = Column(Integer, default=0)
    cost_usd = Column(Float, default=0.0)
    created_at = Column(DateTime, default=datetime.utcnow)

    tenant = relationship("Tenant", back_populates="usage_records")

API Key Authentication

Authenticate tenants using hashed API keys:

# auth.py
from fastapi import Security, HTTPException
from fastapi.security import APIKeyHeader
import hashlib
from sqlalchemy import select
from models import APIKey, Tenant
from database import get_session

api_key_header = APIKeyHeader(name="X-API-Key")

def hash_api_key(key: str) -> str:
    return hashlib.sha256(key.encode()).hexdigest()

async def get_current_tenant(api_key: str = Security(api_key_header)) -> Tenant:
    """Authenticate and return the tenant for the given API key."""
    key_hash = hash_api_key(api_key)

    async with get_session() as session:
        result = await session.execute(
            select(APIKey)
            .where(APIKey.key_hash == key_hash, APIKey.is_active == True)
            .options(selectinload(APIKey.tenant))
        )
        api_key_record = result.scalar_one_or_none()

        if not api_key_record:
            raise HTTPException(status_code=401, detail="Invalid API key")

        tenant = api_key_record.tenant
        if not tenant.is_active:
            raise HTTPException(status_code=403, detail="Account suspended")

        # Update last used timestamp
        api_key_record.last_used_at = datetime.utcnow()
        await session.commit()

        return tenant

Token Metering Service

Track every token consumed by every tenant in real time:

# metering.py
from datetime import datetime, timedelta
from sqlalchemy import select, func
from models import UsageRecord, Tenant
from database import get_session

MODEL_PRICING = {
    "gpt-5": {"input": 10.00, "output": 30.00},
    "gpt-4.1": {"input": 2.00, "output": 8.00},
    "gpt-4.1-mini": {"input": 0.40, "output": 1.60},
}

class MeteringService:
    async def record_usage(
        self,
        tenant_id: str,
        agent_id: str | None,
        model: str,
        input_tokens: int,
        output_tokens: int,
    ) -> UsageRecord:
        """Record token usage for a tenant."""
        pricing = MODEL_PRICING.get(model, MODEL_PRICING["gpt-4.1"])
        cost = (
            (input_tokens / 1_000_000) * pricing["input"] +
            (output_tokens / 1_000_000) * pricing["output"]
        )

        async with get_session() as session:
            record = UsageRecord(
                tenant_id=tenant_id,
                agent_id=agent_id,
                model=model,
                input_tokens=input_tokens,
                output_tokens=output_tokens,
                total_tokens=input_tokens + output_tokens,
                cost_usd=cost,
            )
            session.add(record)
            await session.commit()
            return record

    async def get_monthly_usage(self, tenant_id: str) -> dict:
        """Get the tenant's usage for the current billing period."""
        month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)

        async with get_session() as session:
            result = await session.execute(
                select(
                    func.sum(UsageRecord.total_tokens).label("total_tokens"),
                    func.sum(UsageRecord.cost_usd).label("total_cost"),
                    func.count(UsageRecord.id).label("request_count"),
                )
                .where(
                    UsageRecord.tenant_id == tenant_id,
                    UsageRecord.created_at >= month_start,
                )
            )
            row = result.one()
            return {
                "total_tokens": row.total_tokens or 0,
                "total_cost": round(row.total_cost or 0.0, 4),
                "request_count": row.request_count or 0,
            }

    async def check_quota(self, tenant_id: str) -> bool:
        """Check if the tenant has remaining quota."""
        async with get_session() as session:
            tenant = await session.get(Tenant, tenant_id)
            usage = await self.get_monthly_usage(tenant_id)
            return usage["total_tokens"] < tenant.monthly_token_limit

Tenant-Isolated Agent Runtime

Each agent run must be isolated to its tenant. The runtime builds agents dynamically from the tenant's configuration:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

# runtime.py
from agents import Agent, Runner, function_tool
from models import AgentConfig, Tenant
from metering import MeteringService

metering = MeteringService()

class TenantAgentRuntime:
    """Runs agents in a tenant-isolated context."""

    def __init__(self, tenant: Tenant):
        self.tenant = tenant

    def build_agent(self, config: AgentConfig) -> Agent:
        """Build an SDK Agent from a tenant's agent configuration."""
        tools = self._build_tools(config.tools)

        return Agent(
            name=config.name,
            model=config.model,
            instructions=config.instructions,
            tools=tools,
        )

    async def run(self, config: AgentConfig, user_input: str) -> dict:
        """Execute an agent run with metering and quota enforcement."""
        # Check quota before running
        has_quota = await metering.check_quota(str(self.tenant.id))
        if not has_quota:
            return {"error": "Monthly token quota exceeded", "status": 429}

        agent = self.build_agent(config)

        result = await Runner.run(agent, input=user_input)

        # Record token usage
        total_input = 0
        total_output = 0
        for response in result.raw_responses:
            if response.usage:
                total_input += response.usage.input_tokens
                total_output += response.usage.output_tokens

        await metering.record_usage(
            tenant_id=str(self.tenant.id),
            agent_id=str(config.id),
            model=config.model,
            input_tokens=total_input,
            output_tokens=total_output,
        )

        return {
            "response": result.final_output,
            "tokens_used": total_input + total_output,
            "model": config.model,
        }

    def _build_tools(self, tool_configs: list[dict]) -> list:
        """Build tool functions from configuration."""
        tools = []
        for tool_config in tool_configs:
            if tool_config["type"] == "webhook":
                tools.append(self._create_webhook_tool(tool_config))
        return tools

    def _create_webhook_tool(self, config: dict):
        """Create a function tool that calls a tenant's webhook."""
        import httpx

        @function_tool(name_override=config["name"], description_override=config["description"])
        async def webhook_tool(**kwargs) -> str:
            async with httpx.AsyncClient(timeout=30.0) as client:
                resp = await client.post(
                    config["url"],
                    json=kwargs,
                    headers={"Authorization": f"Bearer {config.get('token', '')}"},
                )
                return resp.text

        return webhook_tool

API Endpoints

The platform exposes a clean REST API for tenants to manage and run their agents:

# routes.py
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from auth import get_current_tenant
from runtime import TenantAgentRuntime
from models import Tenant, AgentConfig
from database import get_session

router = APIRouter(prefix="/v1")

class RunRequest(BaseModel):
    agent_id: str
    message: str

class RunResponse(BaseModel):
    response: str
    tokens_used: int
    model: str

@router.post("/run", response_model=RunResponse)
async def run_agent(request: RunRequest, tenant: Tenant = Depends(get_current_tenant)):
    async with get_session() as session:
        config = await session.get(AgentConfig, request.agent_id)
        if not config or str(config.tenant_id) != str(tenant.id):
            raise HTTPException(status_code=404, detail="Agent not found")

    runtime = TenantAgentRuntime(tenant)
    result = await runtime.run(config, request.message)

    if "error" in result:
        raise HTTPException(status_code=result["status"], detail=result["error"])

    return RunResponse(**result)

@router.get("/usage")
async def get_usage(tenant: Tenant = Depends(get_current_tenant)):
    from metering import MeteringService
    metering = MeteringService()
    usage = await metering.get_monthly_usage(str(tenant.id))
    return {
        "monthly_usage": usage,
        "monthly_limit": tenant.monthly_token_limit,
        "plan": tenant.plan,
    }

Billing Integration

Connect metering data to a billing system like Stripe for usage-based pricing:

# billing.py
import stripe
from metering import MeteringService

stripe.api_key = "sk_..."

PLAN_PRICING = {
    "free": {"base_price": 0, "included_tokens": 1_000_000, "overage_per_1m": 0},
    "pro": {"base_price": 49, "included_tokens": 10_000_000, "overage_per_1m": 5.00},
    "enterprise": {"base_price": 299, "included_tokens": 100_000_000, "overage_per_1m": 3.00},
}

async def calculate_invoice(tenant_id: str, plan: str) -> dict:
    metering = MeteringService()
    usage = await metering.get_monthly_usage(tenant_id)
    pricing = PLAN_PRICING[plan]

    base = pricing["base_price"]
    overage_tokens = max(0, usage["total_tokens"] - pricing["included_tokens"])
    overage_cost = (overage_tokens / 1_000_000) * pricing["overage_per_1m"]

    return {
        "base_price": base,
        "included_tokens": pricing["included_tokens"],
        "tokens_used": usage["total_tokens"],
        "overage_tokens": overage_tokens,
        "overage_cost": round(overage_cost, 2),
        "total": round(base + overage_cost, 2),
    }

Building an AI agent SaaS platform requires careful attention to isolation, metering, and scalability. The patterns above — hashed API keys, per-tenant agent runtimes, real-time token metering, and usage-based billing — provide a solid foundation. Start with a single-tenant deployment to validate your agent framework, then add multi-tenancy once the core agent logic is proven.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Engineering

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Every 100ms of latency costs you. So does every cent per minute. Here is the decision matrix we use across 6 verticals to pick where to spend and where to save on voice AI infrastructure.

AI Infrastructure

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

The public MCP registry crossed 9,400 servers in April 2026. Here is a curated walkthrough of the SaaS MCP servers CallSphere mounts in production, with OAuth 2.1 PKCE patterns.

Agentic AI

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

Build a browser agent with LangGraph and Playwright that does multi-step web tasks, then ground-truth its work with visual diffs and DOM-based evaluators.

Agentic AI

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

Build a working computer-use agent with the OpenAI Computer Use tool — clicks, types, scrolls a real browser — then evaluate task success on a benchmark suite.

AI Strategy

Vector DB Build vs Buy: The 2026 Decision Framework Made Simple

When to use Pinecone vs pgvector vs Qdrant vs Weaviate. A decision framework that maps team size and workload to the right pick without endless evaluation loops.

Funding & Industry

OpenAI revenue run-rate — April 2026 read — April 2026 update

OpenAI's April 2026 reported revenue run-rate cleared $13B annualized, on continued ChatGPT growth, agentic Operator monetization, and enterprise API expansion.