Skip to content
Learn Agentic AI
Learn Agentic AI14 min read9 views

Building a Multi-Tenant AI Agent Platform: Isolating Customers in Shared Infrastructure

Design and build a multi-tenant AI agent platform with proper tenant isolation, resource quotas, data segregation, per-tenant billing, and shared infrastructure that scales efficiently without cross-tenant data leakage.

Why Multi-Tenancy Is Hard for AI Agents

Multi-tenant AI agent platforms share infrastructure across customers to reduce costs, but AI agents introduce unique isolation challenges. An agent's system prompt contains business-specific knowledge. Conversation histories contain customer PII. Tool configurations expose internal APIs. A cross-tenant data leak in an AI agent is not just a privacy violation — it could expose one customer's business logic and customer data to another.

The three pillars of AI agent multi-tenancy are data isolation (no tenant can read another tenant's data), resource isolation (one tenant's usage spike does not degrade another's experience), and configuration isolation (each tenant's agent behaves according to their specific settings).

Data Isolation with Row-Level Security

The most practical approach for most platforms is a shared database with row-level security (RLS). Every table includes a tenant_id column, and PostgreSQL enforces that queries only return rows matching the current tenant:

flowchart LR
    AGENT(["Agent wants<br/>to run code"])
    POLICY{"Policy check<br/>allow list"}
    SANDBOX[("Ephemeral sandbox<br/>Firecracker or gVisor")]
    NETPOL["Egress firewall<br/>deny by default"]
    LIMIT["Resource limits<br/>CPU, mem, time"]
    EXEC["Run untrusted code"]
    LOG[("Audit log")]
    OUT(["Captured stdout<br/>or error"])
    DENY(["Refuse"])
    AGENT --> POLICY
    POLICY -->|Allow| SANDBOX
    POLICY -->|Block| DENY
    SANDBOX --> NETPOL --> LIMIT --> EXEC --> LOG --> OUT
    style POLICY fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SANDBOX fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style EXEC fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
    style DENY fill:#dc2626,stroke:#b91c1c,color:#fff
# Database schema with tenant isolation
SCHEMA = """
CREATE TABLE tenants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    plan TEXT NOT NULL DEFAULT 'free',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    user_id TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID NOT NULL REFERENCES conversations(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    role TEXT NOT NULL,
    content TEXT NOT NULL,
    tokens_used INTEGER DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Row-Level Security
ALTER TABLE conversations ENABLE ROW LEVEL SECURITY;
ALTER TABLE messages ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_conversations ON conversations
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

CREATE POLICY tenant_isolation_messages ON messages
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

-- Index for tenant-scoped queries
CREATE INDEX idx_messages_tenant_conv
    ON messages (tenant_id, conversation_id, created_at);
"""

Set the tenant context on every database connection before executing queries:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
from contextlib import asynccontextmanager

@asynccontextmanager
async def tenant_connection(tenant_id: str):
    conn = await db_pool.acquire()
    try:
        await conn.execute(
            f"SET app.current_tenant = '{tenant_id}'"
        )
        yield conn
    finally:
        await conn.execute("RESET app.current_tenant")
        await db_pool.release(conn)

# Usage
async def get_conversation_history(
    tenant_id: str, conversation_id: str
) -> list:
    async with tenant_connection(tenant_id) as conn:
        # RLS automatically filters to this tenant
        rows = await conn.fetch(
            "SELECT role, content FROM messages "
            "WHERE conversation_id = $1 ORDER BY created_at",
            conversation_id,
        )
        return [dict(r) for r in rows]

Even if a bug in your application code accidentally passes the wrong conversation ID, RLS ensures the query returns zero rows rather than another tenant's data.

Resource Quotas and Rate Limiting

Each tenant needs resource limits to prevent one customer from consuming all capacity. Implement tiered quotas based on the customer's plan:

from dataclasses import dataclass

@dataclass
class TenantQuota:
    messages_per_minute: int
    messages_per_day: int
    max_tokens_per_message: int
    max_concurrent_sessions: int
    monthly_token_budget: int

PLAN_QUOTAS = {
    "free": TenantQuota(
        messages_per_minute=10,
        messages_per_day=100,
        max_tokens_per_message=2000,
        max_concurrent_sessions=5,
        monthly_token_budget=500_000,
    ),
    "pro": TenantQuota(
        messages_per_minute=60,
        messages_per_day=5000,
        max_tokens_per_message=8000,
        max_concurrent_sessions=50,
        monthly_token_budget=10_000_000,
    ),
    "enterprise": TenantQuota(
        messages_per_minute=300,
        messages_per_day=50000,
        max_tokens_per_message=16000,
        max_concurrent_sessions=500,
        monthly_token_budget=100_000_000,
    ),
}

class QuotaEnforcer:
    def __init__(self, redis_client):
        self.redis = redis_client

    async def check_quota(self, tenant_id: str, plan: str) -> bool:
        quota = PLAN_QUOTAS[plan]

        # Check rate limit (sliding window)
        minute_key = f"rate:{tenant_id}:minute"
        current = await self.redis.incr(minute_key)
        if current == 1:
            await self.redis.expire(minute_key, 60)
        if current > quota.messages_per_minute:
            return False

        # Check daily limit
        day_key = f"rate:{tenant_id}:day:{today()}"
        daily = await self.redis.incr(day_key)
        if daily == 1:
            await self.redis.expire(day_key, 86400)
        if daily > quota.messages_per_day:
            return False

        return True

Tenant-Specific Agent Configuration

Each tenant configures their agent differently — custom system prompts, enabled tools, model preferences, branding. Store this configuration separately and load it per request:

class TenantAgentConfig:
    def __init__(self, redis_client, db_pool):
        self.redis = redis_client
        self.db = db_pool

    async def get_config(self, tenant_id: str) -> dict:
        cache_key = f"tenant:config:{tenant_id}"
        cached = await self.redis.get(cache_key)
        if cached:
            return json.loads(cached)

        async with tenant_connection(tenant_id) as conn:
            config = await conn.fetchrow(
                "SELECT system_prompt, model, enabled_tools, "
                "temperature, max_turns FROM agent_configs "
                "WHERE tenant_id = $1 AND active = true",
                tenant_id,
            )

        config_dict = dict(config)
        await self.redis.setex(cache_key, 300, json.dumps(config_dict))
        return config_dict

Per-Tenant Billing with Token Tracking

Track every LLM API call with the tenant ID to enable accurate billing:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

class UsageMeter:
    def __init__(self, db_pool):
        self.db = db_pool

    async def record_usage(
        self,
        tenant_id: str,
        model: str,
        input_tokens: int,
        output_tokens: int,
        conversation_id: str,
    ):
        async with self.db.acquire() as conn:
            await conn.execute(
                "INSERT INTO usage_records "
                "(tenant_id, model, input_tokens, output_tokens, "
                "conversation_id, cost_cents, recorded_at) "
                "VALUES ($1, $2, $3, $4, $5, $6, NOW())",
                tenant_id,
                model,
                input_tokens,
                output_tokens,
                conversation_id,
                self._calculate_cost(model, input_tokens, output_tokens),
            )

    def _calculate_cost(
        self, model: str, input_tokens: int, output_tokens: int
    ) -> float:
        rates = {
            "gpt-4o-mini": (0.015, 0.06),
            "gpt-4o": (0.25, 1.00),
        }
        input_rate, output_rate = rates.get(model, (0.25, 1.00))
        return (
            (input_tokens / 100_000) * input_rate
            + (output_tokens / 100_000) * output_rate
        )

FAQ

Should I use a shared database or separate databases per tenant?

Use a shared database with row-level security for most cases. It is simpler to manage, migrate, and back up. Use separate databases only for enterprise customers with strict compliance requirements (healthcare, finance) or when a single tenant's data volume justifies dedicated infrastructure.

How do I prevent one tenant's agent from accidentally accessing another tenant's tools?

Load the tool configuration per-tenant at request time and only register the tools that tenant has enabled. Never use a global tool registry shared across tenants. If tools access external APIs, use tenant-specific API keys stored encrypted in the database.

What happens when a tenant exceeds their quota?

Return a 429 status code with a Retry-After header indicating when they can resume. For soft limits (approaching the monthly budget), send a notification to the tenant admin and optionally downgrade to a cheaper model rather than hard-blocking. For hard limits (daily rate limits), block immediately to protect infrastructure.


#MultiTenant #AIAgents #PlatformEngineering #TenantIsolation #SaaS #DataSegregation #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Infrastructure

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

The public MCP registry crossed 9,400 servers in April 2026. Here is a curated walkthrough of the SaaS MCP servers CallSphere mounts in production, with OAuth 2.1 PKCE patterns.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

AI Strategy

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Q1 2026 saw a record acquisition wave: Aircall bought Vogent (May), Meta acquired Manus and PlayAI, OpenAI closed six deals. The voice AI consolidation phase has begun.