Building a Multi-Tenant AI Agent Platform: Isolating Customers in Shared Infrastructure

Why Multi-Tenancy Is Hard for AI Agents

Multi-tenant AI agent platforms share infrastructure across customers to reduce costs, but AI agents introduce unique isolation challenges. An agent's system prompt contains business-specific knowledge. Conversation histories contain customer PII. Tool configurations expose internal APIs. A cross-tenant data leak in an AI agent is not just a privacy violation — it could expose one customer's business logic and customer data to another.

The three pillars of AI agent multi-tenancy are data isolation (no tenant can read another tenant's data), resource isolation (one tenant's usage spike does not degrade another's experience), and configuration isolation (each tenant's agent behaves according to their specific settings).

Data Isolation with Row-Level Security

The most practical approach for most platforms is a shared database with row-level security (RLS). Every table includes a tenant_id column, and PostgreSQL enforces that queries only return rows matching the current tenant:

flowchart LR
    AGENT(["Agent wants<br/>to run code"])
    POLICY{"Policy check<br/>allow list"}
    SANDBOX[("Ephemeral sandbox<br/>Firecracker or gVisor")]
    NETPOL["Egress firewall<br/>deny by default"]
    LIMIT["Resource limits<br/>CPU, mem, time"]
    EXEC["Run untrusted code"]
    LOG[("Audit log")]
    OUT(["Captured stdout<br/>or error"])
    DENY(["Refuse"])
    AGENT --> POLICY
    POLICY -->|Allow| SANDBOX
    POLICY -->|Block| DENY
    SANDBOX --> NETPOL --> LIMIT --> EXEC --> LOG --> OUT
    style POLICY fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SANDBOX fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style EXEC fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
    style DENY fill:#dc2626,stroke:#b91c1c,color:#fff

# Database schema with tenant isolation
SCHEMA = """
CREATE TABLE tenants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    plan TEXT NOT NULL DEFAULT 'free',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    user_id TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID NOT NULL REFERENCES conversations(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    role TEXT NOT NULL,
    content TEXT NOT NULL,
    tokens_used INTEGER DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Row-Level Security
ALTER TABLE conversations ENABLE ROW LEVEL SECURITY;
ALTER TABLE messages ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_conversations ON conversations
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

CREATE POLICY tenant_isolation_messages ON messages
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

-- Index for tenant-scoped queries
CREATE INDEX idx_messages_tenant_conv
    ON messages (tenant_id, conversation_id, created_at);
"""

Set the tenant context on every database connection before executing queries:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

from contextlib import asynccontextmanager

@asynccontextmanager
async def tenant_connection(tenant_id: str):
    conn = await db_pool.acquire()
    try:
        await conn.execute(
            f"SET app.current_tenant = '{tenant_id}'"
        )
        yield conn
    finally:
        await conn.execute("RESET app.current_tenant")
        await db_pool.release(conn)

# Usage
async def get_conversation_history(
    tenant_id: str, conversation_id: str
) -> list:
    async with tenant_connection(tenant_id) as conn:
        # RLS automatically filters to this tenant
        rows = await conn.fetch(
            "SELECT role, content FROM messages "
            "WHERE conversation_id = $1 ORDER BY created_at",
            conversation_id,
        )
        return [dict(r) for r in rows]

Even if a bug in your application code accidentally passes the wrong conversation ID, RLS ensures the query returns zero rows rather than another tenant's data.

Resource Quotas and Rate Limiting

Each tenant needs resource limits to prevent one customer from consuming all capacity. Implement tiered quotas based on the customer's plan:

from dataclasses import dataclass

@dataclass
class TenantQuota:
    messages_per_minute: int
    messages_per_day: int
    max_tokens_per_message: int
    max_concurrent_sessions: int
    monthly_token_budget: int

PLAN_QUOTAS = {
    "free": TenantQuota(
        messages_per_minute=10,
        messages_per_day=100,
        max_tokens_per_message=2000,
        max_concurrent_sessions=5,
        monthly_token_budget=500_000,
    ),
    "pro": TenantQuota(
        messages_per_minute=60,
        messages_per_day=5000,
        max_tokens_per_message=8000,
        max_concurrent_sessions=50,
        monthly_token_budget=10_000_000,
    ),
    "enterprise": TenantQuota(
        messages_per_minute=300,
        messages_per_day=50000,
        max_tokens_per_message=16000,
        max_concurrent_sessions=500,
        monthly_token_budget=100_000_000,
    ),
}

class QuotaEnforcer:
    def __init__(self, redis_client):
        self.redis = redis_client

    async def check_quota(self, tenant_id: str, plan: str) -> bool:
        quota = PLAN_QUOTAS[plan]

        # Check rate limit (sliding window)
        minute_key = f"rate:{tenant_id}:minute"
        current = await self.redis.incr(minute_key)
        if current == 1:
            await self.redis.expire(minute_key, 60)
        if current > quota.messages_per_minute:
            return False

        # Check daily limit
        day_key = f"rate:{tenant_id}:day:{today()}"
        daily = await self.redis.incr(day_key)
        if daily == 1:
            await self.redis.expire(day_key, 86400)
        if daily > quota.messages_per_day:
            return False

        return True

Tenant-Specific Agent Configuration

Each tenant configures their agent differently — custom system prompts, enabled tools, model preferences, branding. Store this configuration separately and load it per request:

class TenantAgentConfig:
    def __init__(self, redis_client, db_pool):
        self.redis = redis_client
        self.db = db_pool

    async def get_config(self, tenant_id: str) -> dict:
        cache_key = f"tenant:config:{tenant_id}"
        cached = await self.redis.get(cache_key)
        if cached:
            return json.loads(cached)

        async with tenant_connection(tenant_id) as conn:
            config = await conn.fetchrow(
                "SELECT system_prompt, model, enabled_tools, "
                "temperature, max_turns FROM agent_configs "
                "WHERE tenant_id = $1 AND active = true",
                tenant_id,
            )

        config_dict = dict(config)
        await self.redis.setex(cache_key, 300, json.dumps(config_dict))
        return config_dict

Per-Tenant Billing with Token Tracking

Track every LLM API call with the tenant ID to enable accurate billing:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

class UsageMeter:
    def __init__(self, db_pool):
        self.db = db_pool

    async def record_usage(
        self,
        tenant_id: str,
        model: str,
        input_tokens: int,
        output_tokens: int,
        conversation_id: str,
    ):
        async with self.db.acquire() as conn:
            await conn.execute(
                "INSERT INTO usage_records "
                "(tenant_id, model, input_tokens, output_tokens, "
                "conversation_id, cost_cents, recorded_at) "
                "VALUES ($1, $2, $3, $4, $5, $6, NOW())",
                tenant_id,
                model,
                input_tokens,
                output_tokens,
                conversation_id,
                self._calculate_cost(model, input_tokens, output_tokens),
            )

    def _calculate_cost(
        self, model: str, input_tokens: int, output_tokens: int
    ) -> float:
        rates = {
            "gpt-4o-mini": (0.015, 0.06),
            "gpt-4o": (0.25, 1.00),
        }
        input_rate, output_rate = rates.get(model, (0.25, 1.00))
        return (
            (input_tokens / 100_000) * input_rate
            + (output_tokens / 100_000) * output_rate
        )

FAQ

Should I use a shared database or separate databases per tenant?

Use a shared database with row-level security for most cases. It is simpler to manage, migrate, and back up. Use separate databases only for enterprise customers with strict compliance requirements (healthcare, finance) or when a single tenant's data volume justifies dedicated infrastructure.

How do I prevent one tenant's agent from accidentally accessing another tenant's tools?

Load the tool configuration per-tenant at request time and only register the tools that tenant has enabled. Never use a global tool registry shared across tenants. If tools access external APIs, use tenant-specific API keys stored encrypted in the database.

What happens when a tenant exceeds their quota?

Return a 429 status code with a Retry-After header indicating when they can resume. For soft limits (approaching the monthly budget), send a notification to the tenant admin and optionally downgrade to a cheaper model rather than hard-blocking. For hard limits (daily rate limits), block immediately to protect infrastructure.

#MultiTenant #AIAgents #PlatformEngineering #TenantIsolation #SaaS #DataSegregation #AgenticAI #LearnAI #AIEngineering

Building a Multi-Tenant AI Agent Platform: Isolating Customers in Shared Infrastructure

Why Multi-Tenancy Is Hard for AI Agents

Data Isolation with Row-Level Security

Resource Quotas and Rate Limiting

Tenant-Specific Agent Configuration

Per-Tenant Billing with Token Tracking

FAQ

Should I use a shared database or separate databases per tenant?

How do I prevent one tenant's agent from accidentally accessing another tenant's tools?

What happens when a tenant exceeds their quota?

Try CallSphere AI Voice Agents

Related Articles You May Like

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals