Cost Tracking for AI Agents: Per-User, Per-Feature Token Usage Analytics

Why Cost Tracking Is Critical for Production Agents

LLM costs scale with usage in ways that are easy to underestimate. A single GPT-4o call might cost fractions of a cent, but an agent that makes three LLM calls per user message — one for routing, one for the specialist, one for summarization — multiplied by thousands of daily users creates a bill that grows faster than most teams expect. Without per-user, per-feature cost attribution, you cannot answer basic questions: Which users drive the most cost? Which agent features are expensive relative to their value? Are costs growing faster than revenue?

A cost tracking system captures token usage at the call level, attributes it to users and features, stores it for analysis, and alerts when budgets are at risk.

The Token Usage Data Model

Start with a database table that records every LLM call with enough context for flexible analysis.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    subgraph IN["Inputs"]
        I1["Monthly call volume"]
        I2["Average deal value"]
        I3["Current answer rate"]
        I4["Receptionist cost<br/>per month"]
    end
    subgraph CALC["CallSphere Captures"]
        C1["Missed calls converted<br/>at 24 by 7 coverage"]
        C2["Receptionist payroll<br/>displaced or freed"]
    end
    subgraph OUT["Outputs"]
        O1["Recovered revenue<br/>per month"]
        O2["Operating cost saved"]
        O3((Net ROI<br/>monthly))
    end
    I1 --> C1
    I2 --> C1
    I3 --> C1
    I4 --> C2
    C1 --> O1 --> O3
    C2 --> O2 --> O3
    style C1 fill:#4f46e5,stroke:#4338ca,color:#fff
    style C2 fill:#4f46e5,stroke:#4338ca,color:#fff
    style O3 fill:#059669,stroke:#047857,color:#fff

# SQLAlchemy model for token usage tracking
from sqlalchemy import Column, String, Integer, Float, DateTime, Index
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import declarative_base
from datetime import datetime
import uuid

Base = declarative_base()

class TokenUsage(Base):
    __tablename__ = "token_usage"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    timestamp = Column(DateTime, nullable=False, default=datetime.utcnow)
    user_id = Column(String, nullable=False, index=True)
    conversation_id = Column(String, nullable=False, index=True)
    agent_name = Column(String, nullable=False)
    feature = Column(String, nullable=False)  # e.g., "routing", "support", "summarization"
    model = Column(String, nullable=False)
    prompt_tokens = Column(Integer, nullable=False)
    completion_tokens = Column(Integer, nullable=False)
    total_tokens = Column(Integer, nullable=False)
    cost_usd = Column(Float, nullable=False)

    __table_args__ = (
        Index("idx_usage_user_timestamp", "user_id", "timestamp"),
        Index("idx_usage_feature_timestamp", "feature", "timestamp"),
    )

Recording Token Usage from LLM Calls

Wrap your LLM client to automatically record usage after every call. Maintain a pricing table that maps models to per-token costs.

MODEL_PRICING = {
    # model: (cost_per_prompt_token, cost_per_completion_token)
    "gpt-4o": (0.0000025, 0.00001),
    "gpt-4o-mini": (0.00000015, 0.0000006),
    "claude-sonnet-4-20250514": (0.000003, 0.000015),
    "claude-haiku-35": (0.0000008, 0.000004),
}

def calculate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
    pricing = MODEL_PRICING.get(model, (0.000003, 0.000015))
    return (prompt_tokens * pricing[0]) + (completion_tokens * pricing[1])

async def tracked_llm_call(
    model: str,
    messages: list,
    user_id: str,
    conversation_id: str,
    feature: str,
    agent_name: str,
    db_session,
):
    response = await llm_client.chat.completions.create(
        model=model, messages=messages
    )

    usage = response.usage
    cost = calculate_cost(model, usage.prompt_tokens, usage.completion_tokens)

    record = TokenUsage(
        user_id=user_id,
        conversation_id=conversation_id,
        agent_name=agent_name,
        feature=feature,
        model=model,
        prompt_tokens=usage.prompt_tokens,
        completion_tokens=usage.completion_tokens,
        total_tokens=usage.total_tokens,
        cost_usd=cost,
    )
    db_session.add(record)
    await db_session.commit()

    return response

Building Usage Analytics Queries

With usage data in PostgreSQL, you can answer the key cost questions with straightforward SQL.

from sqlalchemy import func, text
from datetime import datetime, timedelta

async def get_daily_cost_by_feature(db_session, days: int = 30):
    """Cost per feature per day for the last N days."""
    cutoff = datetime.utcnow() - timedelta(days=days)
    result = await db_session.execute(
        text("""
            SELECT
                date_trunc('day', timestamp) AS day,
                feature,
                SUM(cost_usd) AS total_cost,
                SUM(total_tokens) AS total_tokens,
                COUNT(*) AS call_count
            FROM token_usage
            WHERE timestamp >= :cutoff
            GROUP BY day, feature
            ORDER BY day DESC, total_cost DESC
        """),
        {"cutoff": cutoff},
    )
    return result.fetchall()

async def get_top_users_by_cost(db_session, limit: int = 20):
    """Top N users by total LLM cost in the current month."""
    month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
    result = await db_session.execute(
        text("""
            SELECT
                user_id,
                SUM(cost_usd) AS total_cost,
                SUM(total_tokens) AS total_tokens,
                COUNT(DISTINCT conversation_id) AS conversations
            FROM token_usage
            WHERE timestamp >= :month_start
            GROUP BY user_id
            ORDER BY total_cost DESC
            LIMIT :limit
        """),
        {"month_start": month_start, "limit": limit},
    )
    return result.fetchall()

Budget Alerts

Check user and global budgets after every LLM call. When a threshold is exceeded, send alerts and optionally throttle the user.

MONTHLY_BUDGET_USD = 5000.0
PER_USER_DAILY_LIMIT_USD = 2.0

async def check_budgets(user_id: str, db_session):
    """Check both global and per-user budgets after each call."""
    # Check per-user daily spend
    today_start = datetime.utcnow().replace(hour=0, minute=0, second=0)
    user_result = await db_session.execute(
        text("""
            SELECT COALESCE(SUM(cost_usd), 0)
            FROM token_usage
            WHERE user_id = :user_id AND timestamp >= :today_start
        """),
        {"user_id": user_id, "today_start": today_start},
    )
    user_daily_cost = user_result.scalar()

    if user_daily_cost >= PER_USER_DAILY_LIMIT_USD:
        await send_alert(
            severity="warning",
            message=f"User {user_id} exceeded daily limit: ${user_daily_cost:.2f}",
        )
        raise BudgetExceededError(f"Daily usage limit reached for user {user_id}")

    # Check global monthly spend
    month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
    global_result = await db_session.execute(
        text("SELECT COALESCE(SUM(cost_usd), 0) FROM token_usage WHERE timestamp >= :month_start"),
        {"month_start": month_start},
    )
    monthly_cost = global_result.scalar()

    if monthly_cost >= MONTHLY_BUDGET_USD * 0.8:
        await send_alert(
            severity="critical",
            message=f"Monthly budget 80% consumed: ${monthly_cost:.2f} / ${MONTHLY_BUDGET_USD}",
        )

Exposing a Cost Dashboard API

Serve the analytics data through a FastAPI endpoint so your dashboard frontend can display it.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

from fastapi import APIRouter, Depends

router = APIRouter(prefix="/api/costs")

@router.get("/daily-by-feature")
async def daily_costs(days: int = 30, db=Depends(get_db)):
    rows = await get_daily_cost_by_feature(db, days)
    return [
        {"day": str(r.day.date()), "feature": r.feature,
         "cost": round(r.total_cost, 4), "tokens": r.total_tokens}
        for r in rows
    ]

@router.get("/top-users")
async def top_users(limit: int = 20, db=Depends(get_db)):
    rows = await get_top_users_by_cost(db, limit)
    return [
        {"user_id": r.user_id, "cost": round(r.total_cost, 4),
         "tokens": r.total_tokens, "conversations": r.conversations}
        for r in rows
    ]

FAQ

How accurate is token-based cost tracking compared to the actual invoice?

Token-based tracking is typically within 2-5% of the actual invoice. Discrepancies come from retries that consume tokens before failing, cached completions that some providers discount, and rounding differences. Reconcile your tracked costs against the provider invoice monthly and adjust your pricing table if needed.

Should I track costs synchronously or asynchronously?

Use asynchronous recording. Write the usage record to a queue or background task so it does not add latency to the user response. A simple approach is to use asyncio.create_task() to fire the database write without awaiting it in the request path. For high-throughput systems, batch writes via a message queue like Redis Streams or Kafka.

How do I handle cost tracking when the agent retries a failed LLM call?

Track every attempt, including retries. Each attempt consumes tokens and incurs cost, even if the response is discarded. Add a retry_attempt field to your usage table so you can analyze retry rates and their cost impact separately from successful first-attempt calls.

#CostTracking #TokenUsage #Analytics #AIAgents #BudgetManagement #AgenticAI #LearnAI #AIEngineering

Cost Tracking for AI Agents: Per-User, Per-Feature Token Usage Analytics

Why Cost Tracking Is Critical for Production Agents

The Token Usage Data Model

Recording Token Usage from LLM Calls

Building Usage Analytics Queries

Budget Alerts

Exposing a Cost Dashboard API

FAQ

How accurate is token-based cost tracking compared to the actual invoice?

Should I track costs synchronously or asynchronously?

How do I handle cost tracking when the agent retries a failed LLM call?

Try CallSphere AI Voice Agents

Related Articles You May Like

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026