Skip to content
Learn Agentic AI
Learn Agentic AI14 min read16 views

Cost Tracking for AI Agents: Per-User, Per-Feature Token Usage Analytics

Build a complete cost tracking system for AI agents that attributes token usage to individual users and features, sets budget alerts, and provides dashboards for controlling LLM spend in production.

Why Cost Tracking Is Critical for Production Agents

LLM costs scale with usage in ways that are easy to underestimate. A single GPT-4o call might cost fractions of a cent, but an agent that makes three LLM calls per user message — one for routing, one for the specialist, one for summarization — multiplied by thousands of daily users creates a bill that grows faster than most teams expect. Without per-user, per-feature cost attribution, you cannot answer basic questions: Which users drive the most cost? Which agent features are expensive relative to their value? Are costs growing faster than revenue?

A cost tracking system captures token usage at the call level, attributes it to users and features, stores it for analysis, and alerts when budgets are at risk.

The Token Usage Data Model

Start with a database table that records every LLM call with enough context for flexible analysis.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart LR
    subgraph IN["Inputs"]
        I1["Monthly call volume"]
        I2["Average deal value"]
        I3["Current answer rate"]
        I4["Receptionist cost<br/>per month"]
    end
    subgraph CALC["CallSphere Captures"]
        C1["Missed calls converted<br/>at 24 by 7 coverage"]
        C2["Receptionist payroll<br/>displaced or freed"]
    end
    subgraph OUT["Outputs"]
        O1["Recovered revenue<br/>per month"]
        O2["Operating cost saved"]
        O3((Net ROI<br/>monthly))
    end
    I1 --> C1
    I2 --> C1
    I3 --> C1
    I4 --> C2
    C1 --> O1 --> O3
    C2 --> O2 --> O3
    style C1 fill:#4f46e5,stroke:#4338ca,color:#fff
    style C2 fill:#4f46e5,stroke:#4338ca,color:#fff
    style O3 fill:#059669,stroke:#047857,color:#fff
# SQLAlchemy model for token usage tracking
from sqlalchemy import Column, String, Integer, Float, DateTime, Index
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import declarative_base
from datetime import datetime
import uuid

Base = declarative_base()

class TokenUsage(Base):
    __tablename__ = "token_usage"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    timestamp = Column(DateTime, nullable=False, default=datetime.utcnow)
    user_id = Column(String, nullable=False, index=True)
    conversation_id = Column(String, nullable=False, index=True)
    agent_name = Column(String, nullable=False)
    feature = Column(String, nullable=False)  # e.g., "routing", "support", "summarization"
    model = Column(String, nullable=False)
    prompt_tokens = Column(Integer, nullable=False)
    completion_tokens = Column(Integer, nullable=False)
    total_tokens = Column(Integer, nullable=False)
    cost_usd = Column(Float, nullable=False)

    __table_args__ = (
        Index("idx_usage_user_timestamp", "user_id", "timestamp"),
        Index("idx_usage_feature_timestamp", "feature", "timestamp"),
    )

Recording Token Usage from LLM Calls

Wrap your LLM client to automatically record usage after every call. Maintain a pricing table that maps models to per-token costs.

MODEL_PRICING = {
    # model: (cost_per_prompt_token, cost_per_completion_token)
    "gpt-4o": (0.0000025, 0.00001),
    "gpt-4o-mini": (0.00000015, 0.0000006),
    "claude-sonnet-4-20250514": (0.000003, 0.000015),
    "claude-haiku-35": (0.0000008, 0.000004),
}

def calculate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
    pricing = MODEL_PRICING.get(model, (0.000003, 0.000015))
    return (prompt_tokens * pricing[0]) + (completion_tokens * pricing[1])

async def tracked_llm_call(
    model: str,
    messages: list,
    user_id: str,
    conversation_id: str,
    feature: str,
    agent_name: str,
    db_session,
):
    response = await llm_client.chat.completions.create(
        model=model, messages=messages
    )

    usage = response.usage
    cost = calculate_cost(model, usage.prompt_tokens, usage.completion_tokens)

    record = TokenUsage(
        user_id=user_id,
        conversation_id=conversation_id,
        agent_name=agent_name,
        feature=feature,
        model=model,
        prompt_tokens=usage.prompt_tokens,
        completion_tokens=usage.completion_tokens,
        total_tokens=usage.total_tokens,
        cost_usd=cost,
    )
    db_session.add(record)
    await db_session.commit()

    return response

Building Usage Analytics Queries

With usage data in PostgreSQL, you can answer the key cost questions with straightforward SQL.

from sqlalchemy import func, text
from datetime import datetime, timedelta

async def get_daily_cost_by_feature(db_session, days: int = 30):
    """Cost per feature per day for the last N days."""
    cutoff = datetime.utcnow() - timedelta(days=days)
    result = await db_session.execute(
        text("""
            SELECT
                date_trunc('day', timestamp) AS day,
                feature,
                SUM(cost_usd) AS total_cost,
                SUM(total_tokens) AS total_tokens,
                COUNT(*) AS call_count
            FROM token_usage
            WHERE timestamp >= :cutoff
            GROUP BY day, feature
            ORDER BY day DESC, total_cost DESC
        """),
        {"cutoff": cutoff},
    )
    return result.fetchall()

async def get_top_users_by_cost(db_session, limit: int = 20):
    """Top N users by total LLM cost in the current month."""
    month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
    result = await db_session.execute(
        text("""
            SELECT
                user_id,
                SUM(cost_usd) AS total_cost,
                SUM(total_tokens) AS total_tokens,
                COUNT(DISTINCT conversation_id) AS conversations
            FROM token_usage
            WHERE timestamp >= :month_start
            GROUP BY user_id
            ORDER BY total_cost DESC
            LIMIT :limit
        """),
        {"month_start": month_start, "limit": limit},
    )
    return result.fetchall()

Budget Alerts

Check user and global budgets after every LLM call. When a threshold is exceeded, send alerts and optionally throttle the user.

MONTHLY_BUDGET_USD = 5000.0
PER_USER_DAILY_LIMIT_USD = 2.0

async def check_budgets(user_id: str, db_session):
    """Check both global and per-user budgets after each call."""
    # Check per-user daily spend
    today_start = datetime.utcnow().replace(hour=0, minute=0, second=0)
    user_result = await db_session.execute(
        text("""
            SELECT COALESCE(SUM(cost_usd), 0)
            FROM token_usage
            WHERE user_id = :user_id AND timestamp >= :today_start
        """),
        {"user_id": user_id, "today_start": today_start},
    )
    user_daily_cost = user_result.scalar()

    if user_daily_cost >= PER_USER_DAILY_LIMIT_USD:
        await send_alert(
            severity="warning",
            message=f"User {user_id} exceeded daily limit: ${user_daily_cost:.2f}",
        )
        raise BudgetExceededError(f"Daily usage limit reached for user {user_id}")

    # Check global monthly spend
    month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
    global_result = await db_session.execute(
        text("SELECT COALESCE(SUM(cost_usd), 0) FROM token_usage WHERE timestamp >= :month_start"),
        {"month_start": month_start},
    )
    monthly_cost = global_result.scalar()

    if monthly_cost >= MONTHLY_BUDGET_USD * 0.8:
        await send_alert(
            severity="critical",
            message=f"Monthly budget 80% consumed: ${monthly_cost:.2f} / ${MONTHLY_BUDGET_USD}",
        )

Exposing a Cost Dashboard API

Serve the analytics data through a FastAPI endpoint so your dashboard frontend can display it.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

from fastapi import APIRouter, Depends

router = APIRouter(prefix="/api/costs")

@router.get("/daily-by-feature")
async def daily_costs(days: int = 30, db=Depends(get_db)):
    rows = await get_daily_cost_by_feature(db, days)
    return [
        {"day": str(r.day.date()), "feature": r.feature,
         "cost": round(r.total_cost, 4), "tokens": r.total_tokens}
        for r in rows
    ]

@router.get("/top-users")
async def top_users(limit: int = 20, db=Depends(get_db)):
    rows = await get_top_users_by_cost(db, limit)
    return [
        {"user_id": r.user_id, "cost": round(r.total_cost, 4),
         "tokens": r.total_tokens, "conversations": r.conversations}
        for r in rows
    ]

FAQ

How accurate is token-based cost tracking compared to the actual invoice?

Token-based tracking is typically within 2-5% of the actual invoice. Discrepancies come from retries that consume tokens before failing, cached completions that some providers discount, and rounding differences. Reconcile your tracked costs against the provider invoice monthly and adjust your pricing table if needed.

Should I track costs synchronously or asynchronously?

Use asynchronous recording. Write the usage record to a queue or background task so it does not add latency to the user response. A simple approach is to use asyncio.create_task() to fire the database write without awaiting it in the request path. For high-throughput systems, batch writes via a message queue like Redis Streams or Kafka.

How do I handle cost tracking when the agent retries a failed LLM call?

Track every attempt, including retries. Each attempt consumes tokens and incurs cost, even if the response is discarded. Add a retry_attempt field to your usage table so you can analyze retry rates and their cost impact separately from successful first-attempt calls.


#CostTracking #TokenUsage #Analytics #AIAgents #BudgetManagement #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

AI Strategy

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Q1 2026 saw a record acquisition wave: Aircall bought Vogent (May), Meta acquired Manus and PlayAI, OpenAI closed six deals. The voice AI consolidation phase has begun.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026

The supervisor pattern in LangGraph for coordinating specialist agents, with full code, an eval pipeline that scores routing accuracy, and the failure modes to watch for.