Post-Call Analytics with GPT-4o-mini: Sentiment, Lead Scoring, and Intent

The cheap AI that earns its keep

Running the Realtime API for live conversation is expensive. Running GPT-4o-mini over the transcript afterwards is nearly free — and it is where most of the operational insight actually comes from. Sentiment, intent, lead score, satisfaction, escalation reason: all of it falls out of one structured JSON call per transcript.

This post walks through the post-call analytics pipeline CallSphere runs in production, including the exact schema, the prompt, and the queue architecture that keeps it off the hot path.

call ends
   │
   ▼
queue.publish(post_call, {transcript, metadata})
   │
   ▼
worker pulls
   │
   ▼
GPT-4o-mini call with JSON schema
   │
   ▼
UPSERT call_analytics
   │
   ▼
trigger downstream (CRM, dashboards)

Architecture overview

┌────────────────────┐
│ Voice agent runtime│
└─────────┬──────────┘
          │ on_call_end
          ▼
┌────────────────────┐
│ Queue (SQS/Redis)  │
└─────────┬──────────┘
          ▼
┌────────────────────┐
│ Analytics worker   │
│ • GPT-4o-mini call │
│ • JSON validation  │
└─────────┬──────────┘
          ▼
┌────────────────────┐
│ call_analytics     │
└─────────┬──────────┘
          ▼
   dashboards, CRM,
   alerts, exports

Prerequisites

A queue for background jobs.
Postgres (or any OLAP store) for the analytics table.
An OpenAI key with GPT-4o-mini access.
The call transcript in a structured [{role, text}] format.

Step-by-step walkthrough

1. Define the output schema

ANALYTICS_SCHEMA = {
    "type": "object",
    "properties": {
        "summary": {"type": "string"},
        "sentiment": {"type": "string", "enum": ["positive", "neutral", "negative"]},
        "sentiment_score": {"type": "number", "minimum": -1, "maximum": 1},
        "intent": {"type": "string"},
        "lead_score": {"type": "integer", "minimum": 0, "maximum": 100},
        "satisfaction": {"type": "integer", "minimum": 1, "maximum": 5},
        "escalated": {"type": "boolean"},
        "escalation_reason": {"type": ["string", "null"]},
        "next_action": {"type": "string"},
        "tags": {"type": "array", "items": {"type": "string"}},
    },
    "required": ["summary", "sentiment", "intent", "lead_score", "satisfaction", "escalated", "next_action"],
}

2. Write the worker

from openai import AsyncOpenAI
client = AsyncOpenAI()

PROMPT = """
You are an analyst reviewing a completed phone call between a customer and an AI voice agent.
Return a JSON object matching the provided schema. Be concise and accurate.
Do not invent facts. If something is unclear, say so in the summary.
"""

async def analyze(transcript: list[dict]) -> dict:
    text = "\n".join(f"{t['role']}: {t['text']}" for t in transcript)
    resp = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": PROMPT},
            {"role": "user", "content": text},
        ],
        response_format={"type": "json_object"},
        temperature=0.1,
    )
    return json.loads(resp.choices[0].message.content)

3. Persist and index

CREATE TABLE call_analytics (
  call_id TEXT PRIMARY KEY,
  summary TEXT,
  sentiment TEXT,
  sentiment_score REAL,
  intent TEXT,
  lead_score INT,
  satisfaction INT,
  escalated BOOLEAN,
  escalation_reason TEXT,
  next_action TEXT,
  tags TEXT[],
  created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON call_analytics (sentiment, created_at);
CREATE INDEX ON call_analytics (lead_score DESC) WHERE lead_score >= 70;

4. Trigger downstream actions

async def on_analytics(result: dict, call_id: str):
    if result["lead_score"] >= 75:
        await hubspot_log_hot_lead(call_id, result)
    if result["escalated"]:
        await pager_alert(call_id, result["escalation_reason"])

5. Handle failures gracefully

Validate the JSON against the schema. On failure, retry once with a "fix your previous output" prompt. On repeated failure, park the event in a DLQ for manual review.

flowchart LR
    LEAD(["Inbound lead"])
    AGENT["AI voice or chat<br/>qualifier"]
    BANT["BANT capture<br/>budget, authority,<br/>need, timing"]
    SCORE{"Lead score<br/>and routing rules"}
    HOT(["Hot — book<br/>AE meeting"])
    WARM(["Warm — SDR<br/>sequence"])
    NURT(["Nurture — drip<br/>and content"])
    CRM[("CRM and SLA timer")]
    LEAD --> AGENT --> BANT --> SCORE
    SCORE -->|Hot| HOT --> CRM
    SCORE -->|Warm| WARM --> CRM
    SCORE -->|Cold| NURT --> CRM
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style HOT fill:#059669,stroke:#047857,color:#fff
    style WARM fill:#0ea5e9,stroke:#0369a1,color:#fff
    style NURT fill:#f59e0b,stroke:#d97706,color:#1f2937

6. Sample and spot-check

Every day, have a human reviewer grade 10 random analytics outputs for accuracy. Drift in the base model shows up here first.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Production considerations

Cost: GPT-4o-mini is ~$0.15/1M input tokens. A 5-minute call is roughly $0.001 to analyze.
Latency: this runs async, so latency does not affect the caller, but keep the worker under 10s to avoid backlog.
PII: redact credit cards and SSNs before sending the transcript to the LLM.
Schema evolution: version the schema and store the version alongside the row.
Bias monitoring: spot-check scores across demographics to avoid systematic skew.

CallSphere's real implementation

CallSphere runs exactly this pipeline for every call across every vertical. The voice plane uses the OpenAI Realtime API with gpt-4o-realtime-preview-2025-06-03, PCM16 at 24kHz, and server VAD. When a call ends, the transcript plus metadata is published to a queue, and a worker calls GPT-4o-mini with a JSON schema almost identical to the one above, then writes the result into per-vertical Postgres.

The healthcare vertical tunes the schema for insurance and clinical intent signals (14 tools), real estate uses tighter lead-scoring and tour-booking intent (10 agents), salon optimizes for rebooking and upsell (4 agents), after-hours escalation focuses on urgency classification (7 tools), IT helpdesk combines intent with RAG-hit quality (10 tools + RAG), and the ElevenLabs sales pod tracks objection categories (5 GPT-4 specialists). All of them feed the same admin dashboard. CallSphere runs 57+ languages with analytics computed identically across them.

Common pitfalls

Running analytics synchronously: it blocks the next call.
Trusting the JSON without validation: small JSON errors blow up downstream.
Mixing verticals in one prompt: every vertical needs its own schema.
Ignoring drift: spot-check or you will miss regressions.
Logging raw PII: use field-level encryption for the summary column.

FAQ

Why GPT-4o-mini and not the full model?

Cost. GPT-4o-mini is accurate enough for analytics and 10-20x cheaper.

How do I compute trends over time?

Roll up nightly into a summary table; do not re-query raw every time.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Can I use the same output to route follow-ups?

Yes — the next_action field is designed for it.

What about multi-language calls?

GPT-4o-mini handles 50+ languages well for sentiment and intent.

How do I correlate analytics with business outcomes?

Join call_analytics.call_id to your CRM deal closure data.

Next steps

Want sentiment, intent, and lead scoring on every call? Book a demo, explore the technology page, or see pricing.

#CallSphere #PostCallAnalytics #GPT4oMini #VoiceAI #Sentiment #LeadScoring #AIVoiceAgents