Building a Voicemail AI Agent: Transcription, Analysis, and Automated Response

Rethinking Voicemail with AI

Traditional voicemail is a black hole. Messages pile up, important calls get buried under spam, and by the time someone listens to a message, the moment has passed. An AI-powered voicemail agent transforms this experience: every message is instantly transcribed, analyzed for urgency, scored by priority, and routed to the right person with a recommended action. Critical messages trigger immediate notifications. Routine ones get batched into a daily digest.

This is not just voicemail transcription — it is an intelligent message processing pipeline.

Voicemail Detection and Greeting

The first challenge is knowing when to activate the voicemail system. This happens when a call goes unanswered or when the AI screening agent decides to take a message:

flowchart LR
    CALLER(["Caller"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["Business AI Agent"]
        STT["Streaming STT<br/>Deepgram or Whisper"]
        NLU{"Intent and<br/>Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS<br/>ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and<br/>Schedule")]
        KB[("Knowledge Base<br/>and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Booking captured"])
        O2(["CRM record created"])
        O3(["Human handoff"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS <--> CRM
    TOOLS <--> CAL
    TOOLS <--> KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937

from twilio.twiml.voice_response import VoiceResponse
from fastapi import FastAPI, Request
from fastapi.responses import Response

app = FastAPI()

@app.post("/voicemail-greeting")
async def voicemail_greeting(request: Request):
    """Play a personalized voicemail greeting and record."""
    form = await request.form()
    called_number = form.get("Called")
    caller_number = form.get("From")

    # Look up the mailbox owner for a personalized greeting
    owner = await get_mailbox_owner(called_number)

    response = VoiceResponse()

    if owner and owner.get("custom_greeting_url"):
        response.play(owner["custom_greeting_url"])
    else:
        name = owner.get("name", "the person you are calling") if owner else "us"
        response.say(
            f"You have reached {name}. "
            "Please leave a message after the tone and "
            "I will make sure it gets to the right person.",
            voice="Polly.Joanna",
        )

    response.pause(length=1)
    response.play("https://api.twilio.com/beep.mp3")

    # Record the voicemail
    response.record(
        action="/voicemail-complete",
        max_length=180,          # 3 minutes max
        timeout=5,               # 5 seconds of silence to stop
        transcribe=False,        # We will use our own transcription
        recording_status_callback="/recording-ready",
        play_beep=False,         # We already played our own
    )

    # Fallback if caller does not leave a message
    response.say("No message was recorded. Goodbye.")
    response.hangup()

    return Response(content=str(response), media_type="application/xml")

Message Transcription Pipeline

When the recording is ready, download and transcribe it with high accuracy:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

import httpx
import os
from deepgram import DeepgramClient, PrerecordedOptions
from datetime import datetime

deepgram = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def transcribe_voicemail(recording_url: str) -> dict:
    """Download and transcribe a voicemail recording."""
    async with httpx.AsyncClient() as client:
        resp = await client.get(
            f"{recording_url}.wav",
            auth=(
                os.environ["TWILIO_ACCOUNT_SID"],
                os.environ["TWILIO_AUTH_TOKEN"],
            ),
        )
        audio_bytes = resp.content

    options = PrerecordedOptions(
        model="nova-2",
        smart_format=True,
        punctuate=True,
        paragraphs=True,
        detect_language=True,
        sentiment=True,
    )

    result = await deepgram.listen.asyncrest.v("1").transcribe_file(
        {"buffer": audio_bytes, "mimetype": "audio/wav"},
        options,
    )

    transcript = result.results.channels[0].alternatives[0]

    return {
        "text": transcript.transcript,
        "confidence": transcript.confidence,
        "language": result.results.channels[0].detected_language,
        "words": [
            {
                "word": w.word,
                "start": w.start,
                "end": w.end,
                "confidence": w.confidence,
            }
            for w in transcript.words
        ],
        "duration": result.metadata.duration,
    }

AI-Powered Message Analysis

Analyze the transcribed message to extract structured information:

from openai import AsyncOpenAI

client = AsyncOpenAI()

VOICEMAIL_ANALYSIS_PROMPT = """Analyze this voicemail message and extract:
1. caller_name: if mentioned
2. callback_number: if a different number is provided
3. summary: 1-2 sentence summary
4. intent: the caller's purpose (inquiry, complaint, appointment, urgent, sales, personal, spam)
5. urgency: 1-10 score (10 = emergency, 1 = junk)
6. sentiment: positive, neutral, negative, distressed
7. action_items: specific actions requested
8. entities: names, dates, account numbers, amounts mentioned
9. is_spam: boolean — telemarketer, robocall, or solicitation
10. suggested_response: recommended reply approach

Return valid JSON."""

async def analyze_voicemail(
    transcript_text: str,
    caller_number: str,
    caller_history: dict,
) -> dict:
    """Run AI analysis on a voicemail transcript."""
    context = ""
    if caller_history:
        context = (
            f"\nCaller history: {caller_history.get('total_calls', 0)} "
            f"previous calls, last contact: "
            f"{caller_history.get('last_contact', 'never')}. "
            f"Known as: {caller_history.get('name', 'unknown')}."
        )

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": VOICEMAIL_ANALYSIS_PROMPT},
            {
                "role": "user",
                "content": f"Transcript: {transcript_text}{context}",
            },
        ],
        response_format={"type": "json_object"},
        temperature=0.2,
    )

    import json
    return json.loads(response.choices[0].message.content)

Priority Scoring and Smart Routing

Not all voicemails are equal. Score and route them based on the analysis:

from dataclasses import dataclass
from typing import Optional

@dataclass
class ProcessedVoicemail:
    id: str
    caller_number: str
    recording_url: str
    transcript: str
    analysis: dict
    priority_score: int
    mailbox_owner: str
    created_at: datetime
    callback_scheduled: Optional[datetime] = None

class VoicemailRouter:
    """Routes processed voicemails based on priority and content."""

    URGENCY_THRESHOLDS = {
        "immediate_notify": 8,   # Phone push + SMS
        "priority_notify": 5,    # Email + app notification
        "batch_digest": 1,       # Daily summary
        "spam_discard": 0,       # Auto-archive
    }

    async def route_voicemail(
        self, voicemail: ProcessedVoicemail
    ) -> str:
        """Determine notification strategy based on priority."""
        analysis = voicemail.analysis
        score = analysis.get("urgency", 5)

        if analysis.get("is_spam"):
            await self.archive_spam(voicemail)
            return "spam_archived"

        if score >= self.URGENCY_THRESHOLDS["immediate_notify"]:
            await self.send_immediate_notification(voicemail)
            await self.schedule_callback(voicemail, delay_minutes=15)
            return "immediate"

        if score >= self.URGENCY_THRESHOLDS["priority_notify"]:
            await self.send_priority_notification(voicemail)
            await self.schedule_callback(voicemail, delay_minutes=60)
            return "priority"

        await self.add_to_digest(voicemail)
        return "batched"

    async def send_immediate_notification(
        self, voicemail: ProcessedVoicemail
    ):
        """Push notification with transcript and suggested action."""
        message = (
            f"URGENT VOICEMAIL from {voicemail.analysis.get('caller_name', voicemail.caller_number)}\n"
            f"Summary: {voicemail.analysis['summary']}\n"
            f"Action: {voicemail.analysis.get('suggested_response', 'Call back ASAP')}"
        )
        await self.push_notification(voicemail.mailbox_owner, message)
        await self.send_sms(voicemail.mailbox_owner, message)

    async def schedule_callback(
        self, voicemail: ProcessedVoicemail, delay_minutes: int
    ):
        """Schedule an automated callback if not handled manually."""
        from datetime import timedelta
        callback_time = datetime.utcnow() + timedelta(minutes=delay_minutes)
        callback_number = (
            voicemail.analysis.get("callback_number")
            or voicemail.caller_number
        )

        await self.db_pool.execute(
            """
            INSERT INTO scheduled_callbacks
            (voicemail_id, phone_number, scheduled_at, status, context)
            VALUES ($1, $2, $3, 'pending', $4)
            """,
            voicemail.id,
            callback_number,
            callback_time,
            json.dumps(voicemail.analysis),
        )

Automated Callback System

For voicemails that request a callback, the AI can handle the return call:

class AutoCallbackEngine:
    """Handles automated callbacks for voicemail follow-up."""

    async def execute_callback(
        self, callback_id: str, voicemail: ProcessedVoicemail
    ):
        """Place an automated callback based on voicemail context."""
        context = voicemail.analysis

        # Generate a personalized callback script
        script = await self.generate_callback_script(context)

        # Place the call
        call = self.twilio_client.calls.create(
            to=context.get("callback_number", voicemail.caller_number),
            from_=os.environ["TWILIO_NUMBER"],
            url=(
                f"{self.webhook_base}/callback-answer"
                f"?callback_id={callback_id}"
            ),
            machine_detection="DetectMessageEnd",
        )

        return call.sid

    async def generate_callback_script(self, context: dict) -> str:
        """Generate a contextual callback opening."""
        response = await self.ai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Generate a brief, professional callback "
                        "opening based on the voicemail context. "
                        "Reference the caller's original message to "
                        "show you listened. Keep it under 3 sentences."
                    ),
                },
                {
                    "role": "user",
                    "content": (
                        f"Caller: {context.get('caller_name', 'the caller')}. "
                        f"Their message: {context['summary']}. "
                        f"They wanted: {', '.join(context.get('action_items', ['a callback']))}"
                    ),
                },
            ],
        )
        return response.choices[0].message.content

The Complete Processing Pipeline

Wire everything together in an async pipeline:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

async def process_voicemail_pipeline(
    recording_sid: str,
    recording_url: str,
    call_sid: str,
    caller_number: str,
    called_number: str,
):
    """End-to-end voicemail processing pipeline."""
    # Step 1: Transcribe
    transcript = await transcribe_voicemail(recording_url)

    if transcript["confidence"] < 0.3:
        # Very low confidence — store raw recording, skip analysis
        await store_raw_voicemail(recording_sid, recording_url)
        return

    # Step 2: Get caller history
    caller_history = await get_caller_history(caller_number)

    # Step 3: Analyze
    analysis = await analyze_voicemail(
        transcript["text"], caller_number, caller_history
    )

    # Step 4: Create processed voicemail record
    voicemail = ProcessedVoicemail(
        id=recording_sid,
        caller_number=caller_number,
        recording_url=recording_url,
        transcript=transcript["text"],
        analysis=analysis,
        priority_score=analysis.get("urgency", 5),
        mailbox_owner=await get_mailbox_owner(called_number),
        created_at=datetime.utcnow(),
    )

    # Step 5: Store in database
    await store_processed_voicemail(voicemail)

    # Step 6: Route based on priority
    route_result = await voicemail_router.route_voicemail(voicemail)

    print(
        f"Voicemail from {caller_number}: "
        f"urgency={analysis.get('urgency')}, "
        f"intent={analysis.get('intent')}, "
        f"routed={route_result}"
    )

FAQ

How do I detect if a voicemail system answered instead of a human?

When making outbound calls, use Twilio's machine_detection parameter set to DetectMessageEnd. This uses audio analysis to distinguish human speech patterns from voicemail greetings. It detects the greeting, waits for the beep, and then connects your webhook so you can leave a message at the right moment. Detection accuracy is approximately 90% — design your opening line to work gracefully in both scenarios.

What is the best way to handle voicemails in languages other than English?

Use a transcription service with automatic language detection (Deepgram and Whisper both support this). Once the language is detected, switch your AI analysis prompt to that language or use a multilingual model. Store the detected language alongside the transcript so notifications can be formatted appropriately. For businesses serving multilingual populations, consider offering the voicemail greeting in multiple languages.

How do I handle very long voicemails or callers who ramble?

Set a max_length on the recording (120-180 seconds is typical). For analysis of long messages, the AI naturally handles this — the summary and action items extraction will distill even a rambling 3-minute message into a concise output. If you want to discourage long messages, your greeting can say "Please leave a brief message" and you can use the timeout parameter to stop recording after a few seconds of silence.

#Voicemail #Transcription #AIAnalysis #CallbackScheduling #VoiceAI #Automation #AgenticAI #LearnAI #AIEngineering

Building a Voicemail AI Agent: Transcription, Analysis, and Automated Response

Rethinking Voicemail with AI

Voicemail Detection and Greeting

Message Transcription Pipeline

AI-Powered Message Analysis

Priority Scoring and Smart Routing

Automated Callback System

The Complete Processing Pipeline

FAQ

How do I detect if a voicemail system answered instead of a human?

What is the best way to handle voicemails in languages other than English?

How do I handle very long voicemails or callers who ramble?

Try CallSphere AI Voice Agents

Related Articles You May Like

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

WebRTC Over QUIC and the Future of Realtime: Where Voice AI Goes After 2026

Defense, ITAR & AI Voice Vendor Compliance in 2026

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real

Call Sentiment Time-Series Dashboards for Voice AI in 2026