Skip to content
Technical Guides
Technical Guides16 min read15 views

SIP Trunking for AI Voice Agents: Carrier Selection and Architecture

A technical guide to SIP trunking for AI voice agents — carrier comparison, codec selection, and high-availability patterns.

Why SIP trunking still matters

Most teams starting with AI voice agents buy a Twilio number and stop thinking about telephony. That works until you need to port 300 existing DIDs, attach an AI agent to an on-prem PBX, or dial into a country where your preferred CPaaS has terrible termination rates. At that point you are in SIP trunking territory, and the decisions you make about carriers, codecs, and failover will dictate your voice quality for years.

This is a technical guide to wiring SIP trunks into an AI voice agent stack. It covers the carrier comparison I wish I had when I started, the codec tradeoffs that matter, and the high-availability patterns that keep calls flowing when one carrier goes dark.

on-prem PBX / softswitch
   │ SIP INVITE
   ▼
Primary SIP trunk (carrier A)
   │
   ▼
SBC (session border controller)
   │ PCM16
   ▼
AI voice agent edge

Architecture overview

┌──────────┐      ┌──────────┐      ┌────────────┐
│ Carrier A│──┐   │ Carrier B│──┐   │ Carrier C  │
└──────────┘  │   └──────────┘  │   └────────────┘
              ▼                 ▼           │
        ┌────────────────────────────┐      │
        │        Dual SBCs           │◄─────┘
        │ (active/active failover)   │
        └────────────┬───────────────┘
                     │ RTP / PCM16
                     ▼
        ┌────────────────────────────┐
        │ AI voice agent edge        │
        │ (FastAPI + Realtime API)   │
        └────────────────────────────┘

Prerequisites

  • Accounts with at least two SIP carriers (Twilio Elastic SIP Trunking, Bandwidth, Telnyx, or similar).
  • An SBC — cloud (Twilio, Telnyx) or self-hosted (Kamailio, OpenSIPS, FreeSWITCH).
  • A public IP or SRV record that the carriers can reach.
  • Familiarity with SIP methods (INVITE, ACK, BYE) and SDP.

Step-by-step walkthrough

1. Choose your codec strategy

For AI voice agents, stick with G.711 ulaw (8kHz) or Opus (16-48kHz). Avoid G.729 unless you are forced into it — the compression artifacts confuse speech recognition.

flowchart LR
    CALLER(["Caller"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["Business AI Agent"]
        STT["Streaming STT<br/>Deepgram or Whisper"]
        NLU{"Intent and<br/>Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS<br/>ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and<br/>Schedule")]
        KB[("Knowledge Base<br/>and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Booking captured"])
        O2(["CRM record created"])
        O3(["Human handoff"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS <--> CRM
    TOOLS <--> CAL
    TOOLS <--> KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937
Codec Bandwidth Quality for STT Notes
G.711 64 kbps Good Universal, carrier default
Opus 6-64 kbps Excellent Not all carriers support it end-to-end
G.729 8 kbps Poor Avoid for AI agents

2. Configure carrier authentication

Most carriers support IP-based auth or SIP digest. IP-based is simpler but requires a static egress IP.

; Kamailio example: accept INVITEs from carrier A's IP range
if (src_ip == 198.51.100.0/24) {
    xlog("L_INFO", "Call from carrier A\n");
    route(FORWARD_TO_EDGE);
}

3. Bridge SIP to your edge with a media gateway

Use FreeSWITCH or a cloud SBC to terminate SIP and emit PCM16 frames over a WebSocket or RTP stream your edge can consume.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
<!-- FreeSWITCH dialplan -->
<extension name="ai_agent_bridge">
  <condition field="destination_number" expression="^\+1([0-9]{10})$">
    <action application="answer"/>
    <action application="set" data="media_webhook_url=wss://edge.yourapp.com/sip"/>
    <action application="audio_fork" data="wss://edge.yourapp.com/sip"/>
  </condition>
</extension>

4. Consume audio on the edge

import WebSocket from "ws";

const server = new WebSocket.Server({ port: 8080, path: "/sip" });

server.on("connection", (sock) => {
  const oai = new WebSocket(
    "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03",
    { headers: { Authorization: "Bearer " + process.env.OPENAI_API_KEY, "OpenAI-Beta": "realtime=v1" } },
  );

  sock.on("message", (frame) => {
    oai.send(JSON.stringify({ type: "input_audio_buffer.append", audio: frame.toString("base64") }));
  });

  oai.on("message", (raw) => {
    const evt = JSON.parse(raw.toString());
    if (evt.type === "response.audio.delta") {
      sock.send(Buffer.from(evt.delta, "base64"));
    }
  });
});

5. Add a second carrier for failover

Configure your SBC to route primary traffic through carrier A and automatically fall back to carrier B on SIP 5xx responses or RTP timeouts.

6. Monitor with Homer or sngrep

SIP debugging is a full-time job without a packet capture tool. Homer captures every SIP message and lets you reconstruct a call flow after the fact.

Production considerations

  • Latency: SIP adds 20-100ms versus a direct CPaaS WebSocket. Budget for it.
  • NAT traversal: use a public SBC IP; do not put carriers behind 1:1 NAT without testing.
  • DTMF: prefer RFC 2833 over inband. Inband DTMF corrupts AI transcription.
  • RTP inactivity timeout: set to 30-60s to detect silent failures.
  • Billing reconciliation: carriers disagree with your CDRs. Keep your own call log authoritative.

CallSphere's real implementation

CallSphere primarily uses Twilio for telephony with WebRTC for in-browser testing, and for enterprise customers with existing telecom infrastructure we bridge SIP trunks to the same edge service that handles native Twilio Media Streams. The edge runs Python FastAPI and forwards PCM16 at 24kHz to the OpenAI Realtime API with gpt-4o-realtime-preview-2025-06-03 and server VAD.

The multi-agent topologies vary by vertical — 14 tools for healthcare, 10 for real estate, 4 for salon, 7 for after-hours escalation, 10 plus RAG for IT helpdesk, and an ElevenLabs + 5 GPT-4 specialist pod for sales — but they all share the same carrier-agnostic audio plane, which means a new SIP carrier is a config change, not a rewrite. CallSphere supports 57+ languages with under one second of end-to-end response time on live traffic.

Common pitfalls

  • Mixing G.729 with STT: recognition accuracy drops 10-20 points.
  • Inband DTMF: tones leak into the audio and confuse the LLM.
  • Single carrier: when they have an outage, you have an outage.
  • Skipping the SBC: you need it for topology hiding and codec negotiation.
  • Forgetting about emergency calls: if you handle 911, you need a separate E911 provider.

FAQ

Is Twilio Elastic SIP Trunking enough for production?

Yes for most teams. It handles failover, has good global coverage, and integrates cleanly with Twilio's programmable voice.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Can I use Asterisk instead of FreeSWITCH?

Yes, but FreeSWITCH has a more modern audio_fork app and better WebSocket support.

Do I need STIR/SHAKEN?

In the US and Canada, yes, for outbound calling to avoid spam labeling.

What sample rate should the SBC deliver?

Whatever the model expects. For the Realtime API, 24kHz PCM16.

How do I debug a one-way audio issue?

Capture SIP and RTP with sngrep or Wireshark and verify the SDP offered by each side. One-way audio is almost always an RTP port issue.

Next steps

Planning a telephony migration or an enterprise SIP integration? Book a demo, read the technology overview, or check the platform page.

#CallSphere #SIPTrunking #VoiceAI #Telephony #Kamailio #FreeSWITCH #Carriers

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

IT Helpdesk

Denver and Boulder IT Helpdesks: A Different Take on CallSphere Voice + Chat for Front Range MSPs Running Tight Margins

Colorado MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.

IT Helpdesk

Hassle-Free CallSphere Integration for Edison IT Departments — RAG Knowledge Base, Auto Ticket, Live Voice & Chat

New Jersey MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.

IT Helpdesk

Michigan MSP Operators' Playbook for Plugging Voice + Chat AI Into Your PSA Without Rewriting a Workflow

Michigan MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.

IT Helpdesk

From Rochester to Statewide MN: Smooth CallSphere Rollout for MSPs Running Halo, Freshservice, and Jira SM

Minnesota MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.

IT Helpdesk

Why Pennsylvania IT Helpdesks Are Routing L1 Tickets Through CallSphere's 10-Agent AI — Pittsburgh Lead Adopters

Pennsylvania MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.

IT Helpdesk

Columbus MSPs: Drop CallSphere Voice + Chat Into ConnectWise, Autotask, and ServiceNow With Zero Friction

Ohio MSPs and IT helpdesks: integrate CallSphere's 10-agent voice + chat AI into ConnectWise, Autotask, ServiceNow, or your PSA in 24-72 hours.