Skip to content
AI Infrastructure
AI Infrastructure13 min read0 views

Build a Cloudflare Workers + Durable Objects Voice Agent

Per-call state with Durable Objects, voice transport with Cloudflare Realtime, and tools via the Agents SDK. Real Workers code that scales globally.

TL;DR — Cloudflare's Agents SDK gives you per-call Agent instances backed by Durable Objects, with WebSocket voice transport and SQLite-backed conversation history. ~30 lines of server code.

What you'll build

A Cloudflare Worker exposing a /voice endpoint. Each connecting client gets a dedicated Durable Object (one per call) running the Agents SDK's withVoice mixin. STT comes from Workers AI Whisper Flux, TTS from Aura, and the LLM from @cf/meta/llama-3.3-70b-instruct.

Prerequisites

  1. Cloudflare account with Workers paid plan ($5/mo) for DO compute.
  2. npm create cloudflare@latest -- --template cloudflare/agents-starter.
  3. wrangler 4+ and AI binding enabled.
  4. A simple HTML client that opens wss://your-worker/voice.
  5. Familiarity with Durable Objects.

Architecture

flowchart LR
  B[Browser] -- ws --> W[Worker]
  W -- routeAgentRequest --> DO[(Durable Object: VoiceAgent)]
  DO -- Workers AI --> ST[Whisper Flux]
  DO -- Workers AI --> LL[Llama 3.3 70B]
  DO -- Workers AI --> TT[Aura TTS]

Step 1 — wrangler.jsonc

```jsonc { "name": "callsphere-voice", "main": "src/index.ts", "compatibility_date": "2026-05-01", "ai": { "binding": "AI" }, "durable_objects": { "bindings": [{ "name": "VoiceAgent", "class_name": "VoiceAgent" }] }, "migrations": [ { "tag": "v1", "new_sqlite_classes": ["VoiceAgent"] } ] } ```

Step 2 — The Agent class

```typescript import { Agent, routeAgentRequest } from "agents"; import { withVoice, WorkersAIFluxSTT, WorkersAITTS } from "agents/voice";

type Env = { AI: Ai; VoiceAgent: DurableObjectNamespace };

export class VoiceAgent extends withVoice(Agent, { stt: new WorkersAIFluxSTT({ model: "@cf/openai/whisper-large-v3-turbo" }), tts: new WorkersAITTS({ model: "@cf/deepgram/aura-1" }), }) { async onChatMessage(messages: { role: string; content: string }[]) { const res = await this.env.AI.run("@cf/meta/llama-3.3-70b-instruct", { messages: [ { role: "system", content: "You are CallSphere's CF agent. Be brief." }, ...messages, ], }); return res.response as string; } } ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 3 — Worker entry

```typescript export default { async fetch(req: Request, env: Env): Promise { return (await routeAgentRequest(req, env)) ?? new Response("not found", { status: 404 }); }, }; ```

routeAgentRequest automatically routes /agents/voice-agent/<id>/voice to the Durable Object.

Step 4 — Browser client

```html

```

Step 5 — Add a tool that hits your CRM

The Agents SDK exposes this.callable:

```typescript async getNextAppointment(params: { customerId: string }) { const r = await fetch(`https://crm.callsphere.ai/appt/${params.customerId}\`, { headers: { Authorization: `Bearer ${this.env.CRM_TOKEN}` } }); return r.json(); } ```

Reference it in the system prompt; onChatMessage will route the call.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 6 — Deploy

```bash wrangler deploy ```

Cloudflare instantiates one Durable Object per session ID, runs it on the closest colo, and persists conversation history in SQLite-backed DO storage.

Common pitfalls

  • Forgetting new_sqlite_classes migration — without it, this.sql is unavailable.
  • High DO request bills — DOs charge per request; batch updates if you can.
  • Aura TTS sample rate — defaults to 24kHz; resample on the client if needed.
  • WebSocket hibernation — DOs hibernate; use hibernatable WebSockets or your DC will time out.

How CallSphere does this in production

CallSphere uses Cloudflare for edge cache + image resize, but our voice plane is Pion Go for Real Estate and FastAPI :8084 for Healthcare — both feeding the same 115-table Postgres. CF Workers is a great fit for low-volume verticals; we use it for our affiliate referral tracking.

FAQ

Cold start? ~10ms — DOs hibernate but resume nearly instantly.

SQLite limits? 10GB per DO, 1k writes/sec.

Can I bring my own LLM? Yes — proxy from onChatMessage to OpenAI or Anthropic.

Pricing for 1k calls/day? ~$8/mo CF + LLM tokens.

Voice + WebRTC? Use Cloudflare Realtime SFU; it converts Opus to PCM for your DO.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.