Skip to content
AI Voice Agents
AI Voice Agents12 min read0 views

Build an AI Voice Agent on Hono + OpenAI Realtime in TypeScript (2026)

Wire Hono's WebSocket helpers, the OpenAI Realtime API, and Bun runtime into a sub-700ms voice agent. Real TypeScript code, deploy targets, and pitfalls.

TL;DR — Hono ships a one-file WebSocket relay between a browser and the OpenAI Realtime API. With gpt-realtime ($32/M audio-in, $64/M audio-out as of late 2025) you can hit ~600-800ms voice-to-voice on a single Bun process. Hono's edge-friendly routing means the same code runs on Cloudflare Workers, Vercel Edge, Deno Deploy, or Node 22.

What you'll build

A TypeScript backend that serves a static HTML mic page and exposes a /realtime WebSocket. Browser audio (PCM16 24kHz) is forwarded to OpenAI Realtime; model audio + transcripts are streamed back. Tool calls (e.g. book_appointment) are handled server-side and the result is fed back into the same session.

Prerequisites

  1. Bun 1.3+ or Node 22+, hono@^4.6, @hono/node-ws (Node) or built-in Bun WS.
  2. OpenAI key with Realtime access (gpt-realtime GA from Aug 2025).
  3. Browser that supports getUserMedia (Chrome 120+, Safari 17+).

Architecture

flowchart LR
  BR[Browser mic] -- WS PCM16 --> H[Hono /realtime]
  H -- WS gpt-realtime --> OA[OpenAI Realtime API]
  OA -- audio.delta --> H --> BR
  OA -- response.function_call --> H
  H -- tool result --> OA

Step 1 — Hono server scaffold

```ts import { Hono } from "hono"; import { upgradeWebSocket } from "hono/bun";

const app = new Hono(); const OPENAI_WS = "wss://api.openai.com/v1/realtime?model=gpt-realtime";

app.get("/", (c) => c.html(<script type="module" src="/client.js"></script>));

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

app.get( "/realtime", upgradeWebSocket(() => ({ onOpen: (e, ws) => { const oa = new WebSocket(OPENAI_WS, { headers: { Authorization: Bearer ${process.env.OPENAI_API_KEY}, "OpenAI-Beta": "realtime=v1", }, } as any); (ws as any).oa = oa; oa.onmessage = (m) => ws.send(m.data); }, onMessage: (e, ws) => (ws as any).oa?.send(e.data), onClose: (, ws) => (ws as any).oa?.close(), })), );

export default { port: 8787, fetch: app.fetch, websocket: { /* bun ws */ } }; ```

Step 2 — Configure session with VAD + tools

```ts oa.onopen = () => oa.send(JSON.stringify({ type: "session.update", session: { voice: "alloy", input_audio_transcription: { model: "gpt-4o-mini-transcribe" }, turn_detection: { type: "server_vad", threshold: 0.55 }, tools: [{ type: "function", name: "book_appointment", description: "Book a slot", parameters: { type: "object", properties: { iso: { type: "string" } } } }], } })); ```

Step 3 — Browser PCM capture

```ts const ctx = new AudioContext({ sampleRate: 24000 }); const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); const src = ctx.createMediaStreamSource(stream); await ctx.audioWorklet.addModule("/pcm-worklet.js"); const node = new AudioWorkletNode(ctx, "pcm"); src.connect(node); const ws = new WebSocket(ws://${location.host}/realtime); node.port.onmessage = (e) => ws.readyState === 1 && ws.send(JSON.stringify({ type: "input_audio_buffer.append", audio: btoa(String.fromCharCode(...new Uint8Array(e.data))) })); ```

Step 4 — Handle function calls server-side

```ts oa.onmessage = async (m) => { const evt = JSON.parse(m.data.toString()); if (evt.type === "response.function_call_arguments.done") { const args = JSON.parse(evt.arguments); const result = await db.book(args.iso); oa.send(JSON.stringify({ type: "conversation.item.create", item: { type: "function_call_output", call_id: evt.call_id, output: JSON.stringify(result) }, })); oa.send(JSON.stringify({ type: "response.create" })); } ws.send(m.data); // forward all events to browser }; ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 5 — Deploy

bun build --target=bun src/index.ts then fly deploy or wrangler deploy (Hono's WebSocket adapter ships for both). Add fly scale memory 512 and set OPENAI_API_KEY as a secret.

Pitfalls

  • 24kHz vs 16kHz: Realtime expects PCM16 @ 24kHz — resampling at 16kHz produces robotic audio.
  • No commit on server VAD: Don't send input_audio_buffer.commit when turn_detection: server_vad is set; the model commits automatically.
  • Cloudflare Worker connect timeouts: WS to OpenAI sometimes exceeds 30s on idle — send a 25s keepalive ping.
  • Auth in browser: Never expose your OpenAI key client-side; the relay is the auth boundary.

How CallSphere does this in production

CallSphere runs 37 production agents across 6 verticals with 90+ tools and 115+ Postgres tables. The Healthcare stack (FastAPI), OneRoof real-estate (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), and Sales (Node.js 20 + React 18 + Vite) all share a Hono-based realtime relay that handles 1.2M concurrent voice minutes/month with ~720ms p95 voice-to-voice. Pricing is $149/$499/$1,499 with a 14-day no-card trial and a 22% recurring affiliate.

FAQ

Why Hono over Express? Hono is ~14kb, runs on every JS runtime, and has first-class WebSocket helpers for Bun, Node, Workers, and Deno without code changes.

Can I use Node instead of Bun? Yes — swap hono/bun for @hono/node-ws. Bun is ~2x faster on cold start.

What's the cost per minute? gpt-realtime is ~$0.06/min audio in + $0.24/min audio out — call it ~$0.20/min for typical voice agent traffic.

Does WebRTC work too? Yes. For browser-direct WebRTC, mint an ephemeral key via /v1/realtime/sessions and skip the relay entirely.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.