Skip to content
AI Voice Agents
AI Voice Agents12 min read0 views

Build a Voice Agent with Hume EVI 3: Emotionally Intelligent Voice (2026)

Hume EVI 3 is one model for STT+LLM+TTS with prosody-aware reactions. Build a customizable speech-to-speech agent — TypeScript code, voice prompting, pitfalls.

TL;DR — Hume EVI 3 is a single speech-language model that handles transcription, language, and speech in one shot — and it tracks the user's vocal emotion in real time. You can describe ANY voice in a prompt ("a warm 40-year-old British woman"), point it at Claude or Gemini, and get sub-300ms emotionally aware replies.

What you'll build

A Next.js app using Hume's TypeScript SDK to open an EVI 3 WebSocket session, render the live emotion meter, and let users design a voice via plain-English prompt — all under 250 lines.

Architecture

flowchart LR
  MIC[Browser mic] -- WS audio --> EV[Hume EVI 3]
  EV -- prosody + transcript --> APP[Your client]
  EV -- voice audio --> APP --> SP[Speakers]
  EV -- llm_call --> CLD[Claude 4 / Gemini 2.5]

Step 1 — Install

```bash npm i hume @humeai/voice-react

server-side only:

npm i hume jsonwebtoken ```

Step 2 — Mint an access token (server)

```ts // app/api/hume-token/route.ts import { fetchAccessToken } from "@humeai/voice";

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

export async function GET() { const accessToken = await fetchAccessToken({ apiKey: process.env.HUME_API_KEY!, secretKey: process.env.HUME_SECRET_KEY!, }); return Response.json({ accessToken }); } ```

Step 3 — Configure the EVI

In platform.hume.ai → EVI → Configs, create a config with:

  • Model: evi-3
  • Voice description (prompt): A warm, calm 35-year-old American woman who sounds like a kind nurse.
  • LLM: anthropic/claude-3-5-sonnet (or google/gemini-2.5-flash)
  • System prompt: You are Ava, a clinic concierge. Adapt tone to the caller's emotion.
  • Tools: optional (function calls work like OpenAI)

Copy the resulting configId.

Step 4 — Client provider

```tsx "use client"; import { VoiceProvider, useVoice } from "@humeai/voice-react";

export default function Page() { const [token, setToken] = useState<string | null>(null); useEffect(() => { fetch("/api/hume-token").then(r=>r.json()) .then(j=>setToken(j.accessToken)); }, []); if (!token) return null; return ( <VoiceProvider auth={{ type: "accessToken", value: token }} configId={process.env.NEXT_PUBLIC_HUME_CONFIG_ID!}> ); } ```

Step 5 — Render the emotion meter

```tsx function Concierge() { const { connect, disconnect, status, messages } = useVoice(); const last = messages[messages.length - 1]; const top3 = last?.models?.prosody?.scores ? Object.entries(last.models.prosody.scores) .sort((a, b) => (b[1] as number) - (a[1] as number)).slice(0, 3) : []; return ( <> <button onClick={status.value === "connected" ? disconnect : connect}> {status.value === "connected" ? "Hang up" : "Talk"}

    {top3.map(([k, v]) =>
  • {k}: {(v as number).toFixed(2)}
  • )}
</> ); } ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 6 — Design a voice in code

```ts // node script import { HumeClient } from "hume"; const hume = new HumeClient({ apiKey: process.env.HUME_API_KEY! }); const voice = await hume.empathicVoice.customVoices.create({ name: "Sunrise Ava", baseVoice: "ITO", parameterModel: "20240715-4parameter", parameters: { gender: 2, assertiveness: -1, buoyancy: 1, confidence: 0 }, }); console.log(voice.id); ```

Step 7 — Hook tool calls

EVI 3 tool events look like {type: "tool_call", name, parameters, tool_call_id} — handle in onMessage and respond with {type: "tool_response", tool_call_id, content}.

Pitfalls

  • WebSocket only: No HTTP REST surface for EVI; budget your reconnect logic.
  • Voice description quality: Vague prompts ("nice voice") yield generic output — be specific (age, accent, energy).
  • Latency vs realism: evi-3 is ~280ms p50; switching to evi-3-fast drops to ~180ms with slightly less expressive prosody.
  • Multi-language: Excellent on EN; for 60+ languages pair EVI 3 STT with Soniox or Universal-3.

How CallSphere does this

CallSphere uses EVI 3 in the Behavioral Health vertical where emotional adaptation is core to UX — running across 37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149/$499/$1,499 · 14-day trial · 22% affiliate.

FAQ

Cost? Per-minute pricing on EVI 3 is comparable to GPT-4o Realtime — ~$0.18/min combined.

Custom LLM? Yes — point the config at OpenAI / Anthropic / Google / Mistral via the dashboard.

Voice cloning? With 30 seconds of audio, EVI 3 captures timbre, rhythm, and tone.

Phone calls? Twilio Media Streams bridge ships in the docs — wire WS-to-WS and you have PSTN.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Infrastructure

Build a Multi-Region Voice Agent on Fly.io for Sub-500ms Global Latency (2026)

Deploy a voice agent to Fly.io's anycast network across 6 regions: Tokyo, Frankfurt, São Paulo, Sydney, Virginia, Los Angeles. fly-replay routes traffic to the closest healthy region.

AI Voice Agents

Build an AI Voice Agent with SolidStart + SolidJS + OpenAI Realtime (2026)

SolidStart 1.3 + Solid 1.9 deliver fine-grained reactivity with no VDOM — voice agents render at 30% lower CPU than React. Plug WebRTC into Solid signals.

AI Infrastructure

TensorFlow.js + ML5.js Voice Agents in the Browser: 2026 Architecture

Pre-trained Speech Commands models, ml5.js wrappers, and TensorFlow.js with the WASM/WebGPU backend let you ship a voice agent with wake-word, intent, and tone detection — all client-side.

AI Voice Agents

Build an AI Voice Agent with Nuxt 3 + Vue 3.5 + OpenAI Realtime (2026)

Nuxt 3 Nitro server routes mint ephemeral OpenAI keys, Vue 3.5 composables wrap WebRTC, and Pinia holds the call state. Sub-700ms voice agent in 200 lines.

AI Voice Agents

Build a Voice Agent with Bolna (Open-Source Production Stack)

Bolna 0.10 wires LiteLLM, Deepgram, ElevenLabs, Twilio and Plivo into one OSS orchestrator. Deploy a full conversational voice agent in under 200 lines of YAML + Python.

AI Voice Agents

Build an AI Voice Agent with SvelteKit + WebRTC + OpenAI Realtime (2026)

SvelteKit 2 + Svelte 5 runes give you reactive voice UI with 30% smaller bundles than React. Wire WebRTC ephemeral keys to OpenAI Realtime for browser-direct voice.