TL;DR — Hume EVI 3 is a single speech-language model that handles transcription, language, and speech in one shot — and it tracks the user's vocal emotion in real time. You can describe ANY voice in a prompt ("a warm 40-year-old British woman"), point it at Claude or Gemini, and get sub-300ms emotionally aware replies.

What you'll build

A Next.js app using Hume's TypeScript SDK to open an EVI 3 WebSocket session, render the live emotion meter, and let users design a voice via plain-English prompt — all under 250 lines.

Architecture

flowchart LR
  MIC[Browser mic] -- WS audio --> EV[Hume EVI 3]
  EV -- prosody + transcript --> APP[Your client]
  EV -- voice audio --> APP --> SP[Speakers]
  EV -- llm_call --> CLD[Claude 4 / Gemini 2.5]

Step 1 — Install

```bash npm i hume @humeai/voice-react

server-side only:

npm i hume jsonwebtoken ```

Step 2 — Mint an access token (server)

```ts // app/api/hume-token/route.ts import { fetchAccessToken } from "@humeai/voice";

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

export async function GET() { const accessToken = await fetchAccessToken({ apiKey: process.env.HUME_API_KEY!, secretKey: process.env.HUME_SECRET_KEY!, }); return Response.json({ accessToken }); } ```

Step 3 — Configure the EVI

In platform.hume.ai → EVI → Configs, create a config with:

Model: evi-3
Voice description (prompt): A warm, calm 35-year-old American woman who sounds like a kind nurse.
LLM: anthropic/claude-3-5-sonnet (or google/gemini-2.5-flash)
System prompt: You are Ava, a clinic concierge. Adapt tone to the caller's emotion.
Tools: optional (function calls work like OpenAI)

Copy the resulting configId.

Step 4 — Client provider

```tsx "use client"; import { VoiceProvider, useVoice } from "@humeai/voice-react";

export default function Page() { const [token, setToken] = useState<string | null>(null); useEffect(() => { fetch("/api/hume-token").then(r=>r.json()) .then(j=>setToken(j.accessToken)); }, []); if (!token) return null; return ( <VoiceProvider auth={{ type: "accessToken", value: token }} configId={process.env.NEXT_PUBLIC_HUME_CONFIG_ID!}> ); } ```

Step 5 — Render the emotion meter

```tsx function Concierge() { const { connect, disconnect, status, messages } = useVoice(); const last = messages[messages.length - 1]; const top3 = last?.models?.prosody?.scores ? Object.entries(last.models.prosody.scores) .sort((a, b) => (b[1] as number) - (a[1] as number)).slice(0, 3) : []; return ( <> <button onClick={status.value === "connected" ? disconnect : connect}> {status.value === "connected" ? "Hang up" : "Talk"}

{k}: {(v as number).toFixed(2)}

</> ); } ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 6 — Design a voice in code

```ts // node script import { HumeClient } from "hume"; const hume = new HumeClient({ apiKey: process.env.HUME_API_KEY! }); const voice = await hume.empathicVoice.customVoices.create({ name: "Sunrise Ava", baseVoice: "ITO", parameterModel: "20240715-4parameter", parameters: { gender: 2, assertiveness: -1, buoyancy: 1, confidence: 0 }, }); console.log(voice.id); ```

Step 7 — Hook tool calls

EVI 3 tool events look like {type: "tool_call", name, parameters, tool_call_id} — handle in onMessage and respond with {type: "tool_response", tool_call_id, content}.

Pitfalls

WebSocket only: No HTTP REST surface for EVI; budget your reconnect logic.
Voice description quality: Vague prompts ("nice voice") yield generic output — be specific (age, accent, energy).
Latency vs realism: evi-3 is ~280ms p50; switching to evi-3-fast drops to ~180ms with slightly less expressive prosody.
Multi-language: Excellent on EN; for 60+ languages pair EVI 3 STT with Soniox or Universal-3.

How CallSphere does this

CallSphere uses EVI 3 in the Behavioral Health vertical where emotional adaptation is core to UX — running across 37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149/$499/$1,499 · 14-day trial · 22% affiliate.

FAQ

Cost? Per-minute pricing on EVI 3 is comparable to GPT-4o Realtime — ~$0.18/min combined.

Custom LLM? Yes — point the config at OpenAI / Anthropic / Google / Mistral via the dashboard.

Voice cloning? With 30 seconds of audio, EVI 3 captures timbre, rhythm, and tone.

Phone calls? Twilio Media Streams bridge ships in the docs — wire WS-to-WS and you have PSTN.

Sources

Hume Blog - Introducing EVI 3 - https://www.hume.ai/blog/introducing-evi-3
Hume Blog - Announcing EVI 3 API - https://www.hume.ai/blog/announcing-evi-3-api
Hume API Docs - Speech-to-Speech (EVI) - https://dev.hume.ai/docs/speech-to-speech-evi/overview
Vercel Template - Hume Empathic Voice Starter - https://vercel.com/templates/next.js/empathic-voice-interface-starter

Build a Voice Agent with Hume EVI 3: Emotionally Intelligent Voice (2026)

What you'll build

Architecture

Step 1 — Install

server-side only:

Step 2 — Mint an access token (server)

Step 3 — Configure the EVI

Step 4 — Client provider

Step 5 — Render the emotion meter

Step 6 — Design a voice in code

Step 7 — Hook tool calls

Pitfalls

How CallSphere does this

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Multi-Region Voice Agent on Fly.io for Sub-500ms Global Latency (2026)

Build an AI Voice Agent with SolidStart + SolidJS + OpenAI Realtime (2026)

TensorFlow.js + ML5.js Voice Agents in the Browser: 2026 Architecture

Build an AI Voice Agent with Nuxt 3 + Vue 3.5 + OpenAI Realtime (2026)

Build a Voice Agent with Bolna (Open-Source Production Stack)

Build an AI Voice Agent with SvelteKit + WebRTC + OpenAI Realtime (2026)