Build a Voice Agent with Pipecat: Python Pipeline Framework (2026)
Pipecat 0.0.7x ships a frame-based pipeline for real-time voice. Wire Daily WebRTC, Deepgram, GPT-4o, and Cartesia into a working agent — code + pitfalls.
TL;DR — Pipecat is an open-source frame-based pipeline framework from Daily.co. You compose a list of processors (transport → STT → context → LLM → TTS → transport) and Pipecat handles every microsecond of timing, interruption, and back-pressure between them.
What you'll build
A Daily room voice agent that joins a call, listens with Deepgram, reasons with GPT-4o, and speaks back with Cartesia Sonic-3 — running locally on python bot.py and deployable to Daily Bots, Cerebrium, or Modal.
Architecture
flowchart LR
RM[Daily room] --> TR[DailyTransport]
TR --> STT[Deepgram STT]
STT --> CTX[OpenAILLMContext]
CTX --> LLM[OpenAI GPT-4o]
LLM --> TTS[Cartesia Sonic-3]
TTS --> TR --> RM
Step 1 — Install
```bash python -m venv .venv && source .venv/bin/activate pip install "pipecat-ai[daily,deepgram,openai,cartesia,silero]" ```
Step 2 — Build the pipeline
```python import os, asyncio from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineTask, PipelineParams from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.audio.vad.silero import SileroVADAnalyzer ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3 — Wire it up
```python async def main(room_url: str, token: str): transport = DailyTransport( room_url, token, "Pipecat Bot", DailyParams(audio_in_enabled=True, audio_out_enabled=True, vad_analyzer=SileroVADAnalyzer()), ) stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"]) llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o") tts = CartesiaTTSService( api_key=os.environ["CARTESIA_API_KEY"], voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", model="sonic-3", ) ctx = OpenAILLMContext([{"role": "system", "content": "You are a helpful clinic concierge."}]) agg = llm.create_context_aggregator(ctx) pipeline = Pipeline([ transport.input(), stt, agg.user(), llm, tts, transport.output(), agg.assistant(), ]) task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True)) await PipelineRunner().run(task) ```
Step 4 — Hook the join event
```python @transport.event_handler("on_first_participant_joined") async def on_join(transport, participant): await transport.capture_participant_transcription(participant["id"]) await task.queue_frames([ ctx.create_user_message("Greet the caller and ask how you can help.") ]) ```
Step 5 — Run
```bash DEEPGRAM_API_KEY=... OPENAI_API_KEY=... CARTESIA_API_KEY=... \ python bot.py --url https://yourorg.daily.co/agent --token ${DAILY_TOKEN} ```
Step 6 — Function calling
Pipecat's OpenAILLMContext supports the OpenAI tools schema directly. Add tools=[...] and the LLM service emits FunctionCallInProgressFrame / FunctionCallResultFrame you handle with llm.register_function("name", handler).
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Pitfalls
- VAD positioning: Place
SileroVADAnalyzeron the transport, NOT in the pipeline — frames must be VAD-tagged before they reach the aggregator. - Aggregator order:
agg.user()BEFORE the LLM,agg.assistant()AFTER the TTS — reversing it loses tool messages. allow_interruptions: Off by default in some templates; turn it on or the agent talks over the user.- Cartesia voice IDs: Region matters — pull voice IDs from your account, not docs.
How CallSphere does this
CallSphere runs 37 agents in 6 verticals with 90+ tools and 115+ DB tables. Pipecat powers the salon and behavioral health products at a steady ~680ms p50. $149/$499/$1,499 plans, 14-day trial, 22% affiliate.
FAQ
Pipecat vs LiveKit Agents? Pipecat is lower-level — you control every frame. LiveKit Agents is higher-level with built-in dispatch.
Can I swap transports? Yes — DailyTransport, LiveKitTransport, WebsocketServerTransport, FastAPIWebsocketTransport, and Twilio all share the same interface.
Is it production-ready? NVIDIA NIM ships Pipecat as their reference voice agent blueprint and AWS published a multi-part guide pairing it with Bedrock.
How do I observe it? Pipecat emits OpenTelemetry spans for every processor — point your collector at the runner.
Sources
- Pipecat Docs - Introduction - https://docs.pipecat.ai/getting-started/introduction
- GitHub - pipecat-ai/pipecat - https://github.com/pipecat-ai/pipecat
- HackerNoon - Real-Time Voice Agent with Pipecat - https://hackernoon.com/how-to-build-a-real-time-voice-agent-with-pipecat
- AWS - Intelligent AI Voice Agents with Pipecat + Bedrock - https://aws.amazon.com/blogs/machine-learning/building-intelligent-ai-voice-agents-with-pipecat-and-amazon-bedrock-part-1/
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.