TL;DR — Pipecat is an open-source frame-based pipeline framework from Daily.co. You compose a list of processors (transport → STT → context → LLM → TTS → transport) and Pipecat handles every microsecond of timing, interruption, and back-pressure between them.

What you'll build

A Daily room voice agent that joins a call, listens with Deepgram, reasons with GPT-4o, and speaks back with Cartesia Sonic-3 — running locally on python bot.py and deployable to Daily Bots, Cerebrium, or Modal.

Architecture

flowchart LR
  RM[Daily room] --> TR[DailyTransport]
  TR --> STT[Deepgram STT]
  STT --> CTX[OpenAILLMContext]
  CTX --> LLM[OpenAI GPT-4o]
  LLM --> TTS[Cartesia Sonic-3]
  TTS --> TR --> RM

Step 1 — Install

```bash python -m venv .venv && source .venv/bin/activate pip install "pipecat-ai[daily,deepgram,openai,cartesia,silero]" ```

Step 2 — Build the pipeline

```python import os, asyncio from pipecat.pipeline.pipeline import Pipeline from pipecat.pipeline.runner import PipelineRunner from pipecat.pipeline.task import PipelineTask, PipelineParams from pipecat.transports.services.daily import DailyTransport, DailyParams from pipecat.services.deepgram.stt import DeepgramSTTService from pipecat.services.openai.llm import OpenAILLMService from pipecat.services.cartesia.tts import CartesiaTTSService from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext from pipecat.audio.vad.silero import SileroVADAnalyzer ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 3 — Wire it up

```python async def main(room_url: str, token: str): transport = DailyTransport( room_url, token, "Pipecat Bot", DailyParams(audio_in_enabled=True, audio_out_enabled=True, vad_analyzer=SileroVADAnalyzer()), ) stt = DeepgramSTTService(api_key=os.environ["DEEPGRAM_API_KEY"]) llm = OpenAILLMService(api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o") tts = CartesiaTTSService( api_key=os.environ["CARTESIA_API_KEY"], voice_id="79a125e8-cd45-4c13-8a67-188112f4dd22", model="sonic-3", ) ctx = OpenAILLMContext([{"role": "system", "content": "You are a helpful clinic concierge."}]) agg = llm.create_context_aggregator(ctx) pipeline = Pipeline([ transport.input(), stt, agg.user(), llm, tts, transport.output(), agg.assistant(), ]) task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True)) await PipelineRunner().run(task) ```

Step 4 — Hook the join event

```python @transport.event_handler("on_first_participant_joined") async def on_join(transport, participant): await transport.capture_participant_transcription(participant["id"]) await task.queue_frames([ ctx.create_user_message("Greet the caller and ask how you can help.") ]) ```

Step 5 — Run

```bash DEEPGRAM_API_KEY=... OPENAI_API_KEY=... CARTESIA_API_KEY=... \ python bot.py --url https://yourorg.daily.co/agent --token ${DAILY_TOKEN} ```

Step 6 — Function calling

Pipecat's OpenAILLMContext supports the OpenAI tools schema directly. Add tools=[...] and the LLM service emits FunctionCallInProgressFrame / FunctionCallResultFrame you handle with llm.register_function("name", handler).

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Pitfalls

VAD positioning: Place SileroVADAnalyzer on the transport, NOT in the pipeline — frames must be VAD-tagged before they reach the aggregator.
Aggregator order: agg.user() BEFORE the LLM, agg.assistant() AFTER the TTS — reversing it loses tool messages.
allow_interruptions: Off by default in some templates; turn it on or the agent talks over the user.
Cartesia voice IDs: Region matters — pull voice IDs from your account, not docs.

How CallSphere does this

CallSphere runs 37 agents in 6 verticals with 90+ tools and 115+ DB tables. Pipecat powers the salon and behavioral health products at a steady ~680ms p50. $149/$499/$1,499 plans, 14-day trial, 22% affiliate.

FAQ

Pipecat vs LiveKit Agents? Pipecat is lower-level — you control every frame. LiveKit Agents is higher-level with built-in dispatch.

Can I swap transports? Yes — DailyTransport, LiveKitTransport, WebsocketServerTransport, FastAPIWebsocketTransport, and Twilio all share the same interface.

Is it production-ready? NVIDIA NIM ships Pipecat as their reference voice agent blueprint and AWS published a multi-part guide pairing it with Bedrock.

How do I observe it? Pipecat emits OpenTelemetry spans for every processor — point your collector at the runner.

Sources

Pipecat Docs - Introduction - https://docs.pipecat.ai/getting-started/introduction
GitHub - pipecat-ai/pipecat - https://github.com/pipecat-ai/pipecat
HackerNoon - Real-Time Voice Agent with Pipecat - https://hackernoon.com/how-to-build-a-real-time-voice-agent-with-pipecat
AWS - Intelligent AI Voice Agents with Pipecat + Bedrock - https://aws.amazon.com/blogs/machine-learning/building-intelligent-ai-voice-agents-with-pipecat-and-amazon-bedrock-part-1/

Build a Voice Agent with Pipecat: Python Pipeline Framework (2026)

What you'll build

Architecture

Step 1 — Install

Step 2 — Build the pipeline

Step 3 — Wire it up

Step 4 — Hook the join event

Step 5 — Run

Step 6 — Function calling

Pitfalls

How CallSphere does this

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Deploy a Voice Agent on fly.io with Multi-Region Routing

Streaming TTS Quality Benchmarks 2026: Naturalness, Latency, and Cost Side-by-Side

Build a Multi-Region Voice Agent on Fly.io for Sub-500ms Global Latency (2026)

Build an AI Voice Agent with SolidStart + SolidJS + OpenAI Realtime (2026)