Skip to content
AI Infrastructure
AI Infrastructure11 min read0 views

Blue/Green Voice Agent Deploys with WebSocket Sticky Sessions (2026)

Blue/green deploy an AI voice agent without dropping calls. ALB stickiness, draining timeouts tuned for WebSockets, Redis-backed session state, and a clean cutover.

TL;DR — Blue/green for voice means: stand up green, drain blue with sticky sessions intact, cut over new calls only, keep state in Redis so reconnects work. Stickiness duration on the LB must match your call SLO, not 12 hours.

What you'll set up

Two parallel ReplicaSets (voice-agent-blue and voice-agent-green) behind an Application Load Balancer with target-group stickiness, session state in Redis so a reconnecting call lands on the same color, and a cutover script that gates on "all blue calls drained".

Architecture

flowchart TD
  CLIENT[Caller WS] --> ALB[ALB stickiness=5m]
  ALB --> BLUE[voice-agent-blue]
  ALB --> GREEN[voice-agent-green]
  BLUE --> REDIS[(Redis call state)]
  GREEN --> REDIS
  CUT[Cutover] -->|weight 0/100| ALB
  DRAIN[Drain blue] --> BLUE

Step 1 — Two target groups, one ALB

```hcl resource "aws_lb_target_group" "blue" { name = "voice-blue" port = 8080 protocol = "HTTP" protocol_version = "HTTP1" # WebSocket stickiness { type = "lb_cookie" cookie_duration = 300 # 5 minutes — matches our max call length enabled = true } deregistration_delay = 600 # let active calls finish (10 min) health_check { path = "/healthz/realtime" interval = 10 timeout = 3 } } resource "aws_lb_target_group" "green" { ... identical with name = "voice-green" } ```

cookie_duration = 300 (5 min) is the critical knob. Default 12 hours means a customer who reconnects 11 hours later still hits the old color — keeps blue alive forever.

Step 2 — Listener with weighted routing

```hcl resource "aws_lb_listener_rule" "voice" { listener_arn = aws_lb_listener.https.arn action { type = "forward" forward { target_group { arn = aws_lb_target_group.blue.arn weight = 100 } target_group { arn = aws_lb_target_group.green.arn weight = 0 } stickiness { duration = 300 enabled = true } } } condition { path_pattern { values = ["/realtime/*"] } } } ```

Step 3 — Move call state to Redis

The voice agent must not keep call context in process memory. Otherwise a green pod can't resume a blue pod's call. Move to Redis:

```python import redis.asyncio as redis r = redis.from_url("redis://voice-redis:6379")

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

async def get_call(call_id): return json.loads(await r.get(f"call:{call_id}") or "{}") async def set_call(call_id, state): await r.setex(f"call:{call_id}", 1800, json.dumps(state)) ```

Now any pod (blue or green) can pick up state. Sticky sessions stay because of the cookie, but reconnects are safe.

Step 4 — Deploy green alongside blue

```bash kubectl apply -f voice-agent-green.yaml kubectl rollout status deploy/voice-agent-green --timeout=120s ```

Green is live and registered in its target group, but weight = 0 means no traffic.

Step 5 — Cutover with a smoke test

```bash

Promote green to 100, blue to 0

aws elbv2 modify-rule --rule-arn $RULE \ --actions '[{"Type":"forward","ForwardConfig":{"TargetGroups":[ {"TargetGroupArn":"${BLUE_TG}","Weight":0}, {"TargetGroupArn":"${GREEN_TG}","Weight":100}], "TargetGroupStickinessConfig":{"Enabled":true,"DurationSeconds":300}}}]'

New calls go green; existing sticky cookies still finish on blue

```

Then poll until blue drains:

```bash while [ "$(aws cloudwatch get-metric-data ... blue active_calls)" != "0" ]; do sleep 30; done ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 6 — Decommission blue

```bash kubectl delete deploy voice-agent-blue ```

After confirming blue active_calls == 0 for at least 5 min (a stragglers buffer), remove. ALB target deregistration_delay (10 min) handles late connection drains.

Step 7 — Auto-rollback if green smoke fails

```bash

Quick smoke

python smoke/realtime_ping.py --url wss://agent.example.com/realtime --color green || \ aws elbv2 modify-rule ... --weights blue=100,green=0 && exit 1 ```

Pitfalls

  • Stickiness cookie domain mismatch — set cookie domain explicitly on the listener; otherwise multi-subdomain setups lose stickiness across reconnects.
  • deregistration_delay too short — 60s defaults will kill in-flight calls. Set ≥ your p99 call length.
  • TLS termination at ALB with mTLS to upstream needs the right ssl_policy. Default ELBSecurityPolicy-2016-08 lacks TLS 1.3.
  • Connection multiplexing — HTTP/2 means many calls share a connection. Stickiness on connection != stickiness on call. Test with a real client.
  • Forgetting Redis backups — call state in Redis is great for blue/green; lose Redis and lose every active call. Run with replicas + AOF.

How CallSphere does this in production

CallSphere does blue/green for major voice-agent versions and canary (Argo Rollouts) for prompt changes. Stickiness is 300s; call state lives in Redis Sentinel; deregistration delay 10 min. Across our k3s + Cloudflare Tunnel stack we cut over ~12 times a month with zero call drops on the blue/green path. 37 agents, 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day trial, 22% affiliate.

FAQ

Q: Blue/green vs canary for voice? Blue/green for infrastructure swaps (LB config, runtime version). Canary for agent changes (prompts, models). Use both.

Q: Why not WebRTC instead of WebSocket signaling? WebRTC media is point-to-point and doesn't go through the ALB. Only the signaling WebSocket needs sticky handling.

Q: Redis instead of in-memory state — what's the latency cost? ~1-2 ms per turn. Negligible vs voice round-trip.

Q: Can I do this with k8s Services only, no ALB? Yes — use sessionAffinity: ClientIP and two Services with weighted Ingress. Less elegant than ALB target weights, but works.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Engineering

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.

AI Infrastructure

Defense, ITAR & AI Voice Vendor Compliance in 2026

ITAR technical-data definitions don't care if a human or an LLM produced the output. CMMC Level 2 has been mandatory since November 2025. Here is what an AI voice vendor needs to ship to defense in 2026.

AI Engineering

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.

AI Engineering

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Every 100ms of latency costs you. So does every cent per minute. Here is the decision matrix we use across 6 verticals to pick where to spend and where to save on voice AI infrastructure.

AI Infrastructure

WebRTC Over QUIC and the Future of Realtime: Where Voice AI Goes After 2026

WebTransport is Baseline as of March 2026. Media Over QUIC ships in production within the year. Here is what changes for AI voice agents — and what stays the same.

AI Strategy

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Q1 2026 saw a record acquisition wave: Aircall bought Vogent (May), Meta acquired Manus and PlayAI, OpenAI closed six deals. The voice AI consolidation phase has begun.