TL;DR — Blue/green for voice means: stand up green, drain blue with sticky sessions intact, cut over new calls only, keep state in Redis so reconnects work. Stickiness duration on the LB must match your call SLO, not 12 hours.

What you'll set up

Two parallel ReplicaSets (voice-agent-blue and voice-agent-green) behind an Application Load Balancer with target-group stickiness, session state in Redis so a reconnecting call lands on the same color, and a cutover script that gates on "all blue calls drained".

Architecture

flowchart TD
  CLIENT[Caller WS] --> ALB[ALB stickiness=5m]
  ALB --> BLUE[voice-agent-blue]
  ALB --> GREEN[voice-agent-green]
  BLUE --> REDIS[(Redis call state)]
  GREEN --> REDIS
  CUT[Cutover] -->|weight 0/100| ALB
  DRAIN[Drain blue] --> BLUE

Step 1 — Two target groups, one ALB

```hcl resource "aws_lb_target_group" "blue" { name = "voice-blue" port = 8080 protocol = "HTTP" protocol_version = "HTTP1" # WebSocket stickiness { type = "lb_cookie" cookie_duration = 300 # 5 minutes — matches our max call length enabled = true } deregistration_delay = 600 # let active calls finish (10 min) health_check { path = "/healthz/realtime" interval = 10 timeout = 3 } } resource "aws_lb_target_group" "green" { ... identical with name = "voice-green" } ```

cookie_duration = 300 (5 min) is the critical knob. Default 12 hours means a customer who reconnects 11 hours later still hits the old color — keeps blue alive forever.

Step 2 — Listener with weighted routing

```hcl resource "aws_lb_listener_rule" "voice" { listener_arn = aws_lb_listener.https.arn action { type = "forward" forward { target_group { arn = aws_lb_target_group.blue.arn weight = 100 } target_group { arn = aws_lb_target_group.green.arn weight = 0 } stickiness { duration = 300 enabled = true } } } condition { path_pattern { values = ["/realtime/*"] } } } ```

Step 3 — Move call state to Redis

The voice agent must not keep call context in process memory. Otherwise a green pod can't resume a blue pod's call. Move to Redis:

```python import redis.asyncio as redis r = redis.from_url("redis://voice-redis:6379")

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

async def get_call(call_id): return json.loads(await r.get(f"call:{call_id}") or "{}") async def set_call(call_id, state): await r.setex(f"call:{call_id}", 1800, json.dumps(state)) ```

Now any pod (blue or green) can pick up state. Sticky sessions stay because of the cookie, but reconnects are safe.

Step 4 — Deploy green alongside blue

```bash kubectl apply -f voice-agent-green.yaml kubectl rollout status deploy/voice-agent-green --timeout=120s ```

Green is live and registered in its target group, but weight = 0 means no traffic.

Step 5 — Cutover with a smoke test

```bash

Promote green to 100, blue to 0

aws elbv2 modify-rule --rule-arn $RULE \ --actions '[{"Type":"forward","ForwardConfig":{"TargetGroups":[ {"TargetGroupArn":"${BLUE_TG}","Weight":0}, {"TargetGroupArn":"${GREEN_TG}","Weight":100}], "TargetGroupStickinessConfig":{"Enabled":true,"DurationSeconds":300}}}]'

New calls go green; existing sticky cookies still finish on blue

```

Then poll until blue drains:

```bash while [ "$(aws cloudwatch get-metric-data ... blue active_calls)" != "0" ]; do sleep 30; done ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 6 — Decommission blue

```bash kubectl delete deploy voice-agent-blue ```

After confirming blue active_calls == 0 for at least 5 min (a stragglers buffer), remove. ALB target deregistration_delay (10 min) handles late connection drains.

Step 7 — Auto-rollback if green smoke fails

```bash

Quick smoke

python smoke/realtime_ping.py --url wss://agent.example.com/realtime --color green || \ aws elbv2 modify-rule ... --weights blue=100,green=0 && exit 1 ```

Pitfalls

Stickiness cookie domain mismatch — set cookie domain explicitly on the listener; otherwise multi-subdomain setups lose stickiness across reconnects.
deregistration_delay too short — 60s defaults will kill in-flight calls. Set ≥ your p99 call length.
TLS termination at ALB with mTLS to upstream needs the right ssl_policy. Default ELBSecurityPolicy-2016-08 lacks TLS 1.3.
Connection multiplexing — HTTP/2 means many calls share a connection. Stickiness on connection != stickiness on call. Test with a real client.
Forgetting Redis backups — call state in Redis is great for blue/green; lose Redis and lose every active call. Run with replicas + AOF.

How CallSphere does this in production

CallSphere does blue/green for major voice-agent versions and canary (Argo Rollouts) for prompt changes. Stickiness is 300s; call state lives in Redis Sentinel; deregistration delay 10 min. Across our k3s + Cloudflare Tunnel stack we cut over ~12 times a month with zero call drops on the blue/green path. 37 agents, 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day trial, 22% affiliate.

FAQ

Q: Blue/green vs canary for voice? Blue/green for infrastructure swaps (LB config, runtime version). Canary for agent changes (prompts, models). Use both.

Q: Why not WebRTC instead of WebSocket signaling? WebRTC media is point-to-point and doesn't go through the ALB. Only the signaling WebSocket needs sticky handling.

Q: Redis instead of in-memory state — what's the latency cost? ~1-2 ms per turn. Negligible vs voice round-trip.

Q: Can I do this with k8s Services only, no ALB? Yes — use sessionAffinity: ClientIP and two Services with weighted Ingress. Less elegant than ALB target weights, but works.

Blue/Green Voice Agent Deploys with WebSocket Sticky Sessions (2026)

What you'll set up

Architecture

Step 1 — Two target groups, one ALB

Step 2 — Listener with weighted routing

Step 3 — Move call state to Redis

Step 4 — Deploy green alongside blue

Step 5 — Cutover with a smoke test

Promote green to 100, blue to 0

New calls go green; existing sticky cookies still finish on blue

Step 6 — Decommission blue

Step 7 — Auto-rollback if green smoke fails

Quick smoke

Pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Defense, ITAR & AI Voice Vendor Compliance in 2026

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

WebRTC Over QUIC and the Future of Realtime: Where Voice AI Goes After 2026

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals