Blue-Green Deployments for AI Agents: Zero-Downtime Model and Prompt Updates

Why Blue-Green Deployments for AI Agents

Deploying a new version of an AI agent is riskier than deploying a typical web service. A subtle prompt change can make the agent behave inappropriately. A model upgrade might produce longer or shorter responses that break client parsing. A tool integration update might introduce latency that causes timeouts. You need the ability to deploy, validate, and roll back in seconds, not minutes.

Blue-green deployment maintains two identical production environments. Only one (the "live" environment) receives user traffic at any time. You deploy updates to the idle environment, validate them, then switch traffic. If anything goes wrong, switching back is instantaneous.

Kubernetes Blue-Green Architecture

Create two Deployments and a single Service that targets one of them:

flowchart LR
    GIT(["Git push"])
    CI["GitHub Actions<br/>build plus test"]
    REG[("Container registry<br/>GHCR or ECR")]
    HELM["Helm chart<br/>values per env"]
    K8S{"Kubernetes cluster"}
    DEP["Deployment<br/>rolling update"]
    SVC["Service plus Ingress"]
    HPA["HPA<br/>CPU and queue depth"]
    POD[("Inference pods<br/>GPU node pool")]
    USERS(["Production traffic"])
    GIT --> CI --> REG --> HELM --> K8S
    K8S --> DEP --> POD
    K8S --> SVC --> POD
    K8S --> HPA --> POD
    SVC --> USERS
    style CI fill:#4f46e5,stroke:#4338ca,color:#fff
    style POD fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style USERS fill:#059669,stroke:#047857,color:#fff

# k8s/blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-blue
  namespace: ai-agents
  labels:
    app: agent-service
    slot: blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agent-service
      slot: blue
  template:
    metadata:
      labels:
        app: agent-service
        slot: blue
        version: "1.2.0"
    spec:
      containers:
        - name: agent
          image: registry.example.com/agent-service:1.2.0
          ports:
            - containerPort: 8000
          env:
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: agent-secrets
                  key: openai-api-key
            - name: AGENT_VERSION
              value: "1.2.0"
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 5

# k8s/green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-green
  namespace: ai-agents
  labels:
    app: agent-service
    slot: green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agent-service
      slot: green
  template:
    metadata:
      labels:
        app: agent-service
        slot: green
        version: "1.3.0"
    spec:
      containers:
        - name: agent
          image: registry.example.com/agent-service:1.3.0
          ports:
            - containerPort: 8000
          env:
            - name: OPENAI_API_KEY
              valueFrom:
                secretKeyRef:
                  name: agent-secrets
                  key: openai-api-key
            - name: AGENT_VERSION
              value: "1.3.0"
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8000
            initialDelaySeconds: 10
            periodSeconds: 5

The Traffic-Switching Service

A single Service points to whichever slot is live:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: agent-service
  namespace: ai-agents
spec:
  selector:
    app: agent-service
    slot: blue  # <-- Change this to "green" to switch traffic
  ports:
    - port: 80
      targetPort: 8000

Switch traffic by patching the selector:

# Switch from blue to green
kubectl patch service agent-service -n ai-agents \
  -p '{"spec": {"selector": {"slot": "green"}}}'

# Verify the switch
kubectl get endpoints agent-service -n ai-agents

Traffic switches in seconds because all green pods are already running and healthy.

Deployment Script with Validation

Automate the deploy-validate-switch workflow:

#!/usr/bin/env python3
# scripts/deploy.py
import subprocess
import sys
import time
import httpx

def run(cmd: str) -> str:
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    if result.returncode != 0:
        print(f"FAILED: {cmd}\n{result.stderr}")
        sys.exit(1)
    return result.stdout.strip()

def get_live_slot() -> str:
    output = run("kubectl get svc agent-service -n ai-agents -o jsonpath='{.spec.selector.slot}'")
    return output.strip("'")

def get_idle_slot(live: str) -> str:
    return "green" if live == "blue" else "blue"

def wait_for_ready(deployment: str, timeout: int = 120):
    print(f"Waiting for {deployment} to be ready...")
    run(f"kubectl rollout status deployment/{deployment} -n ai-agents --timeout={timeout}s")

def validate_slot(slot: str) -> bool:
    """Run smoke tests against the idle slot."""
    port_forward = subprocess.Popen(
        f"kubectl port-forward deploy/agent-{slot} 9090:8000 -n ai-agents",
        shell=True,
    )
    time.sleep(3)
    try:
        resp = httpx.get("http://localhost:9090/readyz", timeout=10)
        return resp.status_code == 200
    finally:
        port_forward.terminate()

def main():
    image = sys.argv[1]  # e.g., registry.example.com/agent-service:1.3.0
    live = get_live_slot()
    idle = get_idle_slot(live)

    print(f"Live: {live}, Deploying to: {idle}")

    run(f"kubectl set image deployment/agent-{idle} agent={image} -n ai-agents")
    wait_for_ready(f"agent-{idle}")

    if not validate_slot(idle):
        print("Validation failed. Aborting.")
        sys.exit(1)

    run(f"kubectl patch svc agent-service -n ai-agents -p '{{"spec": {{"selector": {{"slot": "{idle}"}}}}}}'")
    print(f"Traffic switched to {idle}")

if __name__ == "__main__":
    main()

Rollback Procedure

Rollback is a single command — switch traffic back to the previous slot:

# If green is live and broken, switch back to blue
kubectl patch service agent-service -n ai-agents \
  -p '{"spec": {"selector": {"slot": "blue"}}}'

The old version is still running with full replicas. No image pulls, no pod startups, no waiting.

Canary Testing Before Full Switch

Route a percentage of traffic to the new slot before committing:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

# Using nginx ingress annotations for traffic splitting
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: agent-canary
  namespace: ai-agents
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  rules:
    - host: agent.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: agent-green
                port:
                  number: 80

This sends 10% of traffic to green while blue handles the remaining 90%. Monitor error rates and latency, then increase the canary weight or roll back.

FAQ

How long should I keep the old (idle) deployment running after a switch?

Keep it running for at least the duration of your monitoring window — typically 30 minutes to a few hours. If you detect degradation in the new version, you can roll back instantly. Once you are confident the new version is stable, either leave the idle deployment as a standby or scale it to zero replicas to save resources.

How do blue-green deployments handle database migrations?

Database schema changes must be backward compatible. Both blue and green versions will run against the same database simultaneously during the transition. Use expand-and-contract migrations: first add new columns or tables (expand), deploy the new version, then remove old columns in a later release (contract). Never drop columns or change types in the same release that introduces the code change.

Can I use blue-green deployments to A/B test different AI agent prompts?

Yes. Deploy different prompt versions to blue and green, then use canary weights to split traffic. Compare metrics like task completion rate, user satisfaction, response latency, and cost per conversation across the two versions. This is one of the most powerful patterns for iterating on agent prompts in production with real user traffic.

#BlueGreenDeployment #AIAgents #ZeroDowntime #Kubernetes #DevOps #AgenticAI #LearnAI #AIEngineering

Blue-Green Deployments for AI Agents: Zero-Downtime Model and Prompt Updates

Why Blue-Green Deployments for AI Agents

Kubernetes Blue-Green Architecture

The Traffic-Switching Service

Deployment Script with Validation

Rollback Procedure

Canary Testing Before Full Switch

FAQ

How long should I keep the old (idle) deployment running after a switch?

How do blue-green deployments handle database migrations?

Can I use blue-green deployments to A/B test different AI agent prompts?

Try CallSphere AI Voice Agents

Related Articles You May Like

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026