Multi-Step AI Workflows: Orchestrating Claude Across Complex Tasks
Learn patterns for orchestrating Claude across multi-step workflows including sequential chains, parallel fan-out, conditional branching, and human-in-the-loop checkpoints. Includes production-ready Python examples.
Why Single-Call AI Is Not Enough
Most AI integrations start as a single API call: user sends input, model returns output, done. But real business processes are multi-step. Reviewing a contract involves extracting clauses, checking against policy, flagging risks, and generating a summary. Onboarding a customer requires validating documents, running compliance checks, creating accounts, and sending notifications.
Orchestrating Claude across multi-step workflows is the difference between "AI feature" and "AI-powered system." The challenge is not making individual calls, it is managing state, handling failures, and coordinating parallel and sequential steps efficiently.
The Four Orchestration Patterns
Pattern 1: Sequential Chain
The simplest pattern. Each step's output feeds into the next step's input.
flowchart LR
INPUT(["User intent"])
PARSE["Parse plus<br/>classify"]
PLAN["Plan and tool<br/>selection"]
AGENT["Agent loop<br/>LLM plus tools"]
GUARD{"Guardrails<br/>and policy"}
EXEC["Execute and<br/>verify result"]
OBS[("Trace and metrics")]
OUT(["Outcome plus<br/>next action"])
INPUT --> PARSE --> PLAN --> AGENT --> GUARD
GUARD -->|Pass| EXEC --> OUT
GUARD -->|Fail| AGENT
AGENT --> OBS
style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
style OUT fill:#059669,stroke:#047857,color:#fff
import anthropic
from dataclasses import dataclass
client = anthropic.Anthropic()
@dataclass
class StepResult:
step_name: str
output: str
tokens_used: int
model: str
async def sequential_chain(document: str) -> list[StepResult]:
"""Process a document through a sequential analysis chain."""
results = []
# Step 1: Extract key information
extraction = client.messages.create(
model="claude-haiku-4-20250514", # Fast model for extraction
max_tokens=2048,
messages=[{
"role": "user",
"content": f"Extract all dates, names, monetary amounts, and "
f"obligations from this document:\n\n{document}"
}]
)
results.append(StepResult(
step_name="extraction",
output=extraction.content[0].text,
tokens_used=extraction.usage.output_tokens,
model="claude-haiku-4-20250514"
))
# Step 2: Analyze risks (uses extraction output)
risk_analysis = client.messages.create(
model="claude-sonnet-4-20250514", # Stronger model for analysis
max_tokens=4096,
messages=[{
"role": "user",
"content": f"Given these extracted elements:\n{extraction.content[0].text}"
f"\n\nIdentify potential risks, ambiguities, and "
f"missing clauses in this contract."
}]
)
results.append(StepResult(
step_name="risk_analysis",
output=risk_analysis.content[0].text,
tokens_used=risk_analysis.usage.output_tokens,
model="claude-sonnet-4-20250514"
))
# Step 3: Generate summary (uses both previous outputs)
summary = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"Create an executive summary of this contract review."
f"\n\nExtracted elements:\n{extraction.content[0].text}"
f"\n\nRisk analysis:\n{risk_analysis.content[0].text}"
}]
)
results.append(StepResult(
step_name="summary",
output=summary.content[0].text,
tokens_used=summary.usage.output_tokens,
model="claude-sonnet-4-20250514"
))
return results
When to use: Tasks with clear linear dependencies where each step requires the previous step's output.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Pattern 2: Parallel Fan-Out / Fan-In
When multiple independent analyses can run simultaneously, fan-out to parallel calls and fan-in to combine results.
import asyncio
from anthropic import AsyncAnthropic
async_client = AsyncAnthropic()
async def parallel_analysis(document: str) -> dict:
"""Run multiple independent analyses in parallel."""
async def analyze(aspect: str, instructions: str) -> tuple[str, str]:
response = await async_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"{instructions}\n\nDocument:\n{document}"
}]
)
return aspect, response.content[0].text
# Fan-out: run all analyses concurrently
tasks = [
analyze("legal", "Identify all legal obligations and liabilities."),
analyze("financial", "Extract and analyze all financial terms."),
analyze("compliance", "Check for regulatory compliance issues."),
analyze("timeline", "Extract all deadlines and milestones."),
]
results = await asyncio.gather(*tasks)
# Fan-in: combine results
analysis_map = dict(results)
# Synthesis step: combine all parallel results
synthesis = await async_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": (
"Synthesize these analyses into a unified report:\n\n"
+ "\n\n".join(
f"## {k.title()} Analysis\n{v}"
for k, v in analysis_map.items()
)
)
}]
)
return {
"individual_analyses": analysis_map,
"synthesis": synthesis.content[0].text
}
When to use: Multiple independent analyses of the same input, where a final synthesis step combines the results.
Pattern 3: Conditional Branching
Different inputs require different processing paths. A routing step decides which branch to execute.
import json
async def conditional_workflow(user_request: str) -> dict:
"""Route and process requests based on AI classification."""
# Step 1: Classify the request
classification = await async_client.messages.create(
model="claude-haiku-4-20250514",
max_tokens=256,
messages=[{
"role": "user",
"content": f"""Classify this request into exactly one category.
Categories: billing, technical_support, account_change, general_inquiry
Request: {user_request}
Respond with JSON: {{"category": "...", "confidence": 0.0-1.0}}"""
}]
)
route = json.loads(classification.content[0].text)
# Step 2: Branch based on classification
branch_configs = {
"billing": {
"model": "claude-sonnet-4-20250514",
"system": "You are a billing specialist. Access account data via tools.",
"tools": billing_tools,
},
"technical_support": {
"model": "claude-sonnet-4-20250514",
"system": "You are a technical support engineer. Diagnose and resolve issues.",
"tools": tech_support_tools,
},
"account_change": {
"model": "claude-sonnet-4-20250514",
"system": "You are an account manager. Process account modifications.",
"tools": account_tools,
},
"general_inquiry": {
"model": "claude-haiku-4-20250514",
"system": "You are a helpful assistant. Answer general questions.",
"tools": [],
},
}
config = branch_configs.get(route["category"], branch_configs["general_inquiry"])
# Step 3: Execute the appropriate branch
response = await async_client.messages.create(
model=config["model"],
system=config["system"],
max_tokens=4096,
tools=config["tools"],
messages=[{"role": "user", "content": user_request}]
)
return {
"classification": route,
"response": response.content[0].text,
"branch_used": route["category"]
}
Pattern 4: Human-in-the-Loop Checkpoint
For high-stakes workflows, insert approval gates where a human reviews the AI's work before proceeding.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
from enum import Enum
class ApprovalStatus(Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"
MODIFIED = "modified"
async def workflow_with_checkpoints(task: str) -> dict:
"""Execute a workflow with human approval checkpoints."""
# Step 1: AI generates a plan
plan = await async_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"Create a detailed execution plan for: {task}\n"
f"List each step with expected outcomes and risks."
}]
)
# Checkpoint: save plan and wait for human approval
checkpoint_id = await save_checkpoint(
stage="plan_review",
content=plan.content[0].text,
requires_approval=True
)
# In production, this would be async (webhook, polling, queue)
approval = await wait_for_approval(checkpoint_id)
if approval.status == ApprovalStatus.REJECTED:
return {"status": "rejected", "reason": approval.feedback}
# Use the potentially modified plan
approved_plan = approval.modified_content or plan.content[0].text
# Step 2: Execute the approved plan
execution = await async_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8096,
messages=[{
"role": "user",
"content": f"Execute this approved plan:\n{approved_plan}"
}]
)
return {"status": "completed", "result": execution.content[0].text}
Error Handling and Retry Strategies
Multi-step workflows need robust error handling because any step can fail.
import time
from anthropic import APIError, RateLimitError
async def resilient_step(
messages: list,
model: str = "claude-sonnet-4-20250514",
max_retries: int = 3,
fallback_model: str = "claude-haiku-4-20250514"
) -> str:
"""Execute a step with retries and model fallback."""
for attempt in range(max_retries):
try:
response = await async_client.messages.create(
model=model,
max_tokens=4096,
messages=messages
)
return response.content[0].text
except RateLimitError:
wait_time = 2 ** attempt # Exponential backoff
time.sleep(wait_time)
except APIError as e:
if attempt == max_retries - 1 and fallback_model:
# Last resort: try a different model
response = await async_client.messages.create(
model=fallback_model,
max_tokens=4096,
messages=messages
)
return response.content[0].text
raise
raise RuntimeError(f"Step failed after {max_retries} retries")
Cost Optimization: Model Routing Per Step
One of the biggest advantages of multi-step workflows is using the right model for each step. Not every step needs the most capable model.
| Step Type | Recommended Model | Why |
|---|---|---|
| Classification / routing | Haiku | Fast, cheap, highly accurate for simple decisions |
| Data extraction | Haiku or Sonnet | Structured extraction is well-handled by smaller models |
| Complex analysis | Sonnet | Good balance of capability and cost |
| Critical decisions | Opus | Highest accuracy for high-stakes reasoning |
| Synthesis / writing | Sonnet | Strong writing quality at reasonable cost |
A typical workflow using model routing costs 40-60% less than using Sonnet for every step, with no measurable quality degradation.
Summary
Multi-step AI workflows transform Claude from a question-answering tool into a process automation engine. The four core patterns, sequential chains, parallel fan-out, conditional branching, and human-in-the-loop, can be combined to model almost any business process. The keys to production success are robust error handling with fallbacks, model routing for cost optimization, and checkpoint-based human oversight for high-stakes decisions.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.