Skip to content
Learn Agentic AI
Learn Agentic AI12 min read19 views

Sequential Agent Chaining: Building Pipeline Architectures

Learn how to build sequential agent pipelines where the output of one agent feeds directly into the next, using structured outputs for clean handoffs in the OpenAI Agents SDK.

Why Sequential Chaining Is the Most Underrated Agent Pattern

Most multi-agent tutorials jump straight to complex orchestration — handoffs, parallel execution, manager agents delegating to specialists. But the most reliable and debuggable multi-agent pattern is the simplest one: sequential chaining. Agent A finishes its work, passes a structured result to Agent B, which finishes its work and passes a structured result to Agent C.

Sequential pipelines are the assembly lines of agentic AI. Each agent has a single responsibility, a well-defined input contract, and a well-defined output contract. When something breaks, you know exactly which stage failed and why. When you need to improve quality, you can swap out a single stage without touching the rest.

This post walks through the architecture of sequential agent pipelines in the OpenAI Agents SDK, with particular focus on using structured outputs to create clean, type-safe handoffs between stages.

The Core Concept: Output as Input

A sequential chain is defined by one rule: the output of agent N becomes the input of agent N+1. This sounds trivial, but the implementation details matter enormously. If Agent A returns free-form text and Agent B expects structured data, the chain is fragile. If Agent A returns a Pydantic model and Agent B is instructed to work with that exact schema, the chain is robust.

flowchart LR
    INPUT(["User input"])
    AGENT["Agent<br/>name plus instructions"]
    HAND{"Handoff to<br/>another agent?"}
    SUB["Sub-agent<br/>specialist"]
    GUARD{"Guardrail<br/>passed?"}
    TOOL["Tool call"]
    SDK[("Tracing<br/>OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

The OpenAI Agents SDK supports structured outputs through the output_type parameter on an Agent. When you set an output type, the SDK forces the LLM to return valid JSON conforming to your Pydantic model. This eliminates parsing failures between stages.

from pydantic import BaseModel
from agents import Agent, Runner

class ResearchOutput(BaseModel):
    topic: str
    key_findings: list[str]
    sources: list[str]
    confidence_score: float

class DraftOutput(BaseModel):
    title: str
    sections: list[str]
    word_count: int
    tone: str

research_agent = Agent(
    name="Researcher",
    instructions="""You are a research analyst. Given a topic, produce
    structured research findings with sources and a confidence score
    between 0.0 and 1.0.""",
    model="gpt-4o",
    output_type=ResearchOutput,
)

draft_agent = Agent(
    name="Drafter",
    instructions="""You are a technical writer. Given research findings,
    produce a structured article draft with clear sections. Use a
    professional tone and target 800-1200 words.""",
    model="gpt-4o",
    output_type=DraftOutput,
)

Building the Pipeline Runner

The pipeline runner is a simple loop that feeds each agent's output into the next agent as input. The key design decision is how to format the handoff — you serialize the structured output into a string that the next agent can parse from its input message.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
import asyncio
import json
from agents import Agent, Runner

async def run_pipeline(agents: list[Agent], initial_input: str) -> any:
    """Run a sequential pipeline of agents, passing each output as the next input."""
    current_input = initial_input

    for i, agent in enumerate(agents):
        print(f"Stage {i + 1}/{len(agents)}: Running {agent.name}")

        result = await Runner.run(agent, input=current_input)
        output = result.final_output

        # If the output is a Pydantic model, serialize it for the next agent
        if hasattr(output, 'model_dump'):
            current_input = (
                f"Previous stage ({agent.name}) produced the following output:\n"
                f"{json.dumps(output.model_dump(), indent=2)}"
            )
        else:
            current_input = str(output)

        print(f"  Completed: {agent.name}")

    return output

Now running the full pipeline is a single function call:

async def main():
    pipeline = [research_agent, draft_agent]
    result = await run_pipeline(pipeline, "Write about sequential agent pipelines in production AI systems")

    print(f"Title: {result.title}")
    print(f"Word count: {result.word_count}")
    for section in result.sections:
        print(f"  - {section[:80]}...")

asyncio.run(main())

Adding Error Handling and Retries

Production pipelines need to handle failures at any stage. If Agent B fails, you should not need to re-run Agent A (which may have cost significant tokens and time). A resilient pipeline runner captures intermediate results and supports retries per stage.

from dataclasses import dataclass, field

@dataclass
class PipelineResult:
    stage_outputs: list[any] = field(default_factory=list)
    failed_stage: int | None = None
    error: str | None = None
    completed: bool = False

async def run_pipeline_with_retries(
    agents: list[Agent],
    initial_input: str,
    max_retries: int = 2,
) -> PipelineResult:
    """Run a pipeline with per-stage retries and intermediate result capture."""
    pipeline_result = PipelineResult()
    current_input = initial_input

    for i, agent in enumerate(agents):
        success = False

        for attempt in range(max_retries + 1):
            try:
                result = await Runner.run(agent, input=current_input)
                output = result.final_output
                pipeline_result.stage_outputs.append(output)

                if hasattr(output, 'model_dump'):
                    current_input = (
                        f"Previous stage ({agent.name}) produced:\n"
                        f"{json.dumps(output.model_dump(), indent=2)}"
                    )
                else:
                    current_input = str(output)

                success = True
                break

            except Exception as e:
                print(f"  Stage {i + 1} attempt {attempt + 1} failed: {e}")
                if attempt == max_retries:
                    pipeline_result.failed_stage = i
                    pipeline_result.error = str(e)
                    return pipeline_result

        if not success:
            break

    pipeline_result.completed = True
    return pipeline_result

Designing Effective Stage Boundaries

The hardest part of sequential chaining is deciding where to split the pipeline. Too many stages and you waste tokens re-explaining context at each handoff. Too few stages and you lose the benefits of single-responsibility agents. Here are guidelines that work in practice.

Split when the skill set changes. If one stage requires domain expertise (medical terminology, legal reasoning) and the next requires a different skill (plain-language writing, data formatting), those should be separate agents with different system prompts.

Split when you need a quality gate. If you want to validate output before proceeding — checking that research has enough sources, or that a draft meets length requirements — insert a validation stage. This agent can either approve the output or reject it with feedback.

Do not split purely for modularity. If two stages need the same context and the same skills, combining them into a single agent is usually better. The overhead of serializing and re-parsing context is not free.

Validation Stages: The Quality Gate Pattern

A powerful extension of sequential chaining is the validation stage — an agent whose only job is to check the previous stage's output and either pass it through or flag issues.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

class ValidationResult(BaseModel):
    is_valid: bool
    issues: list[str]
    suggestions: list[str]

validator_agent = Agent(
    name="QualityValidator",
    instructions="""You are a quality assurance reviewer. Evaluate the
    draft article against these criteria:
    1. All claims are supported by the provided sources
    2. The tone is professional and consistent
    3. Technical accuracy is maintained
    4. The article has a clear structure with introduction and conclusion

    Return is_valid=True only if ALL criteria are met. Otherwise list
    the specific issues and actionable suggestions.""",
    model="gpt-4o",
    output_type=ValidationResult,
)

You can insert this validation agent between any two stages. If validation fails, you can either retry the previous stage with the feedback or escalate to a human reviewer.

Performance Considerations

Sequential pipelines have an inherent latency cost: each stage must complete before the next begins. For a three-stage pipeline where each LLM call takes 3-5 seconds, total latency is 9-15 seconds. Strategies to mitigate this include using faster models (gpt-4o-mini) for simpler stages, caching stage outputs for repeated inputs, and running independent sub-tasks within a stage in parallel even though the stages themselves are sequential.

The token cost is also worth monitoring. Each handoff adds tokens because the next agent receives the serialized output of the previous agent as part of its input. For large outputs, consider summarizing before handoff rather than passing the complete structured output.

When to Use Sequential Chaining vs Other Patterns

Use sequential chaining when: your workflow has a natural linear progression, each stage has clear input/output contracts, and you value debuggability and reliability over latency.

Use handoffs instead when: the workflow requires dynamic routing — the output of one stage determines which agent should handle the next step, not just what data it receives.

Use parallel execution when: multiple stages are independent and can produce results simultaneously, which are later combined by a synthesis agent.

Sequential chaining is the foundation. Master it first, then add complexity only when the problem demands it.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like