Agent Planning: How AI Systems Decompose Complex Tasks into Steps

Why Agents Need Planning

A ReAct loop can handle tasks that require 5-10 tool calls, but it breaks down on complex, multi-stage goals. Ask a pure ReAct agent to "analyze our Q1 sales data, identify the top 3 underperforming regions, research competitor pricing in those regions, and write a strategy memo" — and it will either lose track of its progress or wander aimlessly between subtasks.

Planning solves this by having the agent create a structured plan before it starts executing. Instead of figuring out the next step one at a time, the agent maps out the full path first, then works through it methodically.

Task Decomposition: Breaking Goals into Steps

The simplest form of planning is task decomposition — asking the LLM to break a complex goal into a numbered list of steps.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

import json
from openai import OpenAI

client = OpenAI()

def decompose_task(goal: str) -> list[str]:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You are a planning assistant. Break the user's goal into a numbered "
                "list of concrete, actionable steps. Each step should be independently "
                "executable. Return a JSON array of strings."
            )},
            {"role": "user", "content": goal},
        ],
        response_format={"type": "json_object"},
    )

    result = json.loads(response.choices[0].message.content)
    return result.get("steps", [])

# Example usage
steps = decompose_task(
    "Analyze our Q1 sales data and write a strategy memo for underperforming regions"
)
# Returns:
# [
#   "Load and summarize Q1 sales data by region",
#   "Identify the top 3 underperforming regions by revenue growth",
#   "For each underperforming region, analyze key metrics (volume, avg deal size, churn)",
#   "Research competitor pricing and positioning in those regions",
#   "Draft a strategy memo with findings and recommended actions",
#   "Review and finalize the memo"
# ]

The Plan-and-Execute Architecture

Plan-and-Execute separates planning from execution into two distinct agents (or two distinct phases of one agent). The planner creates the step list, and the executor handles each step using a ReAct loop.

from dataclasses import dataclass

@dataclass
class PlanStep:
    description: str
    status: str = "pending"  # pending, in_progress, complete, failed
    result: str = ""

class PlanAndExecuteAgent:
    def __init__(self, tools, tool_executor):
        self.tools = tools
        self.tool_executor = tool_executor

    def run(self, goal: str) -> str:
        # Phase 1: Plan
        steps = self._create_plan(goal)
        print(f"Plan created with {len(steps)} steps")

        # Phase 2: Execute each step
        results = []
        for i, step in enumerate(steps):
            step.status = "in_progress"
            print(f"Executing step {i+1}: {step.description}")

            result = self._execute_step(step.description, results)
            step.result = result
            step.status = "complete"
            results.append({"step": step.description, "result": result})

        # Phase 3: Synthesize final output
        return self._synthesize(goal, results)

    def _create_plan(self, goal: str) -> list[PlanStep]:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Create a step-by-step plan. "
                 "Return JSON with a 'steps' array of strings."},
                {"role": "user", "content": goal},
            ],
            response_format={"type": "json_object"},
        )
        raw_steps = json.loads(response.choices[0].message.content).get("steps", [])
        return [PlanStep(description=s) for s in raw_steps]

    def _execute_step(self, step_description: str, prior_results: list) -> str:
        """Execute a single step using a mini ReAct loop."""
        context = ""
        if prior_results:
            context = "Previous step results:\n"
            for r in prior_results:
                context += f"- {r['step']}: {r['result'][:200]}\n"

        messages = [
            {"role": "system", "content": f"Execute this step: {step_description}\n{context}"},
            {"role": "user", "content": step_description},
        ]

        for _ in range(10):  # Max iterations per step
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=self.tools,
            )
            msg = response.choices[0].message
            messages.append(msg)

            if not msg.tool_calls:
                return msg.content

            for tc in msg.tool_calls:
                args = json.loads(tc.function.arguments)
                result = self.tool_executor(tc.function.name, args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": json.dumps(result),
                })

        return "Step execution timed out."

    def _synthesize(self, goal: str, results: list) -> str:
        context = "\n".join(f"Step: {r['step']}\nResult: {r['result']}" for r in results)
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Synthesize the step results into a final answer."},
                {"role": "user", "content": f"Goal: {goal}\n\nStep results:\n{context}"},
            ],
        )
        return response.choices[0].message.content

Hierarchical Planning

For very complex tasks, a single level of decomposition is not enough. Hierarchical planning breaks goals into subtasks, and then breaks subtasks into sub-subtasks, creating a tree of work.

Goal: "Migrate our monolith to microservices"
├── Phase 1: Assess current architecture
│   ├── Map all database dependencies
│   ├── Identify bounded contexts
│   └── Document API contracts
├── Phase 2: Design target architecture
│   ├── Define service boundaries
│   ├── Design inter-service communication
│   └── Plan data migration strategy
└── Phase 3: Execute migration
    ├── Extract first service
    ├── Set up CI/CD for new service
    └── Migrate traffic gradually

Each leaf node is small enough for a ReAct agent to handle in a single loop. The hierarchy provides structure and progress tracking that flat decomposition lacks.

Dynamic Replanning

Static plans break when reality does not match expectations. A step might fail, return unexpected results, or reveal that the original plan was based on wrong assumptions. Dynamic replanning handles this by reassessing the plan after each step.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

def replan_if_needed(original_goal: str, completed_steps: list, remaining_steps: list) -> list[str]:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "You are a planning assistant. Given the original goal, completed steps "
                "and their results, and the remaining planned steps, decide if the plan "
                "needs revision. Return JSON with 'needs_replan' (bool) and 'new_steps' (array)."
            )},
            {"role": "user", "content": json.dumps({
                "goal": original_goal,
                "completed": completed_steps,
                "remaining": remaining_steps,
            })},
        ],
        response_format={"type": "json_object"},
    )
    result = json.loads(response.choices[0].message.content)
    if result.get("needs_replan"):
        return result["new_steps"]
    return remaining_steps

FAQ

When should I use planning versus a simple ReAct loop?

Use a simple ReAct loop for tasks with fewer than 5 steps where the path is straightforward (lookup, calculate, respond). Use planning when the task has more than 5 steps, when steps have dependencies on each other, or when the user's goal is abstract and needs decomposition before execution can begin.

How do I handle a step that fails during plan execution?

Three strategies in order of preference: retry the step with a different approach, skip the step and adjust downstream steps, or abort and return partial results. Which strategy to use depends on whether the failed step is critical to the overall goal. Always inform the user when a plan cannot be completed as originally designed.

Does planning increase token usage significantly?

Yes, planning adds an extra LLM call for decomposition and potentially more calls for replanning. However, it typically reduces total token usage on complex tasks because each step's ReAct loop is smaller and more focused, which means fewer wasted iterations from an agent losing track of the overall goal.

#AgentPlanning #TaskDecomposition #AIAgents #Python #Architecture #AgenticAI #LearnAI #AIEngineering

Agent Planning: How AI Systems Decompose Complex Tasks into Steps

Why Agents Need Planning

Task Decomposition: Breaking Goals into Steps

The Plan-and-Execute Architecture

Hierarchical Planning

Dynamic Replanning

FAQ

When should I use planning versus a simple ReAct loop?

How do I handle a step that fails during plan execution?

Does planning increase token usage significantly?

Try CallSphere AI Voice Agents

Related Articles You May Like

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals