The Map-Reduce Pattern for AI Agents: Parallel Processing of Large Datasets

When a Single Agent Is Not Enough

Suppose you need an AI agent to analyze 500 customer reviews, summarize a 200-page document, or evaluate code across 50 repositories. A single sequential agent would take too long and might hit context window limits. The Map-Reduce pattern solves this by splitting work into chunks, processing each chunk in parallel with separate agent instances, and then aggregating the partial results into a final output.

This pattern, borrowed from distributed computing, is one of the most practical ways to scale AI agent workloads.

The Three Phases

Split — Divide the input data into manageable chunks
Map — Process each chunk independently (in parallel)
Reduce — Combine all partial results into a single output

Implementation

import asyncio
from dataclasses import dataclass
from typing import Any, Callable, Awaitable

@dataclass
class MapResult:
    chunk_index: int
    output: Any
    success: bool
    error: str | None = None

class AgentMapReduce:
    def __init__(
        self,
        splitter: Callable[[Any], list[Any]],
        mapper: Callable[[Any, int], Awaitable[Any]],
        reducer: Callable[[list[MapResult]], Any],
        max_concurrency: int = 10,
    ):
        self.splitter = splitter
        self.mapper = mapper
        self.reducer = reducer
        self.semaphore = asyncio.Semaphore(max_concurrency)

    async def _process_chunk(self, chunk: Any,
                              index: int) -> MapResult:
        async with self.semaphore:
            try:
                output = await self.mapper(chunk, index)
                return MapResult(
                    chunk_index=index,
                    output=output,
                    success=True,
                )
            except Exception as e:
                return MapResult(
                    chunk_index=index,
                    output=None,
                    success=False,
                    error=str(e),
                )

    async def run(self, data: Any) -> Any:
        # Split phase
        chunks = self.splitter(data)
        print(f"Split into {len(chunks)} chunks")

        # Map phase — parallel execution
        tasks = [
            self._process_chunk(chunk, i)
            for i, chunk in enumerate(chunks)
        ]
        results = await asyncio.gather(*tasks)

        # Report failures
        failed = [r for r in results if not r.success]
        if failed:
            print(f"Warning: {len(failed)} chunks failed")

        # Reduce phase
        successful = sorted(
            [r for r in results if r.success],
            key=lambda r: r.chunk_index,
        )
        return self.reducer(successful)

Applying It to Review Analysis

Here is how you would analyze hundreds of customer reviews in parallel:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

import openai

client = openai.AsyncOpenAI()

def split_reviews(reviews: list[str]) -> list[list[str]]:
    chunk_size = 20
    return [reviews[i:i + chunk_size]
            for i in range(0, len(reviews), chunk_size)]

async def analyze_chunk(chunk: list[str], index: int) -> dict:
    combined = "\n---\n".join(chunk)
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system",
             "content": (
                 "Analyze these reviews. Return JSON with: "
                 "positive_count, negative_count, neutral_count, "
                 "top_themes (list of strings), average_sentiment (0-1)."
             )},
            {"role": "user", "content": combined},
        ],
        response_format={"type": "json_object"},
    )
    import json
    return json.loads(response.choices[0].message.content)

def aggregate_results(results: list[MapResult]) -> dict:
    total_positive = sum(r.output["positive_count"] for r in results)
    total_negative = sum(r.output["negative_count"] for r in results)
    total_neutral = sum(r.output["neutral_count"] for r in results)
    all_themes = []
    for r in results:
        all_themes.extend(r.output["top_themes"])

    # Deduplicate themes by frequency
    from collections import Counter
    theme_counts = Counter(all_themes)
    top_themes = [t for t, _ in theme_counts.most_common(10)]

    avg_sentiment = (
        sum(r.output["average_sentiment"] for r in results) / len(results)
    )
    return {
        "total_reviews": total_positive + total_negative + total_neutral,
        "positive": total_positive,
        "negative": total_negative,
        "neutral": total_neutral,
        "top_themes": top_themes,
        "average_sentiment": round(avg_sentiment, 3),
    }

mr = AgentMapReduce(
    splitter=split_reviews,
    mapper=analyze_chunk,
    reducer=aggregate_results,
    max_concurrency=5,
)

# reviews = [... 500 review strings ...]
# result = asyncio.run(mr.run(reviews))

Controlling Concurrency

The max_concurrency parameter controls how many map workers run simultaneously via an asyncio semaphore. This is essential for respecting API rate limits. If your LLM provider allows 10 requests per second, set concurrency to 8-9 to stay safely below the limit.

FAQ

How should I choose the chunk size for splitting?

Balance between context window limits and meaningful work units. Each chunk should contain enough data for the agent to produce a useful partial result, but not so much that it exceeds the model's context window. For GPT-4o with 128K tokens, 20-50 reviews per chunk works well.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

What if some chunks fail during the map phase?

The implementation above captures failures per chunk without aborting the entire run. After the map phase completes, retry only the failed chunks. If failures persist, log them and proceed with partial results — in many analytical workloads, losing 2-3% of data is acceptable.

Can I chain Map-Reduce with the Pipeline pattern?

Absolutely. A pipeline stage can internally use Map-Reduce for the heavy parallel work. For example, stage 1 fetches data, stage 2 runs Map-Reduce analysis across it, and stage 3 formats the aggregated results into a report.

#AgentDesignPatterns #MapReduce #Python #ParallelProcessing #AgenticAI #LearnAI #AIEngineering

The Map-Reduce Pattern for AI Agents: Parallel Processing of Large Datasets

When a Single Agent Is Not Enough

The Three Phases

Implementation

Applying It to Review Analysis

Controlling Concurrency

FAQ

How should I choose the chunk size for splitting?

What if some chunks fail during the map phase?

Can I chain Map-Reduce with the Pipeline pattern?

Try CallSphere AI Voice Agents

Related Articles You May Like

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Anthropic Skills System: Loadable Tool Packs for Claude Agents

Designing Agent Loops with the Claude Agent SDK

Enterprise CIO Guide: Hippocratic AI — Healthcare Agents at Scale

Multilingual Chat Agents in 2026: The 57-Language Gap and How to Close It

Enterprise CIO Guide: Harvey AI — Legal Agents Move from Pilot to Practice