Agentic RAG Patterns: When the Agent Decides What to Retrieve
In agentic RAG the agent itself controls retrieval. The 2026 patterns and where they outperform classic retrieve-then-generate.
What Changes With Agentic RAG
Classic RAG: a fixed pipeline runs retrieval, then generation. The model has no say in whether to retrieve, what to retrieve, or whether to retrieve again. Agentic RAG: retrieval is one of several tools the agent can call. The agent decides — based on the query and intermediate results — whether to retrieve, what query to use, which corpus to hit, and when to stop.
By 2026 this is the dominant pattern for non-trivial production RAG systems. This piece walks through the patterns that work.
The Five Patterns
flowchart TB
P1[1. Retrieve-or-skip] --> Use[Skip retrieval if<br/>model already knows]
P2[2. Multi-source routing] --> Pick[Pick the right corpus]
P3[3. Query rewriting] --> Better[Rewrite for better recall]
P4[4. Iterative retrieval] --> Refine[Refine + retrieve again]
P5[5. Tool-augmented retrieval] --> Mix[Mix vector + SQL + web]
Retrieve-or-Skip
The simplest and most-undervalued pattern. The agent decides whether retrieval is even necessary. Many queries do not need retrieval — small talk, math, code generation, formatting requests. Skipping retrieval saves tokens and avoids irrelevant context polluting the prompt.
Implementation: a short prompt asks the model to classify the query as "needs retrieval" / "answer directly." Any non-trivial RAG system in 2026 has this gate.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Multi-Source Routing
The agent picks which corpus to hit. Production knowledge bases are rarely one homogeneous index — they are a customer FAQ, a product manual, a billing database, a CRM, a web search. The agent classifies the query and routes.
flowchart LR
Q[Query] --> R[Router Agent]
R -->|product question| P[(Product KB)]
R -->|customer question| C[(CRM)]
R -->|policy question| Pol[(Policy Docs)]
R -->|external| W[Web Search]
Query Rewriting
The user query is often not the optimal retrieval query. Rewriting expands abbreviations, fixes typos, decomposes multi-part questions, generates hypothetical answers (HyDE).
Iterative Retrieval
The agent retrieves once, examines the results, and decides whether to retrieve again with a different query. This is where CRAG-style refinement lives. Especially valuable on multi-hop questions.
Tool-Augmented Retrieval
Pure-vector RAG is one tool. Real systems mix vector search, SQL queries, knowledge-graph queries, and web search. The agent picks the right combination per query.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
A Reference Architecture
flowchart LR
User --> Agent[Agentic RAG Loop]
Agent --> D{Decision}
D -->|skip| Direct[Answer directly]
D -->|retrieve| Tools[Tool-call retrieval]
Tools --> Vec[Vector search]
Tools --> SQL[SQL]
Tools --> KG[Knowledge graph]
Tools --> Web[Web search]
Vec --> Eval[Evaluator]
SQL --> Eval
KG --> Eval
Web --> Eval
Eval -->|sufficient| Gen[Generate]
Eval -->|insufficient| Agent
Gen --> User
The decisions and the loop are what make it agentic. Each step can be a separate small LLM call or fused into the main reasoning model.
Where Agentic RAG Outperforms Classic RAG
The 2026 production data:
- Customer service: 8-15 percent higher resolution rate (the agent knows which subject-matter corpus to hit)
- Internal Q&A: 10-25 percent fewer "I do not know" answers
- Long-form research: 2x quality on multi-hop questions
- Cost: roughly 1.5x classic RAG due to extra LLM calls (retrieve-or-skip pays much of that back)
Common Failure Modes
- Over-retrieval: agent retrieves on every query out of caution; cost balloons. Fix: stricter retrieve-or-skip gate.
- Under-retrieval: agent skips retrieval and confidently hallucinates. Fix: lean toward retrieval on factual questions; calibrate the gate.
- Loop infinity: agent keeps refining and never commits. Fix: hard cap on retrieval rounds.
- Tool selection error: agent hits the wrong corpus. Fix: better router prompts, or a retrieval evaluator that catches mismatches.
Implementing It in 2026
LangGraph, LlamaIndex, and the OpenAI Agents SDK all ship recipes for agentic RAG. The minimal version:
- A "should I retrieve" classifier
- A "which corpus" router
- One or more retrievers as tools
- An evaluator that scores retrieval quality
- A cap on rounds (typically 3)
Most teams ship this in a week. The hard part is not the orchestration; it is the corpora and their evaluators.
Sources
- "Active RAG" Asai et al. — https://arxiv.org/abs/2403.10131
- "Agentic RAG with LangGraph" — https://langchain-ai.github.io/langgraph
- "RAG patterns in 2026" survey — https://arxiv.org
- LlamaIndex agent recipes — https://docs.llamaindex.ai
- "HyDE: hypothetical document embeddings" — https://arxiv.org/abs/2212.10496
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.