LlamaIndex Agents: RAG-First Agent Architecture for Knowledge-Intensive Tasks

Why RAG and Agents Belong Together

Retrieval-augmented generation (RAG) gives LLMs access to external knowledge. Agents give LLMs the ability to take actions and reason over multiple steps. Separately, each has limitations: RAG pipelines cannot reason about which documents to retrieve or how to combine information from multiple sources, and agents without retrieval hallucinate when asked knowledge-intensive questions.

LlamaIndex was built from the ground up for RAG. Its agent layer extends this foundation by letting agents treat query engines, indexes, and retrieval pipelines as tools. The result is agents that are genuinely good at knowledge-intensive tasks — not just tool-calling agents with a vector store bolted on.

Query Engines as Agent Tools

The core pattern in LlamaIndex agents is wrapping query engines as tools. A query engine encapsulates an index, a retriever, and a response synthesizer. When you give this to an agent as a tool, the agent can decide when and how to query your data:

flowchart TD
    Q{"Pick by primary<br/>design constraint"}
    NEED1{"Need explicit<br/>state graph plus<br/>checkpoints?"}
    NEED2{"Need role and task<br/>based teams?"}
    NEED3{"Need conversation<br/>style multi agent?"}
    NEED4{"Need full control<br/>Claude native?"}
    LG[/"LangGraph"/]
    CR[/"CrewAI"/]
    AG[/"AutoGen"/]
    CS[/"Claude Agent SDK"/]
    Q --> NEED1
    NEED1 -->|Yes| LG
    NEED1 -->|No| NEED2
    NEED2 -->|Yes| CR
    NEED2 -->|No| NEED3
    NEED3 -->|Yes| AG
    NEED3 -->|No| NEED4
    NEED4 -->|Yes| CS
    style Q fill:#4f46e5,stroke:#4338ca,color:#fff
    style LG fill:#0ea5e9,stroke:#0369a1,color:#fff
    style CR fill:#f59e0b,stroke:#d97706,color:#1f2937
    style AG fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style CS fill:#059669,stroke:#047857,color:#fff

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

# Load and index documents
documents = SimpleDirectoryReader("./data/financial_reports").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=5)

# Wrap the query engine as an agent tool
finance_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="financial_reports",
        description="""Useful for answering questions about quarterly
        financial reports, revenue figures, and earnings data.
        Input should be a specific financial question.""",
    ),
)

# Create an agent with the query engine tool
llm = OpenAI(model="gpt-4o")
agent = ReActAgent.from_tools(
    tools=[finance_tool],
    llm=llm,
    verbose=True,
)

response = agent.chat("What was the Q3 2025 revenue growth rate?")

The agent receives a question, decides whether to use the financial reports tool, formulates a retrieval query, gets the relevant chunks, and synthesizes an answer. The ReAct pattern lets it do this iteratively — if the first retrieval does not answer the question, the agent can reformulate and try again.

Multi-Index Agents

Real applications often have multiple data sources. LlamaIndex agents handle this by accepting multiple query engine tools:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

# Second index for a different data source
policy_docs = SimpleDirectoryReader("./data/company_policies").load_data()
policy_index = VectorStoreIndex.from_documents(policy_docs)
policy_engine = policy_index.as_query_engine(similarity_top_k=3)

policy_tool = QueryEngineTool(
    query_engine=policy_engine,
    metadata=ToolMetadata(
        name="company_policies",
        description="""Useful for questions about company policies,
        HR guidelines, compliance requirements, and internal procedures.""",
    ),
)

# Agent with multiple knowledge sources
agent = ReActAgent.from_tools(
    tools=[finance_tool, policy_tool],
    llm=llm,
    verbose=True,
)

# The agent decides which tool to query
response = agent.chat(
    "Does our remote work policy affect the Q3 headcount projections?"
)

For this question, the agent might first query the company policies tool to understand the remote work policy, then query the financial reports tool for headcount projections, and finally synthesize both results into a coherent answer. This multi-step reasoning across data sources is where LlamaIndex agents excel.

Sub-Question Query Engine

For complex questions that span multiple sources, LlamaIndex offers a specialized pattern — the sub-question query engine. Instead of an agent loop, it decomposes a question into sub-questions and routes each to the appropriate query engine:

from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import SubQuestionQueryEngine

sub_question_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[finance_tool, policy_tool],
    llm=llm,
)

response = sub_question_engine.query(
    "Compare our Q3 hiring costs against the approved budget in the HR policy."
)

This approach is more deterministic than the agent loop and works well when you know the question will require information from multiple sources.

Data Agents for Structured Data

LlamaIndex also supports agents that work with structured data through SQL and pandas integrations:

from llama_index.core import SQLDatabase
from llama_index.core.query_engine import NLTOSQLQueryEngine
from sqlalchemy import create_engine

engine = create_engine("postgresql://user:pass@localhost/analytics")
sql_database = SQLDatabase(engine, include_tables=["revenue", "expenses"])

sql_tool = QueryEngineTool(
    query_engine=NLTOSQLQueryEngine.from_defaults(
        sql_database=sql_database, llm=llm
    ),
    metadata=ToolMetadata(
        name="analytics_db",
        description="Query the analytics database for revenue and expense data.",
    ),
)

This lets agents write and execute SQL queries against your database, combining structured data retrieval with unstructured document retrieval in a single agent.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

When to Choose LlamaIndex Agents

Choose LlamaIndex when your primary use case is knowledge-intensive — answering questions over documents, databases, or a combination. If your agents spend most of their time retrieving and synthesizing information rather than taking external actions, LlamaIndex's RAG-first design gives you better retrieval quality with less work.

For agents focused on external API calls, multi-agent orchestration, or code execution, other frameworks may be a better fit.

FAQ

How does LlamaIndex handle large document collections at scale?

LlamaIndex integrates with production vector stores like Pinecone, Weaviate, Qdrant, and ChromaDB. For large collections, you build the index once, persist it to the vector store, and load it on startup. The agent queries the vector store directly, so retrieval scales with the vector database.

Can LlamaIndex agents use non-RAG tools?

Yes. You can add any callable function as a FunctionTool alongside query engine tools. Agents can mix RAG tools with API calls, calculations, or any custom logic.

What is the difference between ReActAgent and the OpenAI agent in LlamaIndex?

ReActAgent uses the ReAct prompting pattern (Reason + Act) and works with any LLM. The OpenAI agent uses OpenAI's native function-calling API, which is more reliable for tool selection but only works with OpenAI models.

#LlamaIndex #RAG #AgentFrameworks #KnowledgeAgents #Python #AgenticAI #LearnAI #AIEngineering

LlamaIndex Agents: RAG-First Agent Architecture for Knowledge-Intensive Tasks

Why RAG and Agents Belong Together

Query Engines as Agent Tools

Multi-Index Agents

Sub-Question Query Engine

Data Agents for Structured Data

When to Choose LlamaIndex Agents

FAQ

How does LlamaIndex handle large document collections at scale?

Can LlamaIndex agents use non-RAG tools?

What is the difference between ReActAgent and the OpenAI agent in LlamaIndex?

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Cognee: Knowledge-Graph Memory for Agents — A Getting-Started Guide

Mastra.ai: The TypeScript Agent Framework Worth Trying in 2026