LangChain RAG Chains: Document Loaders, Text Splitters, and Retrieval QA

What Is RAG and Why LangChain for It

Retrieval Augmented Generation (RAG) combines a retrieval step with LLM generation. Instead of relying solely on the model's training data, RAG fetches relevant documents from your own data source and includes them as context in the prompt. This lets the model answer questions about your specific documents, databases, or knowledge bases.

LangChain provides the full RAG pipeline as composable components: document loaders to ingest data, text splitters to chunk it, embedding models and vector stores to index it, retrievers to search it, and chain composition to wire it all together.

Step 1: Loading Documents

LangChain ships with loaders for dozens of formats — PDF, HTML, CSV, Markdown, databases, APIs, and more.

flowchart LR
    Q(["User query"])
    EMB["Embed query<br/>text-embedding-3"]
    VEC[("Vector DB<br/>pgvector or Pinecone")]
    RET["Top-k retrieval<br/>k = 8"]
    PROMPT["Augmented prompt<br/>system plus context"]
    LLM["LLM generation<br/>Claude or GPT"]
    CITE["Inline citations<br/>and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

from langchain_community.document_loaders import (
    TextLoader,
    PyPDFLoader,
    CSVLoader,
    WebBaseLoader,
)

# Load a text file
text_docs = TextLoader("knowledge_base.txt").load()

# Load a PDF (one document per page)
pdf_docs = PyPDFLoader("annual_report.pdf").load()

# Load from a web page
web_docs = WebBaseLoader("https://docs.example.com/guide").load()

# Each returns a list of Document objects
print(text_docs[0].page_content[:200])
print(text_docs[0].metadata)  # {"source": "knowledge_base.txt"}

Every loader returns Document objects with page_content (the text) and metadata (source, page number, etc.). Metadata flows through the entire pipeline, so your final answers can cite sources.

Step 2: Splitting Text into Chunks

Documents are often too large to fit in a single prompt. Text splitters break them into manageable chunks while preserving semantic coherence.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,       # Max characters per chunk
    chunk_overlap=200,     # Overlap between consecutive chunks
    separators=["\n\n", "\n", ". ", " ", ""],
)

chunks = splitter.split_documents(pdf_docs)
print(f"Split {len(pdf_docs)} pages into {len(chunks)} chunks")

RecursiveCharacterTextSplitter is the recommended default. It tries to split on paragraph boundaries first, then sentences, then words, ensuring chunks are as semantically coherent as possible. The overlap ensures that information spanning a boundary appears in at least one chunk.

For code, use RecursiveCharacterTextSplitter.from_language():

from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=1000,
    chunk_overlap=100,
)

Step 3: Embedding and Indexing

Chunks are converted to vectors using an embedding model and stored in a vector store for similarity search.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create vector store from chunks
vectorstore = FAISS.from_documents(chunks, embeddings)

# Or use Chroma for persistence
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    chunks,
    embeddings,
    persist_directory="./chroma_db",
)

FAISS is fast and in-memory. Chroma persists to disk. For production, consider Pinecone, Weaviate, or pgvector for PostgreSQL.

Step 4: Building the Retriever

A retriever wraps the vector store and returns the most relevant documents for a query.

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4},  # Return top 4 chunks
)

# Test the retriever
docs = retriever.invoke("What were Q3 revenue numbers?")
for doc in docs:
    print(doc.page_content[:100])
    print(doc.metadata)
    print("---")

You can also use search_type="mmr" (Maximal Marginal Relevance) to get diverse results rather than just the closest matches.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 5: Composing the RAG Chain

Now connect everything into an LCEL chain that retrieves context and generates answers.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Format retrieved documents into a single string
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = ChatPromptTemplate.from_template(
    """Answer the question based on the following context.
If the context does not contain enough information, say so.

Context:
{context}

Question: {question}

Answer:"""
)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | ChatOpenAI(model="gpt-4o", temperature=0)
    | StrOutputParser()
)

answer = rag_chain.invoke("What were the key highlights from Q3?")
print(answer)

The dictionary step runs the retriever and passthrough in parallel. Retrieved documents are formatted into a string, while the original question is forwarded. Both feed into the prompt template.

Adding Source Citations

To return sources alongside the answer, modify the chain to return both.

from langchain_core.runnables import RunnableParallel

rag_with_sources = RunnableParallel(
    answer=rag_chain,
    sources=retriever | (lambda docs: [d.metadata["source"] for d in docs]),
)

result = rag_with_sources.invoke("What were Q3 revenue numbers?")
print(result["answer"])
print("Sources:", result["sources"])

FAQ

How do I choose the right chunk size?

Start with 1000 characters and 200 overlap. Smaller chunks (500 characters) improve retrieval precision but may lose context. Larger chunks (2000 characters) retain more context but may dilute relevance. Test with your actual queries and documents, measuring retrieval quality.

Can I use RAG with local models instead of OpenAI?

Yes. Replace ChatOpenAI with any LangChain model wrapper — ChatOllama for local Ollama models, for example. For embeddings, use HuggingFaceEmbeddings or OllamaEmbeddings to keep everything local.

How do I update the vector store when my documents change?

Most vector stores support add_documents() to add new content. For updates, delete the old documents by ID and add the new versions. Chroma and Pinecone support upsert operations. For bulk reindexing, rebuild the vector store from scratch.

#LangChain #RAG #VectorStore #DocumentLoading #Python #AgenticAI #LearnAI #AIEngineering

LangChain RAG Chains: Document Loaders, Text Splitters, and Retrieval QA

What Is RAG and Why LangChain for It

Step 1: Loading Documents

Step 2: Splitting Text into Chunks

Step 3: Embedding and Indexing

Step 4: Building the Retriever

Step 5: Composing the RAG Chain

Adding Source Citations

FAQ

How do I choose the right chunk size?

Can I use RAG with local models instead of OpenAI?

How do I update the vector store when my documents change?

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Token-Level Evaluation of Streaming Agents: TTFT, Stream Smoothness, and Mid-Stream Hallucination Detection