Skip to content
Learn Agentic AI
Learn Agentic AI11 min read20 views

LangChain Output Parsers: Pydantic, JSON, and Structured Output Parsing

Learn how to extract structured data from LLM responses using LangChain output parsers — Pydantic models, JSON parsing, format instructions, and retry parsers for robust extraction.

Why Structured Output Matters

LLMs produce free-form text by default. But downstream code needs structured data — objects, lists, dictionaries, typed fields. Output parsers bridge this gap by defining an expected schema, generating format instructions for the prompt, and parsing the LLM's response into the target structure.

Without structured parsing, you end up writing fragile regex or string-splitting logic that breaks when the model changes phrasing. LangChain's parsers standardize this process and include retry mechanisms for when the model produces malformed output.

The with_structured_output Approach

Modern LangChain models support with_structured_output(), which uses the model's native structured output capability (function calling or JSON mode) rather than text parsing.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class MovieReview(BaseModel):
    title: str = Field(description="The movie title")
    rating: float = Field(description="Rating from 0 to 10")
    summary: str = Field(description="One sentence summary")
    recommended: bool = Field(description="Whether you recommend this movie")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(MovieReview)

result = structured_llm.invoke("Review the movie Inception")
print(type(result))        # <class 'MovieReview'>
print(result.title)        # "Inception"
print(result.rating)       # 8.5
print(result.recommended)  # True

This is the recommended approach for models that support it. The Pydantic schema is converted to a function/tool schema, and the model returns structured JSON that is automatically parsed.

PydanticOutputParser

For models without native structured output, the PydanticOutputParser adds format instructions to the prompt and parses the text response.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

class Recipe(BaseModel):
    name: str = Field(description="Name of the recipe")
    ingredients: list[str] = Field(description="List of ingredients")
    prep_time_minutes: int = Field(description="Preparation time in minutes")
    difficulty: str = Field(description="Easy, Medium, or Hard")

parser = PydanticOutputParser(pydantic_object=Recipe)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful cooking assistant."),
    ("human", "Give me a recipe for {dish}.\n\n{format_instructions}"),
])

chain = prompt.partial(
    format_instructions=parser.get_format_instructions()
) | llm | parser

recipe = chain.invoke({"dish": "pasta carbonara"})
print(recipe.name)
print(recipe.ingredients)
print(recipe.prep_time_minutes)

parser.get_format_instructions() returns a string that tells the model exactly what JSON structure to produce. The parser then validates the response against the Pydantic model.

JsonOutputParser

When you want raw dictionaries instead of Pydantic objects, use JsonOutputParser.

from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser()

chain = prompt | llm | parser
result = chain.invoke({"dish": "tacos"})
print(type(result))  # <class 'dict'>

You can optionally provide a Pydantic model for format instructions without strict validation:

parser = JsonOutputParser(pydantic_object=Recipe)
# Generates format instructions but returns a dict, not a Recipe object

StrOutputParser and CommaSeparatedListOutputParser

For simpler outputs, use lightweight parsers.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.output_parsers import CommaSeparatedListOutputParser

# Plain string
str_parser = StrOutputParser()
result = str_parser.invoke(ai_message)  # "Just the text content"

# Comma-separated list
list_parser = CommaSeparatedListOutputParser()
chain = prompt | llm | list_parser
result = chain.invoke({"topic": "Python frameworks"})
# ["Django", "Flask", "FastAPI", "LangChain"]

Output-Fixing and Retry Parsers

LLMs sometimes produce invalid output. Retry parsers automatically fix these failures.

from langchain.output_parsers import OutputFixingParser, RetryOutputParser
from langchain_openai import ChatOpenAI

base_parser = PydanticOutputParser(pydantic_object=Recipe)

# Option 1: Use another LLM call to fix malformed output
fixing_parser = OutputFixingParser.from_llm(
    parser=base_parser,
    llm=ChatOpenAI(model="gpt-4o-mini"),
)

# If the base parser fails, the fixing parser sends the bad output
# to the LLM with instructions to fix the formatting
result = fixing_parser.parse(bad_output_string)

OutputFixingParser receives the malformed output and asks the LLM to reformat it. RetryOutputParser goes further by resending the original prompt along with the error, giving the LLM full context to produce a corrected response.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

retry_parser = RetryOutputParser.from_llm(
    parser=base_parser,
    llm=ChatOpenAI(model="gpt-4o-mini"),
    max_retries=2,
)

Enum and Datetime Parsers

LangChain includes specialized parsers for common types.

from langchain.output_parsers import EnumOutputParser
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

parser = EnumOutputParser(enum=Sentiment)
result = parser.parse("positive")
print(result)  # Sentiment.POSITIVE

Composing Parsers in LCEL

Parsers are runnables, so they integrate seamlessly into LCEL chains.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class Analysis(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(description="Confidence score 0-1")
    key_phrases: list[str] = Field(description="Important phrases")

parser = PydanticOutputParser(pydantic_object=Analysis)

chain = (
    ChatPromptTemplate.from_template(
        "Analyze this text: {text}\n{format_instructions}"
    ).partial(format_instructions=parser.get_format_instructions())
    | ChatOpenAI(model="gpt-4o-mini")
    | parser
)

analysis = chain.invoke({"text": "The product quality is outstanding!"})
print(analysis.sentiment)    # "positive"
print(analysis.confidence)   # 0.95

FAQ

Should I use with_structured_output or PydanticOutputParser?

Use with_structured_output() whenever the model supports it — it is more reliable because the model returns structured JSON natively rather than embedding JSON in free text. Fall back to PydanticOutputParser for models that lack native structured output support.

What happens when the LLM ignores format instructions?

The parser raises an OutputParserException. Wrap your parser with OutputFixingParser or RetryOutputParser to handle these failures automatically. Alternatively, with_structured_output avoids this issue entirely by constraining the output format at the API level.

Can I parse streaming output into structured objects?

Yes, if the model supports streaming structured output. Use JsonOutputParser with chain.stream() to receive partial JSON objects as they are generated. For Pydantic parsing, you typically need the full response before validation can occur.


#LangChain #OutputParsing #Pydantic #StructuredData #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Token-Level Evaluation of Streaming Agents: TTFT, Stream Smoothness, and Mid-Stream Hallucination Detection

Streaming changes the eval game — final-answer correctness isn't enough when users perceive the answer one token at a time. Here's the metric set that matters.

Agentic AI

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026

How to stream tokens, tool-call deltas, and intermediate steps from an agent — with code for both the OpenAI Agents SDK and LangChain — and the gotchas that bite in production.

Agentic AI

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Build a production RAG agent with LangChain, then measure faithfulness, answer relevance, and context precision with RAGAS. The four metrics that matter and how to wire them up.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Beyond single-shot RAG — agentic RAG with LangGraph that re-retrieves, self-grades, and rewrites queries. With evals that catch silent retrieval drift.

Agentic AI

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Smolagents lets agents write Python instead of JSON. Why code-as-action reduces tool errors and where the security trade-offs are for production deployments.