LangChain Output Parsers: Pydantic, JSON, and Structured Output Parsing

Why Structured Output Matters

LLMs produce free-form text by default. But downstream code needs structured data — objects, lists, dictionaries, typed fields. Output parsers bridge this gap by defining an expected schema, generating format instructions for the prompt, and parsing the LLM's response into the target structure.

Without structured parsing, you end up writing fragile regex or string-splitting logic that breaks when the model changes phrasing. LangChain's parsers standardize this process and include retry mechanisms for when the model produces malformed output.

The with_structured_output Approach

Modern LangChain models support with_structured_output(), which uses the model's native structured output capability (function calling or JSON mode) rather than text parsing.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class MovieReview(BaseModel):
    title: str = Field(description="The movie title")
    rating: float = Field(description="Rating from 0 to 10")
    summary: str = Field(description="One sentence summary")
    recommended: bool = Field(description="Whether you recommend this movie")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(MovieReview)

result = structured_llm.invoke("Review the movie Inception")
print(type(result))        # <class 'MovieReview'>
print(result.title)        # "Inception"
print(result.rating)       # 8.5
print(result.recommended)  # True

This is the recommended approach for models that support it. The Pydantic schema is converted to a function/tool schema, and the model returns structured JSON that is automatically parsed.

PydanticOutputParser

For models without native structured output, the PydanticOutputParser adds format instructions to the prompt and parses the text response.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

class Recipe(BaseModel):
    name: str = Field(description="Name of the recipe")
    ingredients: list[str] = Field(description="List of ingredients")
    prep_time_minutes: int = Field(description="Preparation time in minutes")
    difficulty: str = Field(description="Easy, Medium, or Hard")

parser = PydanticOutputParser(pydantic_object=Recipe)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful cooking assistant."),
    ("human", "Give me a recipe for {dish}.\n\n{format_instructions}"),
])

chain = prompt.partial(
    format_instructions=parser.get_format_instructions()
) | llm | parser

recipe = chain.invoke({"dish": "pasta carbonara"})
print(recipe.name)
print(recipe.ingredients)
print(recipe.prep_time_minutes)

parser.get_format_instructions() returns a string that tells the model exactly what JSON structure to produce. The parser then validates the response against the Pydantic model.

JsonOutputParser

When you want raw dictionaries instead of Pydantic objects, use JsonOutputParser.

from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser()

chain = prompt | llm | parser
result = chain.invoke({"dish": "tacos"})
print(type(result))  # <class 'dict'>

You can optionally provide a Pydantic model for format instructions without strict validation:

parser = JsonOutputParser(pydantic_object=Recipe)
# Generates format instructions but returns a dict, not a Recipe object

StrOutputParser and CommaSeparatedListOutputParser

For simpler outputs, use lightweight parsers.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.output_parsers import CommaSeparatedListOutputParser

# Plain string
str_parser = StrOutputParser()
result = str_parser.invoke(ai_message)  # "Just the text content"

# Comma-separated list
list_parser = CommaSeparatedListOutputParser()
chain = prompt | llm | list_parser
result = chain.invoke({"topic": "Python frameworks"})
# ["Django", "Flask", "FastAPI", "LangChain"]

Output-Fixing and Retry Parsers

LLMs sometimes produce invalid output. Retry parsers automatically fix these failures.

from langchain.output_parsers import OutputFixingParser, RetryOutputParser
from langchain_openai import ChatOpenAI

base_parser = PydanticOutputParser(pydantic_object=Recipe)

# Option 1: Use another LLM call to fix malformed output
fixing_parser = OutputFixingParser.from_llm(
    parser=base_parser,
    llm=ChatOpenAI(model="gpt-4o-mini"),
)

# If the base parser fails, the fixing parser sends the bad output
# to the LLM with instructions to fix the formatting
result = fixing_parser.parse(bad_output_string)

OutputFixingParser receives the malformed output and asks the LLM to reformat it. RetryOutputParser goes further by resending the original prompt along with the error, giving the LLM full context to produce a corrected response.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

retry_parser = RetryOutputParser.from_llm(
    parser=base_parser,
    llm=ChatOpenAI(model="gpt-4o-mini"),
    max_retries=2,
)

Enum and Datetime Parsers

LangChain includes specialized parsers for common types.

from langchain.output_parsers import EnumOutputParser
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

parser = EnumOutputParser(enum=Sentiment)
result = parser.parse("positive")
print(result)  # Sentiment.POSITIVE

Composing Parsers in LCEL

Parsers are runnables, so they integrate seamlessly into LCEL chains.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class Analysis(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(description="Confidence score 0-1")
    key_phrases: list[str] = Field(description="Important phrases")

parser = PydanticOutputParser(pydantic_object=Analysis)

chain = (
    ChatPromptTemplate.from_template(
        "Analyze this text: {text}\n{format_instructions}"
    ).partial(format_instructions=parser.get_format_instructions())
    | ChatOpenAI(model="gpt-4o-mini")
    | parser
)

analysis = chain.invoke({"text": "The product quality is outstanding!"})
print(analysis.sentiment)    # "positive"
print(analysis.confidence)   # 0.95

FAQ

Should I use with_structured_output or PydanticOutputParser?

Use with_structured_output() whenever the model supports it — it is more reliable because the model returns structured JSON natively rather than embedding JSON in free text. Fall back to PydanticOutputParser for models that lack native structured output support.

What happens when the LLM ignores format instructions?

The parser raises an OutputParserException. Wrap your parser with OutputFixingParser or RetryOutputParser to handle these failures automatically. Alternatively, with_structured_output avoids this issue entirely by constraining the output format at the API level.

Can I parse streaming output into structured objects?

Yes, if the model supports streaming structured output. Use JsonOutputParser with chain.stream() to receive partial JSON objects as they are generated. For Pydantic parsing, you typically need the full response before validation can occur.

#LangChain #OutputParsing #Pydantic #StructuredData #Python #AgenticAI #LearnAI #AIEngineering

LangChain Output Parsers: Pydantic, JSON, and Structured Output Parsing

Why Structured Output Matters

The with_structured_output Approach

PydanticOutputParser

JsonOutputParser

StrOutputParser and CommaSeparatedListOutputParser

Output-Fixing and Retry Parsers

Enum and Datetime Parsers

Composing Parsers in LCEL

FAQ

Should I use with_structured_output or PydanticOutputParser?

What happens when the LLM ignores format instructions?

Can I parse streaming output into structured objects?

Try CallSphere AI Voice Agents

Related Articles You May Like

Token-Level Evaluation of Streaming Agents: TTFT, Stream Smoothness, and Mid-Stream Hallucination Detection

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Smolagents: Hugging Face's Code-First Agent Framework Reviewed