Skip to content
Large Language Models
Large Language Models5 min read18 views

OpenAI Structured Outputs: The Evolution of Function Calling and Type-Safe AI

OpenAI's Structured Outputs guarantee valid JSON responses matching your schema. How it works, migration from function calling, and patterns for production type-safe AI applications.

From Free Text to Guaranteed Structure

One of the most persistent challenges in building LLM-powered applications has been getting models to produce reliably structured output. A model that generates beautiful JSON 95% of the time and malformed text 5% of the time creates cascading failures in downstream systems. OpenAI's Structured Outputs feature, introduced in mid-2024 and refined throughout 2025, addresses this definitively.

The Evolution of Output Control

The journey to reliable structured output has gone through several stages:

Stage 1: Prompt engineering (2022-2023)

"Return your answer as JSON with fields: name, age, city"
→ Sometimes works, sometimes wraps in markdown, sometimes adds commentary

Stage 2: JSON mode (2023)

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[...]
)
# Guarantees valid JSON, but no schema enforcement

Stage 3: Function calling (2023-2024)

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
tools = [{
    "type": "function",
    "function": {
        "name": "extract_contact",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string", "format": "email"}
            }
        }
    }
}]
# Model chooses to call the function, but schema compliance not guaranteed

Stage 4: Structured Outputs (2024-2025)

from pydantic import BaseModel

class Contact(BaseModel):
    name: str
    email: str
    phone: str | None
    company: str

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    response_format=Contact,
    messages=[{"role": "user", "content": "Extract: John at Acme, [email protected]"}]
)

contact = response.choices[0].message.parsed
# contact.name == "John", contact.email == "[email protected]"
# Type-safe, schema-compliant, guaranteed

How Structured Outputs Work Internally

OpenAI achieves guaranteed schema compliance through constrained decoding — modifying the token generation process to only allow tokens that are valid according to the target schema at each step.

The process:

  1. The JSON schema is converted into a context-free grammar (CFG)
  2. At each generation step, the CFG is used to compute a mask of valid next tokens
  3. Invalid tokens receive -infinity logit scores, making them impossible to select
  4. The result is guaranteed to be valid JSON matching the schema

This is fundamentally different from hoping the model follows instructions. The model cannot produce invalid output because invalid tokens are literally excluded from consideration.

flowchart TD
    HUB(("From Free Text to<br/>Guaranteed Structure"))
    HUB --> L0["The Evolution of Output<br/>Control"]
    style L0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L1["How Structured Outputs Work<br/>Internally"]
    style L1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L2["Practical Patterns"]
    style L2 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L3["Function Calling +<br/>Structured Outputs"]
    style L3 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L4["Limitations and<br/>Considerations"]
    style L4 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L5["Impact on Application<br/>Architecture"]
    style L5 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    style HUB fill:#4f46e5,stroke:#4338ca,color:#fff

Practical Patterns

Pattern 1: Data extraction with type safety

from pydantic import BaseModel, Field
from typing import Literal

class InvoiceItem(BaseModel):
    description: str
    quantity: int = Field(ge=1)
    unit_price: float = Field(ge=0)

class Invoice(BaseModel):
    invoice_number: str
    date: str
    vendor: str
    items: list[InvoiceItem]
    currency: Literal["USD", "EUR", "GBP"]
    total: float

response = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    response_format=Invoice,
    messages=[{"role": "user", "content": f"Extract invoice data: {raw_text}"}]
)

Pattern 2: Multi-step reasoning with structured intermediate state

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

class ReasoningStep(BaseModel):
    step_number: int
    thought: str
    conclusion: str

class Analysis(BaseModel):
    reasoning: list[ReasoningStep]
    final_answer: str
    confidence: Literal["high", "medium", "low"]

Pattern 3: Classification with constrained output

class TicketClassification(BaseModel):
    category: Literal["billing", "technical", "account", "feature_request"]
    priority: Literal["critical", "high", "medium", "low"]
    summary: str
    requires_human: bool

Function Calling + Structured Outputs

Structured Outputs also applies to function calling, ensuring that tool arguments strictly match the defined schema:

tools = [{
    "type": "function",
    "function": {
        "name": "query_database",
        "strict": True,  # Enable structured outputs for this function
        "parameters": {
            "type": "object",
            "properties": {
                "table": {"type": "string", "enum": ["users", "orders", "products"]},
                "filters": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "field": {"type": "string"},
                            "operator": {"type": "string", "enum": ["=", ">", "<", ">=", "<="]},
                            "value": {"type": "string"}
                        },
                        "required": ["field", "operator", "value"]
                    }
                }
            },
            "required": ["table"],
            "additionalProperties": False
        }
    }
}]

With strict: True, the model's function call arguments are guaranteed to match the schema — no more try/except blocks for malformed tool arguments.

Limitations and Considerations

  • Latency: Constrained decoding adds ~100-200ms overhead for schema processing on the first request with a new schema (cached afterward)
  • Schema restrictions: Some JSON Schema features are not supported ($ref cycles, patternProperties, some format validators)
  • All fields required: In strict mode, all object properties must be listed in required — optional fields should use nullable types instead
  • No additionalProperties: Must be set to false in strict mode — the model outputs exactly the defined fields
  • Model dependency: Currently supported on GPT-4o, GPT-4o-mini, and o-series models

Impact on Application Architecture

Structured Outputs fundamentally simplifies the LLM application stack. Before Structured Outputs, applications needed:

  • Output parsing logic with error handling
  • Retry loops for malformed responses
  • Validation layers to check schema compliance
  • Fallback strategies for parse failures

With Structured Outputs, the parsing layer effectively disappears. The model output is your typed data structure, period. This reduces code complexity, eliminates an entire category of runtime errors, and makes LLM outputs as reliable as traditional API responses.


Sources: OpenAI — Structured Outputs Documentation, OpenAI — Introducing Structured Outputs, OpenAI Cookbook — Structured Outputs Examples

flowchart LR
    IN(["Input prompt"])
    subgraph PRE["Pre processing"]
        TOK["Tokenize"]
        EMB["Embed"]
    end
    subgraph CORE["Model Core"]
        ATTN["Self attention layers"]
        MLP["Feed forward layers"]
    end
    subgraph POST["Post processing"]
        SAMP["Sampling"]
        DETOK["Detokenize"]
    end
    OUT(["Generated text"])
    IN --> TOK --> EMB --> ATTN --> MLP --> SAMP --> DETOK --> OUT
    style IN fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style CORE fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
flowchart TD
    HUB(("From Free Text to<br/>Guaranteed Structure"))
    HUB --> L0["The Evolution of Output<br/>Control"]
    style L0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L1["How Structured Outputs Work<br/>Internally"]
    style L1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L2["Practical Patterns"]
    style L2 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L3["Function Calling +<br/>Structured Outputs"]
    style L3 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L4["Limitations and<br/>Considerations"]
    style L4 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    HUB --> L5["Impact on Application<br/>Architecture"]
    style L5 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    style HUB fill:#4f46e5,stroke:#4338ca,color:#fff
Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like