Skip to content
Learn Agentic AI
Learn Agentic AI12 min read13 views

Production LangGraph: Deploying Stateful Agents with LangGraph Cloud

Deploy LangGraph agents to production using LangGraph Cloud with API endpoints, cron triggers, monitoring, scaling strategies, and operational best practices for stateful agent workflows.

From Development to Production

Building a LangGraph agent locally is straightforward. Running one in production — handling concurrent users, persisting state across restarts, monitoring execution, recovering from failures, and scaling under load — requires careful architecture. LangGraph Cloud provides managed infrastructure for deploying stateful agents, but you can also self-host with the right patterns.

Structuring Your Project for Deployment

LangGraph Cloud expects a specific project layout:

flowchart TD
    USER(["User input"])
    SUPER["Supervisor node<br/>routes by state"]
    A["Specialist node A<br/>research"]
    B["Specialist node B<br/>writing"]
    TOOL{"Tool call<br/>needed?"}
    EXEC["Tool executor<br/>ToolNode"]
    CHK[("Postgres<br/>checkpointer")]
    INT{"interrupt for<br/>human approval?"}
    HUMAN(["Human reviewer"])
    OUT(["Final response"])
    USER --> SUPER
    SUPER --> A
    SUPER --> B
    A --> TOOL
    B --> TOOL
    TOOL -->|Yes| EXEC --> SUPER
    TOOL -->|No| INT
    INT -->|Yes| HUMAN --> SUPER
    INT -->|No| OUT
    SUPER <--> CHK
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CHK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
    style HUMAN fill:#f59e0b,stroke:#d97706,color:#1f2937
# langgraph.json — deployment configuration
{
    "dependencies": ["."],
    "graphs": {
        "my_agent": "./agent/graph.py:graph"
    },
    "env": ".env"
}

The graphs field maps endpoint names to compiled graph objects. Your graph module exports the compiled graph:

# agent/graph.py
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import ToolNode

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

@tool
def lookup_order(order_id: str) -> str:
    """Look up order details by ID."""
    # Production implementation here
    return f"Order {order_id}: shipped, arriving March 20"

tools = [lookup_order]
llm = ChatOpenAI(model="gpt-4o-mini").bind_tools(tools)
tool_node = ToolNode(tools)

def agent(state: AgentState) -> dict:
    return {"messages": [llm.invoke(state["messages"])]}

def should_continue(state: AgentState):
    last = state["messages"][-1]
    if hasattr(last, "tool_calls") and last.tool_calls:
        return "tools"
    return "end"

builder = StateGraph(AgentState)
builder.add_node("agent", agent)
builder.add_node("tools", tool_node)
builder.add_edge(START, "agent")
builder.add_conditional_edges("agent", should_continue, {
    "tools": "tools",
    "end": END,
})
builder.add_edge("tools", "agent")

graph = builder.compile()

Deploying to LangGraph Cloud

Deploy using the LangGraph CLI:

pip install langgraph-cli
langgraph up

This starts a local development server. For cloud deployment:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
langgraph deploy --project my-agent

The deployment creates API endpoints for your graph with built-in persistence, streaming, and thread management.

API Endpoints

Once deployed, LangGraph Cloud exposes REST endpoints:

# Create a new thread
curl -X POST https://your-deployment.langgraph.app/threads \
  -H "Content-Type: application/json" \
  -d '{}'

# Run the agent on a thread
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "Track order 12345"}]
    }
  }'

# Stream responses
curl -X POST https://your-deployment.langgraph.app/threads/{thread_id}/runs/stream \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "my_agent",
    "input": {
      "messages": [{"role": "human", "content": "What is the status?"}]
    },
    "stream_mode": "messages"
  }'

Using the Python SDK

The LangGraph SDK provides a typed client for interacting with deployed agents:

from langgraph_sdk import get_client

client = get_client(url="https://your-deployment.langgraph.app")

# Create a thread
thread = await client.threads.create()

# Run the agent
result = await client.runs.create(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Track order 12345"}]},
)

# Stream responses
async for chunk in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="my_agent",
    input={"messages": [{"role": "human", "content": "Any updates?"}]},
    stream_mode="messages",
):
    print(chunk)

Cron Triggers for Scheduled Agents

Run agents on a schedule for monitoring, reporting, or maintenance tasks:

# langgraph.json
{
    "dependencies": ["."],
    "graphs": {
        "monitor": "./agent/monitor.py:graph"
    },
    "crons": {
        "daily_report": {
            "graph": "monitor",
            "schedule": "0 9 * * *",
            "input": {
                "messages": [{"role": "human", "content": "Generate daily status report"}]
            }
        }
    }
}

The cron trigger creates a new thread for each execution, runs the graph, and stores the result. You can query past cron runs through the API.

Monitoring and Observability

LangGraph integrates with LangSmith for tracing and monitoring:

# Set environment variables for tracing
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "production-agent"

Every graph execution is traced end-to-end, showing node timings, LLM calls, tool invocations, and state transitions. Set up alerts for error rates, latency spikes, and token usage.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Self-Hosted Production Patterns

If you prefer to self-host rather than use LangGraph Cloud, here are the essential patterns:

# Use PostgreSQL for production checkpointing
from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = os.environ["DATABASE_URL"]

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    checkpointer.setup()
    graph = builder.compile(checkpointer=checkpointer)

    # Wrap in FastAPI for HTTP access
    from fastapi import FastAPI
    app = FastAPI()

    @app.post("/chat/{thread_id}")
    async def chat(thread_id: str, message: str):
        config = {"configurable": {"thread_id": thread_id}}
        result = await graph.ainvoke(
            {"messages": [{"role": "human", "content": message}]},
            config,
        )
        return {"response": result["messages"][-1].content}

Use PostgreSQL for state persistence, Redis for caching, and a process manager like Gunicorn with Uvicorn workers for concurrency.

Scaling Considerations

Stateful agents require careful scaling. Each thread is independent, so you can distribute threads across workers. But a single thread's execution must happen on one worker since the in-progress state is in memory. Use sticky sessions or a queue-based architecture where each run is claimed by exactly one worker.

FAQ

How much does LangGraph Cloud cost?

LangGraph Cloud pricing is based on compute time and storage. Check the LangSmith pricing page for current rates. For high-volume deployments, self-hosting with PostgreSQL and your own compute is typically more cost-effective.

Can I deploy multiple graph versions simultaneously?

Yes. LangGraph Cloud supports versioned deployments. You can route traffic between versions using assistant IDs, enabling canary deployments and A/B testing of different agent configurations.

How do I handle secrets and API keys in production?

Never hardcode secrets. Use environment variables configured through the .env file referenced in langgraph.json or through your cloud provider's secrets management. LangGraph Cloud encrypts environment variables at rest and injects them at runtime.


#LangGraph #Production #Deployment #LangGraphCloud #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Human-in-the-Loop Hybrid Agents: 73% Fewer Errors in 2026

Fully autonomous agents are still a fantasy in production. LangGraph's interrupt() lets you pause for human approval mid-graph without losing state. We cover approve/edit/reject/respond actions and CallSphere's escalation ladder.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Beyond single-shot RAG — agentic RAG with LangGraph that re-retrieves, self-grades, and rewrites queries. With evals that catch silent retrieval drift.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

Build a browser agent with LangGraph and Playwright that does multi-step web tasks, then ground-truth its work with visual diffs and DOM-based evaluators.