Skip to content
Learn Agentic AI
Learn Agentic AI12 min read12 views

MCPServerStreamableHTTP: Connecting to Remote Tool Servers

Connect agents to remote MCP tool servers using MCPServerStreamableHTTP with authentication headers, timeout configuration, retry policies, tool caching, and production deployment patterns.

When to Use Streamable HTTP

MCPServerStdio works great when the tool server runs on the same machine as the agent. But in production, your tools often live on remote servers — a company API, a cloud service, a shared tool server accessible by multiple agents. MCPServerStreamableHTTP connects your agent to remote MCP servers over HTTP, with support for streaming responses, authentication, retries, and tool caching.

Use Streamable HTTP when:

  • The MCP server runs on a different machine or in the cloud
  • Multiple agents need to share the same tool server
  • The tool server needs to scale independently from agents
  • You need authentication, rate limiting, or other HTTP-layer features

Basic Configuration

from agents.mcp import MCPServerStreamableHTTP

server = MCPServerStreamableHTTP(
    name="Remote Tools",
    params={
        "url": "https://tools.example.com/mcp",
    },
)

The url points to the MCP endpoint on the remote server. The Streamable HTTP transport communicates using HTTP POST requests with JSON-RPC payloads and receives streaming responses via Server-Sent Events.

flowchart LR
    HOST(["MCP host<br/>Claude Desktop or IDE"])
    CLIENT["MCP client"]
    subgraph SERVERS["MCP Servers"]
        S1["Filesystem server"]
        S2["GitHub server"]
        S3["Postgres server"]
        SX["Custom tool server"]
    end
    LLM["LLM session"]
    OUT(["Grounded action"])
    HOST <--> CLIENT
    CLIENT <-->|stdio or HTTP+SSE| S1
    CLIENT <--> S2
    CLIENT <--> S3
    CLIENT <--> SX
    CLIENT --> LLM --> OUT
    style HOST fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style CLIENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

Authentication with Headers

Most remote MCP servers require authentication. Pass headers in the configuration:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
import os

server = MCPServerStreamableHTTP(
    name="Authenticated Tools",
    params={
        "url": "https://tools.example.com/mcp",
        "headers": {
            "Authorization": f"Bearer {os.environ['MCP_API_KEY']}",
            "X-Org-Id": "org_12345",
        },
    },
)

For OAuth-based authentication where tokens expire:

class TokenRefreshingMCPServer:
    """Wrapper that refreshes auth tokens before connecting."""

    def __init__(self, url: str, token_provider):
        self.url = url
        self.token_provider = token_provider

    async def get_server(self) -> MCPServerStreamableHTTP:
        token = await self.token_provider.get_valid_token()
        return MCPServerStreamableHTTP(
            name="OAuth Tools",
            params={
                "url": self.url,
                "headers": {
                    "Authorization": f"Bearer {token}",
                },
            },
        )

# Usage
token_provider = OAuthTokenProvider(
    client_id="your_client_id",
    client_secret="your_client_secret",
    token_url="https://auth.example.com/token",
)

refreshing_server = TokenRefreshingMCPServer(
    url="https://tools.example.com/mcp",
    token_provider=token_provider,
)

server = await refreshing_server.get_server()

Timeout and Retry Configuration

Remote servers can be slow or temporarily unavailable. Configure timeouts and retries to handle this gracefully:

server = MCPServerStreamableHTTP(
    name="Resilient Remote Tools",
    params={
        "url": "https://tools.example.com/mcp",
        "headers": {
            "Authorization": f"Bearer {os.environ['MCP_API_KEY']}",
        },
        "timeout": 30,           # Connection timeout in seconds
        "sse_read_timeout": 300,  # SSE stream read timeout for long operations
    },
)

The distinction between timeout and sse_read_timeout matters: timeout is the initial connection timeout, while sse_read_timeout controls how long to wait for streaming data. Long-running tools (like database migrations or file processing) need a generous sse_read_timeout.

Retry with Backoff

For production reliability, configure retry behavior:

from agents.mcp import MCPServerStreamableHTTP

server = MCPServerStreamableHTTP(
    name="Production Tools",
    params={
        "url": "https://tools.example.com/mcp",
        "headers": {"Authorization": f"Bearer {os.environ['MCP_API_KEY']}"},
    },
    # Client-side retry configuration
    client_session_timeout_seconds=300,
)

For more control over retries, wrap the server connection with custom logic:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

import asyncio
from typing import Optional

async def connect_with_retry(
    server: MCPServerStreamableHTTP,
    max_attempts: int = 3,
    base_delay: float = 1.0,
) -> bool:
    """Connect to an MCP server with exponential backoff."""
    for attempt in range(max_attempts):
        try:
            await server.connect()
            return True
        except Exception as e:
            if attempt == max_attempts - 1:
                raise
            delay = base_delay * (2 ** attempt)
            print(f"Connection attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
            await asyncio.sleep(delay)
    return False

Caching Tool Lists for Performance

Every time you enter the async with block, the client fetches the server's tool list. For servers with stable tool sets, this is redundant overhead. Enable caching:

server = MCPServerStreamableHTTP(
    name="Cached Tools",
    params={
        "url": "https://tools.example.com/mcp",
        "headers": {"Authorization": f"Bearer {os.environ['MCP_API_KEY']}"},
    },
    cache_tools_list=True,  # Cache the tool list across connections
)

With cache_tools_list=True, the tool list is fetched once and reused on subsequent connections. This saves a round trip on every agent run. Disable caching only if the server's tools change frequently.

Building an Agent with Remote API Tools

Here is a complete example connecting to a remote CRM tools server:

import asyncio
from agents import Agent, Runner
from agents.mcp import MCPServerStreamableHTTP

async def main():
    # Connect to a remote CRM tool server
    crm_server = MCPServerStreamableHTTP(
        name="CRM Tools",
        params={
            "url": "https://crm-tools.internal.company.com/mcp",
            "headers": {
                "Authorization": f"Bearer {os.environ['CRM_MCP_TOKEN']}",
                "X-Team": "sales",
            },
            "timeout": 15,
            "sse_read_timeout": 120,
        },
        cache_tools_list=True,
    )

    # Connect to a remote analytics server
    analytics_server = MCPServerStreamableHTTP(
        name="Analytics Tools",
        params={
            "url": "https://analytics-tools.internal.company.com/mcp",
            "headers": {
                "Authorization": f"Bearer {os.environ['ANALYTICS_MCP_TOKEN']}",
            },
        },
        cache_tools_list=True,
    )

    async with crm_server, analytics_server:
        agent = Agent(
            name="Sales Intelligence Agent",
            instructions="""You are a sales intelligence assistant with access
            to CRM data and analytics tools.

            Use CRM tools to look up contacts, deals, and account history.
            Use analytics tools to pull pipeline metrics and forecasts.

            Always cite specific data points when making recommendations.
            Never guess — if you cannot find the data, say so.""",
            mcp_servers=[crm_server, analytics_server],
        )

        result = await Runner.run(
            agent,
            input="What is the current pipeline value for Q2 and which deals are most at risk?",
        )
        print(result.final_output)

asyncio.run(main())

Building a Remote MCP Server

Here is how to build the server side using FastMCP with HTTP transport:

# crm_tools_server.py
from mcp.server.fastmcp import FastMCP
import asyncpg

mcp = FastMCP("CRM Tools")
db_pool = None

@mcp.tool()
async def search_contacts(query: str, limit: int = 10) -> str:
    """Search CRM contacts by name, email, or company."""
    rows = await db_pool.fetch(
        """
        SELECT name, email, company, deal_count, total_revenue
        FROM contacts
        WHERE name ILIKE $1 OR email ILIKE $1 OR company ILIKE $1
        ORDER BY total_revenue DESC
        LIMIT $2
        """,
        f"%{query}%",
        limit,
    )
    if not rows:
        return "No contacts found matching the query."
    results = []
    for r in rows:
        results.append(
            f"- {r['name']} ({r['email']}) at {r['company']}: "
            f"{r['deal_count']} deals, ${r['total_revenue']:,.0f} revenue"
        )
    return "\n".join(results)

@mcp.tool()
async def get_pipeline_summary(quarter: str) -> str:
    """Get deal pipeline summary for a given quarter (e.g., 'Q2 2026')."""
    rows = await db_pool.fetch(
        """
        SELECT stage, COUNT(*) as deal_count, SUM(value) as total_value
        FROM deals
        WHERE quarter = $1
        GROUP BY stage
        ORDER BY total_value DESC
        """,
        quarter,
    )
    if not rows:
        return f"No pipeline data found for {quarter}."
    lines = [f"Pipeline for {quarter}:"]
    for r in rows:
        lines.append(
            f"  {r['stage']}: {r['deal_count']} deals, ${r['total_value']:,.0f}"
        )
    return "\n".join(lines)

if __name__ == "__main__":
    import asyncio

    async def setup():
        global db_pool
        db_pool = await asyncpg.create_pool(dsn="postgresql://user:pass@db:5432/crm")
        mcp.run(transport="streamable-http", host="0.0.0.0", port=8080)

    asyncio.run(setup())

Production Deployment Patterns

  1. Health checks — Add a /health endpoint to your MCP server for load balancer probes
  2. Rate limiting — Implement per-client rate limits to prevent one agent from monopolizing resources
  3. Request logging — Log every tool invocation with trace IDs for debugging
  4. Circuit breaker — If the remote server fails repeatedly, stop trying and fall back gracefully
  5. mTLS — Use mutual TLS for service-to-service authentication in internal networks
  6. Connection pooling — Reuse HTTP connections across multiple agent runs

MCPServerStreamableHTTP is the production transport for multi-service architectures where tools live on dedicated servers.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Infrastructure

MCP Registry Catalogs in 2026: Official Registry vs Smithery vs mcp.so

The Official MCP Registry hit API freeze v0.1. Smithery has 7,000+ servers, mcp.so has 19,700+, PulseMCP is hand-curated. We compare discovery, install, and security across the major catalogs.

AI Infrastructure

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

The public MCP registry crossed 9,400 servers in April 2026. Here is a curated walkthrough of the SaaS MCP servers CallSphere mounts in production, with OAuth 2.1 PKCE patterns.

Agentic AI

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

Build a browser agent with LangGraph and Playwright that does multi-step web tasks, then ground-truth its work with visual diffs and DOM-based evaluators.

Agentic AI

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

Build a working computer-use agent with the OpenAI Computer Use tool — clicks, types, scrolls a real browser — then evaluate task success on a benchmark suite.

Funding & Industry

OpenAI revenue run-rate — April 2026 read — April 2026 update

OpenAI's April 2026 reported revenue run-rate cleared $13B annualized, on continued ChatGPT growth, agentic Operator monetization, and enterprise API expansion.

Funding & Industry

Stargate progress update — April 2026 site and capex

OpenAI's Stargate with Oracle and SoftBank crossed a milestone in April 2026 with the first Texas site partially energized and three additional sites under construction.