Skip to content
Learn Agentic AI
Learn Agentic AI12 min read17 views

Caching MCP Tool Definitions for Performance

Dramatically reduce agent startup latency by caching MCP tool definitions with cache_tools_list, implementing cache invalidation strategies, and benchmarking the performance gains in production agents.

The Hidden Cost of Tool Discovery

Every time an MCP agent starts a run, it calls list_tools() on each connected MCP server. This discovery step fetches the name, description, and JSON schema for every tool the server exposes. For a stdio server, that means spawning a subprocess, waiting for initialization, and exchanging JSON-RPC messages. For an HTTP server, it means a network round-trip.

When you have a single server with five tools, the cost is negligible. But production agents often connect to three, four, or more servers — a filesystem server, a database server, a search server, and a custom business logic server. Each server might expose ten to twenty tools. Suddenly, tool discovery adds 500 milliseconds to two seconds of latency before the agent can process its first message.

The fix is straightforward: cache the tool definitions so that discovery only happens once.

Enabling cache_tools_list

The OpenAI Agents SDK supports tool caching directly on MCP server instances. When you set cache_tools_list=True, the SDK stores the tool definitions after the first list_tools() call and reuses them on subsequent agent runs without re-fetching:

flowchart LR
    HOST(["MCP host<br/>Claude Desktop or IDE"])
    CLIENT["MCP client"]
    subgraph SERVERS["MCP Servers"]
        S1["Filesystem server"]
        S2["GitHub server"]
        S3["Postgres server"]
        SX["Custom tool server"]
    end
    LLM["LLM session"]
    OUT(["Grounded action"])
    HOST <--> CLIENT
    CLIENT <-->|stdio or HTTP+SSE| S1
    CLIENT <--> S2
    CLIENT <--> S3
    CLIENT <--> SX
    CLIENT --> LLM --> OUT
    style HOST fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style CLIENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
from agents.mcp import MCPServerStdio, MCPServerStreamableHTTP

# Stdio server with caching enabled
filesystem_server = MCPServerStdio(
    name="Filesystem",
    params={
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"],
    },
    cache_tools_list=True,
)

# HTTP server with caching enabled
db_server = MCPServerStreamableHTTP(
    name="Database",
    params={
        "url": "http://localhost:8001/mcp",
    },
    cache_tools_list=True,
)

With caching enabled, the first agent run performs normal tool discovery. Every subsequent run skips the discovery step entirely and uses the cached schemas. For stdio servers, this is especially impactful because it avoids re-spawning the subprocess just to enumerate tools.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

How the Cache Works Internally

The caching mechanism is simple but effective. When cache_tools_list is True, the SDK stores the result of list_tools() in memory on the server object. On subsequent calls, it returns the stored list immediately instead of making a JSON-RPC request.

This means the cache lives for the lifetime of the server object. If you create a new MCPServerStdio instance, it starts with an empty cache. If you reuse the same instance across multiple Runner.run() calls — which is the recommended pattern — the cache persists.

from agents import Agent, Runner

# Create server once, reuse across runs
server = MCPServerStdio(
    name="Tools",
    params={"command": "npx", "args": ["-y", "@modelcontextprotocol/server-tools"]},
    cache_tools_list=True,
)

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant.",
    mcp_servers=[server],
)

async def handle_request(user_message: str):
    # First call: discovers tools, caches them
    # All subsequent calls: uses cached tool list
    result = await Runner.run(agent, user_message)
    return result.final_output

Invalidating the Cache

Caching introduces a consistency problem. If the MCP server adds, removes, or modifies tools after the initial discovery, the cached list becomes stale. The agent might try to call a tool that no longer exists, or miss a newly added tool.

The SDK provides invalidate_tools_cache() to handle this:

# After deploying a new version of the MCP server
filesystem_server.invalidate_tools_cache()

# The next Runner.run() call will re-discover tools
result = await Runner.run(agent, "List all files in /data")

You can also build automatic invalidation into your workflow. A common pattern is to invalidate on a schedule or in response to deployment events:

import asyncio
from datetime import datetime

class ManagedMCPServer:
    def __init__(self, server, refresh_interval_seconds=300):
        self.server = server
        self.refresh_interval = refresh_interval_seconds
        self.last_refresh = datetime.now()

    async def maybe_refresh(self):
        elapsed = (datetime.now() - self.last_refresh).total_seconds()
        if elapsed > self.refresh_interval:
            self.server.invalidate_tools_cache()
            self.last_refresh = datetime.now()

    async def run_agent(self, agent, message):
        await self.maybe_refresh()
        return await Runner.run(agent, message)

Another approach is event-driven invalidation. If your MCP servers are deployed via CI/CD, you can send a webhook or message to your agent service whenever a server is redeployed:

from fastapi import FastAPI

app = FastAPI()
servers = {}

@app.post("/webhook/server-deployed")
async def on_server_deployed(server_name: str):
    if server_name in servers:
        servers[server_name].invalidate_tools_cache()
        return {"status": "cache_invalidated", "server": server_name}
    return {"status": "server_not_found"}

Latency Benchmarks

To quantify the impact of caching, here are measurements from a real agent setup with three MCP servers. The environment uses MCPServerStdio for a filesystem server, MCPServerStreamableHTTP for a database server, and another stdio server for a custom tools package.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Without caching (tool discovery on every run):

Server Discovery Time
Filesystem (stdio) 420ms
Database (HTTP) 85ms
Custom tools (stdio) 380ms
Total 885ms

With cache_tools_list=True (after first run):

Server Discovery Time
Filesystem (stdio) <1ms
Database (HTTP) <1ms
Custom tools (stdio) <1ms
Total <3ms

That is a 99.7% reduction in tool discovery latency. For an agent handling real-time chat, cutting 880 milliseconds from every response cycle is transformative.

Benchmarking Your Own Setup

You can measure tool discovery latency in your own environment with a simple timing wrapper:

import time
from agents.mcp import MCPServerStdio

async def benchmark_tool_discovery(server, iterations=10):
    times = []
    for i in range(iterations):
        server.invalidate_tools_cache()
        start = time.perf_counter()
        tools = await server.list_tools()
        elapsed = (time.perf_counter() - start) * 1000
        times.append(elapsed)
        print(f"  Run {i+1}: {elapsed:.1f}ms ({len(tools)} tools)")
    avg = sum(times) / len(times)
    print(f"  Average: {avg:.1f}ms")
    return avg

async def benchmark_cached(server, iterations=10):
    # Prime the cache
    await server.list_tools()
    times = []
    for i in range(iterations):
        start = time.perf_counter()
        tools = await server.list_tools()
        elapsed = (time.perf_counter() - start) * 1000
        times.append(elapsed)
    avg = sum(times) / len(times)
    print(f"  Cached average: {avg:.2f}ms")
    return avg

When Not to Cache

Caching is not always the right choice. Avoid it when:

  • Tools change frequently during development. If you are actively iterating on an MCP server and adding or renaming tools, stale caches will cause confusing errors.
  • The server is short-lived. If each agent run creates and destroys a new server instance, caching provides no benefit because the cache is lost with the instance.
  • Tool availability is dynamic. Some servers expose different tools based on the authenticated user or context. Caching a tool list from one user would be incorrect for another.

For all other cases — and especially in production where server definitions are stable — enabling cache_tools_list=True is one of the simplest and highest-impact performance optimizations available.

Production Recommendations

  1. Always enable caching in production. Set cache_tools_list=True on every MCP server instance that has a stable tool set.
  2. Use long-lived server objects. Create MCP server instances at application startup and reuse them across requests. Do not recreate them per request.
  3. Invalidate on deploy. Wire your CI/CD pipeline to call invalidate_tools_cache() whenever an MCP server is redeployed.
  4. Monitor discovery latency. Log the time spent in tool discovery so you can detect regressions when servers add new tools or infrastructure changes affect subprocess startup.
  5. Set refresh intervals for safety. Even with deploy-triggered invalidation, add a periodic refresh (every five to ten minutes) as a safety net against missed events.

Tool caching is a small configuration change with outsized impact. It eliminates the most common source of unnecessary latency in multi-server MCP agents and is the first optimization you should apply when moving from prototype to production.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Infrastructure

MCP Registry Catalogs in 2026: Official Registry vs Smithery vs mcp.so

The Official MCP Registry hit API freeze v0.1. Smithery has 7,000+ servers, mcp.so has 19,700+, PulseMCP is hand-curated. We compare discovery, install, and security across the major catalogs.

AI Infrastructure

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

The public MCP registry crossed 9,400 servers in April 2026. Here is a curated walkthrough of the SaaS MCP servers CallSphere mounts in production, with OAuth 2.1 PKCE patterns.

Agentic AI

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

Build a browser agent with LangGraph and Playwright that does multi-step web tasks, then ground-truth its work with visual diffs and DOM-based evaluators.

Agentic AI

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

Build a working computer-use agent with the OpenAI Computer Use tool — clicks, types, scrolls a real browser — then evaluate task success on a benchmark suite.

Funding & Industry

OpenAI revenue run-rate — April 2026 read — April 2026 update

OpenAI's April 2026 reported revenue run-rate cleared $13B annualized, on continued ChatGPT growth, agentic Operator monetization, and enterprise API expansion.

Funding & Industry

Stargate progress update — April 2026 site and capex

OpenAI's Stargate with Oracle and SoftBank crossed a milestone in April 2026 with the first Texas site partially energized and three additional sites under construction.