Skip to content
Learn Agentic AI
Learn Agentic AI11 min read8 views

Custom Model Providers with OpenAI Agents SDK: Using Any LLM as Your Agent Brain

Learn how to implement the Model protocol in OpenAI Agents SDK to connect any LLM — Anthropic Claude, local Ollama models, or custom endpoints — as your agent's reasoning engine with full tool-calling support.

Why Custom Model Providers Matter

The OpenAI Agents SDK ships with built-in support for OpenAI models, but production teams rarely use a single LLM vendor. You might need Claude for nuanced reasoning, a local Llama model for cost-sensitive tasks, or a fine-tuned endpoint for domain-specific work. The SDK's Model protocol lets you swap in any LLM without changing your agent logic.

This decoupling is the key architectural insight: your agent's behavior (instructions, tools, handoffs) stays the same regardless of which model powers the reasoning.

Understanding the Model Protocol

The SDK defines a Model protocol that any custom provider must implement. At its core, you need to provide a single method — get_response — that accepts the agent's conversation history and returns a structured response.

flowchart LR
    INPUT(["User input"])
    AGENT["Agent<br/>name plus instructions"]
    HAND{"Handoff to<br/>another agent?"}
    SUB["Sub-agent<br/>specialist"]
    GUARD{"Guardrail<br/>passed?"}
    TOOL["Tool call"]
    SDK[("Tracing<br/>OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from __future__ import annotations
from agents import Agent, Runner, Model, ModelProvider
from agents.models import ModelResponse, ModelUsage
from agents.items import (
    TResponseInputItem,
    TResponseOutputItem,
    ModelResponse,
)
from dataclasses import dataclass
from typing import Any
import anthropic

@dataclass
class AnthropicModelResponse:
    output: list[TResponseOutputItem]
    usage: ModelUsage

class AnthropicModel(Model):
    """Custom model that routes agent calls to Anthropic Claude."""

    def __init__(self, model_name: str = "claude-sonnet-4-20250514"):
        self.model_name = model_name
        self.client = anthropic.AsyncAnthropic()

    async def get_response(
        self,
        system_instructions: str | None,
        input: list[TResponseInputItem],
        model_settings: Any,
        tools: list,
        output_schema: Any | None,
        handoffs: list,
        tracing: Any,
    ) -> ModelResponse:
        # Convert SDK messages to Anthropic format
        messages = self._convert_messages(input)

        response = await self.client.messages.create(
            model=self.model_name,
            max_tokens=model_settings.max_tokens or 4096,
            system=system_instructions or "",
            messages=messages,
            temperature=model_settings.temperature or 0.7,
        )

        return self._convert_response(response)

    def _convert_messages(self, input_items):
        """Transform SDK input items to Anthropic message format."""
        messages = []
        for item in input_items:
            if hasattr(item, "role") and hasattr(item, "content"):
                messages.append({
                    "role": item.role if item.role != "system" else "user",
                    "content": item.content,
                })
        return messages if messages else [{"role": "user", "content": "Hello"}]

    def _convert_response(self, response):
        """Transform Anthropic response back to SDK format."""
        # Build output items from response content blocks
        output_text = ""
        for block in response.content:
            if block.type == "text":
                output_text += block.text

        return ModelResponse(
            output=[],  # Simplified — populate with proper items
            usage=ModelUsage(
                input_tokens=response.usage.input_tokens,
                output_tokens=response.usage.output_tokens,
                requests=1,
            ),
            response_id=response.id,
        )

Building a Custom Model Provider

A ModelProvider maps model name strings to Model instances. This lets you register multiple backends under a single provider.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
class MultiModelProvider(ModelProvider):
    """Routes model names to different LLM backends."""

    def __init__(self):
        self._models: dict[str, Model] = {}

    def register(self, name: str, model: Model):
        self._models[name] = model

    def get_model(self, model_name: str | None) -> Model:
        if model_name and model_name in self._models:
            return self._models[model_name]
        raise ValueError(f"Unknown model: {model_name}")

# Register providers
provider = MultiModelProvider()
provider.register("claude-sonnet", AnthropicModel("claude-sonnet-4-20250514"))
provider.register("claude-haiku", AnthropicModel("claude-haiku-4-20250514"))

Connecting a Local Ollama Model

For local inference, you can implement a provider that calls Ollama's HTTP API.

import httpx

class OllamaModel(Model):
    def __init__(self, model_name: str = "llama3", base_url: str = "http://localhost:11434"):
        self.model_name = model_name
        self.base_url = base_url
        self.client = httpx.AsyncClient(timeout=120.0)

    async def get_response(self, system_instructions, input, model_settings, tools, output_schema, handoffs, tracing):
        messages = []
        if system_instructions:
            messages.append({"role": "system", "content": system_instructions})
        for item in input:
            if hasattr(item, "role"):
                messages.append({"role": item.role, "content": item.content})

        resp = await self.client.post(
            f"{self.base_url}/api/chat",
            json={"model": self.model_name, "messages": messages, "stream": False},
        )
        data = resp.json()
        return self._build_response(data)

Wiring It Into Your Agent

Once your provider is ready, pass it when creating an agent.

import asyncio

agent = Agent(
    name="research_assistant",
    instructions="You are a helpful research assistant.",
    model="claude-sonnet",  # This name is resolved by the provider
)

async def main():
    result = await Runner.run(
        agent,
        input="Summarize the latest advances in quantum computing.",
        run_config={"model_provider": provider},
    )
    print(result.final_output)

asyncio.run(main())

The agent code has zero awareness of which vendor is running under the hood. Switching from Claude to a local Llama model is a one-line configuration change.

When to Use Custom Providers

Custom model providers solve real production problems: cost optimization by routing simple tasks to cheaper models, compliance by keeping sensitive data on local models, redundancy by failing over between vendors, and specialization by directing domain tasks to fine-tuned endpoints.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

FAQ

Can I use tool calling with custom model providers?

Yes, but your custom Model implementation must convert the SDK's tool definitions into whatever format your target LLM expects. For Anthropic, this means transforming the JSON schema into Claude's tool format. For local models without native tool calling, you can inject tool descriptions into the system prompt and parse the output yourself.

Does streaming work with custom providers?

The SDK supports a get_stream_response method alongside get_response. Implement this method to return an async iterator of chunks. If you skip it, the SDK falls back to the non-streaming path, which still works but returns the full response at once.

How do I handle authentication for multiple providers?

Each Model instance manages its own authentication. Store API keys in environment variables and read them in each model's constructor. Avoid passing keys through the agent layer — the model provider encapsulates all vendor-specific details.


#OpenAIAgentsSDK #CustomModelProvider #LLMIntegration #Anthropic #Ollama #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Tool Selection Accuracy: The Eval Most Teams Skip — and Should Not (2026)

Your agent picked the wrong tool 12% of the time and the final answer was still right. That's a latent bug. Here's the eval pipeline that surfaces it.

Agentic AI

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026

How to stream tokens, tool-call deltas, and intermediate steps from an agent — with code for both the OpenAI Agents SDK and LangChain — and the gotchas that bite in production.

Agentic AI

OpenAI Agents SDK vs Assistants API in 2026: Migration Guide with Eval Parity

Honest principal-engineer comparison of the OpenAI Agents SDK and the legacy Assistants API, with a migration checklist and eval-parity strategy so you don't ship regressions.

Agentic AI

Parallel Tool Calling in the OpenAI Agents SDK: When It Helps, When It Hurts (2026)

OpenAI's parallel function calling can cut latency in half — or burn money on dependent calls. The architecture, code, and an eval that proves the win.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

Token-Level Evaluation of Streaming Agents: TTFT, Stream Smoothness, and Mid-Stream Hallucination Detection

Streaming changes the eval game — final-answer correctness isn't enough when users perceive the answer one token at a time. Here's the metric set that matters.