Building Agents Without Frameworks: When Raw API Calls Beat Abstractions

The Framework Tax

Every framework adds a layer between your code and the LLM API. That layer provides convenience — tool registration, conversation management, retry logic — but also adds complexity. You inherit the framework's abstractions, opinions, bugs, and update cadence. When something goes wrong, you debug through the framework's stack traces instead of your own code.

For many use cases, the framework tax is worth paying. But for others — especially simple agents, latency-sensitive applications, or systems with unusual requirements — building directly against the LLM API gives you full control with minimal overhead.

The Minimal Agent Loop

An agent is fundamentally a loop: send a message to the LLM, check if it wants to call a tool, execute the tool, send the result back, and repeat until the LLM produces a final response. Here is that loop in about 60 lines:

flowchart LR
    CLIENT(["Client SDK"])
    GW["API Gateway<br/>auth plus rate limit"]
    APP["FastAPI app<br/>handlers and DI"]
    VAL["Pydantic validation"]
    SVC["Service layer<br/>business logic"]
    DB[(Database)]
    QUEUE[(Background queue)]
    OBS[(Tracing)]
    CLIENT --> GW --> APP --> VAL --> SVC
    SVC --> DB
    SVC --> QUEUE
    SVC --> OBS
    SVC --> CLIENT
    style GW fill:#4f46e5,stroke:#4338ca,color:#fff
    style APP fill:#f59e0b,stroke:#d97706,color:#1f2937
    style DB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b

import json
import openai

client = openai.OpenAI()

# Tool registry: maps function names to callables
TOOLS = {}

def tool(func):
    """Register a function as an agent tool."""
    TOOLS[func.__name__] = func
    return func

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # In production, call a real weather API
    return json.dumps({"city": city, "temp_f": 72, "condition": "sunny"})

@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        result = eval(expression, {"__builtins__": {}})
        return json.dumps({"result": result})
    except Exception as e:
        return json.dumps({"error": str(e)})

# Tool schemas for the API
TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Evaluate a mathematical expression.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string", "description": "Math expression"}
                },
                "required": ["expression"],
            },
        },
    },
]

def run_agent(user_message: str, system_prompt: str = "You are a helpful assistant.") -> str:
    """Run a minimal agent loop."""
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message},
    ]

    max_iterations = 10
    for _ in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOL_SCHEMAS,
        )
        choice = response.choices[0]

        # If the model wants to call tools
        if choice.finish_reason == "tool_calls":
            messages.append(choice.message)
            for tool_call in choice.message.tool_calls:
                fn_name = tool_call.function.name
                fn_args = json.loads(tool_call.function.arguments)
                result = TOOLS[fn_name](**fn_args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result,
                })
        else:
            # Final response
            return choice.message.content

    return "Agent reached maximum iterations without completing."

# Usage
answer = run_agent("What is the weather in Tokyo, and what is 42 * 17?")
print(answer)

This is a complete, working agent. It handles multi-tool calls in a single turn, loops until the LLM decides it is done, and caps iterations to prevent runaway costs. No framework required.

Adding Streaming

Streaming is straightforward with the raw API:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

def run_agent_streaming(user_message: str, system_prompt: str = "You are a helpful assistant."):
    """Run agent with streaming final response."""
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message},
    ]

    max_iterations = 10
    for _ in range(max_iterations):
        # Non-streaming call for tool use turns
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOL_SCHEMAS,
        )
        choice = response.choices[0]

        if choice.finish_reason == "tool_calls":
            messages.append(choice.message)
            for tool_call in choice.message.tool_calls:
                fn_name = tool_call.function.name
                fn_args = json.loads(tool_call.function.arguments)
                result = TOOLS[fn_name](**fn_args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result,
                })
        else:
            # Stream the final response
            stream = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                stream=True,
            )
            for chunk in stream:
                if chunk.choices[0].delta.content:
                    yield chunk.choices[0].delta.content
            return

    yield "Agent reached maximum iterations."

When Frameworks Are Not Worth It

Simple single-agent tools: If your agent has 2-5 tools and a single conversation loop, the raw API is cleaner than importing a framework.

Latency-critical paths: Frameworks add milliseconds of overhead per turn from abstraction layers, event hooks, and serialization. For sub-second agent responses, every millisecond counts.

Unusual conversation patterns: If your agent loop does not fit the standard "LLM calls tools in a loop" pattern — for example, you need to interleave human approval steps, external event triggers, or custom branching logic — a framework's assumptions may fight you.

Learning and understanding: Building from scratch once teaches you what frameworks actually do. You become a better user of frameworks when you understand the primitives underneath.

When Frameworks Win

Multi-agent orchestration: Handoffs, delegation, and group chat patterns are genuinely complex. Frameworks like the OpenAI Agents SDK and AutoGen save significant effort here.

Observability: Built-in tracing, logging, and debugging tools in frameworks like LangChain (with LangSmith) or the Agents SDK are hard to replicate manually.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Rapid prototyping: When you need to test an idea quickly, frameworks eliminate boilerplate and let you focus on the logic.

Team projects: Frameworks provide conventions that keep a team's code consistent. Without a framework, every developer invents their own agent loop.

FAQ

Is there a performance difference between framework agents and raw API agents?

The LLM API call dominates execution time (hundreds of milliseconds to seconds). Framework overhead is typically 1-10ms per turn — negligible for most applications. The performance argument for raw APIs is strongest in high-throughput scenarios processing thousands of agent runs per second.

How do I handle errors in a framework-free agent?

Add try/except around tool execution and include the error in the tool response message. The LLM will see the error and can retry or adjust its approach. Also add timeout handling on the API call itself and validate tool arguments before execution.

Should I build my own framework over time?

Many teams start with raw API calls and gradually extract reusable patterns into an internal library. This is a valid approach — you end up with a framework tailored to your specific needs. The risk is maintaining it as the LLM APIs evolve.

#AgentArchitecture #APIDesign #Python #MinimalAgents #FrameworkFree #AgenticAI #LearnAI #AIEngineering

Building Agents Without Frameworks: When Raw API Calls Beat Abstractions

The Framework Tax

The Minimal Agent Loop

Adding Streaming

When Frameworks Are Not Worth It

When Frameworks Win

FAQ

Is there a performance difference between framework agents and raw API agents?

How do I handle errors in a framework-free agent?

Should I build my own framework over time?

Try CallSphere AI Voice Agents

Related Articles You May Like

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Deploy a Voice Agent on Modal with Python and Serverless GPU

Function Calling Deep Dive: CallSphere 14 Tools vs Vapi Patterns