Building a Data Analysis Agent: Natural Language to SQL and Visualizations

Why Data Analysis Needs an Agent Layer

Most organizations store critical business data in SQL databases, but only a fraction of employees know how to write SQL. A data analysis agent bridges this gap by accepting natural language questions like "What were our top 5 products by revenue last quarter?" and returning both the answer and a visualization — no SQL knowledge required.

This is not just a text-to-SQL translator. A proper data analysis agent forms a loop: it understands the schema, generates a query, executes it, checks for errors, builds a chart if appropriate, and explains the results in plain language. Each of these steps requires a different tool, and the agent orchestrates them autonomously.

Architecture Overview

The agent needs four core tools:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

Schema inspector — retrieves table names, columns, and types from the database
SQL executor — runs a generated query and returns rows
Chart generator — creates visualizations from tabular results
Interpreter — produces a natural language summary of the data

Here is the foundational setup:

import sqlite3
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import io
import base64
from agents import Agent, Runner, function_tool

DB_PATH = "sales.db"

def get_connection():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    return conn

Tool 1: Schema Inspector

The agent must understand the database structure before writing SQL. This tool returns every table with its columns and types:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

@function_tool
def inspect_schema() -> str:
    """Return all table names, column names, and column types in the database."""
    conn = get_connection()
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = [row["name"] for row in cursor.fetchall()]

    schema_parts = []
    for table in tables:
        cursor.execute(f"PRAGMA table_info({table})")
        cols = cursor.fetchall()
        col_defs = [f"  {c['name']} ({c['type']})" for c in cols]
        schema_parts.append(f"Table: {table}\n" + "\n".join(col_defs))

    conn.close()
    return "\n\n".join(schema_parts)

Tool 2: SQL Executor with Safety Checks

Never let an LLM run arbitrary write operations on your database. The executor validates that the query is read-only before running it:

@function_tool
def execute_sql(query: str) -> str:
    """Execute a read-only SQL query and return up to 50 rows as text."""
    normalized = query.strip().upper()
    if not normalized.startswith("SELECT"):
        return "Error: Only SELECT queries are allowed."

    conn = get_connection()
    try:
        cursor = conn.cursor()
        cursor.execute(query)
        rows = cursor.fetchmany(50)
        if not rows:
            return "Query returned 0 rows."
        headers = rows[0].keys()
        lines = ["\t".join(headers)]
        for row in rows:
            lines.append("\t".join(str(row[h]) for h in headers))
        return "\n".join(lines)
    except Exception as e:
        return f"SQL Error: {e}"
    finally:
        conn.close()

Tool 3: Chart Generator

When the data suits a visual representation, the agent can produce a bar chart, line chart, or pie chart:

@function_tool
def generate_chart(
    chart_type: str, labels: list[str], values: list[float], title: str
) -> str:
    """Create a chart and return it as a base64-encoded PNG image.
    chart_type must be one of: bar, line, pie."""
    fig, ax = plt.subplots(figsize=(8, 5))
    if chart_type == "bar":
        ax.bar(labels, values)
    elif chart_type == "line":
        ax.plot(labels, values, marker="o")
    elif chart_type == "pie":
        ax.pie(values, labels=labels, autopct="%1.1f%%")
    else:
        return "Error: Unsupported chart type."

    ax.set_title(title)
    fig.tight_layout()

    buf = io.BytesIO()
    fig.savefig(buf, format="png")
    plt.close(fig)
    buf.seek(0)
    return base64.b64encode(buf.read()).decode()

Assembling the Agent

Wire the tools together and give the agent clear instructions about its workflow:

data_analyst = Agent(
    name="Data Analyst",
    instructions="""You are a data analysis agent. When the user asks a question:
1. Call inspect_schema to understand the database structure.
2. Write a SQL query to answer the question. Use execute_sql to run it.
3. If the query fails, read the error and fix the SQL.
4. If the results suit a chart, call generate_chart.
5. Always end with a plain-English interpretation of the findings.""",
    tools=[inspect_schema, execute_sql, generate_chart],
)

result = Runner.run_sync(
    data_analyst, "Which product category had the highest revenue last month?"
)
print(result.final_output)

The agent loop handles the rest. It inspects the schema, discovers the orders and products tables, writes a JOIN with a date filter, executes the query, generates a bar chart, and summarizes: "Electronics led with $42,300 in revenue last month, followed by Apparel at $31,800."

Production Hardening Tips

Query cost limits. Wrap the executor with a timeout and a row cap so a poorly written query cannot lock the database or return millions of rows.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Schema caching. Call inspect_schema once per session and inject the result into the agent instructions rather than calling the tool on every question.

Parameterized queries. For databases with user-supplied filter values, extend the executor to accept parameters and use parameterized queries to prevent SQL injection.

FAQ

Can this agent handle JOINs across multiple tables?

Yes. By providing the full schema — including foreign key relationships — the LLM reliably generates multi-table JOINs. Include sample rows in the schema description if the column names are ambiguous.

How do I prevent the agent from running destructive queries?

The executor shown above rejects any query that does not start with SELECT. For stronger guarantees, connect with a read-only database user that has no INSERT, UPDATE, or DELETE privileges at the database level.

What if the generated SQL is incorrect?

The agent loop naturally handles this. When execute_sql returns an error message, the LLM reads it, identifies the issue (wrong column name, missing GROUP BY), and generates a corrected query on the next iteration.

#DataAnalysis #TexttoSQL #Visualization #Python #AIAgents #AgenticAI #LearnAI #AIEngineering

Building a Data Analysis Agent: Natural Language to SQL and Visualizations

Why Data Analysis Needs an Agent Layer

Architecture Overview

Tool 1: Schema Inspector

Tool 2: SQL Executor with Safety Checks

Tool 3: Chart Generator

Assembling the Agent

Production Hardening Tips

FAQ

Can this agent handle JOINs across multiple tables?

How do I prevent the agent from running destructive queries?

What if the generated SQL is incorrect?

Try CallSphere AI Voice Agents

Related Articles You May Like

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026