Skip to content
Learn Agentic AI
Learn Agentic AI14 min read2 views

Building a Data Analysis Agent: Natural Language to SQL and Visualizations

Learn how to build an AI agent that converts natural language questions into SQL queries, executes them against a database, generates charts from the results, and provides plain-English interpretations of the data.

Why Data Analysis Needs an Agent Layer

Most organizations store critical business data in SQL databases, but only a fraction of employees know how to write SQL. A data analysis agent bridges this gap by accepting natural language questions like "What were our top 5 products by revenue last quarter?" and returning both the answer and a visualization — no SQL knowledge required.

This is not just a text-to-SQL translator. A proper data analysis agent forms a loop: it understands the schema, generates a query, executes it, checks for errors, builds a chart if appropriate, and explains the results in plain language. Each of these steps requires a different tool, and the agent orchestrates them autonomously.

Architecture Overview

The agent needs four core tools:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
  1. Schema inspector — retrieves table names, columns, and types from the database
  2. SQL executor — runs a generated query and returns rows
  3. Chart generator — creates visualizations from tabular results
  4. Interpreter — produces a natural language summary of the data

Here is the foundational setup:

import sqlite3
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import io
import base64
from agents import Agent, Runner, function_tool

DB_PATH = "sales.db"

def get_connection():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    return conn

Tool 1: Schema Inspector

The agent must understand the database structure before writing SQL. This tool returns every table with its columns and types:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
@function_tool
def inspect_schema() -> str:
    """Return all table names, column names, and column types in the database."""
    conn = get_connection()
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = [row["name"] for row in cursor.fetchall()]

    schema_parts = []
    for table in tables:
        cursor.execute(f"PRAGMA table_info({table})")
        cols = cursor.fetchall()
        col_defs = [f"  {c['name']} ({c['type']})" for c in cols]
        schema_parts.append(f"Table: {table}\n" + "\n".join(col_defs))

    conn.close()
    return "\n\n".join(schema_parts)

Tool 2: SQL Executor with Safety Checks

Never let an LLM run arbitrary write operations on your database. The executor validates that the query is read-only before running it:

@function_tool
def execute_sql(query: str) -> str:
    """Execute a read-only SQL query and return up to 50 rows as text."""
    normalized = query.strip().upper()
    if not normalized.startswith("SELECT"):
        return "Error: Only SELECT queries are allowed."

    conn = get_connection()
    try:
        cursor = conn.cursor()
        cursor.execute(query)
        rows = cursor.fetchmany(50)
        if not rows:
            return "Query returned 0 rows."
        headers = rows[0].keys()
        lines = ["\t".join(headers)]
        for row in rows:
            lines.append("\t".join(str(row[h]) for h in headers))
        return "\n".join(lines)
    except Exception as e:
        return f"SQL Error: {e}"
    finally:
        conn.close()

Tool 3: Chart Generator

When the data suits a visual representation, the agent can produce a bar chart, line chart, or pie chart:

@function_tool
def generate_chart(
    chart_type: str, labels: list[str], values: list[float], title: str
) -> str:
    """Create a chart and return it as a base64-encoded PNG image.
    chart_type must be one of: bar, line, pie."""
    fig, ax = plt.subplots(figsize=(8, 5))
    if chart_type == "bar":
        ax.bar(labels, values)
    elif chart_type == "line":
        ax.plot(labels, values, marker="o")
    elif chart_type == "pie":
        ax.pie(values, labels=labels, autopct="%1.1f%%")
    else:
        return "Error: Unsupported chart type."

    ax.set_title(title)
    fig.tight_layout()

    buf = io.BytesIO()
    fig.savefig(buf, format="png")
    plt.close(fig)
    buf.seek(0)
    return base64.b64encode(buf.read()).decode()

Assembling the Agent

Wire the tools together and give the agent clear instructions about its workflow:

data_analyst = Agent(
    name="Data Analyst",
    instructions="""You are a data analysis agent. When the user asks a question:
1. Call inspect_schema to understand the database structure.
2. Write a SQL query to answer the question. Use execute_sql to run it.
3. If the query fails, read the error and fix the SQL.
4. If the results suit a chart, call generate_chart.
5. Always end with a plain-English interpretation of the findings.""",
    tools=[inspect_schema, execute_sql, generate_chart],
)

result = Runner.run_sync(
    data_analyst, "Which product category had the highest revenue last month?"
)
print(result.final_output)

The agent loop handles the rest. It inspects the schema, discovers the orders and products tables, writes a JOIN with a date filter, executes the query, generates a bar chart, and summarizes: "Electronics led with $42,300 in revenue last month, followed by Apparel at $31,800."

Production Hardening Tips

Query cost limits. Wrap the executor with a timeout and a row cap so a poorly written query cannot lock the database or return millions of rows.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Schema caching. Call inspect_schema once per session and inject the result into the agent instructions rather than calling the tool on every question.

Parameterized queries. For databases with user-supplied filter values, extend the executor to accept parameters and use parameterized queries to prevent SQL injection.

FAQ

Can this agent handle JOINs across multiple tables?

Yes. By providing the full schema — including foreign key relationships — the LLM reliably generates multi-table JOINs. Include sample rows in the schema description if the column names are ambiguous.

How do I prevent the agent from running destructive queries?

The executor shown above rejects any query that does not start with SELECT. For stronger guarantees, connect with a read-only database user that has no INSERT, UPDATE, or DELETE privileges at the database level.

What if the generated SQL is incorrect?

The agent loop naturally handles this. When execute_sql returns an error message, the LLM reads it, identifies the issue (wrong column name, missing GROUP BY), and generates a corrected query on the next iteration.


#DataAnalysis #TexttoSQL #Visualization #Python #AIAgents #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

AI Strategy

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Q1 2026 saw a record acquisition wave: Aircall bought Vogent (May), Meta acquired Manus and PlayAI, OpenAI closed six deals. The voice AI consolidation phase has begun.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026

The supervisor pattern in LangGraph for coordinating specialist agents, with full code, an eval pipeline that scores routing accuracy, and the failure modes to watch for.