Database Query Tools: Letting AI Agents Read from PostgreSQL and SQLite

The Case for Database Tools

Most useful AI agents need access to data. Rather than pre-loading everything into the prompt, database query tools let agents fetch exactly the data they need on demand. This keeps context windows lean and allows agents to answer questions about large datasets that would never fit in a single prompt.

The challenge is safety. You are giving an LLM the ability to run SQL against your database. This post shows how to do it with proper guardrails.

Designing the Tool Schema

The schema should guide the LLM to produce well-structured queries:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

query_tool_schema = {
    "type": "function",
    "function": {
        "name": "query_database",
        "description": "Execute a read-only SQL query against the application database and return results. Only SELECT statements are allowed. Use this to look up customer data, order history, product information, or aggregate statistics. Always include a LIMIT clause to avoid returning too many rows.",
        "parameters": {
            "type": "object",
            "properties": {
                "sql": {
                    "type": "string",
                    "description": "A SQL SELECT query. Must start with SELECT. Include LIMIT to cap results."
                },
                "params": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Positional parameters for the query, replacing $1, $2 etc placeholders."
                }
            },
            "required": ["sql"]
        }
    }
}

Including the params field encourages the LLM to use parameterized queries instead of string interpolation, though you should enforce this server-side too.

The Safety Layer: Read-Only Enforcement

Never trust that the LLM will only generate SELECT statements. Validate every query before execution:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

import re
import sqlparse

FORBIDDEN_KEYWORDS = {
    "INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "CREATE",
    "TRUNCATE", "GRANT", "REVOKE", "EXEC", "EXECUTE",
    "INTO",  # catches SELECT INTO
}

def validate_query(sql: str) -> tuple[bool, str]:
    normalized = sql.strip().upper()

    if not normalized.startswith("SELECT"):
        return False, "Only SELECT queries are allowed"

    parsed = sqlparse.parse(sql)
    if len(parsed) != 1:
        return False, "Only single statements are allowed"

    tokens = [t.ttype for t in parsed[0].flatten() if t.ttype is not None]
    words = set(re.findall(r'\b[A-Z]+\b', normalized))

    dangerous = words.intersection(FORBIDDEN_KEYWORDS)
    if dangerous:
        return False, f"Forbidden keywords detected: {dangerous}"

    if "LIMIT" not in normalized:
        return False, "Query must include a LIMIT clause"

    return True, "OK"

This is defense in depth. Even if the LLM generates a valid-looking query, you parse it, check for forbidden keywords, enforce single-statement execution, and require a LIMIT clause.

PostgreSQL Implementation

For PostgreSQL, use asyncpg with a read-only connection:

import asyncpg
import json

class PostgreSQLTool:
    def __init__(self, dsn: str):
        self.dsn = dsn
        self.pool = None

    async def connect(self):
        self.pool = await asyncpg.create_pool(
            self.dsn,
            min_size=2,
            max_size=10,
            command_timeout=10,
        )
        # Set all connections to read-only
        async with self.pool.acquire() as conn:
            await conn.execute("SET default_transaction_read_only = ON")

    async def execute_query(self, sql: str, params: list = None) -> str:
        is_valid, message = validate_query(sql)
        if not is_valid:
            return f"Query rejected: {message}"

        try:
            async with self.pool.acquire() as conn:
                await conn.execute("SET statement_timeout = '5s'")
                rows = await conn.fetch(sql, *(params or []))

                results = [dict(row) for row in rows]
                return json.dumps(results, default=str, indent=2)
        except asyncpg.PostgresError as e:
            return f"Database error: {str(e)}"

Key safety measures here: connection pool with a maximum size to prevent resource exhaustion, a 5-second statement timeout to kill runaway queries, and read-only transaction mode at the connection level.

SQLite Implementation

SQLite is simpler but needs the same guardrails:

import sqlite3
import json

class SQLiteTool:
    def __init__(self, db_path: str):
        self.db_path = db_path

    def execute_query(self, sql: str, params: list = None) -> str:
        is_valid, message = validate_query(sql)
        if not is_valid:
            return f"Query rejected: {message}"

        try:
            conn = sqlite3.connect(
                f"file:{self.db_path}?mode=ro",
                uri=True,
                timeout=5,
            )
            conn.row_factory = sqlite3.Row
            cursor = conn.execute(sql, params or [])
            rows = [dict(row) for row in cursor.fetchall()]
            conn.close()

            return json.dumps(rows, default=str, indent=2)
        except sqlite3.Error as e:
            return f"Database error: {str(e)}"

The mode=ro URI parameter opens the database in true read-only mode at the filesystem level.

Formatting Results for the LLM

Raw JSON works, but you can help the LLM interpret results by adding metadata:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

def format_results(rows: list, sql: str) -> str:
    if not rows:
        return "No results found for the query."

    output = f"Query returned {len(rows)} row(s).\n\n"
    columns = list(rows[0].keys())
    output += "Columns: " + ", ".join(columns) + "\n\n"
    output += json.dumps(rows, default=str, indent=2)

    if len(rows) >= 100:
        output += "\n\nNote: Results may be truncated. Consider adding more specific WHERE conditions."

    return output

Providing Schema Context

The LLM needs to know your database structure to write correct queries. Include the schema in the system prompt or provide a separate tool:

schema_tool_schema = {
    "type": "function",
    "function": {
        "name": "get_database_schema",
        "description": "Returns the database table names, column names, and column types. Call this before writing a query if you are unsure about the table structure.",
        "parameters": {
            "type": "object",
            "properties": {},
            "required": []
        }
    }
}

This lets the agent discover the schema on demand rather than stuffing it all into the system prompt upfront.

FAQ

Should I let the LLM write raw SQL or use an abstraction layer?

Raw SQL with guardrails is the most flexible approach and what most production systems use. Abstraction layers like predefined query templates are safer but limit the agent to only the queries you anticipated. Start with raw SQL plus validation, and add templates for common queries as you identify them.

How do I prevent the agent from reading sensitive tables?

Create a dedicated database user with SELECT permissions only on approved tables. This is enforced at the database level, which is more reliable than keyword filtering. Additionally, use a schema tool that only exposes the tables the agent is allowed to query.

What if the query returns thousands of rows?

Enforce a maximum LIMIT in your validation layer (e.g., cap at 100 rows regardless of what the LLM requests). Truncate results before returning them to the LLM and include a note that results were truncated. Large result sets waste context tokens and rarely improve the agent's answer.

#Database #PostgreSQL #SQLite #ToolDesign #AIAgents #AgenticAI #LearnAI #AIEngineering

Database Query Tools: Letting AI Agents Read from PostgreSQL and SQLite

The Case for Database Tools

Designing the Tool Schema

The Safety Layer: Read-Only Enforcement

PostgreSQL Implementation

SQLite Implementation

Formatting Results for the LLM

Providing Schema Context

FAQ

Should I let the LLM write raw SQL or use an abstraction layer?

How do I prevent the agent from reading sensitive tables?

What if the query returns thousands of rows?

Try CallSphere AI Voice Agents

Related Articles You May Like

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026