Building a Survey Analysis Agent: AI-Powered Qualitative Data Processing

The Qualitative Data Problem

Quantitative survey data (ratings, multiple choice) is easy to analyze — pivot tables and averages handle it well. But the richest insights hide in open-ended responses: "What would you improve about our product?" Reading and manually coding 5,000 free-text responses takes weeks. An AI survey analysis agent categorizes responses, measures sentiment, extracts themes, and generates reports in minutes.

The agent combines rule-based tools for structured data with LLM-powered tools for the qualitative analysis that makes survey data truly valuable.

Loading Survey Data

The first tool loads survey responses and separates quantitative from qualitative fields:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

import pandas as pd
import json
from agents import Agent, Runner, function_tool

_survey_data: dict = {}

@function_tool
def load_survey(file_path: str) -> str:
    """Load survey responses from a CSV file. Identifies quantitative
    and qualitative (text) columns automatically."""
    try:
        df = pd.read_csv(file_path)
    except Exception as e:
        return f"Error loading survey: {e}"

    numeric_cols = df.select_dtypes(include="number").columns.tolist()
    text_cols = df.select_dtypes(include="object").columns.tolist()

    _survey_data["df"] = df
    _survey_data["numeric_cols"] = numeric_cols
    _survey_data["text_cols"] = text_cols

    profile = (
        f"Survey loaded: {len(df)} responses\n"
        f"Quantitative columns ({len(numeric_cols)}): {', '.join(numeric_cols)}\n"
        f"Text columns ({len(text_cols)}): {', '.join(text_cols)}\n"
        f"\nSample text responses from '{text_cols[0]}' (first 3):\n"
    )
    for i, val in enumerate(df[text_cols[0]].dropna().head(3)):
        profile += f"  {i+1}. {str(val)[:200]}\n"

    return profile

Quantitative Summary Tool

Handle the easy part first — aggregate ratings, NPS scores, and numeric fields:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

@function_tool
def quantitative_summary() -> str:
    """Generate statistical summaries for all numeric survey columns."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    numeric_cols = _survey_data["numeric_cols"]

    if not numeric_cols:
        return "No numeric columns found in survey."

    lines = ["Quantitative Summary:"]
    for col in numeric_cols:
        series = df[col].dropna()
        lines.append(
            f"\n  {col}:\n"
            f"    Mean: {series.mean():.2f}\n"
            f"    Median: {series.median():.2f}\n"
            f"    Std Dev: {series.std():.2f}\n"
            f"    Min: {series.min()}, Max: {series.max()}\n"
            f"    Response count: {len(series)}"
        )

    return "\n".join(lines)

Categorization Tool

This tool processes batches of open-ended responses through the LLM to assign categories:

_categorized: list[dict] = []

@function_tool
def categorize_responses(
    column: str, categories: str, batch_size: int = 20
) -> str:
    """Categorize text responses into predefined categories.
    Returns a summary of category distribution.
    Categories should be comma-separated."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    if column not in df.columns:
        return f"Column '{column}' not found."

    responses = df[column].dropna().tolist()
    cat_list = [c.strip() for c in categories.split(",")]

    # Store for the LLM to process in the agent loop
    _categorized.clear()
    batch = responses[:batch_size]

    return (
        f"Ready to categorize {len(responses)} responses into: {cat_list}\n"
        f"First batch ({len(batch)} responses):\n"
        + "\n".join(f"  [{i}] {r[:150]}" for i, r in enumerate(batch))
        + "\n\nAssign each response a category from the list above. "
        "Return as JSON: [{index: category}, ...]"
    )

Sentiment Analysis Tool

Measure the emotional tone of responses using a structured scoring approach:

@function_tool
def analyze_sentiment(column: str, sample_size: int = 50) -> str:
    """Analyze sentiment distribution across text responses.
    Returns responses grouped for LLM-based sentiment scoring."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    responses = df[column].dropna().tolist()
    sample = responses[:sample_size]

    return (
        f"Analyze sentiment for {len(sample)} responses from '{column}'.\n"
        f"Score each as: positive, neutral, or negative.\n\n"
        + "\n".join(f"  [{i}] {r[:200]}" for i, r in enumerate(sample))
        + "\n\nReturn counts: {{positive: N, neutral: N, negative: N}} "
        "and list the 3 most positive and 3 most negative verbatims."
    )

Theme Extraction Tool

Beyond predefined categories, the agent should discover emergent themes:

@function_tool
def extract_themes(column: str, num_themes: int = 5) -> str:
    """Extract the top recurring themes from open-ended responses.
    Provides response samples for LLM-based theme identification."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    responses = df[column].dropna().tolist()

    return (
        f"Identify the top {num_themes} themes from {len(responses)} responses.\n"
        f"For each theme provide: name, description, frequency estimate, "
        f"and 2 representative quotes.\n\n"
        f"Responses (showing first 30):\n"
        + "\n".join(f"  [{i}] {r[:200]}" for i, r in enumerate(responses[:30]))
    )

Assembling the Survey Agent

survey_agent = Agent(
    name="Survey Analyst",
    instructions="""You are a survey analysis agent. When given survey data:
1. Call load_survey to understand the structure.
2. Call quantitative_summary for all numeric metrics.
3. For each text column, call analyze_sentiment to gauge overall tone.
4. Call extract_themes to discover what respondents care about most.
5. If the user specifies categories, use categorize_responses.
6. Produce a final report with:
   - Executive Summary (3-5 bullet points)
   - Quantitative Highlights
   - Sentiment Overview
   - Key Themes (with supporting quotes)
   - Recommendations based on the data""",
    tools=[
        load_survey, quantitative_summary, categorize_responses,
        analyze_sentiment, extract_themes,
    ],
)

Running the Analysis

result = Runner.run_sync(
    survey_agent,
    "Analyze the file customer_feedback_q1.csv. I want to understand "
    "overall satisfaction, what themes emerge from the open-ended feedback, "
    "and what our top 3 priorities for improvement should be.",
)
print(result.final_output)

The agent loads the data, summarizes the 1-5 satisfaction ratings (mean: 3.7), runs sentiment analysis on the comments (62% positive, 15% negative), extracts five themes (pricing concerns, onboarding friction, feature requests for mobile, praise for support team, integration gaps), and recommends priorities based on frequency and sentiment intensity.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

FAQ

How does this handle surveys in multiple languages?

The LLM naturally processes text in many languages. For best results, add an instruction: "Detect the language of each response and analyze it in that language, then translate theme names and quotes to English for the report." This handles multilingual surveys without pre-translation.

Can the agent process thousands of responses without hitting token limits?

Process responses in batches. The categorization and sentiment tools shown above use a batch_size parameter. The agent processes each batch, accumulates results in tool state, and synthesizes at the end. For very large surveys (10,000+ responses), pre-filter with keyword matching before LLM analysis.

How do I validate the accuracy of AI-generated categories?

Run a calibration step: manually code 50-100 responses and compare them against the agent's categorization. Calculate inter-rater agreement (Cohen's kappa). If agreement is above 0.7, the agent is reliable for the remaining responses.

#SurveyAnalysis #SentimentAnalysis #QualitativeData #NLP #AIAgents #AgenticAI #LearnAI #AIEngineering

Building a Survey Analysis Agent: AI-Powered Qualitative Data Processing

The Qualitative Data Problem

Loading Survey Data

Quantitative Summary Tool

Categorization Tool

Sentiment Analysis Tool

Theme Extraction Tool

Assembling the Survey Agent

Running the Analysis

FAQ

How does this handle surveys in multiple languages?

Can the agent process thousands of responses without hitting token limits?

How do I validate the accuracy of AI-generated categories?

Try CallSphere AI Voice Agents

Related Articles You May Like

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026