Skip to content
Learn Agentic AI
Learn Agentic AI13 min read8 views

Building a Survey Analysis Agent: AI-Powered Qualitative Data Processing

Build an AI agent that processes survey responses at scale — categorizing open-ended answers, detecting sentiment, extracting recurring themes, and generating executive-ready reports with statistical backing.

The Qualitative Data Problem

Quantitative survey data (ratings, multiple choice) is easy to analyze — pivot tables and averages handle it well. But the richest insights hide in open-ended responses: "What would you improve about our product?" Reading and manually coding 5,000 free-text responses takes weeks. An AI survey analysis agent categorizes responses, measures sentiment, extracts themes, and generates reports in minutes.

The agent combines rule-based tools for structured data with LLM-powered tools for the qualitative analysis that makes survey data truly valuable.

Loading Survey Data

The first tool loads survey responses and separates quantitative from qualitative fields:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
import pandas as pd
import json
from agents import Agent, Runner, function_tool

_survey_data: dict = {}

@function_tool
def load_survey(file_path: str) -> str:
    """Load survey responses from a CSV file. Identifies quantitative
    and qualitative (text) columns automatically."""
    try:
        df = pd.read_csv(file_path)
    except Exception as e:
        return f"Error loading survey: {e}"

    numeric_cols = df.select_dtypes(include="number").columns.tolist()
    text_cols = df.select_dtypes(include="object").columns.tolist()

    _survey_data["df"] = df
    _survey_data["numeric_cols"] = numeric_cols
    _survey_data["text_cols"] = text_cols

    profile = (
        f"Survey loaded: {len(df)} responses\n"
        f"Quantitative columns ({len(numeric_cols)}): {', '.join(numeric_cols)}\n"
        f"Text columns ({len(text_cols)}): {', '.join(text_cols)}\n"
        f"\nSample text responses from '{text_cols[0]}' (first 3):\n"
    )
    for i, val in enumerate(df[text_cols[0]].dropna().head(3)):
        profile += f"  {i+1}. {str(val)[:200]}\n"

    return profile

Quantitative Summary Tool

Handle the easy part first — aggregate ratings, NPS scores, and numeric fields:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
@function_tool
def quantitative_summary() -> str:
    """Generate statistical summaries for all numeric survey columns."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    numeric_cols = _survey_data["numeric_cols"]

    if not numeric_cols:
        return "No numeric columns found in survey."

    lines = ["Quantitative Summary:"]
    for col in numeric_cols:
        series = df[col].dropna()
        lines.append(
            f"\n  {col}:\n"
            f"    Mean: {series.mean():.2f}\n"
            f"    Median: {series.median():.2f}\n"
            f"    Std Dev: {series.std():.2f}\n"
            f"    Min: {series.min()}, Max: {series.max()}\n"
            f"    Response count: {len(series)}"
        )

    return "\n".join(lines)

Categorization Tool

This tool processes batches of open-ended responses through the LLM to assign categories:

_categorized: list[dict] = []

@function_tool
def categorize_responses(
    column: str, categories: str, batch_size: int = 20
) -> str:
    """Categorize text responses into predefined categories.
    Returns a summary of category distribution.
    Categories should be comma-separated."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    if column not in df.columns:
        return f"Column '{column}' not found."

    responses = df[column].dropna().tolist()
    cat_list = [c.strip() for c in categories.split(",")]

    # Store for the LLM to process in the agent loop
    _categorized.clear()
    batch = responses[:batch_size]

    return (
        f"Ready to categorize {len(responses)} responses into: {cat_list}\n"
        f"First batch ({len(batch)} responses):\n"
        + "\n".join(f"  [{i}] {r[:150]}" for i, r in enumerate(batch))
        + "\n\nAssign each response a category from the list above. "
        "Return as JSON: [{index: category}, ...]"
    )

Sentiment Analysis Tool

Measure the emotional tone of responses using a structured scoring approach:

@function_tool
def analyze_sentiment(column: str, sample_size: int = 50) -> str:
    """Analyze sentiment distribution across text responses.
    Returns responses grouped for LLM-based sentiment scoring."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    responses = df[column].dropna().tolist()
    sample = responses[:sample_size]

    return (
        f"Analyze sentiment for {len(sample)} responses from '{column}'.\n"
        f"Score each as: positive, neutral, or negative.\n\n"
        + "\n".join(f"  [{i}] {r[:200]}" for i, r in enumerate(sample))
        + "\n\nReturn counts: {{positive: N, neutral: N, negative: N}} "
        "and list the 3 most positive and 3 most negative verbatims."
    )

Theme Extraction Tool

Beyond predefined categories, the agent should discover emergent themes:

@function_tool
def extract_themes(column: str, num_themes: int = 5) -> str:
    """Extract the top recurring themes from open-ended responses.
    Provides response samples for LLM-based theme identification."""
    if "df" not in _survey_data:
        return "No survey loaded."

    df = _survey_data["df"]
    responses = df[column].dropna().tolist()

    return (
        f"Identify the top {num_themes} themes from {len(responses)} responses.\n"
        f"For each theme provide: name, description, frequency estimate, "
        f"and 2 representative quotes.\n\n"
        f"Responses (showing first 30):\n"
        + "\n".join(f"  [{i}] {r[:200]}" for i, r in enumerate(responses[:30]))
    )

Assembling the Survey Agent

survey_agent = Agent(
    name="Survey Analyst",
    instructions="""You are a survey analysis agent. When given survey data:
1. Call load_survey to understand the structure.
2. Call quantitative_summary for all numeric metrics.
3. For each text column, call analyze_sentiment to gauge overall tone.
4. Call extract_themes to discover what respondents care about most.
5. If the user specifies categories, use categorize_responses.
6. Produce a final report with:
   - Executive Summary (3-5 bullet points)
   - Quantitative Highlights
   - Sentiment Overview
   - Key Themes (with supporting quotes)
   - Recommendations based on the data""",
    tools=[
        load_survey, quantitative_summary, categorize_responses,
        analyze_sentiment, extract_themes,
    ],
)

Running the Analysis

result = Runner.run_sync(
    survey_agent,
    "Analyze the file customer_feedback_q1.csv. I want to understand "
    "overall satisfaction, what themes emerge from the open-ended feedback, "
    "and what our top 3 priorities for improvement should be.",
)
print(result.final_output)

The agent loads the data, summarizes the 1-5 satisfaction ratings (mean: 3.7), runs sentiment analysis on the comments (62% positive, 15% negative), extracts five themes (pricing concerns, onboarding friction, feature requests for mobile, praise for support team, integration gaps), and recommends priorities based on frequency and sentiment intensity.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

FAQ

How does this handle surveys in multiple languages?

The LLM naturally processes text in many languages. For best results, add an instruction: "Detect the language of each response and analyze it in that language, then translate theme names and quotes to English for the report." This handles multilingual surveys without pre-translation.

Can the agent process thousands of responses without hitting token limits?

Process responses in batches. The categorization and sentiment tools shown above use a batch_size parameter. The agent processes each batch, accumulates results in tool state, and synthesizes at the end. For very large surveys (10,000+ responses), pre-filter with keyword matching before LLM analysis.

How do I validate the accuracy of AI-generated categories?

Run a calibration step: manually code 50-100 responses and compare them against the agent's categorization. Calculate inter-rater agreement (Cohen's kappa). If agreement is above 0.7, the agent is reliable for the remaining responses.


#SurveyAnalysis #SentimentAnalysis #QualitativeData #NLP #AIAgents #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

AI Strategy

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Q1 2026 saw a record acquisition wave: Aircall bought Vogent (May), Meta acquired Manus and PlayAI, OpenAI closed six deals. The voice AI consolidation phase has begun.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026

The supervisor pattern in LangGraph for coordinating specialist agents, with full code, an eval pipeline that scores routing accuracy, and the failure modes to watch for.