Building a Survey Analysis Agent: AI-Powered Qualitative Data Processing
Build an AI agent that processes survey responses at scale — categorizing open-ended answers, detecting sentiment, extracting recurring themes, and generating executive-ready reports with statistical backing.
The Qualitative Data Problem
Quantitative survey data (ratings, multiple choice) is easy to analyze — pivot tables and averages handle it well. But the richest insights hide in open-ended responses: "What would you improve about our product?" Reading and manually coding 5,000 free-text responses takes weeks. An AI survey analysis agent categorizes responses, measures sentiment, extracts themes, and generates reports in minutes.
The agent combines rule-based tools for structured data with LLM-powered tools for the qualitative analysis that makes survey data truly valuable.
Loading Survey Data
The first tool loads survey responses and separates quantitative from qualitative fields:
flowchart LR
INPUT(["User intent"])
PARSE["Parse plus<br/>classify"]
PLAN["Plan and tool<br/>selection"]
AGENT["Agent loop<br/>LLM plus tools"]
GUARD{"Guardrails<br/>and policy"}
EXEC["Execute and<br/>verify result"]
OBS[("Trace and metrics")]
OUT(["Outcome plus<br/>next action"])
INPUT --> PARSE --> PLAN --> AGENT --> GUARD
GUARD -->|Pass| EXEC --> OUT
GUARD -->|Fail| AGENT
AGENT --> OBS
style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
style OUT fill:#059669,stroke:#047857,color:#fff
import pandas as pd
import json
from agents import Agent, Runner, function_tool
_survey_data: dict = {}
@function_tool
def load_survey(file_path: str) -> str:
"""Load survey responses from a CSV file. Identifies quantitative
and qualitative (text) columns automatically."""
try:
df = pd.read_csv(file_path)
except Exception as e:
return f"Error loading survey: {e}"
numeric_cols = df.select_dtypes(include="number").columns.tolist()
text_cols = df.select_dtypes(include="object").columns.tolist()
_survey_data["df"] = df
_survey_data["numeric_cols"] = numeric_cols
_survey_data["text_cols"] = text_cols
profile = (
f"Survey loaded: {len(df)} responses\n"
f"Quantitative columns ({len(numeric_cols)}): {', '.join(numeric_cols)}\n"
f"Text columns ({len(text_cols)}): {', '.join(text_cols)}\n"
f"\nSample text responses from '{text_cols[0]}' (first 3):\n"
)
for i, val in enumerate(df[text_cols[0]].dropna().head(3)):
profile += f" {i+1}. {str(val)[:200]}\n"
return profile
Quantitative Summary Tool
Handle the easy part first — aggregate ratings, NPS scores, and numeric fields:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
@function_tool
def quantitative_summary() -> str:
"""Generate statistical summaries for all numeric survey columns."""
if "df" not in _survey_data:
return "No survey loaded."
df = _survey_data["df"]
numeric_cols = _survey_data["numeric_cols"]
if not numeric_cols:
return "No numeric columns found in survey."
lines = ["Quantitative Summary:"]
for col in numeric_cols:
series = df[col].dropna()
lines.append(
f"\n {col}:\n"
f" Mean: {series.mean():.2f}\n"
f" Median: {series.median():.2f}\n"
f" Std Dev: {series.std():.2f}\n"
f" Min: {series.min()}, Max: {series.max()}\n"
f" Response count: {len(series)}"
)
return "\n".join(lines)
Categorization Tool
This tool processes batches of open-ended responses through the LLM to assign categories:
_categorized: list[dict] = []
@function_tool
def categorize_responses(
column: str, categories: str, batch_size: int = 20
) -> str:
"""Categorize text responses into predefined categories.
Returns a summary of category distribution.
Categories should be comma-separated."""
if "df" not in _survey_data:
return "No survey loaded."
df = _survey_data["df"]
if column not in df.columns:
return f"Column '{column}' not found."
responses = df[column].dropna().tolist()
cat_list = [c.strip() for c in categories.split(",")]
# Store for the LLM to process in the agent loop
_categorized.clear()
batch = responses[:batch_size]
return (
f"Ready to categorize {len(responses)} responses into: {cat_list}\n"
f"First batch ({len(batch)} responses):\n"
+ "\n".join(f" [{i}] {r[:150]}" for i, r in enumerate(batch))
+ "\n\nAssign each response a category from the list above. "
"Return as JSON: [{index: category}, ...]"
)
Sentiment Analysis Tool
Measure the emotional tone of responses using a structured scoring approach:
@function_tool
def analyze_sentiment(column: str, sample_size: int = 50) -> str:
"""Analyze sentiment distribution across text responses.
Returns responses grouped for LLM-based sentiment scoring."""
if "df" not in _survey_data:
return "No survey loaded."
df = _survey_data["df"]
responses = df[column].dropna().tolist()
sample = responses[:sample_size]
return (
f"Analyze sentiment for {len(sample)} responses from '{column}'.\n"
f"Score each as: positive, neutral, or negative.\n\n"
+ "\n".join(f" [{i}] {r[:200]}" for i, r in enumerate(sample))
+ "\n\nReturn counts: {{positive: N, neutral: N, negative: N}} "
"and list the 3 most positive and 3 most negative verbatims."
)
Theme Extraction Tool
Beyond predefined categories, the agent should discover emergent themes:
@function_tool
def extract_themes(column: str, num_themes: int = 5) -> str:
"""Extract the top recurring themes from open-ended responses.
Provides response samples for LLM-based theme identification."""
if "df" not in _survey_data:
return "No survey loaded."
df = _survey_data["df"]
responses = df[column].dropna().tolist()
return (
f"Identify the top {num_themes} themes from {len(responses)} responses.\n"
f"For each theme provide: name, description, frequency estimate, "
f"and 2 representative quotes.\n\n"
f"Responses (showing first 30):\n"
+ "\n".join(f" [{i}] {r[:200]}" for i, r in enumerate(responses[:30]))
)
Assembling the Survey Agent
survey_agent = Agent(
name="Survey Analyst",
instructions="""You are a survey analysis agent. When given survey data:
1. Call load_survey to understand the structure.
2. Call quantitative_summary for all numeric metrics.
3. For each text column, call analyze_sentiment to gauge overall tone.
4. Call extract_themes to discover what respondents care about most.
5. If the user specifies categories, use categorize_responses.
6. Produce a final report with:
- Executive Summary (3-5 bullet points)
- Quantitative Highlights
- Sentiment Overview
- Key Themes (with supporting quotes)
- Recommendations based on the data""",
tools=[
load_survey, quantitative_summary, categorize_responses,
analyze_sentiment, extract_themes,
],
)
Running the Analysis
result = Runner.run_sync(
survey_agent,
"Analyze the file customer_feedback_q1.csv. I want to understand "
"overall satisfaction, what themes emerge from the open-ended feedback, "
"and what our top 3 priorities for improvement should be.",
)
print(result.final_output)
The agent loads the data, summarizes the 1-5 satisfaction ratings (mean: 3.7), runs sentiment analysis on the comments (62% positive, 15% negative), extracts five themes (pricing concerns, onboarding friction, feature requests for mobile, praise for support team, integration gaps), and recommends priorities based on frequency and sentiment intensity.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
FAQ
How does this handle surveys in multiple languages?
The LLM naturally processes text in many languages. For best results, add an instruction: "Detect the language of each response and analyze it in that language, then translate theme names and quotes to English for the report." This handles multilingual surveys without pre-translation.
Can the agent process thousands of responses without hitting token limits?
Process responses in batches. The categorization and sentiment tools shown above use a batch_size parameter. The agent processes each batch, accumulates results in tool state, and synthesizes at the end. For very large surveys (10,000+ responses), pre-filter with keyword matching before LLM analysis.
How do I validate the accuracy of AI-generated categories?
Run a calibration step: manually code 50-100 responses and compare them against the agent's categorization. Calculate inter-rater agreement (Cohen's kappa). If agreement is above 0.7, the agent is reliable for the remaining responses.
#SurveyAnalysis #SentimentAnalysis #QualitativeData #NLP #AIAgents #AgenticAI #LearnAI #AIEngineering
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.