Skip to content
Learn Agentic AI
Learn Agentic AI10 min read2 views

Text Classification for Agent Routing: Intent Detection and Topic Categorization

Build robust text classification systems for AI agent routing using intent detection, multi-label classification, and zero-shot approaches with practical Python implementations.

Why Classification Is the Agent's First Decision

Every message that reaches a multi-agent system must be routed to the right handler. A billing question should go to the billing agent. A technical issue should go to the support agent. A sales inquiry should go to the sales agent. Text classification is the mechanism that makes this routing fast and accurate.

Intent detection is a specialized form of text classification where the classes represent user goals: "check_balance," "reset_password," "schedule_appointment." Topic categorization groups messages by subject matter: "billing," "technical," "general." Both are essential for agents that handle diverse user requests.

Building an Intent Classifier with scikit-learn

For agents with a known set of intents and training data, a traditional ML classifier is fast and reliable.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import joblib

training_data = [
    ("What is my account balance?", "check_balance"),
    ("How much do I owe?", "check_balance"),
    ("I forgot my password", "reset_password"),
    ("Can't log in to my account", "reset_password"),
    ("I want to cancel my subscription", "cancel_subscription"),
    ("How do I unsubscribe?", "cancel_subscription"),
    ("Can I speak to a manager?", "escalate"),
    ("I need to talk to a real person", "escalate"),
]

texts, labels = zip(*training_data)

classifier = Pipeline([
    ("tfidf", TfidfVectorizer(ngram_range=(1, 2), max_features=5000)),
    ("clf", LogisticRegression(max_iter=1000)),
])

classifier.fit(texts, labels)

def classify_intent(text: str) -> dict:
    prediction = classifier.predict([text])[0]
    probabilities = classifier.predict_proba([text])[0]
    confidence = max(probabilities)
    return {"intent": prediction, "confidence": round(confidence, 3)}

print(classify_intent("I need to check how much money is in my account"))
# {'intent': 'check_balance', 'confidence': 0.82}

Zero-Shot Classification: No Training Data Required

When you cannot collect labeled training data — or when new intents appear frequently — zero-shot classification lets you define categories at inference time.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
from transformers import pipeline

zero_shot = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
)

def classify_zero_shot(
    text: str,
    candidate_labels: list[str],
    multi_label: bool = False,
) -> dict:
    result = zero_shot(
        text,
        candidate_labels=candidate_labels,
        multi_label=multi_label,
    )
    return {
        label: round(score, 3)
        for label, score in zip(result["labels"], result["scores"])
    }

labels = ["billing", "technical_support", "sales", "general_inquiry"]
message = "My invoice shows a charge I didn't authorize"

print(classify_zero_shot(message, labels))
# {'billing': 0.891, 'general_inquiry': 0.054,
#  'technical_support': 0.033, 'sales': 0.022}

Zero-shot models use natural language inference under the hood. They evaluate whether the text "entails" each candidate label. This means your label names matter — "billing_dispute" will perform differently than "billing" for the same input.

Multi-Label Classification

Users often express multiple intents in a single message: "I need to update my address and also check when my next payment is due." Multi-label classification assigns multiple categories simultaneously.

def classify_multi_intent(text: str, labels: list[str]) -> list[dict]:
    result = zero_shot(
        text,
        candidate_labels=labels,
        multi_label=True,
    )
    return [
        {"label": label, "score": round(score, 3)}
        for label, score in zip(result["labels"], result["scores"])
        if score > 0.5
    ]

labels = ["update_info", "check_payment", "billing", "technical"]
message = "Please change my address and tell me when my next bill is due"

print(classify_multi_intent(message, labels))
# [{'label': 'update_info', 'score': 0.92},
#  {'label': 'check_payment', 'score': 0.87},
#  {'label': 'billing', 'score': 0.61}]

Confidence-Based Routing

Raw classification output needs a decision layer. Here is a routing pattern that handles low-confidence predictions gracefully.

from dataclasses import dataclass
from typing import Optional

@dataclass
class RouteDecision:
    target_agent: str
    confidence: float
    fallback_used: bool

class IntentRouter:
    def __init__(self, classifier, threshold: float = 0.7):
        self.classifier = classifier
        self.threshold = threshold
        self.agent_map = {
            "check_balance": "billing_agent",
            "reset_password": "auth_agent",
            "cancel_subscription": "retention_agent",
            "escalate": "human_agent",
        }

    def route(self, message: str) -> RouteDecision:
        result = self.classifier(message)
        intent = result["intent"]
        confidence = result["confidence"]

        if confidence >= self.threshold and intent in self.agent_map:
            return RouteDecision(
                target_agent=self.agent_map[intent],
                confidence=confidence,
                fallback_used=False,
            )

        return RouteDecision(
            target_agent="general_agent",
            confidence=confidence,
            fallback_used=True,
        )

The threshold is critical. Set it too low and misrouted messages frustrate users. Set it too high and too many messages fall through to the general agent. Start at 0.7 and adjust based on production metrics.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Combining Classification Approaches

Production agents often layer multiple classifiers. A fast keyword-based filter catches obvious cases, a traditional ML model handles most traffic, and a zero-shot model handles the long tail.

class HybridClassifier:
    def __init__(self, keyword_rules, ml_model, zero_shot_model):
        self.keyword_rules = keyword_rules
        self.ml_model = ml_model
        self.zero_shot_model = zero_shot_model

    def classify(self, text: str, labels: list[str]) -> dict:
        # Tier 1: keyword match (microseconds)
        for pattern, label in self.keyword_rules.items():
            if pattern.lower() in text.lower():
                return {"label": label, "confidence": 1.0, "tier": "keyword"}

        # Tier 2: ML model (milliseconds)
        ml_result = self.ml_model(text)
        if ml_result["confidence"] > 0.8:
            return {**ml_result, "tier": "ml"}

        # Tier 3: zero-shot (slower but flexible)
        zs_result = self.zero_shot_model(text, labels)
        top_label = max(zs_result, key=zs_result.get)
        return {
            "label": top_label,
            "confidence": zs_result[top_label],
            "tier": "zero_shot",
        }

FAQ

How many training examples do I need per intent for a reliable classifier?

For traditional ML classifiers like logistic regression with TF-IDF, aim for at least 50 to 100 examples per intent. For fine-tuned transformer models, 20 to 50 examples per class can be sufficient. If you have fewer than 20 examples, use zero-shot classification instead and gradually collect labeled data from production traffic to build a training set.

How do I handle intents that overlap semantically?

Overlapping intents are a design problem, not a model problem. If "check_balance" and "billing_inquiry" are hard to distinguish, consider merging them into a single intent and using a second-stage classifier or rule-based logic to differentiate subtypes. Alternatively, use multi-label classification and let the downstream agent handle both intents.

What is the best way to add new intents without retraining from scratch?

Use a zero-shot model as your primary classifier and maintain a mapping from zero-shot labels to agent routes. Adding a new intent is as simple as adding a new label string. Once you collect enough production examples for the new intent, fine-tune your ML model and redeploy. This gives you immediate coverage with zero-shot and improved accuracy over time with supervised learning.


#TextClassification #IntentDetection #NLP #ZeroShot #AIAgents #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Strategy

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Q1 2026 saw a record acquisition wave: Aircall bought Vogent (May), Meta acquired Manus and PlayAI, OpenAI closed six deals. The voice AI consolidation phase has begun.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

Handoffs done right — when one agent should hand control to another, how to preserve context, and how to evaluate the handoff decision itself.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026

The supervisor pattern in LangGraph for coordinating specialist agents, with full code, an eval pipeline that scores routing accuracy, and the failure modes to watch for.