Named Entity Recognition for AI Agents: Extracting People, Places, and Organizations

Why Agents Need Named Entity Recognition

When an AI agent receives a message like "Schedule a meeting with Sarah Chen at the Austin office next Tuesday," it must extract three distinct pieces of structured information: a person (Sarah Chen), a location (Austin office), and a date (next Tuesday). Named Entity Recognition (NER) is the NLP technique that performs this extraction automatically.

Without NER, an agent would have to rely entirely on the LLM to parse every incoming message from scratch. While LLMs are capable of entity extraction, dedicated NER pipelines are faster, cheaper, and more predictable for high-volume workloads. The best agent architectures combine both approaches — fast NER for common entities and LLM fallback for ambiguous cases.

Entity Types Every Agent Developer Should Know

The standard NER taxonomy covers these categories:

flowchart LR
    CALLER(["Client"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["Salon AI Agent"]
        STT["Streaming STT<br/>Deepgram or Whisper"]
        NLU{"Intent and<br/>Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS<br/>ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and<br/>Schedule")]
        KB[("Knowledge Base<br/>and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Appointment booked"])
        O2(["Reschedule completed"])
        O3(["Stylist handoff"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS <--> CRM
    TOOLS <--> CAL
    TOOLS <--> KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937

PERSON — individual names (Sarah Chen, Dr. Patel)
ORG — companies, agencies, institutions (Acme Corp, FDA)
GPE — geopolitical entities like countries, cities, states (Austin, France)
DATE — absolute or relative dates (March 15, next Tuesday)
MONEY — monetary values ($500, 12.5 million euros)
PRODUCT — named products (iPhone 16, Tesla Model Y)

Most pre-trained NER models handle these out of the box. Custom entities — like internal project names, medical codes, or proprietary terms — require additional training.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

NER with spaCy: The Fast Path

spaCy provides production-grade NER that runs in milliseconds per document. Here is a complete extraction pipeline.

import spacy

nlp = spacy.load("en_core_web_trf")  # Transformer-based model

def extract_entities(text: str) -> dict[str, list[str]]:
    """Extract named entities grouped by type."""
    doc = nlp(text)
    entities: dict[str, list[str]] = {}

    for ent in doc.ents:
        if ent.label_ not in entities:
            entities[ent.label_] = []
        if ent.text not in entities[ent.label_]:
            entities[ent.label_].append(ent.text)

    return entities

message = "Tell John at Microsoft to review the Q3 report by March 20th."
result = extract_entities(message)
# {'PERSON': ['John'], 'ORG': ['Microsoft'], 'DATE': ['Q3', 'March 20th']}

Training Custom Entities

When your agent operates in a specialized domain, you need custom entity types. Here is how to train spaCy to recognize a custom PRODUCT_CODE entity.

import spacy
from spacy.training import Example
import random

nlp = spacy.blank("en")
ner = nlp.add_pipe("ner")
ner.add_label("PRODUCT_CODE")

train_data = [
    ("Order SKU-4829 immediately", {"entities": [(6, 14, "PRODUCT_CODE")]}),
    ("Check stock for SKU-1100", {"entities": [(16, 24, "PRODUCT_CODE")]}),
    ("We need SKU-7753 and SKU-2201", {
        "entities": [(8, 16, "PRODUCT_CODE"), (21, 29, "PRODUCT_CODE")]
    }),
]

optimizer = nlp.begin_training()
for epoch in range(30):
    random.shuffle(train_data)
    for text, annotations in train_data:
        example = Example.from_dict(nlp.make_doc(text), annotations)
        nlp.update([example], sgd=optimizer)

doc = nlp("Ship SKU-9981 to warehouse B")
for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")
# SKU-9981 -> PRODUCT_CODE

LLM-Based NER for Complex Cases

For ambiguous text or zero-shot entity types, LLMs provide flexible extraction without training data.

import openai

def llm_extract_entities(text: str, entity_types: list[str]) -> dict:
    """Use an LLM for zero-shot entity extraction."""
    prompt = f"""Extract the following entity types from the text.
Return JSON only. Entity types: {', '.join(entity_types)}

Text: {text}"""

    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"},
        temperature=0,
    )
    return response.choices[0].message.content

result = llm_extract_entities(
    "Dr. Amara Osei from Nairobi General prescribed amoxicillin 500mg",
    ["PERSON", "FACILITY", "MEDICATION", "DOSAGE"]
)

Integrating NER into an Agent Pipeline

The most effective pattern is a two-tier approach: fast spaCy extraction first, with LLM fallback for unrecognized patterns.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

class NERProcessor:
    def __init__(self):
        self.nlp = spacy.load("en_core_web_sm")
        self.confidence_threshold = 0.85

    def process(self, text: str) -> dict:
        doc = self.nlp(text)
        entities = {}
        low_confidence = []

        for ent in doc.ents:
            if ent.kb_id_ and float(ent.kb_id_) < self.confidence_threshold:
                low_confidence.append(ent.text)
            else:
                entities.setdefault(ent.label_, []).append(ent.text)

        return {
            "entities": entities,
            "needs_llm_review": low_confidence,
        }

FAQ

When should I use spaCy NER versus LLM-based extraction?

Use spaCy for high-throughput scenarios where you need consistent, fast extraction of standard entity types. Use LLM-based extraction when you need zero-shot recognition of novel entity types, when the text is highly ambiguous, or when you cannot invest in training data for a custom model.

How do I handle entities that span multiple tokens or contain special characters?

spaCy handles multi-token entities natively through its span-based architecture. During training, define entity boundaries using character offsets that encompass the full span. For special characters like hyphens or periods in entity names, ensure your tokenizer does not split them incorrectly by adding custom tokenization rules.

Can I combine multiple NER models in a single agent pipeline?

Yes. A common pattern is to run a general-purpose model for standard entities (PERSON, ORG, GPE) and a domain-specific model for specialized entities (MEDICATION, LEGAL_CLAUSE). Merge the results and use a deduplication step to handle overlapping spans, keeping the prediction with higher confidence.

#NER #NLP #SpaCy #EntityExtraction #Python #AIAgents #AgenticAI #LearnAI #AIEngineering

Named Entity Recognition for AI Agents: Extracting People, Places, and Organizations

Why Agents Need Named Entity Recognition

Entity Types Every Agent Developer Should Know

NER with spaCy: The Fast Path

Training Custom Entities

LLM-Based NER for Complex Cases

Integrating NER into an Agent Pipeline

FAQ

When should I use spaCy NER versus LLM-based extraction?

How do I handle entities that span multiple tokens or contain special characters?

Can I combine multiple NER models in a single agent pipeline?

Try CallSphere AI Voice Agents

Related Articles You May Like

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026