Skip to content
AI Engineering
AI Engineering10 min read0 views

Tool Description Templates LLMs Reliably Call (2026)

A bad tool description costs you 30%+ in argument accuracy. We share the contract-style template — purpose, when-to-use, when-NOT-to-use, parameter constraints, two examples — that lifts CallSphere's Healthcare 14-tool stack to 96% function-arg accuracy across GPT-4o, Claude, and Gemini.

TL;DR — Research from Gorilla and ToolAlpaca shows precise tool descriptions improve parameter accuracy 30%+ versus terse ones. The fix: write each description as a contract — purpose, when to use, when NOT to use, parameter formats with examples, expected return shape — and never reuse a description across two tools.

The technique

A reliable tool description has six parts:

  1. One-line purpose — verb-first, end-state oriented. "Books a clinical appointment for an existing patient."
  2. When to use — explicit triggers. "Call when the user confirms a date AND a provider."
  3. When NOT to use — kills cross-tool confusion. "Do NOT call for new-patient registration; use create_patient first."
  4. Parameter contract — type, format, regex, example. "phone: E.164 string, e.g. +14155551234".
  5. Return shape — short JSON sketch.
  6. Two examples — one happy path, one disambiguation case.

Best-in-class tool descriptions read like API docs written for a junior engineer who must pick the right call from 14 candidates inside 200ms.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Why it works

LLMs choose tools by computing similarity between the user turn and each tool's description. When two descriptions both contain "creates an appointment" and only diverge in fine print, the model's softmax flattens and you get the wrong tool 10–20% of the time. Adding a when NOT to use clause forces the descriptions apart in embedding space and gives the model a hard negative signal.

Modern LLMs in 2026 (GPT-4o, Claude 4.x, Gemini 2.5) are explicitly trained to parse description fields with structured headings. They reward you for treating the description like a contract and they punish ambiguity.

flowchart TD
  USER[User turn] --> EMB[Embed turn]
  EMB --> SIM{Similarity to each tool desc}
  SIM -->|tight desc| PICK[Correct tool 96%]
  SIM -->|vague desc| WRONG[Wrong tool 18%]
  PICK --> ARGS[Fill args from contract]
  ARGS --> CALL[Tool call]

CallSphere implementation

CallSphere's Healthcare 14-tool system prompt uses this exact template per tool. The OneRoof real-estate vertical's Triage Aria routes across 10 specialist agents by reading their routing_hint field — the same description pattern. Salon's narrow 6-tool stack uses two-shot examples per description because the call space is small enough to demonstrate every common path. We measure tool accuracy nightly across all 37 agents, 90+ tools, 115+ DB tables, 6 verticals.

Pricing: Starter $149, Growth $499, Scale $1,499. 14-day trial + 22% affiliate. Build your own tool stack via the Build Your Agent flow.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Build steps with prompt code

{
  "name": "book_appointment",
  "description": "Books a clinical appointment for an EXISTING patient.\n\nWHEN TO USE: caller has confirmed (1) a provider name from list_providers, (2) an ISO-8601 start_time, (3) an existing patient_id from lookup_patient.\n\nWHEN NOT TO USE:\n- Caller has not been verified — call lookup_patient first.\n- Provider unknown — call list_providers first.\n- Time is vague (\"sometime next week\") — clarify with caller first.\n\nEXAMPLES:\n1) Happy: book_appointment(patient_id='p_91', provider='dr_patel', start_time='2026-05-08T15:00-04:00')\n2) Reschedule: cancel_appointment then book_appointment.",
  "parameters": {
    "type": "object",
    "additionalProperties": false,
    "required": ["patient_id","provider","start_time"],
    "properties": {
      "patient_id": {"type":"string","pattern":"^p_[a-z0-9]{2,}$","description":"From lookup_patient"},
      "provider":   {"type":"string","description":"Slug from list_providers, e.g. dr_patel"},
      "start_time": {"type":"string","format":"date-time","description":"ISO-8601 with TZ offset, never naive"}
    }
  }
}

FAQ

Q: How long should a description be? 80–250 tokens. Below 50 you lose disambiguation; above 300 you crowd context.

Q: Should I include error cases? Yes — one line. "Returns 409 if the slot is taken; agent should call list_open_slots."

Q: Does this work for Claude and Gemini too? Yes. The contract style is provider-agnostic. Claude additionally benefits from wrapping the when-NOT section in <not_for> tags (see post 7).

Q: What about strict-mode JSON schemas? Use them — see post 3. Strict mode enforces the contract at the API layer.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like