Skip to content
Learn Agentic AI
Learn Agentic AI11 min read6 views

CrewAI Callbacks and Event Hooks: Monitoring Agent Progress in Real Time

Implement step callbacks, task callbacks, and custom event handlers in CrewAI to monitor agent reasoning in real time, log progress, and build observable multi-agent systems.

Why Observability Matters in Multi-Agent Systems

When a single LLM call produces unexpected output, you read the prompt and response. When a crew of five agents runs for three minutes and produces a poor result, debugging is exponentially harder. Which agent went off track? At which step? Did a tool return bad data? Did an agent misinterpret context from a previous task?

CrewAI's callback system solves this by giving you hooks into every step of agent execution. You can log progress, track costs, save intermediate results, send notifications, or halt execution — all without modifying your agent or task definitions.

Task Callbacks

The simplest callback is at the task level. It fires when a task completes and receives the task output:

flowchart TD
    GOAL(["Crew goal"])
    MGR["Manager agent<br/>hierarchical process"]
    R1["Researcher agent<br/>role plus backstory"]
    R2["Analyst agent"]
    W1["Writer agent"]
    T1["Task A<br/>research"]
    T2["Task B<br/>analyze"]
    T3["Task C<br/>draft"]
    TOOLS[("Tools<br/>web search, files")]
    OUT(["Crew output"])
    GOAL --> MGR
    MGR --> T1 --> R1 --> TOOLS
    R1 --> T2 --> R2
    R2 --> T3 --> W1 --> OUT
    style MGR fill:#4f46e5,stroke:#4338ca,color:#fff
    style TOOLS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from crewai import Agent, Task, Crew, Process
import json
from datetime import datetime

def on_task_complete(output):
    log_entry = {
        "timestamp": datetime.now().isoformat(),
        "description": output.description[:80],
        "output_length": len(output.raw),
        "output_preview": output.raw[:200],
    }
    print(f"[TASK DONE] {json.dumps(log_entry, indent=2)}")

researcher = Agent(
    role="Researcher",
    goal="Find accurate data",
    backstory="Expert researcher.",
)

task = Task(
    description="Research the top 5 AI startups funded in 2026.",
    expected_output="A numbered list with company name, funding amount, and focus area.",
    agent=researcher,
    callback=on_task_complete,
)

The callback receives a TaskOutput object with properties including raw (the string output), description (the task description), and agent (the agent that executed it). This is your primary tool for logging what each task produced.

Step Callbacks

Step callbacks fire at each reasoning step within an agent's execution loop. They provide granular visibility into the agent's thought process, tool calls, and intermediate outputs:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
from crewai import Agent

def on_agent_step(step_output):
    print(f"[STEP] Agent: {step_output.agent}")
    print(f"[STEP] Action: {step_output.action}")
    if step_output.tool:
        print(f"[STEP] Tool used: {step_output.tool}")
        print(f"[STEP] Tool input: {step_output.tool_input}")
    print(f"[STEP] Output: {step_output.result[:150]}...")
    print("---")

researcher = Agent(
    role="Researcher",
    goal="Find accurate data using web search",
    backstory="Expert online researcher.",
    step_callback=on_agent_step,
    verbose=True,
)

Step callbacks let you see exactly what the agent is thinking at each iteration. When an agent makes a bad tool call or misinterprets data, the step callback captures the exact moment things went wrong.

Building a Structured Logger

For production systems, combine callbacks with a structured logging system:

import logging
import json
from datetime import datetime

logging.basicConfig(
    filename="crew_execution.log",
    level=logging.INFO,
    format="%(message)s",
)

class CrewLogger:
    def __init__(self, crew_name: str):
        self.crew_name = crew_name
        self.start_time = None
        self.task_count = 0

    def on_task_start(self):
        self.task_count += 1

    def on_task_complete(self, output):
        entry = {
            "crew": self.crew_name,
            "event": "task_complete",
            "task_number": self.task_count,
            "timestamp": datetime.now().isoformat(),
            "description": output.description[:100],
            "output_chars": len(output.raw),
        }
        logging.info(json.dumps(entry))

    def on_step(self, step_output):
        entry = {
            "crew": self.crew_name,
            "event": "agent_step",
            "task_number": self.task_count,
            "timestamp": datetime.now().isoformat(),
            "action": str(step_output.action)[:100],
        }
        logging.info(json.dumps(entry))

logger = CrewLogger("market_research")

Use the logger with your agents and tasks:

researcher = Agent(
    role="Researcher",
    goal="Find data",
    backstory="Expert researcher.",
    step_callback=logger.on_step,
)

task = Task(
    description="Research AI market trends.",
    expected_output="A summary of 5 trends.",
    agent=researcher,
    callback=logger.on_task_complete,
)

This produces a structured log file that can be ingested by any log aggregation system — ELK, Datadog, CloudWatch, or a simple script that parses JSON lines.

Cost Tracking with Callbacks

One of the most practical uses of callbacks is tracking LLM token usage and cost:

class CostTracker:
    def __init__(self):
        self.total_steps = 0
        self.tool_calls = 0
        self.tasks_completed = 0

    def on_step(self, step_output):
        self.total_steps += 1
        if step_output.tool:
            self.tool_calls += 1

    def on_task_complete(self, output):
        self.tasks_completed += 1

    def summary(self):
        return {
            "total_steps": self.total_steps,
            "tool_calls": self.tool_calls,
            "tasks_completed": self.tasks_completed,
            "avg_steps_per_task": (
                self.total_steps / self.tasks_completed
                if self.tasks_completed > 0
                else 0
            ),
        }

tracker = CostTracker()

After a crew run, call tracker.summary() to understand how much work each execution required. Track this over time to identify optimization opportunities.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Halting Execution from Callbacks

While CrewAI does not natively support halting execution from a callback, you can raise an exception to stop a run:

class SafetyGuard:
    def __init__(self, max_steps: int = 50):
        self.max_steps = max_steps
        self.step_count = 0

    def on_step(self, step_output):
        self.step_count += 1
        if self.step_count > self.max_steps:
            raise RuntimeError(
                f"Safety limit reached: {self.max_steps} steps exceeded. "
                "Agent may be in a loop."
            )

This prevents runaway agents from consuming unlimited tokens. Set the threshold based on your expected task complexity.

FAQ

Can I use async callbacks?

CrewAI's callback system currently expects synchronous functions. If you need to perform async operations (like writing to an async database), use a synchronous wrapper that schedules the async work or writes to a queue that an async consumer processes.

Do callbacks affect agent performance?

Callbacks add negligible overhead — they run between LLM calls, not during them. The LLM inference time dominates execution. A callback that takes 10 milliseconds is invisible when each LLM call takes 1 to 3 seconds.

Can I attach multiple callbacks to the same agent?

Not directly. The step_callback parameter accepts a single function. To run multiple handlers, create a dispatcher function that calls all your handlers sequentially within a single callback.


#CrewAI #Callbacks #Observability #Monitoring #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Infrastructure

Monitoring WebSocket Health: Heartbeats and Prometheus in 2026

How to actually observe a WebSocket fleet: ping/pong heartbeats, Prometheus metrics that matter, dead-man switches, and the alerts that fire before customers notice.

Agentic AI

The Agent Evaluation Stack in 2026: From Trace to Eval Score

How the modern agent eval stack actually flows: instrument, trace, dataset, evaluator, score, CI gate. The full pipeline that keeps agents from regressing.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

AI Voice Agents

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.

AI Engineering

Arize Phoenix: Open-Source LLM Tracing in 2026 Reviewed Honestly

Arize Phoenix is the open-source LLM observability tool that grew up significantly in 2026. Tracing, evals, and the OTel-native approach that makes Phoenix portable.

Agentic AI

CrewAI for E-Commerce Merchandising Agents: A Real DTC Build

A Miami DTC brand uses CrewAI Studio to run a daily merchandising agent crew that updates collections, hero copy, and email sends — and the lift it produced.