Skip to content
Learn Agentic AI
Learn Agentic AI14 min read5 views

Real-Time Agent Dashboards with Grafana: Visualizing Performance and Health Metrics

Learn how to set up Grafana dashboards for AI agent monitoring, configure data sources, design effective panels for latency, throughput, and error rates, and create alert rules that catch problems before users notice.

Why Grafana for Agent Monitoring

Grafana is the standard for operational dashboards because it connects to virtually any data source, renders time-series data beautifully, and provides a robust alerting engine. For AI agents, you need to visualize metrics that span multiple layers: API latency, token throughput, error rates, conversation volume, and model performance — often from different backends.

A single Grafana dashboard can pull from Prometheus for infrastructure metrics, PostgreSQL for business metrics, and Loki for log-based insights, presenting a unified view of agent health.

Exporting Agent Metrics to Prometheus

The first step is instrumenting your agent code to export metrics in a format Grafana can consume. Prometheus is the most common metrics backend. Use the prometheus-client library to expose counters, histograms, and gauges.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart LR
    APP(["Agent or API"])
    SDK["OTel SDK<br/>GenAI conventions"]
    COL["OTel Collector"]
    subgraph BACKENDS["Backends"]
        TR[("Traces<br/>Tempo or Honeycomb")]
        MET[("Metrics<br/>Prometheus")]
        LOG[("Logs<br/>Loki or ELK")]
    end
    DASH["Grafana plus alerts"]
    PAGE(["Pager"])
    APP --> SDK --> COL
    COL --> TR
    COL --> MET
    COL --> LOG
    TR --> DASH
    MET --> DASH
    LOG --> DASH
    DASH --> PAGE
    style SDK fill:#4f46e5,stroke:#4338ca,color:#fff
    style DASH fill:#f59e0b,stroke:#d97706,color:#1f2937
    style PAGE fill:#dc2626,stroke:#b91c1c,color:#fff
from prometheus_client import (
    Counter, Histogram, Gauge, start_http_server
)

# Define metrics
CONVERSATION_TOTAL = Counter(
    "agent_conversations_total",
    "Total conversations started",
    ["agent_name"],
)

MESSAGE_LATENCY = Histogram(
    "agent_message_latency_seconds",
    "Time to generate agent response",
    ["agent_name", "model"],
    buckets=[0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
)

TOKEN_USAGE = Counter(
    "agent_tokens_total",
    "Total tokens consumed",
    ["agent_name", "model", "token_type"],
)

ACTIVE_CONVERSATIONS = Gauge(
    "agent_active_conversations",
    "Currently active conversations",
    ["agent_name"],
)

ERROR_TOTAL = Counter(
    "agent_errors_total",
    "Total errors encountered",
    ["agent_name", "error_type"],
)

# Start metrics server on port 8090
start_http_server(8090)

Instrumenting the Agent Loop

Wrap your agent's message handling with metric recording. The key is to capture timing, token counts, and outcomes at every step.

import time

class InstrumentedAgent:
    def __init__(self, name: str, model: str = "gpt-4o"):
        self.name = name
        self.model = model

    async def handle_message(
        self, conversation_id: str, user_message: str
    ) -> str:
        ACTIVE_CONVERSATIONS.labels(agent_name=self.name).inc()
        start_time = time.time()
        try:
            response = await self._generate_response(user_message)
            latency = time.time() - start_time
            MESSAGE_LATENCY.labels(
                agent_name=self.name, model=self.model
            ).observe(latency)
            TOKEN_USAGE.labels(
                agent_name=self.name,
                model=self.model,
                token_type="prompt",
            ).inc(response["prompt_tokens"])
            TOKEN_USAGE.labels(
                agent_name=self.name,
                model=self.model,
                token_type="completion",
            ).inc(response["completion_tokens"])
            return response["content"]
        except Exception as exc:
            ERROR_TOTAL.labels(
                agent_name=self.name,
                error_type=type(exc).__name__,
            ).inc()
            raise
        finally:
            ACTIVE_CONVERSATIONS.labels(agent_name=self.name).dec()

Grafana Data Source Configuration

Configure Prometheus as a data source in Grafana. If you also want to query business metrics from PostgreSQL, add it as a second data source.

# grafana_provisioning.py — generate provisioning YAML
import yaml

datasources = {
    "apiVersion": 1,
    "datasources": [
        {
            "name": "Prometheus",
            "type": "prometheus",
            "url": "http://prometheus:9090",
            "access": "proxy",
            "isDefault": True,
        },
        {
            "name": "PostgreSQL",
            "type": "postgres",
            "url": "postgres-host:5432",
            "database": "agent_analytics",
            "user": "grafana_reader",
            "jsonData": {"sslmode": "require"},
            "secureJsonData": {"password": "${GRAFANA_PG_PASSWORD}"},
        },
    ],
}

with open("/etc/grafana/provisioning/datasources/agents.yaml", "w") as f:
    yaml.dump(datasources, f)

Dashboard Panel Design

An effective agent dashboard has four sections: overview, performance, errors, and cost. Each section contains panels that answer specific operational questions.

# Dashboard JSON model generator
def create_agent_dashboard() -> dict:
    return {
        "dashboard": {
            "title": "AI Agent Operations",
            "panels": [
                {
                    "title": "Conversations per Minute",
                    "type": "timeseries",
                    "targets": [{
                        "expr": "rate(agent_conversations_total[5m]) * 60",
                        "legendFormat": "{{agent_name}}",
                    }],
                    "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
                },
                {
                    "title": "P95 Response Latency",
                    "type": "timeseries",
                    "targets": [{
                        "expr": (
                            "histogram_quantile(0.95, "
                            "rate(agent_message_latency_seconds_bucket[5m]))"
                        ),
                        "legendFormat": "{{agent_name}}",
                    }],
                    "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
                },
                {
                    "title": "Error Rate",
                    "type": "stat",
                    "targets": [{
                        "expr": (
                            "rate(agent_errors_total[5m]) / "
                            "rate(agent_conversations_total[5m]) * 100"
                        ),
                    }],
                    "gridPos": {"h": 4, "w": 6, "x": 0, "y": 8},
                },
                {
                    "title": "Active Conversations",
                    "type": "gauge",
                    "targets": [{
                        "expr": "agent_active_conversations",
                    }],
                    "gridPos": {"h": 4, "w": 6, "x": 6, "y": 8},
                },
            ],
        },
    }

Alert Rules

Dashboards are useless if nobody is looking at them. Alerts bridge the gap by notifying the team when metrics cross critical thresholds.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

def create_alert_rules() -> list[dict]:
    return [
        {
            "name": "High Agent Latency",
            "condition": (
                "histogram_quantile(0.95, "
                "rate(agent_message_latency_seconds_bucket[5m])) > 5"
            ),
            "for": "5m",
            "severity": "warning",
            "message": "Agent P95 latency exceeds 5 seconds",
        },
        {
            "name": "Elevated Error Rate",
            "condition": (
                "rate(agent_errors_total[5m]) / "
                "rate(agent_conversations_total[5m]) > 0.05"
            ),
            "for": "3m",
            "severity": "critical",
            "message": "Agent error rate exceeds 5%",
        },
        {
            "name": "Token Budget Exceeded",
            "condition": (
                "increase(agent_tokens_total[1h]) > 1000000"
            ),
            "for": "0m",
            "severity": "warning",
            "message": "Agent consumed over 1M tokens in the past hour",
        },
    ]

FAQ

Should I use Prometheus or push metrics directly to Grafana Cloud?

Prometheus works best if you already run Kubernetes or have infrastructure for scraping. For simpler setups, Grafana Cloud with the OpenTelemetry Collector lets you push metrics directly without managing Prometheus. The dashboards and PromQL queries work the same either way.

How long should I retain high-resolution metrics?

Keep 15-second resolution data for 7 days, 1-minute aggregations for 30 days, and 5-minute aggregations for 1 year. This balances storage costs with the ability to investigate recent incidents in detail and spot long-term trends. Configure Prometheus retention rules or use Thanos for long-term storage.

What is the most important single panel for an agent dashboard?

The error rate panel. Token usage and latency are important for optimization, but errors directly impact user experience. A spike in errors means users are getting failed responses. Display error rate as a percentage with a threshold line at your SLA target (typically 1-2%) and configure an alert when it exceeds that threshold for more than 3 minutes.


#Grafana #Monitoring #Dashboards #Observability #AIAgents #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.