← All docs
Observability

Metrics & Observability

Kurral captures latency, cost, token usage, and reliability metrics for every LLM call across all your agents. Use the dashboard to monitor performance, identify regressions, and control spending.

Kurral captures latency, cost, token usage, and reliability metrics for every LLM call across all your agents. Use the dashboard to monitor performance, identify regressions, and control spending.


What Gets Measured

Every LLM call through the Kurral proxy records:

MetricDescription
LatencyTotal request time (ms)
TTFTTime to first token for streaming calls (ms)
Input tokensTokens in the request
Output tokensTokens in the response
CostCalculated from provider-specific pricing per model
ModelWhich model was used
StatusSuccess, error, timeout

Dashboard Views

Overview Metrics

Top-level cards showing:

  • Total sessions — across all agents
  • Total tokens — input + output
  • Total cost — aggregated across all providers
  • Avg latency — mean response time
  • P50 / P90 / P99 latency — latency percentiles
  • Active models — models in use across agents

Latency Breakdown

  • By model — compare latency across GPT-4o, Claude Sonnet, Gemini, etc.
  • By use case — latency grouped by semantic bucket (if using SDK tracing)
  • Percentiles — P50, P90, P99 distribution

Cost Analysis

  • By model — which models are consuming the most budget
  • By environment — production vs. staging vs. development spend
  • Time series — daily cost trend

Usage Patterns

  • Top consumers — which agents use the most tokens
  • Top use cases — which workflows drive the most cost

Agent-Level Metrics

Each agent has its own metrics accessible via the dashboard or API.

Dashboard

Go to Agents → click an agent → Overview tab. Shows:

  • 7-day session volume bar chart (real data)
  • Total sessions, tokens, cost
  • Average and percentile latency
  • Security score average

API

GET /api/web/agents/{agent_id}/metrics?days=7

Parameters:

ParameterTypeDefaultDescription
daysinteger7Lookback period (1-90)

Response:

{
  "period_days": 7,
  "total_sessions": 142,
  "total_tokens": 89420,
  "total_cost": 1.2340,
  "avg_latency_ms": 1250,
  "p50_latency_ms": 980,
  "p90_latency_ms": 2100,
  "p99_latency_ms": 4500,
  "avg_security_score": 78.5,
  "total_scans": 3,
  "daily_breakdown": {
    "2025-01-08": {"count": 18, "tokens": 12400, "cost": 0.18},
    "2025-01-09": {"count": 22, "tokens": 14200, "cost": 0.21}
  }
}

Flight Recorder

The Flight Recorder is Kurral's session detail view. Click any session to see a timeline of every event in the agent execution:

Timeline Lanes

LaneWhat it Shows
User InputThe initial query or message
LLMModel calls with input/output
Tool ExecutionTool calls with parameters and results
PolicySafety gate blocks, rate limits
OutcomeFinal agent response

Event Details

Click any event in the timeline to see:

  • Full input/output data
  • Timing (start, duration)
  • Token usage for LLM events
  • Tool parameters and return values
  • Error messages and stack traces

Bookmarks

Mark important events for quick reference during investigation.


Observability Summary API

GET /api/web/observe/summary?range=7d

Returns aggregated observability metrics:

{
  "total_runs": 452,
  "success_rate": 94.2,
  "error_rate": 5.8,
  "avg_latency_ms": 1340,
  "p95_latency_ms": 3200,
  "total_tokens": 234000,
  "total_cost": 12.45,
  "reliability_score": 87,
  "top_tools_by_error": [
    {"tool": "search_orders", "error_rate": 2.1}
  ],
  "cost_by_model": [
    {"model": "gpt-4o", "cost": 8.20},
    {"model": "claude-sonnet-4-5", "cost": 4.25}
  ]
}

Cost Calculation

Kurral calculates cost using provider-specific pricing tables. Costs are computed per-call based on:

  • Model used
  • Input token count
  • Output token count
  • Provider pricing at time of call

Cost is shown in USD and broken down at the session, agent, and account level.