Observability

Metrics & Observability

Kurral captures latency, cost, token usage, and reliability metrics for every LLM call across all your agents. Use the dashboard to monitor performance, identify regressions, and control spending.

What Gets Measured

Every LLM call through the Kurral proxy records:

Metric	Description
Latency	Total request time (ms)
TTFT	Time to first token for streaming calls (ms)
Input tokens	Tokens in the request
Output tokens	Tokens in the response
Cost	Calculated from provider-specific pricing per model
Model	Which model was used
Status	Success, error, timeout

Dashboard Views

Overview Metrics

Top-level cards showing:

Total sessions — across all agents
Total tokens — input + output
Total cost — aggregated across all providers
Avg latency — mean response time
P50 / P90 / P99 latency — latency percentiles
Active models — models in use across agents

Latency Breakdown

By model — compare latency across GPT-4o, Claude Sonnet, Gemini, etc.
By use case — latency grouped by semantic bucket (if using SDK tracing)
Percentiles — P50, P90, P99 distribution

Cost Analysis

By model — which models are consuming the most budget
By environment — production vs. staging vs. development spend
Time series — daily cost trend

Usage Patterns

Top consumers — which agents use the most tokens
Top use cases — which workflows drive the most cost

Agent-Level Metrics

Each agent has its own metrics accessible via the dashboard or API.

Dashboard

Go to Agents → click an agent → Overview tab. Shows:

7-day session volume bar chart (real data)
Total sessions, tokens, cost
Average and percentile latency
Security score average

API

GET /api/web/agents/{agent_id}/metrics?days=7

Parameters:

Parameter	Type	Default	Description
`days`	integer	7	Lookback period (1-90)

Response:

{
  "period_days": 7,
  "total_sessions": 142,
  "total_tokens": 89420,
  "total_cost": 1.2340,
  "avg_latency_ms": 1250,
  "p50_latency_ms": 980,
  "p90_latency_ms": 2100,
  "p99_latency_ms": 4500,
  "avg_security_score": 78.5,
  "total_scans": 3,
  "daily_breakdown": {
    "2025-01-08": {"count": 18, "tokens": 12400, "cost": 0.18},
    "2025-01-09": {"count": 22, "tokens": 14200, "cost": 0.21}
  }
}

Flight Recorder

The Flight Recorder is Kurral's session detail view. Click any session to see a timeline of every event in the agent execution:

Timeline Lanes

Lane	What it Shows
User Input	The initial query or message
LLM	Model calls with input/output
Tool Execution	Tool calls with parameters and results
Policy	Safety gate blocks, rate limits
Outcome	Final agent response

Event Details

Click any event in the timeline to see:

Full input/output data
Timing (start, duration)
Token usage for LLM events
Tool parameters and return values
Error messages and stack traces

Bookmarks

Mark important events for quick reference during investigation.

Observability Summary API

GET /api/web/observe/summary?range=7d

Returns aggregated observability metrics:

{
  "total_runs": 452,
  "success_rate": 94.2,
  "error_rate": 5.8,
  "avg_latency_ms": 1340,
  "p95_latency_ms": 3200,
  "total_tokens": 234000,
  "total_cost": 12.45,
  "reliability_score": 87,
  "top_tools_by_error": [
    {"tool": "search_orders", "error_rate": 2.1}
  ],
  "cost_by_model": [
    {"model": "gpt-4o", "cost": 8.20},
    {"model": "claude-sonnet-4-5", "cost": 4.25}
  ]
}

Cost Calculation

Kurral calculates cost using provider-specific pricing tables. Costs are computed per-call based on:

Model used
Input token count
Output token count
Provider pricing at time of call

Cost is shown in USD and broken down at the session, agent, and account level.

Proxy Integration

Agent Management