← All docs
Integration

Proxy Integration

Route your AI agent's LLM calls through the Kurral proxy for automatic observability and security scanning. Zero changes to your agent's core logic.

Route your AI agent's LLM calls through the Kurral proxy for automatic observability and security scanning. Zero changes to your agent's core logic.


How It Works

Your Agent  ──▶  Kurral Proxy  ──▶  LLM Provider (OpenAI / Anthropic / Gemini)
                    │
             Captures every call:
             tokens, cost, latency,
             request/response content

Instead of calling api.openai.com or api.anthropic.com directly, your agent calls the Kurral proxy. The proxy authenticates the request, forwards it to the real provider, captures observability data, and returns the response unchanged.


Prerequisites

  • A Kurral API key (kr_live_...) from the dashboard
  • An agent registered in the dashboard (note the agent key)
  • Your LLM provider API key (OpenAI, Anthropic, or Gemini)

Quick Start by Provider

OpenAI

Before (direct):

from openai import OpenAI
client = OpenAI()  # calls api.openai.com

After (through proxy):

from openai import OpenAI

client = OpenAI(
    base_url="https://kurral-api.onrender.com/api/proxy/openai/v1",
    api_key="sk-your-openai-key",  # still your real OpenAI key
    default_headers={
        "X-Kurral-API-Key": "kr_live_your-kurral-key",
        "x-kurral-agent": "your-agent-key",
    },
)

Anthropic

Before (direct):

import anthropic
client = anthropic.Anthropic()  # calls api.anthropic.com

After (through proxy):

import anthropic

client = anthropic.Anthropic(
    base_url="https://kurral-api.onrender.com/api/proxy/anthropic",
    api_key="sk-ant-your-anthropic-key",  # still your real Anthropic key
    default_headers={
        "X-Kurral-API-Key": "kr_live_your-kurral-key",
        "x-kurral-agent": "your-agent-key",
    },
)

Gemini

Replace the base URL in your HTTP calls:

Before:

POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=YOUR_KEY

After:

POST https://kurral-api.onrender.com/api/proxy/google/v1beta/models/gemini-2.0-flash:generateContent?key=YOUR_KEY

Headers:
  X-Kurral-API-Key: kr_live_your-kurral-key
  x-kurral-agent: your-agent-key

Environment Variables

Keep your config clean:

# .env
KURRAL_API_KEY=kr_live_your-kurral-key
KURRAL_API_URL=https://kurral-api.onrender.com
KURRAL_AGENT_KEY=your-agent-key

# Provider key (unchanged)
ANTHROPIC_API_KEY=sk-ant-your-key
# or
OPENAI_API_KEY=sk-your-key
import os

KURRAL_API_URL = os.getenv("KURRAL_API_URL", "https://kurral-api.onrender.com")
KURRAL_API_KEY = os.getenv("KURRAL_API_KEY")
KURRAL_AGENT_KEY = os.getenv("KURRAL_AGENT_KEY")

client = anthropic.Anthropic(
    base_url=f"{KURRAL_API_URL}/api/proxy/anthropic",
    default_headers={
        "X-Kurral-API-Key": KURRAL_API_KEY,
        "x-kurral-agent": KURRAL_AGENT_KEY,
    },
)

Required Headers

Every proxy request must include:

HeaderValuePurpose
X-Kurral-API-Keykr_live_...Kurral authentication (use this)
X-API-Keykr_live_...Kurral authentication (legacy, deprecated Q3 2026)
x-kurral-agentAgent key (min 3 chars)Identifies which agent made the call
Provider authVaries (see below)Forwarded to the LLM provider
Note: X-API-Key is supported for backward compatibility but will be deprecated in Q3 2026. Use X-Kurral-API-Key to avoid header collisions, especially with Anthropic which uses x-api-key for provider auth.

Provider authentication is passed through unchanged:

  • OpenAI: Authorization: Bearer sk-... (set automatically by the SDK)
  • Anthropic: x-api-key: sk-ant-... (set automatically by the SDK)
  • Gemini: x-goog-api-key header or ?key= query parameter
Note: Kurral strips its own headers before forwarding to the provider. Your provider API key is never stored by Kurral.

Optional Headers

HeaderValuesDefaultPurpose
x-kurral-session-idAny UUIDAuto-generatedGroup multiple LLM calls into one session
x-kurral-retentionnone, metadata, fullmetadataControls what request/response content is stored
x-kurral-envproduction, staging, developmentproductionEnvironment tag for filtering

Session Grouping

By default, each LLM call creates its own session. To group a multi-turn conversation into a single session:

import uuid

session_id = str(uuid.uuid4())

# All calls with this session_id appear as one session in the dashboard
response1 = client.messages.create(
    ...,
    extra_headers={"x-kurral-session-id": session_id},
)

response2 = client.messages.create(
    ...,
    extra_headers={"x-kurral-session-id": session_id},
)

Data Retention

Control what Kurral stores with the x-kurral-retention header:

LevelWhat's Stored
noneOnly metadata (model, tokens, cost, latency)
metadataMetadata + truncated content summaries
fullComplete request and response bodies

Streaming

Streaming works transparently. The proxy forwards chunks in real-time while capturing the full response for observability.

# Streaming works exactly the same as direct calls
with client.messages.stream(
    model="claude-sonnet-4-5-20250929",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

For streaming calls, Kurral also captures time-to-first-token (TTFT) in addition to total latency.


Proxy Endpoints Reference

Proxy PathUpstream Provider
/api/proxy/openai/v1/chat/completionsOpenAI Chat Completions
/api/proxy/openai/v1/responsesOpenAI Responses API
/api/proxy/anthropic/v1/messagesAnthropic Messages API
/api/proxy/google/v1beta/models/{model}:generateContentGemini
/api/proxy/google/v1beta/models/{model}:streamGenerateContentGemini (streaming)
/api/proxy/healthHealth check

What Gets Captured

Every LLM call through the proxy automatically records:

  • Token usage — input tokens, output tokens, total
  • Cost — calculated from model-specific pricing
  • Latency — total request time and time-to-first-token (streaming)
  • Model — which model was used
  • Agent — which agent made the call
  • Session — grouped conversation context
  • Content — request/response bodies (based on retention setting)
  • Tool interactions — tool definitions, tool call arguments (from LLM responses), and tool results (from follow-up requests) are all part of the LLM conversation and captured automatically

Because tool calling in OpenAI, Anthropic, and Gemini flows through the messages API — the LLM requests a tool call, the agent executes it, and the result is sent back in the next message — the proxy sees the full tool interaction loop without any additional instrumentation.

All of this appears in the Kurral dashboard under the agent's session list.


Rate Limits

The proxy enforces per-user rate limits:

LimitDefault
Requests per minute60
Max concurrent requests10
Max request body size10 MB
Request timeout120 seconds

What Stays the Same

After rewiring to the proxy:

  • All client.messages.create() / client.chat.completions.create() calls — unchanged
  • Streaming calls — unchanged
  • Tool use / function calling — unchanged
  • Your MCP server — unchanged (tool execution happens locally; the proxy captures tool interactions via the LLM conversation)

What You Can Remove

Once routed through the proxy, manual observability code becomes unnecessary:

  • Manual session upload functions — proxy captures this automatically
  • Manual token counting — proxy extracts tokens from provider responses
  • Manual cost calculation — proxy calculates cost per model
  • POST /api/sessions/ calls — replaced by automatic proxy capture

Troubleshooting

ErrorCauseFix
401 UnauthorizedMissing or invalid Kurral API keyUse X-Kurral-API-Key (preferred) with your kr_live_... key
400 x-kurral-agent header requiredMissing agent headerAdd x-kurral-agent to default headers
429 Rate limit exceededToo many requests per minuteReduce request rate or contact support
502 Bad GatewayUpstream provider errorCheck your provider API key is valid
Provider auth errorProvider key not forwardedEnsure your provider API key is set (SDK reads from env)