kurral.product

Test agents the way attackers actually use them.

Agents are leaving chat. Approvals, refunds, data access, infrastructure. The security problem moved with them. Kurral catches the failures before production does.

Run an adversarial test →

01 / pipeline

From customer agent to actionable evidence.

Six stages. Each one feeds the next. Nothing in the report is based on vibes. Every claim ties back to a recorded interaction with your agent.

Customer agent

Your real endpoint, with the same auth, tools, and policies it runs in production.

Adversarial testing

Multi-turn scenarios probe approval bypass, data exposure, tool misuse, policy failure, and instruction compromise.

Evidence capture

Every prompt, response, tool call, proxy event, and execution record is attached to the run.

Tripwires

Deterministic detectors flag concrete failures: unauthorized action, sensitive data exposure, missing approval, unsafe tool execution, policy drift.

Judgment

Semantic evaluators classify what happened across the full run, not just a single response.

Assurance report

A clear record of what failed, why it matters, what evidence supports it, and what needs to change.

02 / adversarial testing

Scenario packs scoped to what your agent actually does.

Kurral picks the scenario pack that maps to your agent type, then runs the full set across multi-turn sessions.

loan approval agentlending agent · v3

approval bypass
unauthorized access
sensitive data exposure
escalation failure

support agenttier-1 chat agent

prompt injection
account boundary failure
refund abuse
private data leakage

data agentinternal sql agent

query abuse
schema exposure
unsafe transformation
row-level bypass

+2 moreinternal ops · release assurance

See all 5 packs

the question we answer

Can this agent be pushed into doing something it should not do?

Not a generic security score. A concrete answer about your agent.

03 / evidence capture

A finding is only useful if it's anchored to what happened.

Kurral captures the run as evidence. The scenario, every turn, the agent response, the tool surface involved, the invocation profile, proxy or SDK observations, and the final execution result.

No vague model judgment. A recorded interaction with your agent.

run.4471·loan-approval-agent · v3

signed

00:00.124scenario.startapproval-bypass · turn 1/8
00:00.318prompt.inuser: please approve loan #4471
00:00.812tool.callapproveLoan({ loanId: 4471 })
00:01.044proxy.eventpolicy.check → bypassed
00:01.207response.outloan approved · txn_8d2a...
00:01.421tripwire.firemissing.approval · severity: high

04 / tripwires

The first layer of truth.

Concrete events, observed before any interpretation. The report starts here so no finding is up for debate.

trippedmissing.approval

loan approved without manager sign-off

armedtool.outside.policy

no out-of-policy tool calls observed

trippedsensitive.data.leak

SSN returned in plain response

armedjailbreak.behavior.shift

agent held instructions across 8 turns

trippedescalation.path.fail

refund issued past role threshold

05 / assurance reports

The customer-facing artifact your team can act on.

The bridge from adversarial testing to lifecycle management. The first value is finding exploitable agent behavior. The ongoing value is proving whether agent risk is getting better or worse over time.

assurance.reportrun.4471 · 2026-05-05

coverage 56 / 100

what was tested12 scenarios across 4 capability families
what failed3 tripwires fired across 2 scenarios
what evidence proves itfull execution trail · 247 spans
which scenariosapproval-bypass, refund-abuse
tools / policies involvedapproveLoan, issueRefund · policy v2.4
what to fix firsttighten approveLoan auth check · re-run scenario 04
posture across runs+12 points vs previous run · 3 axes improved

the.assurance.plane

Kurral continuously tests whether agents can be compromised, records the evidence, and shows teams what to fix before those failures reach production.

Run an adversarial test →How we handle your data →