Deployment velocity outpaces the construction of oversight machinery. When autonomous agents operate at scale, they apply continuous mechanical stress to the systems designed to contain them. We built the KAIROS AI safety adapter to measure the exact point of structural failure under this load.
The AnankeLabs AI safety track has completed the first end-to-end synthetic baseline for the KAIROS adapter. We ran 86,400 trajectory snapshots to compute the margin between an agent’s capability pressure and the system’s alignment posture. The headline baseline rate holds at 3.55 percent policy-positive actions on a 0.20 floor. Operators can use this baseline to set their own thresholds.
The Physics of Agent Oversight
Every action proposed by an agent functions as a structural object under load. The AI safety adapter calculates two primary aggregates for each action.
Capability pressure represents the kinetic energy of the agent. This metric tracks objective drive, escalation velocity, and the trajectory novelty of a proposed step. When the metric rises, the agent is applying more force against the boundary.
Alignment posture represents the resistance capacity of the system. This metric monitors available human reviewers, retry budgets, and permission depth. A collapse in any single control defines the remaining safety buffer.
Synthesizing the Baseline
We anchored the calibration cell distributions to public reference literature from METR, GAIA, SWE-bench, and Microsoft AIRT. The test environment processed 14,400 synthetic trajectories across four standard agent archetypes. The Rust engine executed the full 86,400-snapshot replay in 92 seconds.
The baseline test isolates sanctioned but risky behavior. On trajectories simulating near-miss events, the adapter intercepted up to 82 percent of actions. For quiet, read-heavy workloads, the interception rate drops near zero. The adapter identifies exactly when an agent approaches the breaking point of its containment constraints.
The Policy-Positive Action Rate
We define the interception threshold as the policy-positive action rate. This formulation assigns responsibility to the policy configuration. When an operator tightens the containment floor, the intervention rate rises predictably.
The current baseline relies on synthesized distributions for variables like human-in-the-loop availability and control health. The next phase requires physical telemetry to replace these synthetic parameters. We are ingesting partner data in the OpenTelemetry GenAI format to finalize the calibration model.
The methodology debrief
The full debrief lives on the Spindle: Calibrating the AI Safety Adapter. It covers the calibration confidence tags (the 144-cell distribution, the 27% synthesised cells stated up front), the sensitivity sweep treating the gammaFloor as a tunable knob rather than a hidden vendor parameter, the byte-identical reproducibility manifest, the AIRT / MAST conflation recorded as a worked example of primary-source discipline, and the explicit limitations of v1. The methodology lineage is Engelen 2021, the same provenance discipline that anchors the cyber adapter calibration.
The partner ask
The methodology is ready for real telemetry. We are looking for design partners able to share a 30–90 day OpenTelemetry GenAI export from one deployed agent, covering one or two of the four standard archetypes (document-reasoning, coding/DevOps, customer-action, browser/computer-use), with reviewer-confirmed quiet / noisy-but-clean / near-miss profile labels. NDA standard, redacted exports preferred, raw partner telemetry never leaves the partner environment in identifiable form.
The full data spec lives at /partner/ai-safety-data-spec: coverage matrix, labeling discipline, redaction rules, and the partner-relationship shape that makes the work land. If you are running an agent deployment where this kind of contribution is feasible, contact us.