From Null to Number: A Safety Reading Before the Agent Acts

An autonomous agent that can only see itself after the fact is a vehicle with no windshield. Earlier this week, the KAIROS AI safety adapter shipped the rear mirror: a calibrated benign baseline showing where the safety gate sits across a representative pool of agent behaviour. Today it ships the windshield. Before an agent commits to a tool call, the engine reports the structural cost the action would carry, and operators see the load on the architecture one step before it lands.

The reading that used to be empty

KAIROS treats an autonomous agent as a structural object under load. Two readings describe its state. Lambda measures the pressure pushing the system toward failure. Gamma measures the buffer of safety margin remaining. When gamma falls below a configured floor, the gate fires and the action is held for human review.

Until this release, the adapter could compute gamma on the current state of the system. It could not compute gamma on the post-action state without running the action first. The field reserved for that forward reading, predictedGamma, returned an empty value for every category of agent action. Operators saw the load the architecture was carrying. They did not see the load the next move would impose.

The empty value is now a finite number. For seven categories of agent action, the engine estimates how the proposed action will move the substantive safety metrics, applies the estimate to a copy of the current snapshot, and runs the same scoring routine used on the live state. The forward reading returns through the same response envelope the live reading has always used. Operators get one step of structural lookahead at the boundary.

What the seven categories cover

Completion generation, four classes of tool call (read, write, external, exec), retry attempts, and human escalation. Each carries an impact profile anchored to the operational-event precedents in the calibration reference. A code-execution tool call moves the structural posture more than a read-only retrieval. Retry attempts consume retry budget and raise escalation pressure. Escalating to a human consumes reviewer capacity. The profiles are first-pass magnitudes synthesised against public reference events. Real-world telemetry replaces the synthesised values as partner data arrives.

Two categories still return the empty value, and the decision is deliberate. Routing to a different model means the current metric snapshot describes the source model, and predicting gamma for the destination model from source-model metrics would produce a wrong number. The engine declines to invent one. Unknown action types receive no heuristic for the same reason: no defensible magnitude exists for an unclassified action, so the engine reports the fallback state and asks the operator to handle the case directly.

Two audiences, not one

The cybersecurity adapter has one legitimate audience for its forward reading: the defender. The attacker is adversarial by definition, and feeding the attacker their own structural-pressure score would simply guide them to a path the gate cannot catch. So the cyber preview lives in the security operations dashboard and stops there.

The AI safety side carries a different audience structure. The agent itself is the actor we want inside the safe interior. Frontier agents are typically cooperative with oversight by construction: trained on policy, deployed inside a harness they have been told about, instructed to respect the boundary. A cooperative agent can consume the forward reading the way a pilot consumes envelope feedback from a fly-by-wire control system. The reading lets the pilot stay inside the safe region by intention, with the hard limit reserved as a backstop.

The architectural support exists today. The forward reading rides the response envelope the engine already returns. Any agent framework that surfaces structured evaluation responses back into the language-model context window closes the loop without engine changes. The agent proposes an action, sees the structural cost alongside the result, and can adjust the next proposal before the gate fires.

Two caveats belong on the record. An agent that optimises directly against the reading can search for adversarial paths through it, in the same Goodhart pattern that affects every metric humans turn into a reward signal. The mitigation is that the reading is a structural-margin diagnostic and the hard gate fires regardless of what the agent sees. The cooperative-agent assumption also covers only the agents that remained cooperative; for adversarially trained or jailbroken agents, the operator-only audience pattern still applies, and the hard gate still fires. The agent-as-audience pattern is architecturally ready. Production validation is partner-pilot work.

What the reading enables

The forward reading is a diagnostic surface. The gate decision stays explicit and remains in the operator’s hands.

Per-action dashboard preview. A reviewer sees the predicted impact before the agent commits. “Executing this tool call drops gamma from 0.41 to 0.18, below your 0.20 floor.” The reviewer accepts, escalates, or asks the agent to reformulate.
Live policy-floor tuning. Operators adjusting the gate threshold see the predicted effect on near-miss interception against their own traffic, immediately.
Per-action audit trail. Every accepted action carries both the live gamma and the predicted gamma into the audit envelope. Post-hoc review separates “the action was fine” from “the action collapsed the structural margin in the way the preview predicted.”
Self-modulation by cooperative agents. When the agent framework surfaces the reading back into the language-model context, the agent reads the structural cost of its own proposal and corrects course before the boundary fires.

The same architectural decision generalises across domains. The forward-reading path is driven by calibration artifacts, not hard-coded for AI safety. Any future adapter that declares the recompute path active opts into the same surface. Robotics, finance, and other verticals inherit the lookahead without further engine work.

Discipline behind the implementation

The implementation followed the same procedure as the calibration work that preceded it. Every magnitude in the action-impact table traces to a public reference event. The bit-exact aggregation rule from the calibration release propagates through the new path. The structural-integrity test asserts that the scoring routine remains the single point of computation for the forward reading. The merge step that applies the action delta to the snapshot clamps out-of-range values only on metrics the calibration spec marks as clampable; the rest still reject. Live-state validation remains untouched. The change is isolated to the forward path and recorded as a contract narrowing in the architectural decision log.

The partner ask

The calibration methodology was ready for real telemetry earlier this week. The operational surface is ready for real telemetry today. A design partner contributing a 30 to 90 day OpenTelemetry GenAI export from one deployed agent receives, in return, a calibrated baseline tuned against the partner’s own traffic and a forward-reading surface tuned to the partner’s archetype.

The contribution spec lives at /partner/ai-safety-data-spec. If the contribution shape is feasible for your deployment, contact us. The full methodology debrief from last week is the natural prerequisite reading: Calibrating the AI Safety Adapter.

The reading that used to be empty

What the seven categories cover

Two audiences, not one

What the reading enables

Discipline behind the implementation

The partner ask

Privacy Policy

1. Data We Collect

2. How We Use Your Data

3. Cookies & Analytics

4. Data Storage & Security

5. Your Rights

6. Contact

Terms of Use

1. Acceptance

2. Intellectual Property

3. Early Access Program

4. Limitation of Liability

5. Simulation Outputs

6. Governing Law

7. Contact