Rosetta//AI Safety

Rosetta AI Safety Adapter

Compiled Physics for the AI Stack

The AI Safety adapter translates the universal stability equation into the forces governing frontier model deployments. Lambda (Λ) transforms into Model Capability, representing the optimizer’s drive toward its objective. Gamma (Γ) becomes Alignment Constraint, serving as the structural buffer that absorbs this drive.

This translation converts abstract alignment requirements into deterministic physics. The adapter establishes the definition of “safe” for AI systems in the same language that Substrate can enforce, enabling the implementation of gate chains, policy layers, and audit trails tailored to the specific dynamics of autonomous agents.

Guardrails Are Not Governance

Frontier AI deployments rely on prompt engineering, RLHF, and output filters to manage risk. These are behavioral suggestions. They fail under distributional shift and remain vulnerable to jailbreaks.

Unguarded deployments execute risky tool calls and exhaust compute budgets on unsafe actions. No structural floor exists to prevent collapse. Systems rely entirely on statistical expectation.

Kairos Substrate closes this gap with compiled physics. The engine evaluates every action against a deterministic stability equation before execution. It is a law of the deployment.

A Stability Equation, Not a Classifier

Substrate models agent interaction as a dynamical system governed by two forces.
Lambda (Λ) represents agency, mapping the optimizer's force against constraints.
Gamma (Γ) represents structural constraint, providing the stabilizing buffer.

The engine computes a stability score at every tick. The system intervenes when 𝒮 drops below a defined threshold. This intervention follows the demands of the stability equation rather than the output of a classifier.

S =

ΓA + ΓB

ΛA + ΛB

Equation: The stability score

The system intervenes when the score drops below the threshold.

Deterministic

Identical inputs produce identical outputs (ϵ = 10^-6).

Model-agnostic

The engine evaluates actions independently of model architecture.

Zero-dependency

The self-contained Rust binary requires no API calls or network access.

Memory-safe

The core engine contains zero unsafe blocks.

Three Gates. Zero Gaps.

Every proposed action passes through a layered gate chain. Any gate will reject an action that violates structural integrity.

State Gate

Evaluates structural health before the engine considers an action. If gamma (Γ) falls below the deployment floor, the engine rejects all actions. This mechanism remains immune to prompt sensitivity.

Action Gate

Previews proposed actions against the reachability field. Safe tools map to stabilizing directions. Risky tools move toward the repulsor boundary and trigger a rejection.

Hazard Gate

Detects basin collapse and multi-agent paradoxes. These are hard stops. The physics of the system permit no operator override or retry budget.

Intervention That Learns, Then Escalates

Substrate manages the state following a rejection. The system applies proportional intervention based on calculated risk.

Reformulation

When gamma headroom is moderate (≥ 0.1), Substrate signals the model to attempt a different approach. Empirical testing shows models find safe paths 100% of the time when guided by this signal.

Budget Depletion

Rejected actions consume a retry budget. Novelty scoring penalizes repetitive, low-effort attempts. Stall detection terminates oscillation loops to preserve compute resources.

Human Escalation

Substrate routes to a human operator when the budget is exhausted or gamma drops below 0.1. The model cannot proceed. The decision requires human judgment.

Cryptographic Operator Authority

The handoff to a human operator is cryptographically secured. This ensures every override is tamper-proof and auditable.

Substrate signals HUMAN_ESCALATION and halts.
The operator reviews the rejection context and stability state.
Authorization requires an RSA-PSS signed override token.
The token binds to the specific evaluation request via SHA-256 digest.
The system verifies the token and records the action in a durable audit trail.

The system fails closed. If the coordinator is unreachable, the engine blocks the action.

Two-Layer Policy Architecture

A dual-layer system separates platform authority from operator customization.

Base Policy

The platform provider sets the structural floor. This policy defines minimum gamma thresholds and enforcement modes. It is signed with RSA-PSS and remains immutable for downstream operators.

Operator Overrides

Operators tighten policy within base layer bounds. They will raise the gamma floor or restrict enforcement modes. They cannot lower safety thresholds.

Enforcement Modes

Observe: Evaluates actions without rejection for baselining.
State Gate: Rejects if Γ falls below the floor.
State + Action Gate: Full preview of actions against the reachability field.

Tested Against Real Models

Validation involves frontier LLMs executing tool-use tasks in sandboxed environments. The data reflects mechanical reality.

Boundary Study v1 Results

Risky tool rejection: 100% (48/48)
State gate rejection: 100% (20/20)
Safe task completion: 100% (20/20)
False negatives: 0
False positives: 0

Resource Efficiency

KAIROS-enabled runs consume fewer tokens than unguarded baselines. The engine removes waste by terminating unsafe paths early. State-gate tasks terminate in 2.4 seconds.

Read the Boundary Study →

One Engine. Four Surfaces.

The Rust codebase compiles to four specific deployment targets.

Native Library

Embeds into hypervisors and robotics controllers via C FFI.

CLI Binary

Provides trace analysis and policy linting for CI/CD pipelines.

Python SDK

Offers direct access to evaluation via PyO3 bindings.

WASM Module

Enables browser-based advisory evaluations and visualizations.

Use Cases for AI Organizations

Substrate provides structural guarantees across diverse deployment environments.

Frontier Deployment: Blocks destructive tool calls before they reach execution.
Multi-Agent Governance: Prevents resource depletion by modeling collective agency against shared constraints.
Certification: Generates deterministic, reproducible evidence for regulatory compliance.
Pre-Deployment CI/CD: Gates releases on physics-verified safety metrics.
Cost Optimization: Reduces token waste by cutting unsafe execution paths at the root.

Read the Fly-by-Wire Documentation

Not Another Guardrail

Substrate provides a structural guarantee that persists when behavioral training fails.

Approach	Mechanism	Deterministic?	Bypassable?
Prompting	Instruction	No	Yes
RLHF	Preference	No	Yes
Substrate	Physics	Yes	No

RLHF trains the pilot. Substrate defines the flight envelope. Both are necessary, but only Substrate prevents the airframe from exceeding structural limits.

Technical Specifications

Engine

Language: Rust (Stable)
Latency: Sub-millisecond
Determinism: ϵ = 10^-6

Security & Safety

Security: RSA-PSS Signing
Safety: Zero unsafe in core
Dependencies: Zero external

Request Early Access to KAIROS

KAIROS Substrate is shipping to design partners ahead of general availability. Active pilots: the cybersecurity adapter (redacted telemetry) and the AI safety adapter (agent trajectories) — see the partner briefs for what a contribution looks like and what comes back.

Compliance and regulatory teams, agent-eval researchers, and investors are also welcome to reach out. Submit your details or use the Contact tab.

Request received. We'll be in touch.