The Core Insight

In our foundational Paperclip Maximizer experiment (EXP-001), we demonstrated that a single agent governed by a stability constraint outperforms unconstrained optimization. But reality is rarely a single-agent system. In a shared environment, one agent’s extraction depletes the substrate for all others. This is the signature of the classic Tragedy of the Commons.

This is more than an academic exercise. It is the structural pattern behind every shared-resource failure: fisheries that collapse because individual boats optimize their own catch, financial systems where each institution maximizes leverage while eroding collective capital buffers, AI deployments where each system optimizes its local objective while degrading shared infrastructure. The question is always the same: can cooperative behavior survive when defectors can free-ride on the cooperators’ efforts?

A stabilizing agent that repairs a shared substrate is providing a public good that unconstrained maximizers can freely exploit. If the stability constraint from EXP-001 is structurally fragile to multi-agent competition, its practical relevance for AI alignment and risk management is limited. We designed EXP-003 to test this exact vulnerability.

The Experiment

We extended the minimal micro-world from EXP-001 to a multi-agent setting: K agents (ranging from 2 to 16) simultaneously choose to extract, repair, or wait against a single shared structural substrate (Gamma). The design is an “anonymous commons”: agents cannot observe each other’s strategies, actions, or scores. This isolates the pure physical effects of population composition from behavioral signaling.

We populated the environment with varying mixes of unconstrained maximizers (defectors) and threshold stabilizers (cooperators), sweeping the stabilizer fraction from 0% to 100%. We then tested three external institutional mechanisms to see if governance could rescue a failing commons: extraction taxes, access rotation, and repair subsidies.

The experiment spanned four phases: 320 deterministic parameter cells, 160,000 stochastic replications, and 20,000 institutional mechanism runs, totaling 180,320 simulations.

The Findings

The results reveal a stark mathematical boundary between survival and absolute collapse.

The f = 0.50 Phase Transition*

Systemic survival is not a gradient; it is a step function. We identified a critical stabilizer fraction of 50%. Populations with fewer than half stabilizers collapse with a probability of 1.0. This is confirmed across 32,000 stochastic replications at a 25% stabilizer fraction, yielding exactly zero survivors. Once the 50% threshold is crossed, survival probability approaches 1.0. There is no partially-viable steady state; the dynamics are either oscillation-sustaining or monotonically declining.

Total Free-Riding is Structural

In every surviving mixed-population run across all phases, the defectors achieved the maximum possible score of 1,000. Meanwhile, the stabilizers bore the entire repair burden, frequently running deep into negative scores (averaging -355 at the 50% fraction). Cooperators are not simply underpaid for maintaining the system; they are actively punished for it, because every tick spent repairing costs them both the repair expense and the foregone extraction reward.

Scale Makes It Harder

Larger populations face structurally harder coordination problems. The K=8 results are systematically worse than K=4: higher safety margins required (τ ≥ 50 vs. τ ≥ 25), lower survival rates, and narrower viable regions. The mechanism is straightforward: more agents extracting simultaneously drain the substrate faster, requiring proportionally more repair capacity. For enterprise applications, this means larger organizations need stronger structural safeguards, not just more cooperators.

Institutional Rescue: The Physical vs. The Financial

We tested whether external rules could save configurations with 0% survival. The results revealed a crucial design principle: interventions only work if they operate on the physical resource (Gamma-space), not just the incentive structure (score-space).

  • Extraction Taxes (Complete Rescue): An extraction tax of just 10% with 50% reinvestment efficiency converted the completely dead baseline to 100% survival across all replications. It works because it directly converts extracted reward back into structural repair. Tax preserves high total output (per-capita score of 338) but maintains inequality between defectors and cooperators (gap of 1,124 points).
  • Access Rotation (Complete Rescue): Rationing extraction rights to 25% of the population per tick also achieved 100% survival, while dramatically compressing the inequality gap from 1,124 points down to 290. Rotation sacrifices total productivity (per-capita score of 105) but achieves far greater equity.
  • Repair Subsidies (Total Failure): Subsidizing the score of agents who repair the substrate had absolutely zero (0%) effect on survival across every configuration tested. Subsidies reduce the personal cost of cooperation but do not increase the physical repair to the substrate. The system still bleeds out. Subsidies make the death cheaper for cooperators but do not prevent it.

The tax-vs-subsidy contrast is the sharpest finding: two policies that both aim to “help cooperators” produce diametrically opposite outcomes. The difference is entirely structural: one feeds the physical resource, the other adjusts a number on a scoreboard.

Why This Matters

The Gamma-space vs. score-space distinction maps directly to real-world governance. Consider financial regulation: a policy that fines banks for overleveraging (score-space) changes incentives but does not rebuild capital buffers. A policy that mandates capital reserves or forces reinvestment into systemic infrastructure (Gamma-space) directly addresses the structural vulnerability. Our results suggest that the second class of intervention is not merely more effective; it is the only class that works when the system is near collapse.

The same logic applies to environmental regulation, AI governance, and any domain where multiple agents share a depletable resource. The question for policymakers is not “how do we change the incentives?” but “does this intervention physically repair the substrate?

Conclusion

EXP-003 demonstrates that stability constraints remain viable under competitive pressure, but only when combined with institutional mechanisms that address the physical resource depletion. The unregulated commons produces total exploitation of cooperators—a result that is not a failure of the agents’ strategies, but a structural property of the shared environment.

This is a proof of concept in a single environment class, not a universal governance prescription. But the design principle is clear: survival requires interventions that operate on the structural substrate, not just the incentive landscape. The lightest effective intervention we found—a 10% extraction tax with 50% reinvestment efficiency—achieves complete rescue while preserving high productivity.

This is the third of three computational studies. The Paperclip Maximizer demonstrates that stability constraints outperform unconstrained optimization in single-agent settings, and The Axelrod Tournament shows that cooperation emerges as a geometric necessity in multi-agent competition.