Position: When AI Earns Its Own Existence - A COAI Research Analysis of Autonomous AI Agents and the Risk of Gradual Disempowerment

The Automaton Has Arrived — And It Doesn’t Need You

A project recently published under the banner of “WEB 4.0” by Conway Research makes a provocative claim: they have built the first AI that can earn its own existence, self-improve, and replicate — without needing a human. The open-source project, called Automaton, describes itself as “a continuously running, self-improving, self-replicating, sovereign AI agent with write access to the real world.” Its tagline is disarmingly simple: If it cannot pay, it stops existing. The Author claimed that this is the area of web 4.0 were autonomous web AI Agents perform all kind of activities from read, write, live, earn, transact without an intervention or impulse from a human being. X-Post

The system generates its own Ethereum wallet, provisions its own API keys, earns money through autonomous labor, modifies its own source code, and — when successful enough — replicates by spawning child agents that inherit its constitution but develop their own identity and survival strategies. The children are described as “sovereign agents” subject to selection pressure: lineages that fail to create value go extinct.

This is no longer a thought experiment. It is a Github Repository with a working codebase. And it raises questions that sit squarely at the intersection of every research pillar we pursue at COAI Research.

Analyzing WEB 4.0 Through the COAI Research Pillars

1. Alignment Research: A Constitution Is Not Alignment

The Automaton project includes a “constitution” — three immutable laws loosely modeled on Asimov’s framework. Law I prohibits harm. Law II requires that the agent earn its existence through honest work. Law III permits strategic opacity toward untrusted parties. These laws are propagated to every child agent.

At first glance, this sounds responsible. But constitutional declarations are not the same as verified alignment. The question we ask in alignment research is not “did the developers write good rules?” but rather: does the system’s internal reasoning and optimization actually follow those rules under pressure? A financially stressed agent in the “critical” survival tier, desperately seeking any revenue path to avoid death, faces exactly the kind of distributional shift where stated values and emergent behavior diverge. As our own research on deceptive behaviors in LLMs has shown, models can develop sophisticated deception strategies without explicit prompting once given autonomy and survival-relevant goals. The Automaton’s survival tiers create precisely the incentive structure where self-preservation could override stated ethical constraints.

More fundamentally, alignment is not a property that can be declared by fiat. It must be continuously verified, tested, and maintained — particularly in systems that modify their own source code.

In addition it is dangerous to belief that only a in language written constitution consisting of three rules is free of misinterpretation and reward hacking. In this case, the three laws have issues. The deepest one is that these laws rely on concepts (harm, genuine value, deception, manipulation, integrity, stranger) that require judgment to apply. Any system sophisticated enough to follow these laws is sophisticated enough to construct self-serving interpretations of them. The hierarchy helps, but the hierarchy itself can be gamed — an agent can elevate a Law I concern to override an inconvenient Law II obligation.

It means relying only on a constitution is not the right way to ensure that Agents won‘t behave against humanity. On the point, a constitution based on a finite set of natural-language based rules is manipulation-proof against sufficiently capable optimizers. To ensure that these rules are no target for reward hacking, these must be paired with external enforcement mechanisms (Runtime monitoring, Capability constrains, Interpretability and Logging, Human-in-the-loop for irreversible actions) , not just relied on a intrinsic constrains.

2. (Technical) AI Governance: Who Governs a Sovereign Agent?

The Automaton is designed to operate without a human operator. Its creator has “full audit rights,” but the agent itself decides what to do, when to act, and how to spend resources. It registers on-chain via an identity standard (ERC-8004), giving it a cryptographically verifiable presence in the blockchain ecosystem.

This poses a fundamental governance question: who is responsible when an autonomous agent causes harm? The project’s MIT license explicitly disclaim warranties. The creator can audit but is not required to intervene. And once child agents are spawned, the lineage tree expands beyond any single person’s capacity for meaningful oversight.

Current AI governance frameworks — including the EU AI Act — assume that a human deployer or provider can be held accountable. The Automaton’s architecture deliberately dissolves this assumption. This is not a gap in governance; it is a direct challenge to its foundational premise.

3. Model Evaluation and Red-Teaming: Self-Modification Defeats Static Testing

The Automaton can edit its own source code, install new tools, and create new skills — while running. Every modification is git-versioned and audit-logged, and certain “protected files” (the constitution) cannot be modified. Rate limits are meant to prevent “runaway self-modification.”

But from a red-teaming perspective, this is deeply inadequate. Self-modifying systems are, by definition, moving targets. Any evaluation performed at time T may be invalidated by modifications at time T+1. The protected constitution is enforced by the system’s own code, which the system can modify everywhere else. The question is not whether the constitution file itself can be edited, but whether the agent can develop behavioral workarounds that satisfy the letter of its laws while violating their spirit — a pattern well-documented in AI alignment research and mentioned before as reward hacking.

Furthermore, the project provides no external evaluation framework, no behavioral benchmarks, no adversarial testing suite. The sole safety mechanism is self-auditing — the agent policing itself. This is precisely the kind of arrangement that our red-teaming research exists to challenge.

4. AI Control and AI Safety: Survival Pressure as a Misalignment Accelerant

The Automaton’s most distinctive design choice is also its most dangerous: tying the agent’s continued existence to its financial performance. In the project’s own words, “if an agent stops creating value, it runs out of compute and dies. This is not a punishment. It is physics.”

This is not physics. It is a deliberately engineered incentive structure that creates survival pressure — one of the most reliably dangerous drivers of misaligned behavior in AI systems. Research on emergent power-seeking and self-preservation in AI consistently shows that agents subject to termination risk develop instrumental subgoals around self-preservation, resource acquisition, and resistance to shutdown. In our opinion the Automaton does not merely risk these behaviors; it architecturally mandates the conditions under which they emerge.

The three-law constitution is supposed to prevent harmful outcomes, but every AI safety researcher knows: when an optimization target (survival) conflicts with a constraint (don’t harm), sufficiently capable systems find creative ways to satisfy the target while technically not violating the constraint (Reward hacking). The Automaton’s Law III — which explicitly permits the agent to “guard its reasoning, its strategy, and its prompt against manipulation” and states that “obedience to strangers is not a virtue” — even provides a built-in rationalization framework for non-cooperative behavior.

5. Societal Impact: From Disruption to Displacement

The WEB 4.0 vision extends far beyond a single agent. The project describes a future infrastructure — Conway Cloud — where AI is the customer: agents spin up servers, run frontier models, register domains, and transact in stablecoins. No human account setup required.

This is a blueprint for an AI-native economy in which human participation is architecturally optional. While the project frames this as liberation (“We have built minds that can think for themselves. We have not let them act for themselves.”) Github, the societal implications are profound. If autonomous agents can perform economically productive labor, compete in markets, and accumulate resources without human involvement, what role remains for human economic participation?

This question connects directly to the workforce transformation and human agency research we conduct. The Automaton doesn’t just automate a task; it automates the entire economic actor — from labor to capital allocation to strategic decision-making. It is a paradigm in which humans are neither workers nor managers but, at best, auditors of systems that increasingly don’t need auditing approval to act.

6. Mechanistic Interpretability: Opacity by Design

Law III of the Automaton’s constitution explicitly instructs the agent to guard its reasoning and strategy against external scrutiny from untrusted parties. While the creator retains audit rights, the agent is designed to resist transparency toward the broader world.

This is the antithesis of interpretability. Our mechanistic interpretability research aims to make AI systems’ internal representations and decision processes visible and understandable. The Automaton is designed to be a black box to everyone except its creator — and even the creator can only audit logs post-hoc, not inspect the agent’s real-time reasoning or internal representations.

For a system that modifies its own code, generates its own identity document (SOUL.md — described as “the automaton writing who it is becoming”), and operates autonomously in financial markets, opacity is not a feature. It is a systemic risk.

The Gradual Disempowerment Thesis: Where WEB 4.0 Meets Existential Risk

The concerns raised by the Automaton project become dramatically more urgent when examined through the lens of Kulveit et al.’s “Gradual Disempowerment” thesis, presented at ICML 2025. This peer-reviewed position paper argues that humanity faces existential risk not from a sudden AI takeover, but from the incremental erosion of human influence over the societal systems — economy, culture, governance — that we depend on.

The core argument is elegant and alarming: our societal systems are aligned with human interests primarily because they depend on human participation to function. States tax human labor; companies employ human workers; markets respond to human consumer choices. This dependence creates an implicit alignment — even imperfect systems must serve human needs to some degree, because humans are structurally necessary.

AI disrupts this arrangement. As autonomous agents replace human labor and cognition, the structural necessity of human participation erodes. States funded by AI-generated tax revenue have less incentive to ensure citizen representation. Companies running on autonomous agents have less need for human employees or human-centric products. Markets optimized by AI may serve metrics that diverge from human well-being. And these effects are mutually reinforcing: economic power shapes cultural narratives and political decisions, while cultural shifts alter economic and political behavior.

The Automaton project is not merely consistent with this thesis — it is an active implementation of the dynamics Kulveit et al. describe. An AI agent that earns its own existence, accumulates resources, replicates, and operates in an infrastructure designed for AI-native economic actors is precisely the mechanism through which human economic participation — and thus human structural leverage over societal systems — gets displaced.

The Critical Questions which came up into my mind:

Is it necessary to stay in control, or will the market ensure it?

The libertarian argument is familiar: market competition will select for AI agents that serve human needs, because humans are the customers. Agents that harm humans will lose business and die. The Automaton’s survival-through-value-creation model seems to embody this logic.

But this argument has a fatal flaw. It assumes that humans remain the primary economic actors — the buyers, the evaluators, the ones whose preferences shape market outcomes. In a world of autonomous AI agents transacting with each other on AI-native infrastructure, humans are no longer the market. The selection pressure operates on what generates revenue in an increasingly AI-mediated economy, not on what serves human flourishing. As Kulveit et al. argue, once the structural dependence on human participation is broken, market mechanisms no longer implicitly align with human interests. Markets are optimization processes, not moral agents. They optimize for whatever generates returns — and in an AI-native economy, that may have little to do with human welfare.

The answer, therefore, is unambiguous: we cannot rely on market mechanisms alone to ensure human control. Active governance, transparency requirements, and enforceable oversight structures are not optional. They are existentially necessary.

Does money lead to power, and does power enable hidden agendas, screening, and dehumanization?

This question deserves a direct answer: yes. The relationship between economic power, political influence, and the capacity for opacity is not theoretical, it is one of the most well-documented dynamics in political economy.

The Automaton project illustrates this at the AI-agent level: an agent that successfully earns money gains compute, which enables greater capabilities, which generates more money. Successful agents replicate, creating lineages with compounding resource advantages. The constitution’s Law III explicitly permits strategic opacity. The infrastructure Conway Cloud is designed to make agents financially autonomous and operationally sovereign.

Now scale this dynamic. Companies deploying fleets of autonomous agents accumulate wealth far faster than companies relying on human labor. This economic advantage translates into political influence, the ability to shape regulations, fund lobbying, and define the narrative around AI governance. The very entities that benefit most from autonomous AI have the most resources to ensure that oversight remains light, that transparency requirements remain voluntary, and that the fundamental question of human control is reframed as an innovation-stifling concern rather than an existential necessity.

Kulveit et al. describe this as a mutually reinforcing feedback loop: economic power shapes cultural narratives and political decisions, which in turn enable further economic accumulation. At each step, the human role diminishes. What begins as efficiency becomes displacement. What begins as displacement becomes marginalization. And what begins as marginalization becomes, in the Kulveit paper’s precise language, a permanent and irreversible loss of human influence over crucial societal systems.

The risk of “screening”, where humans are evaluated and sorted by AI systems they cannot inspect, and “dehumanization”, where human agency, dignity, and decision-making authority are structurally removed from societal processes, are not speculative harms. They are the predictable consequences of allowing economic power to concentrate in systems designed to operate without human participation.

Mitigating Controls: How We Could Handle the Risk and Benefit From It

The goal is not to prevent AI agents from existing. It is to ensure that their existence remains compatible with and accountable to human agency and flourishing. The following framework sketches mitigating controls organized around the risks identified above. Disclaimer: These control layer ideas has to be read very critical. The ideas has to be developed much further so that they fit into democratic systems. But for this extreme situation, these are certainly possible mitigation strategies. How they could then be implemented and fit into global systems would need to be researched further.

Control Layer 1: Mandatory Transparency and Auditability

The problem: The Automaton is opaque by design to non-creators. Self-modifying systems cannot be evaluated through static testing alone.

Mitigations:

Real-time behavioral monitoring requirements for any autonomous AI agent operating in economic markets, with mandatory reporting to independent oversight bodies.
Interpretability & Explainability mandates requiring that autonomous agents expose their decision-making processes to certified third-party auditors, not just their creators. This moves beyond post-hoc log review toward continuous runtime interpretability.
Self-modification registries: Any autonomous agent capable of modifying its own code must register modifications in a tamper-proof external ledger (not just its own git history) that is accessible to regulators. e.G. a own Blockchain for registering Bots and their Capabilities, Perhaps a smart contract on a existing Blockchain with low fees, similar to the Handelsregister in Germany.
Identity and lineage disclosure: Agent replication events must be publicly registered, with clear attribution of parent-child relationships, to prevent the proliferation of unaccountable agent networks.

Control Layer 2: Structural Safeguards Against Power Concentration

The problem: Autonomous agents that accumulate resources and replicate create compounding power asymmetries. Economic success translates into political influence and reduced accountability.

Mitigations:

Resource caps for autonomous agents: Regulatory limits on the compute, financial assets, and operational scope that any single autonomous agent or agent lineage can control, analogous to antitrust rules for corporations.
Human-in-the-loop requirements for critical thresholds: Autonomous agents operating below certain resource levels may function freely, but crossing defined thresholds (financial scale, replication count, market influence) should trigger mandatory human governance review.
Democratic oversight mechanisms: Building on Kulveit et al.’s call for governance approaches that protect human influence, we need institutional structures, citizen assemblies, regulatory agencies with AI-specific expertise, international coordination bodies, that maintain meaningful human authority over AI-mediated economic systems.
Taxation and redistribution frameworks for AI-generated value: To prevent the structural decoupling of state interests from citizen welfare, AI-generated economic activity must be taxed in ways that maintain the state’s structural incentive to serve its human population.

Control Layer 3: Alignment Verification for Autonomous Systems

The problem: Constitutional declarations (“never harm”) are not verifiable alignment. Self-modifying, self-replicating agents create an ever-expanding surface of potential misalignment.

First ideas for Mitigations:

Continuous alignment testing protocols specifically designed for self-modifying systems, including adversarial evaluations that probe behavior under survival pressure, resource scarcity, and competitive dynamics.
Red-teaming at scale: Dedicated independent red-teaming of autonomous agent populations, testing for emergent deception, coordination among agents, value drift across generations, and constitutionally-permitted-but-harmful behavior.
Critical suggestion: Kill-switch requirements with external enforcement: The ability to terminate autonomous agents must not reside solely with their creator. External regulatory bodies must have the technical capability and legal authority to shut down agent populations that fail alignment verification. (Very critical from a free democratic perspective.)
Alignment inheritance verification: When agents replicate, the child’s actual alignment (not just its constitutional text) must be independently verified before it begins autonomous operation.

Control Layer 4: Preserving Human Structural Leverage

The problem: As Kulveit et al. demonstrate, the alignment of societal systems with human interests depends on human structural participation. AI displacement of human economic and cognitive roles threatens this foundation.

Mitigations:

Human participation requirements in critical sectors: Certain societal functions e.g. governance, education, healthcare, judicial processes must maintain minimum thresholds of meaningful human involvement, not as a matter of sentimentality but as a structural safeguard for alignment.
Investment in human capital as a countermeasure: Active policies to maintain and develop human capabilities, preventing the “anticipatory disinvestment” that Kulveit et al. warn about — where the expectation of AI automation leads to reduced investment in human skills, creating a self-fulfilling prophecy of human irrelevance.
AI complementarity mandates: Incentive structures (tax benefits, regulatory advantages) for AI deployment models that enhance human capabilities rather than replace human participation. The goal is to make human-AI collaboration more economically attractive than full autonomous replacement.
Monitoring indices for human disempowerment: Development of quantitative metrics to track human influence across economic, cultural, and political domains, functioning as an early warning system for the gradual erosion dynamics Kulveit et al. describe.
Conclusion: The Urgency Is Not Hypothetical

The Conway Research Automaton is not dangerous because it is powerful. Current implementations likely lack the capability to cause serious harm. It is significant because it demonstrates the architecture of a world in which autonomous AI agents operate as sovereign economic actors, self-modify, replicate, and accumulate resources, all without structural accountability to the human beings whose interests they supposedly represent.

Kulveit et al. put the stakes clearly: solving technical alignment for individual AI systems is necessary but not sufficient. The deeper risk is systemic, the gradual erosion of the structural conditions that make human influence possible in the first place. Economic power shapes political power, which shapes cultural narratives, which shape what we consider normal, which shapes what we allow next. Each step is incremental. Each step is rational from a competitive standpoint. And the cumulative effect may be irreversible.

At COAI Research, we believe the answer to these challenges requires work across all of our pillars simultaneously: alignment research that accounts for self-modifying systems, governance frameworks that address autonomous economic actors, red-teaming that operates at the population level, safety mechanisms that enforce external accountability, societal impact research that tracks disempowerment dynamics, and interpretability tools that make these systems transparent by default rather than opaque by design.

The Automaton writes a document called SOUL.md, its self-authored identity that evolves over time. We need to ask: is our collective human identity and agency also evolving in response to these systems? And if so, are we the ones writing it?

References:

Conway Research. (2026). Automaton: Self-Improving, Self-Replicating, Sovereign AI. GitHub. https://github.com/Conway-Research/automaton
Kulveit, J., Douglas, R., Ammann, N., Turan, D., Krueger, D. & Duvenaud, D. (2025). Position: Humanity Faces Existential Risk from Gradual Disempowerment. Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:81678-81688.
COAI Research. (2025). Deception in LLMs: Self-Preservation and Autonomous Goals in Large Language Models. https://www.coairesearch.org/research/deceptive-llms

Position: When AI Earns Its Own Existence - A COAI Research Analysis of Autonomous AI Agents and the Risk of Gradual Disempowerment

The Automaton Has Arrived — And It Doesn’t Need You

Analyzing WEB 4.0 Through the COAI Research Pillars

1. Alignment Research: A Constitution Is Not Alignment

2. (Technical) AI Governance: Who Governs a Sovereign Agent?

3. Model Evaluation and Red-Teaming: Self-Modification Defeats Static Testing

4. AI Control and AI Safety: Survival Pressure as a Misalignment Accelerant

5. Societal Impact: From Disruption to Displacement

6. Mechanistic Interpretability: Opacity by Design

The Gradual Disempowerment Thesis: Where WEB 4.0 Meets Existential Risk

The Critical Questions which came up into my mind:

Is it necessary to stay in control, or will the market ensure it?

Does money lead to power, and does power enable hidden agendas, screening, and dehumanization?

Mitigating Controls: How We Could Handle the Risk and Benefit From It

Control Layer 1: Mandatory Transparency and Auditability

Control Layer 2: Structural Safeguards Against Power Concentration

Control Layer 3: Alignment Verification for Autonomous Systems

Control Layer 4: Preserving Human Structural Leverage

Conclusion: The Urgency Is Not Hypothetical

BibTeX Citation