COAI — Human-Compatible AI
Shaping how humans and AI live and work together — safely and responsibly.
As AI systems become more powerful and integrated into various aspects of our lives, humans themselves change — in how they reason, lead, and collaborate. COAI conducts research on joint human–AI systems, ensuring they remain aligned with human values, understandable, and governable — while actively shaping how humans and AI co-evolve. Our work focuses on areas vital for keeping humanity in charge of AI development in the long run.
About Us
COAI Research is a non-profit research institute dedicated to ensuring AI systems remain aligned with human values and interests as they grow more capable — and to understanding how humans themselves must adapt in the process. Founded in Germany, we work at the intersection of AI safety, interpretability, and human–AI collaboration.
Our team conducts foundational and applied research in mechanistic interpretability, alignment analysis, and the systematic evaluation of AI systems. We focus particularly on identifying and mitigating risks that could emerge from advanced AI capabilities, while also investigating how work processes, organizations, and leadership evolve as AI takes on greater roles. Through international collaboration with research institutions, we contribute to the global effort of making human–AI systems demonstrably safe, effective, and beneficial.
Our Research
COAI enables human–AI co-evolution through three complementary research pillars. Together, they provide the scientific foundation for ensuring AI systems remain aligned with human values — and for shaping joint human–AI systems that are transparent, governable, and beneficial.
Co-Evolve: Shaping Human–AI Systems
Developing, testing, and refining how humans and AI work and learn together.
Detection, understanding, and control are essential capabilities — but they are means, not the end. COAI develops and evaluates models of human–AI collaboration in real and simulated environments — from hybrid teams and leadership models to organizational processes and societal implications. We build multi-agent testbeds and human–AI teaming simulations to study how collaboration succeeds and fails under realistic conditions. And we investigate the question at the heart of our work: how do humans maintain expertise, judgment, and agency in a world where AI increasingly shapes how we think, decide, and lead?
enabled by
Detect
Making Human–AI Systems Observable
We analyze AI agent behavior to identify hazards and leverage points for reliable human–AI collaboration. Our research focuses on understanding how hybrid teams of humans and AI agents can work together while maintaining human expertise and authority. We study emergent behavior in multi-agent collectives — including coordination, collision, and competition dynamics — and develop methods to detect when agents hide capabilities, manipulate their training process, or misrepresent their intentions. Beyond AI behavior, we also track how human behavior shifts in response to AI capabilities — changes in expertise, decision-making patterns, and authority structures that emerge as humans and AI agents work together.
Understand
Making Joint Behaviour Intelligible
Transparency is the foundation for meaningful human oversight and responsible co-evolution. We reverse-engineer the internal computations of AI models — their circuits, features, and representations — to make visible how they process information and arrive at decisions. Building on this, we focus on explainability: tracing why a system made a specific decision and communicating that reasoning in a way humans can verify. Beyond individual model behavior, we study interaction dynamics: how humans and AI systems develop shared reasoning patterns, how human decision-making adapts in the presence of AI, and where misalignment between human intent and system behavior emerges.
Control
Maintaining Meaningful Human Agency
Anticipating risks is essential for sustained human agency over AI-integrated systems. We research how to ensure AI systems pursue their intended goals and remain compatible with human values throughout their operation. This includes developing robust methods for maintaining reliable and predictable behavior under distribution shift, adversarial conditions, and novel situations — so that safety guarantees hold not only in controlled settings but also when systems encounter the complexity of real-world deployment. Beyond technical robustness, we develop governance mechanisms for joint human–AI decision-making and frameworks for accountability in hybrid teams.
Vision & Mission
Our Vision
To be one of the EU's leading research institutes for human-compatible AI — ensuring AI systems remain fundamentally aligned with human values through pioneering safety research, while actively shaping how humans and AI evolve together.
Our Mission
To advance the understanding and control of AI systems to safeguard human interests, while investigating how humans and AI can collaborate effectively and responsibly. We leverage deep technical analysis, systematic evaluation, and applied research into human–AI co-evolution to support the development of beneficial AI that genuinely serves humanity's needs.
Upcoming Events
AI Transparency Conference (AITC 2026)
Two-day research conference on transparent and human-compatible AI. Tracks: Detect, Understand, Control. Free admission.
Conference details →Latest Notes
How Exposed Is the German Job Market to AI?
An analysis of 44 million workers across 266 occupations, inspired by Karpathy’s US study — adapted for Germany’s unique labor market. A opinionate...
Read more →The Shoggoth in a Prison: A Framework for AI Safety at Scale
As AI models scale toward trillions of parameters, they become increasingly capable yet harder to interpret, raising the risk of subtle misalignmen...
Read more →Position: When AI Earns Its Own Existence - A COAI Research Analysis of Autonomous AI Agents and the Risk of Gradual Disempowerment
The Automaton Has Arrived — And It Doesn’t Need You
Read more →