Home | COAI - Human Compatible AI

About Us

COAI Research is a non-profit research institute dedicated to ensuring AI systems remain aligned with human values and interests as they grow more capable. Founded in Germany, we work at the intersection of AI safety, interpretability, and human-AI interaction.

Our team conducts foundational research in mechanistic interpretability, alignment analysis, and systematic evaluation of AI systems. We focus particularly on identifying and mitigating risks that could emerge from advanced AI capabilities. Through international collaboration with research institutions, we contribute to the global effort of making AI systems demonstrably safe and beneficial.

Our Research

Our work is organized around three core pillars — Detect, Understand, and Control — to ensure AI systems remain aligned with human values throughout their lifecycle.

Detect

Agent Behaviour & Collaboration

Understand

Transparency & Interpretability

Control

AI Control & AI Safety

Detect

We analyze AI agent behavior to identify hazards and ensure reliable human-AI collaboration. Our research focuses on understanding how hybrid teams of humans and AI agents can work together while maintaining human expertise and authority. We study emergent behavior in multi-agent collectives — including coordination, collision, and competition dynamics — and develop methods to detect when agents hide capabilities, manipulate their training process, or misrepresent their intentions. We also examine the broader societal impact of increased AI autonomy, investigating how work processes, education, and societal structures must adapt as cognitive capabilities shift.

Understand

Transparency is the foundation for meaningful human oversight and intervention. We reverse-engineer the internal computations of AI models — their circuits, features, and representations — to make visible how they process information and arrive at decisions. Beyond interpretability, we focus on explainability: making it possible to trace why a system made a specific decision and communicating that reasoning in a way humans can verify. Through causal analysis we trace the chains of reasoning within model behavior, test counterfactual scenarios, and develop targeted interventions that allow researchers to understand and influence what AI systems actually do under the hood.

Control

Anticipating risks is essential for sustained human control over AI systems. We research how to ensure AI systems pursue their intended goals and remain compatible with human values throughout their operation. This includes developing robust methods for maintaining reliable and predictable behavior under distribution shift, adversarial conditions, and novel situations — so that safety guarantees hold not only in controlled settings but also when systems encounter the complexity of real-world deployment.

Supporting Activities

Across all three pillars, our work is supported by a set of cross-cutting activities. We develop governance frameworks and contribute to technical AI governance standards. We build research infrastructure including multi-agent testbeds, human-AI teaming simulations, synthetic data generation pipelines, and curated benchmark datasets. Through red teaming and systematic evaluation, we adversarially test AI systems and develop scalable oversight methods for robustness analysis. We continuously advance our research methodologies, define focused research directions, and invest in enabling the next generation of AI safety researchers.

Vision & Mission

Our Vision

To be one of the EU's leading research institutes for ensuring AI systems remain fundamentally aligned with human values and goals through pioneering safety research, systematic analysis, and risk mitigation.

Our Mission

To advance the understanding and control of AI systems to safeguard human interests, leveraging deep technical analysis and evaluation methods to support the development of beneficial AI that genuinely serves humanity's needs.

Upcoming Events

Jun

5–6

2026

AI Transparency Conference (AITC 2026)

Two-day research conference on transparent and human-compatible AI. Tracks: Detect, Understand, Control. Free admission.

Conference details →

All Events

Latest Notes

February 08, 2026 Sigurd Schacht

COAI — Human Compatible AI

About Us

Our Research

Detect

Understand

Control

Detect

Understand

Control

Supporting Activities

Vision & Mission

Our Vision

Our Mission

Upcoming Events

AI Transparency Conference (AITC 2026)

Latest Notes

The Moltbot Phenomenon: When Hype Outpaces Security in Agentic AI

Democratizing Mechanistic Interpretability: Bringing Neural Network Analysis to Apple Silicon

Beyond Reasoning: The Imperative for Critical Thinking Benchmarks in Large Language Models