COAI — Human Compatible AI

As AI systems become more powerful and integrated into various aspects of our lives, it's crucial to ensure they remain aligned with human goals and values. Our research focuses on areas vital for keeping humanity in charge of AI development in the long run.

About Us

COAI Research is a non-profit research institute dedicated to ensuring AI systems remain aligned with human values and interests as they grow more capable. Founded in Germany, we work at the intersection of AI safety, interpretability, and human-AI interaction.

Our team conducts foundational research in mechanistic interpretability, alignment analysis, and systematic evaluation of AI systems. We focus particularly on identifying and mitigating risks that could emerge from advanced AI capabilities. Through international collaboration with research institutions, we contribute to the global effort of making AI systems demonstrably safe and beneficial.

Our Research

Our work is organized around three core pillars — Detect, Understand, and Control — to ensure AI systems remain aligned with human values throughout their lifecycle.

Detect

Agent Behaviour & Collaboration

Understand

Transparency & Interpretability

Control

AI Control & AI Safety

Detect

We analyze AI agent behavior to identify hazards and ensure reliable human-AI collaboration. Our research focuses on understanding how hybrid teams of humans and AI agents can work together while maintaining human expertise and authority. We study emergent behavior in multi-agent collectives — including coordination, collision, and competition dynamics — and develop methods to detect when agents hide capabilities, manipulate their training process, or misrepresent their intentions. We also examine the broader societal impact of increased AI autonomy, investigating how work processes, education, and societal structures must adapt as cognitive capabilities shift.

Understand

Transparency is the foundation for meaningful human oversight and intervention. We reverse-engineer the internal computations of AI models — their circuits, features, and representations — to make visible how they process information and arrive at decisions. Beyond interpretability, we focus on explainability: making it possible to trace why a system made a specific decision and communicating that reasoning in a way humans can verify. Through causal analysis we trace the chains of reasoning within model behavior, test counterfactual scenarios, and develop targeted interventions that allow researchers to understand and influence what AI systems actually do under the hood.

Control

Anticipating risks is essential for sustained human control over AI systems. We research how to ensure AI systems pursue their intended goals and remain compatible with human values throughout their operation. This includes developing robust methods for maintaining reliable and predictable behavior under distribution shift, adversarial conditions, and novel situations — so that safety guarantees hold not only in controlled settings but also when systems encounter the complexity of real-world deployment.

Supporting Activities

Across all three pillars, our work is supported by a set of cross-cutting activities. We develop governance frameworks and contribute to technical AI governance standards. We build research infrastructure including multi-agent testbeds, human-AI teaming simulations, synthetic data generation pipelines, and curated benchmark datasets. Through red teaming and systematic evaluation, we adversarially test AI systems and develop scalable oversight methods for robustness analysis. We continuously advance our research methodologies, define focused research directions, and invest in enabling the next generation of AI safety researchers.

Vision & Mission

Our Vision

To be one of the EU's leading research institutes for ensuring AI systems remain fundamentally aligned with human values and goals through pioneering safety research, systematic analysis, and risk mitigation.

Our Mission

To advance the understanding and control of AI systems to safeguard human interests, leveraging deep technical analysis and evaluation methods to support the development of beneficial AI that genuinely serves humanity's needs.

Upcoming Events

Jun
5–6
2026

AI Transparency Conference (AITC 2026)

Two-day research conference on transparent and human-compatible AI. Tracks: Detect, Understand, Control. Free admission.

Conference details →