A→Z
A2ZAI
Back to Glossary
techniques

Constitutional AI

Anthropic's training approach using principles to guide AI behavior without human labeling.

Share:

Definition

Constitutional AI (CAI) is Anthropic's method for training AI systems to be helpful, harmless, and honest.

How It Works: 1. Start with helpful but potentially harmful model 2. Model critiques its own outputs 3. Uses written principles (constitution) as guide 4. Revises responses to be safer 5. Train on self-revised responses

Key Innovation: - Less reliance on human labeling - Scalable safety training - Explicit principles instead of implicit preferences

The Constitution: - Set of principles model follows - Examples: "Be helpful but don't cause harm" - Can be customized for different use cases

Benefits: - More transparent than pure RLHF - Reduces human labeling costs - Consistent application of rules

Used By: - Claude (Anthropic's model) - Basis for Claude's behavior

Examples

Claude refusing harmful requests based on constitutional principles.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free daily digest. No spam, unsubscribe anytime.

Discussion