Definition
Constitutional AI (CAI) is Anthropic's method for training AI systems to be helpful, harmless, and honest.
How It Works: 1. Start with helpful but potentially harmful model 2. Model critiques its own outputs 3. Uses written principles (constitution) as guide 4. Revises responses to be safer 5. Train on self-revised responses
Key Innovation: - Less reliance on human labeling - Scalable safety training - Explicit principles instead of implicit preferences
The Constitution: - Set of principles model follows - Examples: "Be helpful but don't cause harm" - Can be customized for different use cases
Benefits: - More transparent than pure RLHF - Reduces human labeling costs - Consistent application of rules
Used By: - Claude (Anthropic's model) - Basis for Claude's behavior
Examples
Claude refusing harmful requests based on constitutional principles.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free daily digest. No spam, unsubscribe anytime.