Constitutional AI (CAI) is Anthropic's training method that teaches Claude to be helpful, harmless, and honest using a set of principles (a "constitution").
How It Works
Claude critiques and revises its own outputs against the constitution
Reduces need for massive human labeling of harmful content
Produces more transparent, explainable AI behavior
Key innovation: AI feedback (RLAIF) instead of purely human feedback (RLHF)