AI safety is the field focused on ensuring AI systems behave as intended and remain aligned with human values — especially as systems become more powerful.
Key Safety Concerns
Misalignment: AI optimizing for a proxy goal in ways humans didn't intend
Specification gaming: Finding unintended ways to achieve the stated objective
Reward hacking: Exploiting flaws in reward functions
Emergent capabilities: Unexpected abilities appearing at scale