AI Alignment

AI alignment is the challenge of ensuring AI systems pursue goals that are actually aligned with human values and intentions — not just technically following instructions in ways that produce unintended consequences. The alignment problem gets more important as AI systems become more capable: a highly capable AI optimizing for the wrong objective can cause significant harm even without any malicious intent. For business applications today, alignment shows up as questions like: does the AI do what we actually want, or what we literally said?

Does it behave consistently across situations we didn't explicitly anticipate?

Related terms

Explore more terms