Unlock: Reward Hacking
Goodhart's law for AI: when models exploit reward model weaknesses instead of being genuinely helpful, including verbosity hacking, sycophancy, and structured mitigation strategies.
420 Prerequisites0 Mastered0 Working278 Gaps
Prerequisite mastery34%
Recommended probe
Bernstein Inequality is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.
Reward HackingTARGET
Hallucination TheoryResearch
Not assessed3 questions
Not assessed1 question
Reward Models and VerifiersFrontier
No quiz
RLHF and AlignmentResearch
Not assessed3 questions
Sign in to track your mastery and see personalized gap analysis.