Skip to main content
← Choose a different target

Unlock: Mechanistic Interpretability: Features, Circuits, and Causal Faithfulness

Reverse-engineering trained neural networks. Coverage of the superposition hypothesis, sparse autoencoders for feature extraction, the linear representation hypothesis and its counterexamples, induction heads and IOI as canonical circuits, sparse feature circuits (Marks et al. 2024), cross-layer transcoders (Lindsey et al. 2024), activation patching (noising vs denoising), faithfulness checks, frontier-scale evidence from Anthropic's Scaling Monosemanticity (Templeton et al. 2024) and DeepMind's Gemma Scope (Lieberum et al. 2024), and the limits of current interpretability.

266 Prerequisites0 Mastered0 Working203 Gaps
Prerequisite mastery24%
Recommended probe

Floating-Point Arithmetic is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.

Not assessed3 questions
Not assessed4 questions
Not assessed1 question
Not assessed16 questions
Not assessed1 question
Not assessed3 questions
Not assessed11 questions

Sign in to track your mastery and see personalized gap analysis.