Skip to main content
← Choose a different target

Unlock: Sparse Autoencoders for Interpretability: TopK, JumpReLU, Matryoshka, and Scaling

Sparse autoencoders (SAEs) decompose neural network activations into interpretable feature directions by learning an overcomplete dictionary with a sparsity constraint. Coverage of the superposition hypothesis, the L1 SAE objective and its shrinkage failure mode, TopK SAEs (Gao et al. 2024 / OpenAI), JumpReLU SAEs (Rajamanoharan et al. 2024 / DeepMind), Matryoshka SAEs (Bushnaq / Bussmann 2024), Anthropic's million-feature Claude 3 Sonnet decomposition (Templeton et al. 2024), feature splitting, dead features, evaluation, and steering.

272 Prerequisites0 Mastered0 Working207 Gaps
Prerequisite mastery24%
Recommended probe

Floating-Point Arithmetic is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.

Not assessed3 questions
Not assessed4 questions
Not assessed1 question
Not assessed6 questions
Not assessed4 questions
Not assessed1 question

Sign in to track your mastery and see personalized gap analysis.