Unlock: Sparse Autoencoders for Interpretability: TopK, JumpReLU, Matryoshka, and Scaling
Sparse autoencoders (SAEs) decompose neural network activations into interpretable feature directions by learning an overcomplete dictionary with a sparsity constraint. Coverage of the superposition hypothesis, the L1 SAE objective and its shrinkage failure mode, TopK SAEs (Gao et al. 2024 / OpenAI), JumpReLU SAEs (Rajamanoharan et al. 2024 / DeepMind), Matryoshka SAEs (Bushnaq / Bussmann 2024), Anthropic's million-feature Claude 3 Sonnet decomposition (Templeton et al. 2024), feature splitting, dead features, evaluation, and steering.
272 Prerequisites0 Mastered0 Working207 Gaps
Prerequisite mastery24%
Recommended probe
Floating-Point Arithmetic is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.
Not assessed3 questions
Graph Neural NetworksAdvanced
Not assessed4 questions
Numerical Linear AlgebraFoundations
Not assessed1 question
No quiz
Lasso RegressionCore
Not assessed6 questions
Not assessed2 questions
AutoencodersCore
Not assessed4 questions
Not assessed1 question
No quiz
Sign in to track your mastery and see personalized gap analysis.