Unlock: Forgetting Transformer (FoX)
FoX adds a data-dependent forget gate to softmax attention. The gate down-weights unnormalized attention scores between past and present positions, giving the transformer a learned, recency-biased decay. FoX is FlashAttention-compatible, works without positional embeddings, and improves long-context language modeling and length extrapolation.
173 Prerequisites0 Mastered0 Working145 Gaps
Prerequisite mastery16%
Recommended probe
Chernoff Bounds is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.
Not assessed3 questions
No quiz
Attention Mechanism TheoryResearch
Not assessed11 questions
Not assessed3 questions
Recurrent Neural NetworksAdvanced
Not assessed3 questions
No quiz
Transformer ArchitectureResearch
Not assessed11 questions
Sign in to track your mastery and see personalized gap analysis.