Skip to main content
← Choose a different target

Unlock: Forgetting Transformer (FoX)

FoX adds a data-dependent forget gate to softmax attention. The gate down-weights unnormalized attention scores between past and present positions, giving the transformer a learned, recency-biased decay. FoX is FlashAttention-compatible, works without positional embeddings, and improves long-context language modeling and length extrapolation.

173 Prerequisites0 Mastered0 Working145 Gaps
Prerequisite mastery16%
Recommended probe

Chernoff Bounds is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.

Chernoff BoundsFoundationsWEAKEST
Not assessed3 questions
Not assessed11 questions
Not assessed3 questions
Not assessed3 questions
Not assessed11 questions

Sign in to track your mastery and see personalized gap analysis.