Skip to main content
Theorem
Path
Curriculum
Paths
Labs
Diagnostic
Case Study
Blog
Search
Sign in
Quiz Hub
/
Batch Normalization
Batch Normalization
5 selected
Difficulty 3-5
5 unseen
View topic
Foundation
New
0 answered
2 foundation
3 intermediate
Adapts to your performance
Question 1 of 5
120s
foundation (3/10)
compare
Why do transformer-style language models usually prefer LayerNorm or RMSNorm over BatchNorm?
Hide and think first
A.
They guarantee that the model will never overfit
B.
They use the test set to estimate better normalization statistics
C.
They remove the need for attention layers entirely
D.
They normalize within a token's hidden features instead of relying on batch statistics
Show Hint
Submit Answer
I don't know