Batch Normalization

5 selectedDifficulty 3-55 unseenView topic

FoundationNew

0 answered

2 foundation3 intermediateAdapts to your performance

Question 1 of 5

120sfoundation (3/10)compare

Why do transformer-style language models usually prefer LayerNorm or RMSNorm over BatchNorm?