Skip to main content

Attention Mechanism Theory

8 selected11 availableDifficulty 4-611 unseenView topic
IntermediateNew
0 answered

Showing 8 of 11 available questions using your saved history. Retakes draw unseen questions first, then review or retry items, then repeated items only when the pool is small.

8 intermediateAdapts to your performance
Question 1 of 8
120sintermediate (4/10)conceptual
In the transformer self-attention mechanism, why are the attention scores divided by the square root of the key dimension before applying softmax?