HomeTriviaTech & GamesTransformer architecture
concept🎮 Tech & Games

Transformer architecture Trivia Questions

How much do you really know about Transformer architecture? Below are 8 true or false statements. Click each one to reveal the answer and explanation.

1.

The Transformer architecture was introduced in a 2017 paper titled 'Attention Is All You Need.'

Click to reveal answer ›

Easy
✓ TRUE

This is correct. The Transformer debuted in June 2017 by Vaswani et al., revolutionizing NLP by replacing recurrence with attention mechanisms.

2.

The original Transformer had a single attention mechanism, not multi-head attention.

Click to reveal answer ›

Easy
✗ FALSE

False. The original paper introduced multi-head attention (8 heads) to let the model focus on different representation subspaces simultaneously.

3.

The number of attention heads in a Transformer must always be a power of two.

Click to reveal answer ›

Medium
✗ FALSE

Myth. While common (e.g., 8, 16), it's not required. Models like GPT-3 use 96 heads, which isn't a power of two.

4.

Transformers use a fixed positional encoding based on sine and cosine functions of different frequencies.

Click to reveal answer ›

Medium
✓ TRUE

True. The original paper used sinusoidal positional encodings to inject order information, though learned embeddings are now common too.

5.

Transformers process tokens in sequence, one word at a time, like RNNs do.

Click to reveal answer ›

Medium
✗ FALSE

False. Transformers process all tokens in parallel using self-attention, not sequentially. This parallelization is a key advantage over RNNs.

6.

The decoder in a Transformer can attend to all tokens in the encoder output, including future ones.

Click to reveal answer ›

Hard
✗ FALSE

False. The decoder uses masked self-attention to prevent attending to future tokens, ensuring autoregressive generation during training.

7.

Layer normalization in Transformers is applied after the residual connection, not before.

Click to reveal answer ›

Hard
✓ TRUE

True. The original 'post-norm' places layer norm after the residual addition. Many modern models now use 'pre-norm' for stability.

8.

Transformers cannot handle variable-length inputs without padding or truncation.

Click to reveal answer ›

Hard
✗ FALSE

Myth. They inherently handle variable lengths via attention masks; padding is just for batching efficiency, not a fundamental limitation.

More in Tech & Games

MinecraftTrivia Questions →ChessTrivia Questions →TetrisTrivia Questions →Super MarioTrivia Questions →The Legend of ZeldaTrivia Questions →
View all Tech & Games topics →

Want to test yourself in real time?

Swipe right for True, left for False. New questions every day on PopBluff.

Play PopBluff Free →