Where this topic leads

Topics that build on Transformer Architecture

Once you have Transformer Architecture, these are the topics that cite it as a prerequisite. Pick by tier and the area you want to push into next.

Editor's suggested next (33)

Core flagship topics (7)

Attention Is All You Need (Paper)layer 4 · llm-construction
Chain-of-Thought and Reasoninglayer 5 · llm-construction
DeepSeek Modelslayer 5 · model-timeline
Hallucination Theorylayer 4 · llm-construction
Mechanistic Interpretability: Features, Circuits, and Causal Faithfulnesslayer 4 · ai-safety
Tabular Foundation Models as Bayesian Inference Engineslayer 3 · bayesian-ml-frontier
Vision Transformer Lineage: ViT, DeiT, Swin, MAE, DINOv2, SAMlayer 4 · beyond-llms

Standard topics (20)

BERT and the Pretrain-Finetune Paradigmlayer 4 · llm-construction
Claude Model Familylayer 5 · model-timeline
Cohere Modelslayer 4 · model-timeline
Deep Learning for Time Serieslayer 3 · ml-methods
Forgetting Transformer (FoX)layer 4 · llm-construction
Gemini and Google Modelslayer 5 · model-timeline
GPT Series Evolutionlayer 5 · model-timeline
Induction Headslayer 4 · llm-construction
LLaMA and Open Weight Modelslayer 5 · model-timeline
Mistral Modelslayer 4 · model-timeline
Mixture of Expertslayer 4 · llm-construction
Model Comparison Tablelayer 5 · model-timeline
Multi-Token Predictionlayer 5 · llm-construction
Post-Training Overviewlayer 5 · llm-construction
Prompt Engineering and In-Context Learninglayer 5 · llm-construction
Residual Stream and Transformer Internalslayer 4 · llm-construction
RLHF and Alignmentlayer 4 · llm-construction
Speculative Decoding and Quantizationlayer 5 · llm-construction
Structured Output and Constrained Generationlayer 5 · llm-construction
Tool-Augmented Reasoninglayer 5 · llm-construction

Advanced or specialty topics (6)

Attention for Protein Structure: AlphaFold and Successorslayer 4 · applied-ml
Audio Language Modelslayer 5 · beyond-llms
Donut and OCR-Free Document Understandinglayer 5 · llm-construction
Model Merging and Weight Averaginglayer 5 · llm-construction
Plan-then-Generatelayer 5 · llm-construction
Qwen and Chinese Modelslayer 5 · model-timeline