Where this topic leads
Topics that build on Transformer Architecture
Once you have Transformer Architecture, these are the topics that cite it as a prerequisite. Pick by tier and the area you want to push into next.
Editor's suggested next (33)
- Mechanistic Interpretability: Features, Circuits, and Causal Faithfulness
- Hallucination Theory
- RLHF and Alignment
- Attention for Protein Structure: AlphaFold and Successors
- Attention Is All You Need (Paper)
- Audio Language Models
- BERT and the Pretrain-Finetune Paradigm
- Chain-of-Thought and Reasoning
- Claude Model Family
- Cohere Models
- Deep Learning for Time Series
- DeepSeek Models
- Donut and OCR-Free Document Understanding
- Forgetting Transformer (FoX)
- Gemini and Google Models
- GPT Series Evolution
- Induction Heads
- LLaMA and Open Weight Models
- Mistral Models
- Mixture of Experts
- Model Comparison Table
- Model Merging and Weight Averaging
- Multi-Token Prediction
- Plan-then-Generate
- Post-Training Overview
- Prompt Engineering and In-Context Learning
- Qwen and Chinese Models
- Residual Stream and Transformer Internals
- Speculative Decoding and Quantization
- Structured Output and Constrained Generation
- Tabular Foundation Models as Bayesian Inference Engines
- Tool-Augmented Reasoning
- Vision Transformer Lineage: ViT, DeiT, Swin, MAE, DINOv2, SAM
Core flagship topics (7)
- Attention Is All You Need (Paper)layer 4 · llm-construction
- Chain-of-Thought and Reasoninglayer 5 · llm-construction
- DeepSeek Modelslayer 5 · model-timeline
- Hallucination Theorylayer 4 · llm-construction
- Mechanistic Interpretability: Features, Circuits, and Causal Faithfulnesslayer 4 · ai-safety
- Tabular Foundation Models as Bayesian Inference Engineslayer 3 · bayesian-ml-frontier
- Vision Transformer Lineage: ViT, DeiT, Swin, MAE, DINOv2, SAMlayer 4 · beyond-llms
Standard topics (20)
- BERT and the Pretrain-Finetune Paradigmlayer 4 · llm-construction
- Claude Model Familylayer 5 · model-timeline
- Cohere Modelslayer 4 · model-timeline
- Deep Learning for Time Serieslayer 3 · ml-methods
- Forgetting Transformer (FoX)layer 4 · llm-construction
- Gemini and Google Modelslayer 5 · model-timeline
- GPT Series Evolutionlayer 5 · model-timeline
- Induction Headslayer 4 · llm-construction
- LLaMA and Open Weight Modelslayer 5 · model-timeline
- Mistral Modelslayer 4 · model-timeline
- Mixture of Expertslayer 4 · llm-construction
- Model Comparison Tablelayer 5 · model-timeline
- Multi-Token Predictionlayer 5 · llm-construction
- Post-Training Overviewlayer 5 · llm-construction
- Prompt Engineering and In-Context Learninglayer 5 · llm-construction
- Residual Stream and Transformer Internalslayer 4 · llm-construction
- RLHF and Alignmentlayer 4 · llm-construction
- Speculative Decoding and Quantizationlayer 5 · llm-construction
- Structured Output and Constrained Generationlayer 5 · llm-construction
- Tool-Augmented Reasoninglayer 5 · llm-construction
Advanced or specialty topics (6)
- Attention for Protein Structure: AlphaFold and Successorslayer 4 · applied-ml
- Audio Language Modelslayer 5 · beyond-llms
- Donut and OCR-Free Document Understandinglayer 5 · llm-construction
- Model Merging and Weight Averaginglayer 5 · llm-construction
- Plan-then-Generatelayer 5 · llm-construction
- Qwen and Chinese Modelslayer 5 · model-timeline