Reading Paths
Structured sequences through the material. Each path has a clear starting point, ordered topics, and estimated time. These are curated reading spines, not adaptive schedules.
ML Theory Core
The classical spine: ERM, uniform convergence, VC dimension, Rademacher complexity. Start here if you want to understand why learning from data works.
Concentration Inequalities
From Markov to Matrix Bernstein. The inequality toolkit that every generalization bound, random matrix argument, and stability proof depends on.
Master Linear Algebra
Linear maps, matrix operations, norms, eigenspaces, SVD, PCA, Jacobians, and matrix calculus. The algebra spine behind ML theory and neural networks.
Basic Neural Network From Scratch
Build a tiny MLP before jumping to transformers: linear layers, activations, losses, gradient descent, backprop, softmax, cross-entropy, and generalization checks.
Build an LLM from Scratch
A two-stage decoder-only path: next-token prediction, causal masking, embeddings, transformer blocks, then KV cache, FlashAttention, and modern inference.
Deep Learning Systems From Scratch
A shape-and-memory rebuild track: linear layers, manual backprop, attention ledgers, transformer forward passes, KV cache, roofline reasoning, and accelerator constraints.
Mathematical Maturity
Measure theory, Radon-Nikodym, convex duality, martingales, information theory. The serious math infrastructure that separates surface-level from real understanding.
Modern Generalization
Where classical theory fails and what replaces it. Implicit bias, double descent, NTK, benign overfitting, scaling laws. The frontier of understanding why deep learning works.
Frontier ML (2025-2026)
Post-training, test-time compute, agents, MoE, Mamba, diffusion, context engineering. The topics that dominate current research and systems work.