Reading Paths

Structured sequences through the material. Each path has a clear starting point, ordered topics, and estimated time. These are curated reading spines, not adaptive schedules.

Start Here~8 hours6 topics

ML Theory Core

The classical spine: ERM, uniform convergence, VC dimension, Rademacher complexity. Start here if you want to understand why learning from data works.

1pac learning framework 2empirical risk minimization 3uniform convergence 4vc dimension 5rademacher complexity 6algorithmic stability

Essential~10 hours7 topics

Concentration Inequalities

From Markov to Matrix Bernstein. The inequality toolkit that every generalization bound, random matrix argument, and stability proof depends on.

1concentration inequalities 2chernoff bounds 3subgaussian random variables 4subexponential random variables 5mcdiarmids inequality 6hanson wright inequality 7matrix concentration

Foundation~14 hours8 topics

Master Linear Algebra

Linear maps, matrix operations, norms, eigenspaces, SVD, PCA, Jacobians, and matrix calculus. The algebra spine behind ML theory and neural networks.

Open guided path →

1vectors matrices and linear maps 2matrix operations and properties 3matrix norms 4eigenvalues and eigenvectors 5singular value decomposition 6principal component analysis 7the jacobian matrix 8matrix calculus

Foundation~10 hours10 topics

Basic Neural Network From Scratch

Build a tiny MLP before jumping to transformers: linear layers, activations, losses, gradient descent, backprop, softmax, cross-entropy, and generalization checks.

Open guided path →

1linear layer shapes memory 2perceptron 3activation functions 4loss functions catalog 5gradient descent variants 6feedforward networks and backpropagation 7softmax and numerical stability 8cross entropy loss deep dive 9train test split and data leakage 10regularization in practice

Applied~12 hours7 topics

Build an LLM from Scratch

A two-stage decoder-only path: next-token prediction, causal masking, embeddings, transformer blocks, then KV cache, FlashAttention, and modern inference.

Open guided path →

1token prediction and language modeling 2attention mechanism theory 3transformer architecture 4positional encoding 5kv cache 6flash attention 7inference systems overview

Systems~18 hours6 topics

Deep Learning Systems From Scratch

A shape-and-memory rebuild track: linear layers, manual backprop, attention ledgers, transformer forward passes, KV cache, roofline reasoning, and accelerator constraints.

Open guided path →

1linear layer shapes memory 2feedforward networks and backpropagation 3attention mechanism theory 4transformer architecture 5kv cache 6inference systems overview

Infrastructure~15 hours7 topics

Mathematical Maturity

Measure theory, Radon-Nikodym, convex duality, martingales, information theory. The serious math infrastructure that separates surface-level from real understanding.

1measure theoretic probability 2radon nikodym and conditional expectation 3convex duality 4martingale theory 5information theory foundations 6maximum likelihood estimation 7fisher information

Frontier~10 hours7 topics

Modern Generalization

Where classical theory fails and what replaces it. Implicit bias, double descent, NTK, benign overfitting, scaling laws. The frontier of understanding why deep learning works.

1implicit bias and modern generalization 2double descent 3neural tangent kernel 4mean field theory 5benign overfitting 6random matrix theory overview 7scaling laws

New~12 hours10 topics

Frontier ML (2025-2026)

Post-training, test-time compute, agents, MoE, Mamba, diffusion, context engineering. The topics that dominate current research and systems work.

1post training overview 2test time compute and search 3dpo vs grpo vs rl reasoning 4reward models and verifiers 5agentic rl and tool use 6context engineering 7diffusion models 8mixture of experts 9mamba and state space models 10speculative decoding and quantization