Post
What OpenAI's goblin episode reveals about reward models
A small creature-metaphor habit in GPT-5.5 becomes a clean case study in reward models, proxy objectives, behavior transfer, and synthetic-data feedback loops.
A source-backed guide to machine learning theory, statistics, optimization, and deep learning, organized around the prerequisites that actually connect.
Start
TheoremPath is not a flat syllabus. Pick the layer that matches the missing prerequisite, then move through the graph in order.
Take the diagnostic ->Atlas
The important objects are dependencies: probability before concentration, concentration before uniform convergence, optimization before training dynamics.
Method
Topic pages stay public. Sign-in is for saved notes, diagnostics, and review state; the theory itself remains readable without an account.
The site separates a theorem statement, its assumptions, and the page-level explanation so evidence attaches to the claim it actually supports.
Missed items map to prerequisite concepts, not broad topic pages. The next step is a graph repair, not another generic lesson.
Formal wrappers appear only when the Lean theorem matches the governed claim scope and the manifest records the exact proof object.
Labs
Interactive work is for mechanics: gradients moving, random vectors concentrating, matrix maps changing geometry.
Browse all demos ->Recent work
Post
A small creature-metaphor habit in GPT-5.5 becomes a clean case study in reward models, proxy objectives, behavior transfer, and synthetic-data feedback loops.
Lab
Step the forward and reverse processes of a 2D toy diffusion. Watch noise schedules, score estimates, and DDIM vs DDPM samplers side by side.
Lab
Train a 2-layer attention-only transformer in your browser. Watch the induction circuit form during the loss-cliff phase transition; ablate any head to see in-context learning collapse.
The fastest route through hard theory is not more pages. It is a visible dependency path and one honest next step.