Unlock: Inference Systems Overview
The modern LLM inference stack: batching strategies, scheduling, memory management with paged attention, model parallelism for serving, and why FLOPs do not equal latency when memory bandwidth is the bottleneck.
196 Prerequisites0 Mastered0 Working151 Gaps
Prerequisite mastery23%
Recommended probe
Chernoff Bounds is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.
Not assessed3 questions
McDiarmid's InequalityAdvanced
Not assessed13 questions
Not assessed2 questions
Not assessed15 questions
Symmetrization InequalityAdvanced
Not assessed3 questions
VC DimensionCore
Not assessed58 questions
Contraction InequalityAdvanced
Not assessed1 question
Edge and On-Device MLFrontier
No quiz
KV CacheFrontier
No quiz
Model Compression and PruningAdvanced
Not assessed2 questions
No quiz
Docker and Containers for MLResearch
No quiz
Kubernetes for ML WorkloadsResearch
No quiz
MegakernelsFrontier
No quiz
Sign in to track your mastery and see personalized gap analysis.