Unlock: Inference Systems Overview

The modern LLM inference stack: batching strategies, scheduling, memory management with paged attention, model parallelism for serving, and why FLOPs do not equal latency when memory bandwidth is the bottleneck.

196 Prerequisites0 Mastered0 Working151 Gaps

Prerequisite mastery23%

Recommended probe

Chernoff Bounds is your weakest prerequisite with available questions. You haven't been assessed on this topic yet.

Inference Systems OverviewTARGET

Chernoff BoundsFoundationsWEAKEST

Not assessed3 questions

McDiarmid's InequalityAdvanced

Not assessed13 questions

Sub-Exponential Random VariablesCore

Not assessed2 questions

Sub-Gaussian Random VariablesCore

Not assessed15 questions

Symmetrization InequalityAdvanced

Not assessed3 questions

VC DimensionCore

Not assessed58 questions

Contraction InequalityAdvanced

Not assessed1 question

Edge and On-Device MLFrontier

No quiz

KV CacheFrontier

No quiz

Model Compression and PruningAdvanced

Not assessed2 questions

Speculative Decoding and QuantizationFrontier

No quiz

Docker and Containers for MLResearch

No quiz

Kubernetes for ML WorkloadsResearch

No quiz

MegakernelsFrontier

No quiz