Concentration Probability
Chernoff Bounds
The Chernoff method: the universal technique for deriving exponential tail bounds by optimizing over the moment generating function, yielding the tightest possible exponential concentration.
Why This Matters
The Chernoff method is not a single inequality. It is a technique, and it is the single most important proof technique in concentration inequalities. Every exponential tail bound you have seen (Hoeffding, Bernstein, sub-Gaussian bounds, Bennett) is derived by the same recipe:
- Exponentiate: convert into
- Apply Markov:
- Optimize: choose to minimize the bound
This three-step recipe. The Chernoff method. produces the tightest exponential bound achievable from the moment generating function. If you master this one idea, you can derive Hoeffding, Bernstein, and sub-Gaussian bounds from scratch.
Read this page as the operational sequel to Moment Generating Functions: that page explains what the MGF is, and this page shows how the MGF becomes a tail bound after one Markov step and one optimization step.
Quick Version
- Exponentiate. Replace by . This turns a tail event into a non-negative random variable.
- Apply Markov. Bound the probability by . This converts the tail event into an MGF expression.
- Optimize over . Choose the best tilt parameter. This gives the sharpest exponential certificate available from the MGF.
If the MGF is well controlled, Chernoff gives exponential decay. If the MGF does not exist, the method has nothing to optimize and the certificate disappears.
Mental Model
Think of the Chernoff method as applying Markov's inequality to the "best possible" monotone function of . Markov says for any non-negative increasing . The exponential is the optimal choice because it grows the fastest (among functions whose expectation is finite), penalizing large values of most aggressively. The free parameter lets you tune the exponential to the specific threshold .
Why is this tighter than Chebyshev? Chebyshev uses , which gives polynomial tails (). The exponential gives exponential tails ( or ), which are dramatically better for large .
The MGF Technique Step-by-Step
The Chernoff method is the proof pattern, not a specific bound. Read each step alongside the prerequisite it depends on; the recall boxes name those prerequisites explicitly so the derivation is auditable end-to-end.
Step 1 — Exponentiate. For , monotonicity of gives
The event has not changed; only the algebraic form has. Both sides remain exact equalities, no inequality has been used yet.
Recall: Markov's inequality. For a nonnegative random variable and any , . The next step applies this to , which is nonnegative because the exponential is. See Markov's inequality.
Step 2 — Apply Markov. Markov on the nonnegative random variable :
The tail probability is now an MGF expression. The certificate is no longer distribution-specific; it depends only on .
Step 3 — Use independence to factor. When with the independent,
Each factor is a one-variable MGF that admits a per-summand bound (Hoeffding's lemma for bounded variables, Bernstein's MGF lemma when variance information is available, and so on).
Recall: . Valid for every . The Bernoulli MGF for is , which the inequality bounds by . The same bound, with different per-summand MGFs, drives every Chernoff-style proof on this site.
Step 4 — Optimize over . The bound depends on the free parameter . Taking the infimum gives the Legendre transform of the log-MGF:
The optimum is characterized by , i.e., the exponentially tilted distribution has mean exactly . This is the geometric content of large-deviations theory.
Formal Setup and Notation
Let be a real-valued random variable with moment generating function (MGF) .
Moment Generating Function
The moment generating function of is for those where the expectation is finite. The MGF "generates" moments: . The existence of the MGF in a neighborhood of zero is equivalent to the distribution having exponentially decaying tails.
Log-Moment Generating Function (Cumulant Generating Function)
The log-MGF or cumulant generating function is . It is convex in . The Chernoff bound becomes: . The Legendre transform is the rate function of large deviations theory.
Core Method
The Chernoff Method (General Upper Tail)
Statement
For any random variable whose MGF exists for some , and any :
where is the Fenchel-Legendre dual of the log-MGF.
Intuition
The Chernoff method converts a tail probability into an optimization problem. For each , the bound is valid. Different values of give different bounds, and you pick the best one. The optimal satisfies : the tilted distribution has mean exactly .
Geometrically: is the Legendre transform of , which measures how "surprising" the event is relative to the distribution of . Larger means more surprising, hence less probable.
Proof Sketch
For any :
The first equality is because is strictly increasing. The inequality is Markov applied to the non-negative random variable . Since this holds for all , take the infimum.
Why It Matters
The Chernoff method is a meta-theorem: it transforms the problem of bounding tails into the problem of bounding the MGF, which is often easier.
Different MGF bounds lead to different named inequalities:
- If : sub-Gaussian bound (Hoeffding-type)
- If : Bernstein-type bound
- If : multiplicative Chernoff for Binomials
- If : exponential tail bound
Every exponential concentration inequality is a Chernoff bound with a specific MGF estimate plugged in.
Failure Mode
The Chernoff method requires the MGF to exist in a neighborhood of . For heavy-tailed distributions (Pareto, Cauchy), the MGF is infinite for all , and the Chernoff method gives nothing. For these, you need moment-based methods (Chebyshev for , or higher-moment bounds for ).
Multiplicative Chernoff Bounds
The most important special case: sums of independent Bernoulli (or more generally, -valued) random variables. The "multiplicative" form expresses deviations as a fraction of the mean, which gives tighter bounds than additive Hoeffding when the mean is small.
Multiplicative Chernoff Bound
Statement
Let be independent random variables with . Let and . Then:
Upper tail: For :
Simplified upper tail: For :
Lower tail: For :
Intuition
The multiplicative form measures deviations relative to the mean. If , a deviation of is a 50% overshoot (). The bound depends on : the product of the mean and the squared relative deviation.
Why is this better than Hoeffding for small means? Hoeffding gives . With and small : Hoeffding gives while multiplicative Chernoff gives . When (rare events), Chernoff is dramatically tighter.
Proof Sketch
Step 1: Apply the Chernoff method to with parameter : .
Step 2: Bound the MGF of each : since , using .
Step 3: Multiply: .
Step 4: Total bound: .
Step 5: Optimize: set to get . The simplified form follows from the inequality for .
Why It Matters
The multiplicative Chernoff bound is the standard tool for:
- Randomized algorithms: bounding the probability that a random hash function has too many collisions
- Sampling: how many samples to estimate a proportion within relative error
- Network analysis: bounding node degrees in random graphs
- Binomial concentration: any setting where you sum independent 0/1 indicators and the expected sum may be small
The multiplicative form is especially important when : for rare events, Hoeffding wastes a factor of in the exponent.
Failure Mode
The simplified form is valid only for (at most doubling the mean). For , you must use the exact form or the weaker bound for . Also, the bound requires independence; for dependent indicators, you need Azuma-Hoeffding or other martingale methods.
Bernstein Inequality
Hoeffding uses only the support of each summand. When the variance is much smaller than , Hoeffding is loose. Bernstein's inequality keeps the variance information and produces a tighter bound in exactly this regime.
Bernstein's Inequality
Statement
Let be independent random variables with , almost surely, and . Let . Then for all :
Intuition
The bound interpolates between two regimes via the denominator :
- Small (variance regime): when , the denominator is approximately and the bound reduces to , the Gaussian tail controlled by the total variance. This matches what the CLT predicts for sums whose individual summands are small compared to the aggregate spread.
- Large (Poisson/Hoeffding regime): when , the denominator is dominated by and the bound becomes , an exponential tail linear in . This is the regime where a single bounded summand can contribute most of the deviation.
Bernstein strictly dominates Hoeffding when , which is the common case for rare-event indicators, heavy-imbalance Bernoullis, or any setting where most summands have tiny variance.
Proof Sketch
Apply the Chernoff method to . Control each MGF by the Bernstein condition: for ,
Multiply across and take logs: . Optimize with the substitution to obtain the stated bound.
Why It Matters
Bernstein is the right tool whenever you know both a uniform bound and a variance proxy. It is the backbone of empirical process bounds, matrix Bernstein, and sample-complexity arguments in statistical learning where the effective variance can be much smaller than the worst-case range.
Failure Mode
Requires an almost-sure bound . For sub-exponential tails without an almost-sure bound, use the Bernstein-type inequality for sub-exponential variables (which replaces by the Orlicz parameter). Also, the variance must be finite and correctly estimated; plugging in a loose upper bound on weakens the inequality.
Azuma-Hoeffding (Martingales)
Hoeffding requires independence. Many natural sums are not independent but do form a martingale with bounded differences. Azuma-Hoeffding gives a Chernoff-style bound in exactly this setting.
Azuma-Hoeffding Inequality
Statement
Let be a martingale (or a martingale difference sequence) with almost surely for each . Then for all :
Intuition
Define , so and . Conditional on the past, behaves like a bounded zero-mean variable, so Hoeffding's lemma applied conditionally gives . Tower over to multiply these bounds, then run the Chernoff method on . The exponent is identical to Hoeffding's; only the assumption is weakened from independence to the martingale property.
Proof Sketch
For , use the tower property:
Iterate to get . Apply the Chernoff method and optimize . A symmetric argument for handles the lower tail; the factor of 2 is the union bound.
Why It Matters
Azuma-Hoeffding generalizes Hoeffding beyond independence to any martingale with bounded differences. Typical applications:
- McDiarmid's inequality (bounded differences for functions of independent variables) is a direct corollary via the Doob martingale.
- Stochastic optimization: convergence rates for online algorithms where iterates depend on all past samples.
- Graph and combinatorial concentration: chromatic number, longest increasing subsequence, edge exposure in random graphs.
Failure Mode
Requires an almost-sure bound on each increment. If the differences are only bounded in or in probability, Azuma-Hoeffding does not apply; use Freedman's inequality (a martingale Bernstein) or sub-Gaussian martingale concentration. The bound is loose when the conditional variances are much smaller than ; in that case use Freedman or a Bernstein-type martingale bound.
Why Chernoff Gives the Tightest Exponential Bounds
The Chernoff bound is the best possible exponential bound derivable from the MGF. More precisely:
Claim. For any random variable , among all bounds of the form that hold for all distributions with the same MGF, the Chernoff bound with is optimal.
Why. By the Legendre transform, . Any valid exponential bound based on the MGF must have for the worst-case . Taking the supremum over gives exactly .
Connection to large deviations. For i.i.d. sums , the Chernoff bound gives . The remarkable fact (Cramér's theorem) is that this is asymptotically exact:
So the Chernoff method not only gives an upper bound. It finds the correct exponential rate. This is the starting point of large deviations theory.
Proof Ideas and Templates Used
The Chernoff method is a proof template consisting of three steps:
- Exponentiate: introduce the free parameter
- Apply Markov: convert tail probability to MGF
- Optimize: choose to minimize the bound
This template is universal. The art lies in step 2.5: bounding the MGF. Different MGF bounds produce different named inequalities:
- Hoeffding's lemma for gives Hoeffding's inequality with Gaussian-style decay .
- Bernstein's condition with variance and a uniform bound gives Bernstein's inequality with tail shape .
- A sub-Gaussian MGF bound gives a sub-Gaussian tail .
- The exact Binomial MGF gives the multiplicative Chernoff bound with tail shape .
Canonical Examples
Chernoff for a standard Gaussian
Let . The MGF is . The Chernoff bound gives:
Optimize: gives . So .
This is the standard Gaussian tail bound. The exact Gaussian tail is , so Chernoff is tight up to the polynomial factor .
Estimating a rare event probability
Flip a coin with heads probability a total of times. Let = number of heads, so .
What is ? This is , i.e., .
Multiplicative Chernoff: .
Hoeffding: .
Chernoff gives vs. Hoeffding's . The multiplicative Chernoff is dramatically tighter because it uses the fact that is small relative to .
Common Confusions
Chernoff is a method, not a single inequality
When people say "Chernoff bound," they sometimes mean the general method (optimize over in the MGF bound) and sometimes mean the specific multiplicative bound for sums of independent [0,1] variables. Be aware of which one is meant. Hoeffding's inequality is a Chernoff bound, derived by plugging Hoeffding's lemma into the Chernoff method. The method is more general than any single application.
The Chernoff bound is one-sided
The basic Chernoff method with bounds the upper tail . For the lower tail , use (equivalently, apply the upper-tail bound to ). For two-sided bounds, combine the two one-sided bounds via a union bound, picking up a factor of 2.
MGF must exist for Chernoff to work
The Chernoff method requires for some . This fails for heavy-tailed distributions: the Cauchy distribution has no finite moments at all, and the exponential distribution has for . For exponential random variables, you can still apply Chernoff but only for , limiting the range of values you can bound.
Connection to Large Deviations: Sanov's Theorem Preview
The Chernoff bound for i.i.d. sums is the starting point of large deviations theory. Cramér's theorem says , and this extends to a much more general setting.
Sanov's theorem generalizes this to empirical distributions: if is the empirical distribution of i.i.d. draws from , then the probability that lands in a set of distributions decays as , where is the KL divergence.
This connects Chernoff bounds to information theory: the "rate" at which empirical observations deviate from the truth is measured by KL divergence.
Summary
- The Chernoff method: , optimize over
- This gives the tightest exponential bound from the MGF
- Multiplicative Chernoff for Binomials: for
- Multiplicative form is tighter than Hoeffding when
- Every exponential concentration inequality (Hoeffding, Bernstein, etc.) is a Chernoff bound with a specific MGF estimate
- Bernstein interpolates: variance-driven Gaussian tail for small , linear-in- Hoeffding-style tail when dominates the variance
- Azuma-Hoeffding extends Hoeffding from independent sums to martingales with bounded differences
- The optimal rate equals the large deviations rate function (Cramér's theorem)
- Requires the MGF to exist; fails for heavy-tailed distributions
Exercises
Problem
Let (exponential with rate 1). Use the Chernoff method to derive a tail bound for for .
Problem
A randomized algorithm succeeds with probability on each independent trial. You run it times and take a majority vote. Use the multiplicative Chernoff bound to bound the probability that fewer than 50 trials succeed (i.e., the majority vote fails).
Problem
Derive Hoeffding's inequality from the Chernoff method. Specifically, let be independent with . Starting from the Chernoff bound, use Hoeffding's lemma () to derive:
Related Comparisons
References
Canonical:
- Mitzenmacher & Upfal, Probability and Computing (2017), Chapter 4
- Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapters 2-3 (Chernoff and Bernstein), Chapter 6 (martingale methods)
- Vershynin, High-Dimensional Probability (2018), Chapter 2
Current:
-
Motwani & Raghavan, Randomized Algorithms (1995), Chapter 4
-
Wainwright, High-Dimensional Statistics (2019), Chapter 2 (sub-Gaussian, Bernstein, Azuma)
-
van Handel, Probability in High Dimension (2016), Chapters 1-3
Next Topics
Building on the Chernoff method:
- Sub-Gaussian random variables: the abstract framework that captures all distributions whose MGF satisfies a Gaussian-type bound
- Sub-exponential random variables: the next distributional class, for distributions whose MGF exists only in a limited range
Last reviewed: April 23, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
2- Concentration Inequalitieslayer 1 · tier 1
- Moment Generating Functionslayer 0A · tier 2
Derived topics
5- Hoeffding's Lemmalayer 1 · tier 1
- Bennett's Inequalitylayer 2 · tier 1
- Chi-Squared Concentrationlayer 2 · tier 1
- Sub-Exponential Random Variableslayer 2 · tier 1
- Sub-Gaussian Random Variableslayer 2 · tier 1