Concentration Probability
Sub-Exponential Random Variables
The distributional class between sub-Gaussian and heavy-tailed: heavier tails than Gaussian, the Orlicz norm characterization, Bernstein condition, and the two-regime concentration bound.
Prerequisites
Why This Matters
Sub-Gaussian random variables have tails decaying like and give the cleanest concentration bounds. But many quantities in ML are not sub-Gaussian:
- Chi-squared random variables (sums of squared Gaussians)
- Products of sub-Gaussian random variables
- Exponential random variables (waiting times)
- Squared losses in regression with Gaussian noise
These have tails decaying like , exponentially, but with a linear exponent rather than quadratic. The sub-exponential class captures exactly this level of tail behavior, and the Bernstein inequality for sub-exponential sums provides the right concentration bound: sub-Gaussian for small deviations, exponential for large deviations.
Tail Class Board
Why the exponent shape matters
Move the threshold and watch three tail assumptions separate. The concentration story is not just small probability; it is whether the exponent is quadratic, linear, or only polynomial.
Larger thresholds expose the difference between quadratic, linear, and polynomial decay.
Bigger scale makes the linear tail regime arrive earlier and decay more slowly.
Sub-exponential at this threshold
0.41
Quadratic near zero, then linear after the MGF boundary matters.
Formula shape
The exponent shape is what determines whether sample complexity pays a log, square-root-log, or worse penalty.
ML translation
Sub-Gaussian assumptions power clean Hoeffding-style bounds. Squares, products, and chi-squared terms usually require sub-exponential Bernstein bounds instead.
Mental Model
The tail decay hierarchy is:
| Class | Tail decay | MGF | Example |
|---|---|---|---|
| Sub-Gaussian | Finite for all | Bounded, Gaussian | |
| Sub-exponential | Finite for | , Exponential | |
| Heavy-tailed | Diverges near for any | Cauchy, Pareto |
Sub-exponential is the intermediate class. The MGF exists in a neighborhood of zero (not everywhere), which means the Chernoff method works for small but breaks for large . This produces a concentration bound with two regimes.
Formal Setup and Definitions
Let be a centered random variable ().
Sub-Exponential Random Variable (MGF characterization)
A centered random variable is sub-exponential with parameters if and only if for all :
The parameter plays the role of a "variance proxy" (controlling the sub-Gaussian regime), and controls the radius where the MGF bound holds.
Sub-Exponential Norm (Orlicz ψ₁ norm)
The sub-exponential norm (or -norm) of is:
A random variable is sub-exponential if and only if .
Compare to the sub-Gaussian norm: . The norm uses (linear in ) while uses (quadratic). The linear growth in the exponent is what makes sub-exponential tails heavier than sub-Gaussian.
Bernstein Condition
A centered random variable satisfies the Bernstein condition with parameters if and only if for all integers :
This is equivalent (up to constants) to being sub-exponential with parameters . The condition controls all moments: the -th moment grows at most like , compared to sub-Gaussian where moments grow like .
Equivalent Characterizations
The following are equivalent (up to constants in parameters):
- MGF condition: for
- Tail condition: for all
- Moment condition: for all
- Orlicz norm:
- Bernstein condition: for
Compare to sub-Gaussian characterization 3: for all . Sub-exponential moments grow like ; sub-Gaussian moments grow like . Both forms use . The extra factor of is the precise difference between the two classes.
Main Theorems
Products of Sub-Gaussians are Sub-Exponential
Statement
If and are sub-Gaussian random variables (not necessarily independent), then the product is sub-exponential, with:
Intuition
Multiplying two "Gaussian-like" variables produces something "exponential-like." A concrete example: if , then is sub-Gaussian, but is chi-squared (with 1 degree of freedom) and sub-exponential. The tails get heavier because multiplying two variables that can be large makes the product even larger.
Proof Sketch
Tail bound (qualitative). If , then at least one of or must hold. By union bound:
where each sub-Gaussian tail is evaluated at , yielding . The resulting tail is sub-exponential.
Norm bound via AM-GM. The sharper identity uses the pointwise bound . Combined with the key identity relating the and norms:
(which follows directly from the definitions: means ), we get:
Cauchy-Schwarz on the right side (or applying the same argument to the rescaled variables and ) yields the product form .
Why It Matters
This explains why chi-squared variables, quadratic forms, and many statistics in learning theory are sub-exponential: they involve products or squares of sub-Gaussian quantities. When you see or in a bound, expect sub-exponential concentration, not sub-Gaussian.
Failure Mode
The product of two sub-exponential variables is generally not sub-exponential (it may be even heavier-tailed). Sub-exponential is not closed under multiplication, unlike sub-Gaussian being closed under addition. This limits the composability of sub-exponential bounds.
Bernstein Inequality for Sub-Exponential Sums
Statement
Let be independent centered sub-exponential random variables with parameters . Then for any :
where is a universal constant. For i.i.d. variables with common parameter and sample mean :
Intuition
The bound has two regimes separated by the threshold :
Small deviations (): The term dominates the minimum. The bound is , a sub-Gaussian tail. For moderate deviations, the sub-exponential variable behaves as if it were sub-Gaussian.
Large deviations (): The term dominates. The bound is , an exponential (not Gaussian) tail. For large deviations, the heavier tails kick in and concentration is weaker.
The transition between regimes is smooth. The sub-Gaussian regime gives the familiar rates for moderate confidence levels.
Proof Sketch
Use the Chernoff method. For :
Optimize over :
- If the unconstrained optimum satisfies , we are in the sub-Gaussian regime: bound is .
- If , set (at the boundary): bound is . For large , this gives .
Taking the minimum of the two cases gives the stated bound.
Why It Matters
This is the correct concentration bound for quantities like where each is Gaussian. The sub-Gaussian bound does not apply (these are not sub-Gaussian), but the Bernstein bound does. The two-regime behavior is what you actually observe in practice: moderate deviations look Gaussian, but extreme deviations reveal the heavier tails.
Failure Mode
The bound requires knowing (or bounding) both and . For the sample variance of Gaussian data, and . If these parameters are poorly estimated, the bound may be loose in one or both regimes.
Key Examples of Sub-Exponential Variables
Chi-squared with k degrees of freedom
If independently, then with and .
The centered variable is sub-exponential with parameters and (Wainwright 2019, Example 2.4, using the normalization where each has sub-exponential parameters ). It is not sub-Gaussian because each has MGF which diverges at . A sub-Gaussian MGF would be finite for all .
The Bernstein bound gives: .
Laurent-Massart bound (canonical form). The sharpest explicit constants come from Laurent and Massart (2000), Lemma 1: for all ,
The two-term form on the upper tail exposes both regimes directly: the piece is the sub-Gaussian contribution (dominant when ), the piece is the exponential contribution (dominant when ). This is the standard reference for chi-squared tails in high-dimensional statistics.
Exponential distribution
If , then is centered with for .
This is finite only for , confirming sub-exponential (not sub-Gaussian) behavior. The norm is .
Product of two independent standard normals
Let independently. The product has , and its density (a variance-gamma law with parameter ) has Laplace-like tails:
with universal constants . This is a genuine sub-exponential (linear-exponent) tail, not a tail. The naive union bound in the proof sketch above already delivers this rate, because substituting into the Gaussian tail yields .
The norm bound confirms sub-exponential membership.
Sub-Exponential Closure Properties
-
Sum closure (worst case). For any sub-exponential (not necessarily independent), is sub-exponential with . Unlike sub-Gaussian, the norm adds linearly, not in quadrature. This is tight under arbitrary correlation.
Independent sums (Bernstein regime). When the are independent, the linear bound is pessimistic: the Bernstein inequality above gives a two-regime concentration bound for with a sub-Gaussian small-deviation regime controlled by . This is a concentration statement about the sum, not a norm identity, and it is what actually drives rates in ML theory.
-
Scalar multiplication: .
-
Product of sub-Gaussians: .
-
Not closed under products: The product of two sub-exponential variables may be heavier than sub-exponential.
Connection to Bernstein's Classical Inequality
The classical Bernstein inequality (from the concentration-inequalities page) for bounded random variables:
is a special case of the sub-exponential Bernstein inequality. Bounded random variables are sub-exponential (since they are even sub-Gaussian), and the denominator captures both regimes: sub-Gaussian when and exponential when .
Equivalence of classical and modern forms. The modern min-form and the classical quotient form are equivalent up to universal constants. This follows from the elementary comparison : writing and , both expressions interpolate between the sub-Gaussian regime (small ) and the exponential regime (large ) at the same threshold .
Common Confusions
Sub-exponential is BETWEEN sub-Gaussian and heavy-tailed
Every sub-Gaussian variable is sub-exponential, but not vice versa. Sub-exponential is a weaker condition (allows heavier tails). The hierarchy is: bounded sub-Gaussian sub-exponential finite variance all distributions.
The MGF bound holds only for small lambda
For sub-Gaussian variables, for all . For sub-exponential, the bound holds only for . Beyond this, the MGF may diverge. This is why the Chernoff method produces two regimes: you can only optimize up to the boundary .
ψ₁ norm vs ψ₂ norm
(sub-Gaussian) uses in the definition: the square of in the exponent. (sub-exponential) uses : the absolute value of . The square in is what forces the tails to be Gaussian-like. Every sub-Gaussian variable is sub-exponential because for appropriate .
Summary
- Sub-exponential tails: (linear exponent)
- Sub-Gaussian tails: (quadratic exponent)
- Products of sub-Gaussians are sub-exponential ()
- Chi-squared and squared losses are sub-exponential, not sub-Gaussian
- Bernstein bound has two regimes: sub-Gaussian for small , exponential for large
- Threshold between regimes:
- norm adds linearly for sums; norm adds in quadrature
Exercises
Problem
Show that is sub-exponential but not sub-Gaussian. Compute the MGF and show it is finite only for .
Problem
Let . Show that (centered chi-squared with 1 degree of freedom) is sub-exponential by verifying the tail bound for large .
Problem
Let independently. Using the Bernstein inequality for sub-exponential variables, bound . Identify the two regimes.
Related Comparisons
References
Canonical:
- Vershynin, High-Dimensional Probability (2018), Chapter 2.7-2.8
- Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapter 2
- Laurent and Massart, "Adaptive Estimation of a Quadratic Functional by Model Selection," Annals of Statistics, 28(5):1302-1338 (2000). Canonical chi-squared tail constants (Lemma 1).
Current:
- Wainwright, High-Dimensional Statistics (2019), Chapter 2
- Rigollet and Hutter, High-Dimensional Statistics (MIT lecture notes, 2023)
- van Handel, Probability in High Dimension (2016), Chapters 1-3
- Götze, Sambale, Sinulis, "Concentration inequalities for polynomials in -sub-exponential random variables," Electron. J. Probab. (2021). Extends the framework to sub-Weibull tails and polynomial functionals.
- Catoni, PAC-Bayesian Supervised Classification (IMS Lecture Notes, 2007). The Bernstein condition plays the central role in PAC-Bayes excess-risk bounds.
Related results:
- See Hanson-Wright inequality for quadratic forms in sub-Gaussian vectors, which sharpens the products-of-sub-Gaussians theorem from this page.
Next Topics
Building on sub-exponential theory:
- Matrix concentration: extending sub-exponential bounds to matrix-valued random variables
- Epsilon-nets and covering numbers: combining concentration with geometric discretization
Last reviewed: April 18, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
6- Chernoff Boundslayer 1 · tier 1
- Concentration Inequalitieslayer 1 · tier 1
- Bennett's Inequalitylayer 2 · tier 1
- Bernstein Inequalitylayer 2 · tier 1
- Chi-Squared Concentrationlayer 2 · tier 1
Derived topics
2- Epsilon-Nets and Covering Numberslayer 3 · tier 1
- Matrix Concentrationlayer 3 · tier 1
Graph-backed continuations