Foundations
Moment Generating Functions
Moment generating functions encode moments, control light-tailed behavior, and power Chernoff bounds, sub-Gaussian estimates, and exponential-family theory.
Prerequisites
Why This Matters
The moment generating function is the main workhorse connecting light-tailed probability distributions to concentration inequalities. When the MGF exists in a neighborhood of zero (sub-Gaussian, sub-exponential, bounded), it powers the Chernoff method, exponential tilting, and the sub-Gaussian / sub-exponential machinery. When it does not exist (Cauchy, heavy-tailed power laws), the characteristic function is the strictly more general tool, and concentration results require different machinery (Chebyshev, truncation, Nemirovski-style norms).
Hide overviewShow overview
![Five-panel infographic: definition of the MGF M_X(t) = E[e^{tX}], why it generates all moments via differentiation at zero, examples for Gaussian, exponential, Poisson distributions, the uniqueness theorem connecting MGF to distribution, and why Chernoff bounds use the MGF (exponential Markov yields exponential tail decay).](/images/topics/moment-generating-functions/moment-generating-functions-overview.png)
The Chernoff method is: apply Markov's inequality to and optimize over . This is an MGF computation.
A random variable is sub-Gaussian with parameter if and only if the MGF of the centered variable satisfies for all . The simpler form holds only when . Within the sub-Gaussian world, the entire concentration story reduces to bounding this centered MGF.
Exponential families are distributions whose density is , relying on properties of the exponential function. The function is the log-MGF of the sufficient statistic.
Want the moving picture first? The Chernoff / MGF Tilt Lab lets you drag the threshold and tilt parameter yourself, so you can see when the MGF gives a real certificate and when heavy tails break the method.
Quick Version
| Object | Plain meaning | Why it matters |
|---|---|---|
| Exponential average of | Encodes moments and tail sensitivity | |
| Log-MGF / cumulant generator | Adds cleanly under independent sums | |
| Chernoff method | Apply Markov to | Turns MGF control into tail bounds |
| Failure mode | MGF infinite for | Heavy tails need different tools |
The MGF is not just "a fancy way to store moments." It is the object that lets light-tailed distributions talk to concentration inequalities.
Visual Intuition
Read the picture this way:
- the blue curve is the log-MGF / cumulant generator
- the amber line is the exponential tilt you chose
- the vertical gap is the rate that powers the upper-tail bound
This is the geometric core of the Chernoff method. If the log-MGF is finite and well behaved, you can tilt and optimize. If the MGF blows up, the whole certificate disappears.
Core Definitions
Moment Generating Function
The moment generating function of a random variable is:
defined for all where this expectation is finite. The MGF may not exist for all ; when it exists in an open interval around 0, it determines the distribution uniquely.
Extracting Moments
The -th moment of is the -th derivative of at zero:
This follows from differentiating under the expectation:
and evaluating at . The interchange of differentiation and expectation is valid when exists in a neighborhood of .
In particular: and .
Main Theorems
MGF Uniqueness Theorem
Statement
If two random variables and have moment generating functions and that are finite and equal for all in some open interval with , then and have the same distribution.
Intuition
The MGF encodes the entire distribution, not just the moments. If two distributions agree on their MGFs in a neighborhood of zero, they must be the same distribution. This is stronger than moment matching: there exist distinct distributions with identical moments of all orders, but they cannot have identical MGFs in a neighborhood of zero.
Proof Sketch
The MGF is related to the characteristic function by analytic continuation. If is finite on , the characteristic function extends analytically to a strip in the complex plane. By the uniqueness theorem for characteristic functions (Levy inversion), the distribution is determined.
Why It Matters
This theorem justifies the "MGF technique" for identifying distributions. If you compute the MGF of a sum of independent Gaussians and recognize it as the MGF of another Gaussian, you can conclude the sum is Gaussian. This approach is cleaner than convolution arguments.
Failure Mode
The MGF must exist in a neighborhood of 0, not just at 0 (where it always equals 1). Heavy-tailed distributions like the Cauchy distribution have no MGF. For those distributions, use the characteristic function instead, which always exists.
MGF of Independent Sum
Statement
If and are independent random variables whose MGFs exist, then:
Intuition
Independence means . Apply this with and .
Proof Sketch
, where the third equality uses independence.
Why It Matters
This is why MGFs are the natural tool for sums of independent variables. Addition of random variables corresponds to multiplication of MGFs. Taking logs: the cumulant generating function is additive for independent sums.
Failure Mode
Fails without independence. For dependent variables, in general.
Canonical Examples
MGF of a Gaussian
Let . Then:
This exists for all . Setting : , which is the defining condition for sub-Gaussian random variables with parameter .
The Chernoff method in one line
For any : . The first step is monotonicity of ; the second is Markov's inequality. Optimize over to get the tightest bound. This is the entire Chernoff method.
MGF Reference Table
The MGF of standard distributions, with domain of finiteness. Here unless restricted.
| Distribution | Domain (finite) | |
|---|---|---|
| Bernoulli | all | |
| Binomial | all | |
| Poisson | all | |
| Geometric, support | ||
| Exponential, rate | ||
| Gamma, rate | ||
| Chi-squared | ||
| Normal | all | |
| Uniform | for , at | all |
Bernoulli, Binomial, Poisson, and Normal have MGFs finite on all of . Geometric, Exponential, Gamma, and Chi-squared are only finite on a left half-line. Heavy-tailed distributions (Cauchy, Pareto with small shape, lognormal) have for all .
Cumulants and Cumulant Generating Function
Cumulant Generating Function
The cumulant generating function is , defined on the open set where is finite. The -th cumulant is .
The low-order cumulants recover familiar summaries:
Skewness is and excess kurtosis is . A Gaussian has for all , so nonzero higher cumulants measure non-Gaussianity.
The defining property: cumulants add under independent sums. If and are independent, then , hence for every . This is cleaner than the moment rule, which requires binomial sums. Cumulant additivity is exploited in independent component analysis (ICA), where sources are recovered by maximizing the absolute fourth cumulant of projections (an explicit non-Gaussianity measure) of the observed mixture.
Multivariate MGF
Multivariate MGF
For a random vector , the multivariate moment generating function is:
on the set where the expectation is finite.
Key facts parallel the scalar case. If is finite in an open neighborhood of the origin in , it determines the joint distribution of uniquely. Mixed moments are recovered by partial derivatives: evaluated at . For a multivariate Gaussian : , valid for all . If all linear combinations are Gaussian, is jointly Gaussian; this criterion is often easiest to check via the scalar MGF of .
When the multivariate MGF fails to exist, the multivariate characteristic function always does and plays the same role.
Connection to Concentration
MGFs power two adjacent machines. The Chernoff bound for is obtained by Markov on and optimizing , so every Chernoff bound is an MGF computation. Sub-Gaussian random variables are defined by the MGF inequality for all , and sub-exponential variables by a similar bound on a strip around 0. Concentration for bounded, Lipschitz, and light-tailed functionals reduces to bounding the CGF near 0.
Common Confusions
Moments existing does not imply MGF exists
A distribution can have all moments finite yet have no MGF. The lognormal distribution has for all but for all . The MGF is a stronger condition than having all moments.
MGF vs characteristic function vs cumulant generating function
The MGF is . The characteristic function is , which always exists. The cumulant generating function (CGF) is . The CGF is additive for independent sums. In concentration inequality proofs, you work with the CGF.
Exercises
Problem
Compute the MGF of a Bernoulli() random variable. Use it to find and .
Problem
Let be i.i.d. . Use MGFs to prove that is distributed as .
References
Canonical:
- Casella & Berger, Statistical Inference (2002), Chapter 2.3
- Billingsley, Probability and Measure (1995), Section 21
Current:
- Wainwright, High-Dimensional Statistics (2019), Chapters 2-3 (MGFs in concentration)
- Vershynin, High-Dimensional Probability (2018), Chapter 2 (sub-Gaussian MGF condition)
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
3- Common Probability Distributionslayer 0A · tier 1
- Expectation, Variance, Covariance, and Momentslayer 0A · tier 1
- Exponential Function Propertieslayer 0A · tier 1
Derived topics
12- Distributions Atlaslayer 0A · tier 1
- Normal Distributionlayer 0A · tier 1
- The Multivariate Normal Distributionlayer 0B · tier 1
- Characteristic Functionslayer 1 · tier 1
- Chernoff Boundslayer 1 · tier 1
+7 more on the derived-topics page.