Skip to main content

Foundations

Skewness, Kurtosis, and Higher Moments

Distribution shape beyond mean and variance: skewness measures tail asymmetry, kurtosis measures tail extremeness, cumulants are the cleaner language, and heavy-tailed distributions break all of these.

CoreTier 1StableCore spine~55 min

Why This Matters

Mean and variance tell you the center and spread of a distribution. They say nothing about shape. Two distributions can have identical mean and variance but completely different tail behavior, asymmetry, and outlier frequency. Skewness and kurtosis capture this shape information.

Most textbooks get the interpretation wrong. Kurtosis is not peakedness. It measures how much extreme values dominate the distribution. This page gives the correct interpretations and shows exactly when these statistics fail.

Correct Interpretations

Definition

Skewness

The third standardized moment:

γ1=E ⁣[(Xμσ)3]\gamma_1 = \mathbb{E}\!\left[\left(\frac{X - \mu}{\sigma}\right)^3\right]

Correct interpretation: the third standardized moment. Positive skew means the right tail contributes more cubed-deviation mass and is the heuristic "longer tail" reading; negative skew is the mirror. Zero skewness only means the third standardized moment is zero. It does not imply symmetry of the distribution and it does not imply symmetric tails: there are asymmetric distributions whose positive and negative cubed deviations happen to cancel. Symmetric distributions have zero skewness whenever the third moment exists, but the converse fails.

Watch Out

Skewness is not about which way the distribution leans

People say "right-skewed means the distribution leans right." Wrong. Right-skewed (positive skew) means the RIGHT TAIL is longer. The bulk of the mass is actually on the LEFT. Income distributions are right-skewed: most people earn moderate amounts, but a long right tail extends to very high incomes. The mean is pulled right of the median.

Definition

Kurtosis

The fourth standardized moment:

κ4=E ⁣[(Xμσ)4]\kappa_4 = \mathbb{E}\!\left[\left(\frac{X - \mu}{\sigma}\right)^4\right]

The normal distribution has kurtosis exactly 3.

Correct interpretation: how much extreme values (outliers) dominate the distribution, compared to moderate deviations. High kurtosis means rare but extreme values contribute disproportionately. It is about TAIL WEIGHT, not peak shape.

Why Kurtosis Measures Tails, Not Peaks

The $z^4$ weighting makes extreme values dominate the kurtosis integral

-3σ-2σ-1σ0σ1σ2σ3σtail contributionnear-zero contributionz⁴ weightz-score (standard deviations from mean)
Watch Out

Kurtosis is NOT peakedness

This is one of the most persistent wrong claims in statistics education. Kurtosis does not measure how peaked or flat the distribution is. A distribution can be flat-topped with high kurtosis (if it has heavy tails) or peaked with low kurtosis (if it has light tails). The fourth power z4z^4 amplifies extreme z-scores. Kurtosis measures the contribution of the tails, period.

Definition

Excess Kurtosis

Excess kurtosis=κ43\text{Excess kurtosis} = \kappa_4 - 3

Subtracts the normal distribution's kurtosis as a baseline. The normal has excess kurtosis 0. Positive excess kurtosis means heavier tails than normal. Negative excess kurtosis (possible, minimum is 2-2) means lighter tails than normal.

This is not a different concept. It is just kurtosis recentered so the Gaussian baseline is zero.

Definition

Coefficient of Variation

CV=σμ,often written σμ when μ can be negative.CV = \frac{\sigma}{\mu}, \quad \text{often written } \frac{\sigma}{|\mu|} \text{ when } \mu \text{ can be negative.}

Relative variability compared to the mean. Dimensionless. Useful for comparing spread across different scales (e.g., comparing variability of heights in centimeters vs. inches gives the same CV). The CV is meaningful mainly for positive-scale quantities where zero has a real interpretation; for sign-changing data, the absolute-value form is the standard convention but interpretation is still fragile.

When CV is useful: positive-scale data where zero has meaning (waiting times, concentrations, demand).

When CV is garbage: mean near zero (CV explodes), data crosses zero (interpretation breaks), or variance is infinite.

When Moments Exist and When They Do Not

Higher moments require heavier integrability conditions. They fail in a strict order: the fourth moment fails before the third, the third before the second.

DistributionMean?Variance?Skewness?Kurtosis?
NormalYesYesYesYes
UniformYesYesYesYes
LaplaceYesYesYesYes
Student tt, ν=5\nu = 5YesYesYesYes
Student tt, ν=4\nu = 4YesYesYesNo
Student tt, ν=3\nu = 3YesYesNoNo
Student tt, ν=2\nu = 2YesNoNoNo
Student tt, ν=1\nu = 1 (Cauchy)NoNoNoNo
Pareto, α>4\alpha > 4YesYesYesYes
Pareto, 3<α43 < \alpha \leq 4YesYesYesNo
Pareto, 2<α32 < \alpha \leq 3YesYesNoNo
Pareto, 1<α21 < \alpha \leq 2YesNoNoNo
Pareto, α1\alpha \leq 1NoNoNoNo

The rule for Student tt: the kk-th moment exists only if ν>k\nu > k.

The rule for Pareto: the kk-th raw moment exists only if α>k\alpha > k.

These two rules alone cover most cases you will encounter.

Cumulants: The Better Language

Moments mix information from lower orders into higher orders. Cumulants isolate the genuinely new information at each order.

Definition

Cumulants

Cumulants κ1,κ2,κ3,\kappa_1, \kappa_2, \kappa_3, \ldots are defined through the cumulant generating function (log of the moment generating function):

K(t)=logM(t)=logE[etX]=k=1κktkk!K(t) = \log M(t) = \log \mathbb{E}[e^{tX}] = \sum_{k=1}^{\infty} \kappa_k \frac{t^k}{k!}

The first four cumulants are:

CumulantValueMeaning
κ1\kappa_1Mean μ\muCenter
κ2\kappa_2Variance σ2\sigma^2Spread
κ3\kappa_3Third central momentAsymmetry
κ4\kappa_4Fourth central moment 3σ4- 3\sigma^4Tail departure from Gaussian
Theorem

Gaussian Characterization via Cumulants

Statement

Suppose the moment generating function of XX is finite in a neighborhood of zero, so the cumulant generating function K(t)=logM(t)K(t) = \log M(t) is analytic at 00 and all cumulants κk\kappa_k are well-defined. Under this regularity assumption, XX is Gaussian if and only if all cumulants of order k3k \geq 3 vanish: κk=0\kappa_k = 0 for all k3k \geq 3.

Without the MGF-finiteness assumption, "all cumulants vanish for k3k \geq 3" can be ill-defined (cumulants beyond a finite order may not exist) or insufficient on its own to pin down the distribution.

Intuition

The Gaussian is the only distribution that is "pure location and scale." Every other distribution carries additional shape information in its higher cumulants. If you measure any departure from Gaussianity, it shows up as a nonzero cumulant somewhere.

Why It Matters

This theorem is why cumulants are the natural language for measuring non-Gaussianity. The third cumulant measures asymmetry departure from Gaussian. The fourth cumulant measures tail departure. Each higher cumulant captures a new independent direction of non-Gaussianity. This is the theoretical basis for tests of normality and for independent component analysis (ICA).

Failure Mode

The theorem requires existence of the MGF in a neighborhood of zero, which excludes heavy-tailed distributions. For distributions where the MGF does not exist (e.g., Student tt with low degrees of freedom), cumulants beyond a certain order do not exist, and the characterization cannot be applied.

PropertyMomentsCumulants
Easy to defineYesSlightly less
Easy to interpret at low orderYesYes
Clean under sums of independent variablesNoYes (κk\kappa_k is additive)
Redundant across ordersMoreLess
Better for serious theoryNot reallyYes

The additivity property is the main reason cumulants matter: if XX and YY are independent, then κk(X+Y)=κk(X)+κk(Y)\kappa_k(X + Y) = \kappa_k(X) + \kappa_k(Y) for all kk. Moments do not have this property beyond k=1k = 1.

Tail Probability: What Practitioners Actually Care About

Rather than memorizing kurtosis values, look at tail probabilities directly.

DistributionP(X>2σ)P(\lvert X \rvert > 2\sigma)P(X>3σ)P(\lvert X \rvert > 3\sigma)Interpretation
Normal~4.6%~0.27%Baseline
Student tt, ν=5\nu = 5~6.5%~1.0%Heavier tails
Laplace~6.7%~1.2%Heavier tails
Uniform0%0%Bounded, no tail events

This table is more useful than raw kurtosis values because it shows what actually happens in practice: how often do extreme events occur?

Exercises

ExerciseCore

Problem

Compute the skewness and kurtosis of the exponential distribution with rate λ=1\lambda = 1. Is it right-skewed or left-skewed?

ExerciseCore

Problem

The Cauchy distribution has no finite mean. What happens if you compute the sample mean of 1000 Cauchy observations and repeat this experiment 100 times? What do you observe about the sample means?

ExerciseAdvanced

Problem

Prove that for independent random variables XX and YY, the cumulant of the sum equals the sum of cumulants: κk(X+Y)=κk(X)+κk(Y)\kappa_k(X + Y) = \kappa_k(X) + \kappa_k(Y).

References

Canonical:

  • Casella & Berger, Statistical Inference (2002), Chapter 2
  • DeCarlo, "On the Meaning and Use of Kurtosis" (Psychological Methods, 1997). The definitive correction to the "peakedness" myth.

Current:

  • Westfall, "Kurtosis as Peakedness, 1905-2014. R.I.P." (The American Statistician, 2014). Comprehensive debunking.
  • Stuart and Ord, Kendall's Advanced Theory of Statistics, Vol. 1: Distribution Theory (1994). Standard moments-and-cumulants reference.
  • McCullagh, Tensor Methods in Statistics (1987). Modern cumulant/tensor-moment treatment with Bell-polynomial moment-cumulant conversion.

Next Topics

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

2

Derived topics

3