Cramér-Wold Theorem

Sneiderman, Robby

Foundations

Cramér-Wold Theorem

A multivariate distribution is uniquely determined by all of its one-dimensional projections. This reduces multivariate convergence in distribution to checking univariate projections, and is the standard tool for proving multivariate CLT.

AdvancedTier 2StableSupporting~30 min

Prerequisites

Central Limit Theorem Measure Theoretic Probability

Prereq Map

Why This Matters

The central limit theorem in one dimension says $\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2)$ . But in statistics and ML, you almost always work with vectors: the MLE $\hat{\theta} \in \mathbb{R}^d$ , the gradient $\nabla \mathcal{L} \in \mathbb{R}^d$ , the sample covariance matrix entries. The multivariate CLT says $\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \Sigma)$ , but proving convergence in distribution for random vectors is harder than for scalars.

The Cramér-Wold theorem solves this: to prove a random vector converges in distribution, it suffices to prove that every one-dimensional projection converges. This reduces a $d$ -dimensional problem to infinitely many one-dimensional problems, each of which can be handled by the scalar CLT.

The Theorem

Theorem

Cramér-Wold Theorem

Statement

Let $X_n, X$ be random vectors in $\mathbb{R}^d$ . Then:

$X_n \xrightarrow{d} X \quad \text{if and only if} \quad t^\top X_n \xrightarrow{d} t^\top X \quad \text{for all } t \in \mathbb{R}^d$

A multivariate distribution is uniquely determined by the collection of all its one-dimensional marginals (projections onto arbitrary directions).

Intuition

If two distributions agree on every 1D shadow (projection), they must be the same distribution. Conversely, if two sequences of distributions get close in every 1D shadow, they get close in the full $d$ -dimensional space. The projection $t^\top X$ is a scalar random variable, so you can use all the scalar tools (characteristic functions, univariate CLT, moment conditions) to check convergence direction by direction.

Proof Sketch

The characteristic function of $t^\top X$ is $\varphi_{t^\top X}(s) = \mathbb{E}[e^{i s t^\top X}] = \varphi_X(st)$ , the characteristic function of $X$ evaluated at $st$ .

If $t^\top X_n \xrightarrow{d} t^\top X$ for all $t$ , then by Levy's continuity theorem, $\varphi_{t^\top X_n}(1) \to \varphi_{t^\top X}(1)$ for each $t$ . But $\varphi_{t^\top X_n}(1) = \varphi_{X_n}(t)$ , so $\varphi_{X_n}(t) \to \varphi_X(t)$ for all $t \in \mathbb{R}^d$ . By the multivariate Levy continuity theorem, $X_n \xrightarrow{d} X$ .

Why It Matters

The standard proof of the multivariate CLT uses Cramér-Wold: to show $\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \Sigma)$ , fix any $t \in \mathbb{R}^d$ and note that $t^\top \sqrt{n}(\bar{X}_n - \mu) = \sqrt{n}(t^\top \bar{X}_n - t^\top \mu)$ is a sample mean of scalars $t^\top X_i$ with variance $t^\top \Sigma t$ . The scalar CLT gives $t^\top \sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, t^\top \Sigma t)$ . Since this holds for all $t$ , Cramér-Wold gives the full multivariate result.

This same technique proves asymptotic normality of multivariate MLE, multivariate delta method results, and joint convergence of multiple statistics.

Failure Mode

You must check ALL directions $t$ , not just the coordinate directions. Checking only $t = e_1, \ldots, e_d$ (the standard basis) establishes convergence of each coordinate marginally, but marginal convergence does not imply joint convergence. The full collection of projections captures the dependence structure that marginals miss.

report a correction →

Application: Multivariate CLT Proof

The multivariate CLT follows immediately from the scalar CLT plus Cramér-Wold:

Let $X_1, \ldots, X_n \in \mathbb{R}^d$ be i.i.d. with mean $\mu$ and covariance $\Sigma$ .
Fix any $t \in \mathbb{R}^d$ . Define $Y_i = t^\top X_i$ . Then $Y_i$ are i.i.d. scalars with mean $t^\top \mu$ and variance $t^\top \Sigma t$ .
By the scalar CLT: $\sqrt{n}(t^\top \bar{X}_n - t^\top \mu) \xrightarrow{d} \mathcal{N}(0, t^\top \Sigma t)$ .
But $\mathcal{N}(0, t^\top \Sigma t)$ is the distribution of $t^\top Z$ where $Z \sim \mathcal{N}(0, \Sigma)$ .
Since step 3 holds for all $t$ , Cramér-Wold gives $\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \Sigma)$ .

This proof is three lines once you have the scalar CLT and Cramér-Wold. Without Cramér-Wold, you would need to work directly with multivariate characteristic functions, which is messier.

Common Confusions

Watch Out

Marginal convergence is not the same as joint convergence

If $(X_n, Y_n) \xrightarrow{d} (X, Y)$ , then $X_n \xrightarrow{d} X$ and $Y_n \xrightarrow{d} Y$ (marginals converge). But the converse is false: marginal convergence does not imply joint convergence. Cramér-Wold fixes this by checking ALL linear combinations, not just the individual coordinates. The projection $t^\top (X_n, Y_n) = t_1 X_n + t_2 Y_n$ captures the dependence between $X_n$ and $Y_n$ .

Watch Out

Cramér-Wold does require all directions, but the proof handles them uniformly

The theorem requires convergence of $t^\top X_n$ for every $t \in \mathbb{R}^d$ . In a symbolic proof you fix an arbitrary but unspecified $t$ , derive $t^\top X_n \xrightarrow{d} t^\top X$ using parameters that depend symbolically on $t$ (e.g., the variance $t^\top \Sigma t$ in the multivariate CLT proof), and then conclude — because the argument did not use any specific value of $t$ — that the convergence holds for every $t$ . This is what people informally mean by "checking a generic $t$ ": one symbolic argument that works for all directions.

What this is not: it is not "check a randomly chosen direction", it is not "check almost every direction", and it is not "check a finite set of directions". Cramér-Wold genuinely requires every $t$ , and counterexamples exist where checking only the coordinate directions, or only a finite or measure-zero set of directions, is insufficient.

Exercises

ExerciseCore

Problem

Use the Cramér-Wold theorem to show that if $X_n \xrightarrow{d} \mathcal{N}(0, I_d)$ and $A \in \mathbb{R}^{k \times d}$ is a fixed matrix, then $AX_n \xrightarrow{d} \mathcal{N}(0, AA^\top)$ .

ExerciseAdvanced

Problem

Give an example of random vectors $(X_n, Y_n)$ in $\mathbb{R}^2$ such that $X_n \xrightarrow{d} X$ and $Y_n \xrightarrow{d} Y$ but $(X_n, Y_n)$ does not converge in distribution to $(X, Y)$ .

References

Canonical:

Billingsley, Convergence of Probability Measures (2nd ed., 1999), Section 29
van der Vaart, Asymptotic Statistics (1998), Theorem 2.4 (Cramér-Wold device)
Durrett, Probability: Theory and Examples (5th ed., 2019), Theorem 3.9.5

Historical:

Cramér & Wold, "Some Theorems on Distribution Functions" (1936)

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Central Limit Theoremlayer 0B · tier 1
Measure-Theoretic Probabilitylayer 0B · tier 1

Derived topics

2

Asymptotic Statistics: M-Estimators, Delta Method, LANlayer 0B · tier 1
High-Dimensional Probability (Vershynin)layer 2 · tier 1

Graph-backed continuations

Asymptotic Statistics: M-Estimators, Delta Method, LAN High-Dimensional Probability (Vershynin)