Skip to main content

Concentration Probability

Measure Concentration and Geometric Functional Analysis

High-dimensional geometry is counterintuitive: Lipschitz functions concentrate, random projections preserve distances, and most of a sphere's measure sits near the equator. Johnson-Lindenstrauss, Gaussian concentration, and Levy's lemma.

AdvancedTier 1StableCore spine~70 min

Why This Matters

High-dimensional spaces behave nothing like low-dimensional ones. In Rd\mathbb{R}^d with d=1000d = 1000, a random vector is almost surely close to d\sqrt{d} in norm. Two independent random unit vectors are almost surely nearly orthogonal. A Lipschitz function of a Gaussian vector is almost surely close to its mean.

Hide overviewShow overview
Five-panel infographic on measure concentration in high dimensions: the Gaussian shell phenomenon (norm concentrates near sqrt(n)), almost-orthogonality of independent draws (cosine of order 1/sqrt(n)), Gaussian concentration of Lipschitz functions, geometric consequences (clustering in narrow shell, regular pairwise distances, Johnson-Lindenstrauss-style projections preserve structure), and why it matters (high-dim probability, random projections, statistical learning, optimization, modern ML).
In high dimensions, randomness does not spread out wildly. It concentrates around typical geometric behavior, which is what makes random projections, JL embeddings, and dimensionality reduction work.

These phenomena are not curiosities. They are the reason dimensionality reduction works, random projections preserve structure, and empirical averages concentrate. The Johnson-Lindenstrauss lemma, which enables practical dimensionality reduction from dd to O(logn/ε2)O(\log n / \varepsilon^2), is a direct consequence of Gaussian concentration.

concentration geometry

High dimension turns randomness into a narrow geometric ledger

The proofs below keep repeating the same move: prove one random quantity is close to its mean, then pay a union-bound price to make the statement uniform over many objects.

Thin shell

The coordinates are random, but the normalized radius becomes predictable in high dimension.

Equator belt

most area is near any equator

A random point on a large sphere has tiny projection onto any fixed direction.

Distance ledger

distances stay within

A random projection concentrates every pairwise distance once k is large enough.

one variable

Gaussian concentration controls one Lipschitz observation of a random vector.

many pairs

Johnson-Lindenstrauss repeats that bound over all point pairs.

sphere

Levy concentration says most surface area is trapped near level sets.

The Visual Spine

The page has three objects to keep straight:

  1. A Gaussian vector has many random coordinates, but its radius is almost deterministic after normalization by d\sqrt d.
  2. A high-dimensional sphere puts most surface area near any equator, so a fixed coordinate or projection is usually small.
  3. A random projection preserves a finite cloud because every pairwise distance concentrates, and the union bound only costs logn\log n.

Use the High-Dimensional Probability Lab alongside this page. The lab shows the thin-shell picture, random matrix spectra, and spiked PCA thresholds; this page gives the theorem statements and proof templates behind those pictures.

Gaussian Concentration

Definition

Lipschitz Function

A function f:RdRf: \mathbb{R}^d \to \mathbb{R} is LL-Lipschitz if and only if

f(x)f(y)Lxy2|f(x)-f(y)| \leq L\|x-y\|_2

for all x,yx,y. The smallest such LL is the Lipschitz constant.

Theorem

Gaussian Concentration of Lipschitz Functions

Statement

Let XN(0,Id)X \sim \mathcal{N}(0, I_d) and f:RdRf: \mathbb{R}^d \to \mathbb{R} be LL-Lipschitz. Then for all t>0t > 0:

P ⁣(f(X)Ef(X)t)2exp ⁣(t22L2)\mathbb{P}\!\left(|f(X)-\mathbb{E}f(X)| \geq t\right) \leq 2\exp\!\left(-\frac{t^2}{2L^2}\right)

The function f(X)f(X) is sub-Gaussian with parameter LL.

Intuition

Each coordinate of XX contributes independently to f(X)f(X), and the Lipschitz condition limits how much each coordinate can influence the output. The dd independent contributions each have bounded effect, and their sum concentrates by the same mechanism as a sum of bounded random variables. The dimension dd does not appear in the bound.

Proof Sketch

Use the Gaussian Poincare inequality:

Var(f(X))Ef(X)2L2.\operatorname{Var}(f(X)) \leq \mathbb{E}\|\nabla f(X)\|^2 \leq L^2.

For the tail bound, apply the Herbst argument: compute the log-MGF

ψ(λ)=logEexp ⁣(λ(f(X)Ef(X))),\psi(\lambda) = \log \mathbb{E}\exp\!\left(\lambda(f(X)-\mathbb{E}f(X))\right),

show ψ(λ)λ2L2/2\psi(\lambda)\leq \lambda^2L^2/2 using the Gaussian log-Sobolev inequality, and then apply the Chernoff bound.

Why It Matters

This single result has enormous consequences. It implies that the norm X\|X\| concentrates around d\sqrt{d}, inner products X,Y\langle X, Y \rangle concentrate around 0, and random projections preserve distances. Most of high-dimensional probability follows from applying this inequality to appropriately chosen Lipschitz functions.

Failure Mode

The result requires Gaussianity (or more generally, a distribution satisfying a log-Sobolev inequality). For heavy-tailed distributions, Lipschitz functions do not concentrate at the Gaussian rate. The bound also requires ff to be globally Lipschitz; local Lipschitz conditions are not sufficient.

Johnson-Lindenstrauss Lemma

Lemma

Johnson-Lindenstrauss Lemma

Statement

For any set of nn points x1,,xnRdx_1,\ldots,x_n\in \mathbb{R}^d and any ε(0,1)\varepsilon \in (0,1), there exists a linear map A:RdRkA: \mathbb{R}^d \to \mathbb{R}^k with

k=O ⁣(lognε2)k = O\!\left(\frac{\log n}{\varepsilon^2}\right)

such that for all pairs i,ji,j,

(1ε)xixj2AxiAxj2(1+ε)xixj2.(1-\varepsilon)\|x_i-x_j\|^2 \leq \|Ax_i-Ax_j\|^2 \leq (1+\varepsilon)\|x_i-x_j\|^2.

The map A=1kΠA = \frac{1}{\sqrt{k}} \Pi where Π\Pi is a k×dk \times d matrix with i.i.d. N(0,1)\mathcal{N}(0,1) entries works with high probability.

Intuition

A random projection of a vector xx has squared norm that concentrates around x2\|x\|^2. This is because

Ax2=1kj=1k(πjx)2,πjxN(0,x2).\|Ax\|^2 = \frac{1}{k}\sum_{j=1}^k (\pi_j^\top x)^2, \qquad \pi_j^\top x \sim \mathcal{N}(0,\|x\|^2).

The sum of kk squared Gaussians concentrates around its mean by chi-squared concentration. Taking a union bound over all (n2)\binom{n}{2} pairs gives the result.

Proof Sketch

Fix a single pair (xi,xj)(x_i,x_j) and let v=xixjv=x_i-x_j. Then

Av2=v2k=1kZ2,ZN(0,1)\|Av\|^2 = \frac{\|v\|^2}{k}\sum_{\ell=1}^k Z_\ell^2, \qquad Z_\ell \sim \mathcal{N}(0,1)

i.i.d. By sub-exponential concentration of chi-squared random variables,

P ⁣(Av2v21>ε)2exp(ckε2).\mathbb{P}\!\left( \left|\frac{\|Av\|^2}{\|v\|^2}-1\right|>\varepsilon \right) \leq 2\exp(-ck\varepsilon^2).

For ε<1\varepsilon<1, choosing k=Clogn/ε2k=C\log n/\varepsilon^2 and applying a union bound over (n2)<n2\binom{n}{2}<n^2 pairs gives failure probability at most 2n2exp(cClogn)2n^2\exp(-cC\log n), which is small for CC large enough.

Why It Matters

This lemma says that nn points in arbitrarily high dimension can be embedded into O(logn/ε2)O(\log n/\varepsilon^2) dimensions while approximately preserving all pairwise distances. The target dimension depends only on nn and ε\varepsilon, not on the original dimension dd. This is the theoretical foundation for random projection methods in dimensionality reduction, approximate nearest neighbor search, and compressed sensing.

Failure Mode

The O(logn/ε2)O(\log n/\varepsilon^2) target dimension is tight: there exist point sets requiring Ω(logn/ε2)\Omega(\log n/\varepsilon^2) dimensions for any linear map. For ε\varepsilon very small, the target dimension can still be large. The lemma also only preserves Euclidean distances; other metrics require different techniques.

Example

From one vector to all pairs

Gaussian concentration usually starts with one random quantity. For JL, the one-pair statement is:

P ⁣(A(xixj)2xixj21>ε)2exp(ckε2).\mathbb{P}\!\left( \left|\frac{\|A(x_i-x_j)\|^2}{\|x_i-x_j\|^2}-1\right|>\varepsilon \right) \leq 2\exp(-ck\varepsilon^2).

There are fewer than n2n^2 pairs. A union bound therefore asks for

n2exp(ckε2)1.n^2\exp(-ck\varepsilon^2) \ll 1.

Solving this inequality gives klogn/ε2k \gtrsim \log n/\varepsilon^2. The dimension reduction theorem is not magic: it is one chi-squared tail bound plus a union bound over pairs.

Concentration on the Sphere

Lemma

Levy's Lemma (Concentration on the Sphere)

Statement

Let XX be uniformly distributed on Sd1S^{d-1} and f:Sd1Rf: S^{d-1} \to \mathbb{R} be LL-Lipschitz. Let MfM_f be the median of f(X)f(X). Then:

P ⁣(f(X)Mft)2exp ⁣((d1)t22L2).\mathbb{P}\!\left(|f(X)-M_f|\geq t\right) \leq 2\exp\!\left(-\frac{(d-1)t^2}{2L^2}\right).

The concentration improves with dimension: higher dimension means tighter concentration.

Intuition

On a high-dimensional sphere, most of the surface area is concentrated near any equator. A Lipschitz function cannot vary much on this concentrated strip. The effective number of "independent directions" on the sphere is d1d-1, which is why the exponent contains d1d-1.

Proof Sketch

The proof uses the spherical isoperimetric inequality: among all sets of a given measure on Sd1S^{d-1}, spherical caps have the smallest boundary. For a cap of measure 1/21/2 (the hemisphere), the tt-enlargement has measure at least

1exp ⁣((d1)t22).1-\exp\!\left(-\frac{(d-1)t^2}{2}\right).

For a Lipschitz function, the set {fMf+t}\{f \leq M_f+t\} contains the t/Lt/L-enlargement of {fMf}\{f \leq M_f\}.

Why It Matters

Levy's lemma makes precise the idea that high-dimensional spheres are "effectively one-dimensional" from the perspective of Lipschitz functions. Any continuous measurement on a high-dimensional sphere will give nearly the same value for almost every point. This is a key ingredient in proofs of the Johnson-Lindenstrauss lemma via spherical projections.

Failure Mode

The bound degrades for non-Lipschitz functions. For discontinuous functions or functions with large Lipschitz constant relative to the sphere's radius, the concentration is useless. The bound is also trivial in low ambient dimension: under the convention XUnif(Sd1)X \sim \operatorname{Unif}(S^{d-1}), the case d=1d=1 gives S0={1,+1}S^0=\{-1,+1\}, two points rather than a circle. The circle is S1S^1, corresponding to d=2d=2. For d=2d=2, the exponent is only

(d1)t22=t22,-\frac{(d-1)t^2}{2}=-\frac{t^2}{2},

so the bound gives the usual one-dimensional Gaussian tail rate, with no dimensional improvement. The exp((d1)t2/2)\exp(-(d-1)t^2/2) concentration becomes useful once dd is large.

Why High-Dimensional Geometry is Counterintuitive

Three specific phenomena that defy low-dimensional intuition:

  1. Norm concentration: if XN(0,Id)X \sim \mathcal{N}(0,I_d), then X/d1\|X\|/\sqrt{d}\to 1 in probability. The norm is approximately deterministic.

  2. Near-orthogonality: if X,YN(0,Id)X,Y \sim \mathcal{N}(0,I_d) independently, then X,Y/(XY)\langle X, Y \rangle / (\|X\|\|Y\|) has standard deviation 1/d1/\sqrt{d}. Random vectors are nearly orthogonal.

  3. Volume in corners: most of the volume of a high-dimensional cube [1,1]d[-1, 1]^d is near the corners (near radius d\sqrt{d}), not near the center. The inscribed ball has negligible volume for large dd.

Common Confusions

Watch Out

JL says nothing about structure preservation beyond distances

The JL lemma preserves pairwise Euclidean distances. It does not preserve cluster structure, manifold geometry, or topological properties. Two point sets with the same pairwise distance matrix are mapped identically by JL. For structure beyond distances, you need methods like t-SNE or UMAP that optimize different objectives.

Watch Out

Gaussian concentration dimension-free does not mean dimension irrelevant

The bound

P ⁣(f(X)Ef(X)t)2et2/(2L2)\mathbb{P}\!\left(|f(X)-\mathbb{E}f(X)|\geq t\right) \leq 2e^{-t^2/(2L^2)}

does not contain dd, but the Lipschitz constant LL and the mean Ef(X)\mathbb{E}f(X) can depend on dd. For example, X\|X\| is 11-Lipschitz with mean approximately d\sqrt{d}. The bound says X\|X\| deviates from d\sqrt{d} by at most O(1)O(1), which is a relative deviation of O(1/d)O(1/\sqrt{d}).

Summary

  • Lipschitz functions of Gaussians are sub-Gaussian with parameter equal to the Lipschitz constant. No dependence on dimension.
  • JL: nn points in Rd\mathbb{R}^d can be projected to O(logn/ε2)O(\log n/\varepsilon^2) dimensions, preserving distances up to (1±ε)(1\pm\varepsilon).
  • Levy's lemma: on Sd1S^{d-1}, concentration improves with dimension
  • High-dimensional norm concentrates: Xd\|X\| \approx \sqrt{d} for XN(0,Id)X \sim \mathcal{N}(0,I_d).
  • Random vectors are nearly orthogonal in high dimensions

Exercises

ExerciseCore

Problem

Let XN(0,Id)X \sim \mathcal{N}(0,I_d). Show that f(X)=Xf(X)=\|X\| is 11-Lipschitz and use Gaussian concentration to bound

P ⁣(XEXt).\mathbb{P}\!\left(\left|\|X\|-\mathbb{E}\|X\|\right|\geq t\right).
ExerciseAdvanced

Problem

Prove the JL lemma for a single pair: if vRdv \in \mathbb{R}^d and A=1kΠA = \frac{1}{\sqrt{k}}\Pi with Π\Pi having i.i.d. N(0,1)\mathcal{N}(0,1) entries, show that

P ⁣(Av2v21>ε)2exp(ckε2)\mathbb{P}\!\left( \left|\frac{\|Av\|^2}{\|v\|^2}-1\right|>\varepsilon \right) \leq 2\exp(-ck\varepsilon^2)

for ε(0,1)\varepsilon \in (0,1).

References

Canonical:

Current:

  • Boucheron, Lugosi, Massart, Concentration Inequalities, Chapter 5

  • Wainwright, High-Dimensional Statistics (2019), Chapter 2

  • van Handel, Probability in High Dimension (2016), Chapters 1-3

Next Topics

Last reviewed: April 21, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

2

Derived topics

3