Sampling MCMC
Variance Reduction Techniques
Get the same accuracy with fewer samples by exploiting correlation, known quantities, and stratification. Antithetic variates, control variates, stratification, and Rao-Blackwellization.
Prerequisites
Why This Matters
Monte Carlo estimation has a fundamental limitation: the standard error of the mean decreases as . To halve the error, you need four times as many samples. Variance reduction techniques break this bottleneck by using structure in the problem to get more information per sample. In Bayesian inference, reinforcement learning, and simulation, these techniques can reduce computation by orders of magnitude.
Mental Model
Imagine estimating the average height of people in a city by random sampling. Naive: pick people at random. Smarter: sample equal numbers from each neighborhood (stratification). Even smarter: if you know the average income of each neighborhood and income correlates with height, use that information to adjust your estimate (control variates). The idea is always the same: use what you already know to reduce uncertainty in what you do not.
Formal Setup and Notation
We want to estimate where . The naive Monte Carlo estimator is:
with variance where . Variance reduction constructs an alternative estimator with while keeping .
Antithetic Variates
Generate pairs that are negatively correlated but each marginally distributed as . The estimator:
has variance . When , this variance is less than .
For example, if , set . Then is also but negatively correlated with . If is monotone, and are negatively correlated, so the variance drops.
Control Variates
Let be a function whose expectation is known. The control variate estimator is:
for some coefficient . This is unbiased for any because . Its variance is:
Stratified Sampling
Partition the sample space into disjoint strata with . Sample points from each stratum (the conditional distribution ) and combine:
Under proportional allocation , the variance is where . This is always at most , with strict improvement whenever the stratum means differ (the reduction equals , the between-stratum variance). With arbitrary , stratified sampling can be worse than naive Monte Carlo; Neyman-optimal allocation is optimal when the are known.
Rao-Blackwellization
If and you can compute analytically, then replace with :
By the law of total variance, . Conditioning out part of the randomness always reduces variance.
Main Theorems
Optimal Control Variate Coefficient
Statement
The variance of the control variate estimator is minimized by:
The minimum variance is:
where is the correlation between and .
Intuition
The optimal is the regression coefficient of on . The variance reduction factor is : if and are highly correlated (), almost all variance is eliminated. The control variate is like subtracting the "explained" part of .
Proof Sketch
. This is a quadratic in with minimum at . Substitute back to get .
Why It Matters
This tells you exactly how powerful a control variate will be: it depends entirely on the correlation . With , you reduce variance by , equivalent to using more samples. In practice, you estimate from data, which adds small overhead.
Failure Mode
If , the control variate does nothing. Also, estimating from the same samples introduces a small bias in finite samples (the product of two estimated quantities). With large , this bias is negligible.
Rao-Blackwell Theorem (Variance Reduction)
Statement
Let . Then:
with equality if and only if almost surely (i.e., does not actually depend on ).
Intuition
By conditioning on , you analytically average out the randomness in . This removes the component of variance due to , leaving only the variance due to . It is like having infinitely many samples of for each value of .
Proof Sketch
By the law of total variance: . Since , we get .
Why It Matters
Rao-Blackwellization is the most principled variance reduction technique: it is guaranteed to help and never hurts. In Gibbs sampling, if you can compute conditional expectations for some variables analytically, always do so. It is free variance reduction.
Failure Mode
The technique requires being able to compute analytically, which is often intractable. It also requires choosing the partition wisely. If contributes little variance, the reduction is small.
Canonical Examples
Control variate for option pricing
Estimating for a call option. Use (the terminal stock price) as a control variate. Under risk-neutral pricing, is known. Since the payoff is highly correlated with , this dramatically reduces variance.
Antithetic variates for integral estimation
Estimate . Generate and use pairs . Since is increasing, and are negatively correlated. The estimator has lower variance than alone.
Adjacent Techniques
Two techniques sit next to the four above and are worth naming even though they are covered in more detail elsewhere.
Common random numbers (CRN). When estimating a difference (for example, the effect of a policy change or a design parameter), reuse the same underlying random numbers for both simulations. The variance of the difference becomes . When and are similar, the positive covariance cancels most of the variance, and the paired estimator is dramatically more efficient than two independent estimators. This is the Monte Carlo analog of a paired -test and is standard in simulation-based A/B comparisons.
Quasi-Monte Carlo (QMC). Replace i.i.d. samples with a low-discrepancy sequence (Sobol', Halton, Niederreiter nets). For smooth integrands in dimension , the Koksma-Hlawka inequality gives error , which beats Monte Carlo's for moderate . Randomized QMC (scrambled Sobol') gives unbiased estimators with variance that can decrease as for sufficiently smooth integrands. QMC does not slot into a plug-and-play "reduce variance" recipe. It replaces the sampling scheme itself and breaks the i.i.d. assumption underlying ordinary variance formulas. For finance and simulation-heavy ML pipelines (e.g., expectation computations in variational objectives), QMC is often the single biggest gain available.
Common Confusions
Variance reduction does not change the rate
Antithetic, control, stratified, and Rao-Blackwell estimators all reduce the constant in but keep the rate. You still need samples to halve the error. The improvement is in the constant , which can be enormous in practice but does not change the asymptotic rate.
QMC is the one exception: by abandoning i.i.d. sampling it achieves a faster-than- error decay on sufficiently smooth integrands. It is a genuinely different estimation regime, not a variance-reduction add-on to Monte Carlo.
Control variates require known expectations
The control variate must have a known mean . If you have to estimate , it is no longer a control variate. It becomes an importance sampling or regression adjustment problem.
Summary
- Antithetic variates: use negative correlation between sample pairs
- Control variates: subtract a known-mean quantity correlated with the target; optimal coefficient is
- Stratification: partition the space and sample within strata; always helps
- Rao-Blackwellization: condition out part of the randomness analytically; guaranteed to reduce variance by the law of total variance
- Common random numbers: pair simulations that share a random seed when estimating differences of expectations
- Quasi-Monte Carlo: replace i.i.d. samples with a low-discrepancy sequence for faster-than- convergence on smooth integrands
- Classical variance reduction changes the constant, not the rate; QMC changes the rate itself
Exercises
Problem
You want to estimate where . Explain how to use antithetic variates and why it reduces variance.
Problem
Derive the optimal control variate coefficient and show that the variance reduction is where is the correlation between and .
References
Canonical:
- Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 4
- Ross, Simulation (2012), Chapter 9
- Glasserman, Monte Carlo Methods in Financial Engineering (Springer, 2004), Chapters 4 (variance reduction), 5 (QMC) — the standard practitioner reference; all four classical techniques plus CRN and QMC are treated with concrete examples
Current:
- Owen, Monte Carlo Theory, Methods, and Examples (2013), Chapters 8-9 (variance reduction) and 15-17 (QMC) — open-access textbook
- Dick, Kuo, Sloan, "High-dimensional integration: the quasi-Monte Carlo way," Acta Numerica 22 (2013), 133-288 — modern QMC theory
- Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods (SIAM, 1992) — foundational QMC monograph
- Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
- Brooks et al., Handbook of MCMC (2011), Chapters 1-5
Next Topics
The natural next steps from variance reduction:
- Burn-in and convergence diagnostics: knowing when MCMC samples are usable
- Hamiltonian Monte Carlo: a sampler that naturally has low variance
Last reviewed: April 13, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
1- Importance Samplinglayer 2 · tier 1
Derived topics
2- Burn-in and Convergence Diagnosticslayer 2 · tier 2
- Hamiltonian Monte Carlolayer 3 · tier 2
Graph-backed continuations