Mathematical Infrastructure
Time Reversal of SDEs
Anderson 1982: any forward Ito SDE has an explicit time-reversed SDE whose drift involves the score function (gradient of log density). The single result that turns a forward noising process into a generative sampler and underlies every score-based diffusion model.
Why This Matters
Modern score-based diffusion models work by running an SDE backward in time. The forward SDE corrupts data into noise (typically a variance-preserving or variance-exploding noising schedule ending at a Gaussian); the backward SDE generates new samples by reversing this process, starting from Gaussian noise and integrating back to the data distribution. The fact that this is possible at all — and that the backward drift has a closed form involving the score of the forward marginal — is the content of Anderson's time-reversal theorem (1982).
Anderson's theorem is the single mathematical result that licenses every score-based generative model: DDPM, NCSN, DDIM, EDM, score-SDE, flow matching variants, and most controlled-generation methods. It tells you that learning for the forward noising marginal is sufficient to invert the noising and sample from the data distribution. The score-matching loss is what makes this score learnable; the time-reversal theorem is what makes the learned score useful.
Beyond generative modeling, time reversal sits behind detailed-balance arguments for non-reversible Langevin samplers, dual representations in stochastic control, and the path-measure identities that drive Schrödinger-bridge and Föllmer-process methods. It is one of the most consequential results in stochastic calculus for ML.
Mental Model
A forward SDE pushes a particle from time to time . Its marginal density evolves by the Fokker–Planck equation and "spreads out" as time advances. To run the dynamics backward, starting at time distributed according to the spread-out marginal and recover a sample from , you need a different SDE that pushes particles in the opposite direction. Anderson's theorem gives you that SDE explicitly.
The backward SDE has the same diffusion as the forward one. The backward drift is the forward drift, minus a correction involving the score :
The term is exactly what the Fokker–Planck equation needs subtracted to make the marginal flow in reverse. The score field is the only extra information required; everything else (drift, diffusion, time grid) is shared between the forward and backward processes.
Formal Statement
Reverse-Time SDE (Anderson 1982)
Let solve the forward Itô SDE on with marginal density satisfying the Fokker–Planck equation. Then the reverse-time process satisfies the SDE
where is a -adapted Brownian motion. In the common case where does not depend on , the divergence term vanishes and the formula simplifies to
Equivalently, in forward time-coordinates running from to : (with , a backward Brownian motion).
The score field is the gradient of the log-density of the forward marginal at time . It is the only place where information about the original data distribution enters the backward dynamics.
The Theorem
Anderson's Time-Reversal Theorem
Statement
Under the assumptions above, the law of the reverse-time process on matches the law of the SDE with initial distribution . The terminal distribution at time of the backward SDE is exactly , the original initial distribution of the forward SDE.
Intuition
Both the forward and the reverse-time process have the same marginals ; they just traverse them in opposite directions. The Fokker–Planck equation is a continuity equation for these marginals, and there is exactly one drift field that produces the marginal flow in reverse: it is the forward drift minus the "Stein gradient" . The diffusion stays the same because the Fokker–Planck equation is symmetric in the second-order diffusion term but anti-symmetric in the first-order drift term.
Proof Sketch
Write the forward Fokker–Planck equation as a continuity equation with current . Decompose for an effective drift . The reverse-time process must produce the same current with the opposite sign of the time direction, which (after symmetry-of-noise arguments via Girsanov) forces the backward drift to be . Anderson's original 1982 paper does this directly via Itô's lemma on the time-reversed filtration; Haussmann and Pardoux (1986) provided the modern measure-theoretic proof.
Why It Matters
This is the result that made score-based diffusion possible. Before 2020, generative models used either invertible flows (which constrain architectures to be invertible) or GANs (which are unstable and have no likelihood). Score-based diffusion (Song et al., 2021) said: train a network to learn for a known forward noising SDE, then plug the learned score into Anderson's reverse SDE and sample. Architecture constraints vanish — any function approximator works for the score. The training objective is regression, not adversarial. The reverse SDE is exactly Anderson's formula; the only innovation was learning the score parametrically and treating (variance-exploding) or (variance-preserving) as a hyperparameter. The reverse-time process satisfies the explicit SDE above, with drift involving the score of the forward marginal.
Failure Mode
The theorem requires to be smooth and strictly positive everywhere might visit. For data distributions supported on a low-dimensional manifold in (which is the realistic case for images), is not a density on ; it is a measure concentrated on the manifold. The forward noising process smooths it into a strictly positive for , so the score is well-defined on the bulk. But the backward SDE evaluated near encounters a score field that diverges or is poorly defined on the manifold's boundary. This is why diffusion samplers stop the reverse process at small rather than running all the way to , and why singular-perturbation behavior near is the hardest part of diffusion sampler engineering (Karras et al. 2022).
Score-Based Diffusion: The Canonical Use
The forward "variance-preserving" noising SDE is for a schedule . With and large , the marginal is approximately standard Gaussian.
By Anderson, the corresponding backward SDE is
A score network is trained with score-matching. At sampling time, replace with in the backward SDE and integrate numerically (Euler–Maruyama, predictor–corrector, or higher-order solvers). Each integration step pushes a Gaussian sample closer to the data distribution, and at the sample is approximately drawn from .
The probability-flow ODE (Song et al. 2021) is the deterministic dual of the same backward dynamics; see probability-flow-ode for the closed-form connection.
Worked Example: Time-Reversed OU
Take the forward OU SDE on with . The marginal is . The score is
Anderson's reverse SDE (with constant) is integrated from to . Substituting the score and simplifying gives an explicit Gauss–Markov reverse process whose terminal distribution at is exactly . This is the cleanest verification of Anderson's theorem: forward OU and its time reversal are both linear Gaussian processes, and you can check their marginals match analytically.
Common Confusions
The reverse SDE is NOT the forward SDE with negated drift
A naive guess: to reverse an SDE, just flip the sign of . This is wrong — it gives the wrong stationary distribution and the wrong intermediate marginals. The correct backward drift is , with the score correction. The score correction is the nontrivial content of Anderson's theorem; without it, you would need nothing more than a sign flip and the theorem would be trivial.
The backward Brownian motion is a different Brownian motion
The forward and backward SDEs use different Brownian motions: is adapted to the forward filtration, (or ) is adapted to the backward filtration. They are not the same process run in reverse. The two SDEs only agree in distribution, not pathwise. Running the forward SDE and then "playing the tape backwards" does not produce a sample from the backward SDE. This matters for any analysis that tries to couple forward and backward trajectories.
Time reversal works for any forward SDE; the score is the only data-dependent piece
Some papers describe diffusion as if it required a special "noise schedule" to make reversal possible. It does not. Anderson's theorem applies to any forward SDE with smooth positive marginals. The choice of forward SDE (variance-preserving, variance-exploding, sub-VP, EDM-style) only affects how easy the score is to learn and how well-behaved the reverse sampler is, not whether reversal is mathematically valid.
Exercises
Problem
Verify Anderson's formula in the simplest case: standard Brownian motion on with . Write down , compute , write the backward SDE, and confirm that the backward dynamics produce the right marginals.
Problem
Prove the divergence-of-current calculation that underlies Anderson's formula: starting from the forward Fokker–Planck equation with , show that the time-reversed density satisfies a forward Fokker–Planck equation with drift and the same diffusion .
References
Canonical:
- Anderson, Reverse-time diffusion equation models (Stochastic Processes and their Applications 12, 1982). The original paper. Concise, four pages, and still the cleanest derivation.
- Haussmann and Pardoux, Time reversal of diffusions (Annals of Probability 14, 1986). The rigorous measure-theoretic treatment under modern existence-and-uniqueness assumptions.
- Föllmer, Random fields and diffusion processes (Lecture Notes in Mathematics 1362, 1988). Functional-analytic proof using entropy and dual processes; useful for the Schrödinger-bridge connection.
- Pavliotis, Stochastic Processes and Applications (Springer, 2014), Chapter 4.6. Modern textbook treatment of time reversal in the framework of reversible Markov processes.
Current:
- Song, Sohl-Dickstein, Kingma, Kumar, Ermon, and Poole, Score-based generative modeling through stochastic differential equations (ICLR 2021). The paper that brought Anderson's theorem into modern generative modeling; introduces variance-preserving / variance-exploding SDEs and the probability-flow ODE.
- Karras, Aittala, Aila, and Laine, Elucidating the design space of diffusion-based generative models (NeurIPS 2022). The "EDM" paper; analyzes how the choice of forward SDE and the singular behavior of the score near affect sampler quality.
- Kingma, Salimans, Poole, and Ho, Variational diffusion models (NeurIPS 2021). Recasts diffusion in terms of variational lower bounds; gives an alternative entry point to the same time-reversal result.
- De Bortoli, Thornton, Heng, and Doucet, Diffusion Schrödinger bridge with applications to score-based generative modeling (NeurIPS 2021). Connects time-reversal to the Schrödinger-bridge / entropic-OT framework.
Next Topics
- Score Matching: the training objective that learns for the score network in the reverse SDE.
- Diffusion Models: the family of generative models built directly on Anderson's theorem.
- Probability Flow ODE: the deterministic dual of the reverse SDE with the same marginals.
- Fokker–Planck Equation: the PDE machinery behind the proof.
- Stochastic Differential Equations: the parent framework of forward and backward processes.
Last reviewed: April 18, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
2- Fokker–Planck Equationlayer 3 · tier 2
- Stochastic Differential Equationslayer 3 · tier 2
Derived topics
3- Score Matchinglayer 3 · tier 1
- Diffusion Modelslayer 4 · tier 1
- Probability Flow ODElayer 3 · tier 2
Graph-backed continuations