Stochastic Differential Equations

Sneiderman, Robby

Mathematical Infrastructure

Stochastic Differential Equations

SDEs of the form dX = b dt + sigma dB: strong and weak solutions, existence and uniqueness under Lipschitz conditions, Euler-Maruyama discretization, and the canonical examples that appear throughout ML (Ornstein-Uhlenbeck, geometric Brownian motion, Langevin dynamics).

AdvancedTier 2StableSupporting~55 min

Prerequisites

Brownian Motion Ito Lemma Stochastic Calculus for ML

Quiz (2)Prereq Map

Why This Matters

An SDE is a differential equation driven by Brownian motion:

$dX_t = b(X_t, t)\,dt + \sigma(X_t, t)\,dB_t.$

This is the mathematical object behind Langevin dynamics (the sampler inside SGLD), the forward process of diffusion models, the continuous-time limit of stochastic gradient descent, the state dynamics in continuous-time RL, and the Black-Scholes model in finance. Knowing when an SDE has a unique solution, how to discretize it, and how the solution's density evolves (Fokker-Planck) is the bridge between the stochastic-calculus toolbox and the models that use it.

Mental Model

An SDE is an ODE with noise. The drift $b(X_t, t)$ pulls the trajectory deterministically; the diffusion $\sigma(X_t, t)$ injects random fluctuations proportional to $dB_t$ . The interplay between drift and diffusion determines whether the process has a stationary distribution, how fast it mixes, and whether samples from the SDE can be used for inference.

The integral form is more honest than the differential notation:

$X_t = X_0 + \int_0^t b(X_s, s)\,ds + \int_0^t \sigma(X_s, s)\,dB_s.$

The first integral is an ordinary Riemann integral. The second is an Itô integral, which requires different rules because $B_s$ has infinite variation.

Formal Setup

Definition

Strong Solution

A strong solution of $dX_t = b(X_t,t)\,dt + \sigma(X_t,t)\,dB_t$ is a process $X_t$ that is adapted to the filtration of the given Brownian motion $B_t$ and satisfies the integral equation pathwise. The solution is built on top of the specific noise realization $B_t(\omega)$ .

Definition

Weak Solution

A weak solution is a probability space, a Brownian motion $B_t$ on that space, and a process $X_t$ satisfying the SDE. The Brownian motion is part of the solution, not given in advance. Weak existence is a statement about the law of $X_t$ ; strong existence is a statement about pathwise construction from a given $B_t$ .

Definition

Pathwise Uniqueness

The SDE has pathwise uniqueness if and only if any two strong solutions $X_t$ , $\tilde{X}_t$ driven by the same Brownian motion and starting from the same initial condition satisfy $P(X_t = \tilde{X}_t \text{ for all } t) = 1$ .

Definition

Euler-Maruyama Discretization

Given time step $\Delta t$ and grid $t_k = k\Delta t$ , the Euler-Maruyama scheme approximates the SDE by:

$X_{k+1} = X_k + b(X_k, t_k)\,\Delta t + \sigma(X_k, t_k)\,\sqrt{\Delta t}\,Z_k$

where $Z_k \sim \mathcal{N}(0,1)$ are i.i.d. This is the stochastic analogue of forward Euler.

Main Theorems

Theorem

Existence and Uniqueness for SDEs

Statement

If the drift $b$ and diffusion $\sigma$ satisfy

$|b(x,t) - b(y,t)| + |\sigma(x,t) - \sigma(y,t)| \leq L|x - y|$

for all $x, y, t$ (Lipschitz condition), and

$|b(x,t)|^2 + |\sigma(x,t)|^2 \leq K(1 + |x|^2)$

(linear growth condition), then for any initial condition $X_0$ with $E[|X_0|^2] < \infty$ , the SDE has a unique strong solution on $[0,T]$ for any $T < \infty$ . The solution satisfies $E[\sup_{0 \leq t \leq T} |X_t|^2] < \infty$ .

Intuition

Lipschitz drift and diffusion prevent the solution from branching (uniqueness) or exploding (existence). The proof is a stochastic Picard iteration: define $X_t^{(n+1)}$ as the integral of $b$ and $\sigma$ evaluated at $X_t^{(n)}$ , then show the sequence converges using the Lipschitz bound and Grönwall's inequality. This mirrors the deterministic ODE proof, with $L^2$ norms replacing sup-norms.

Proof Sketch

Define the Picard iterates $X^{(0)}_t = X_0$ and $X^{(n+1)}_t = X_0 + \int_0^t b(X^{(n)}_s, s)\,ds + \int_0^t \sigma(X^{(n)}_s, s)\,dB_s$ . Using Itô isometry and the Lipschitz bound:

$E\!\left[\sup_{s \leq t} |X^{(n+1)}_s - X^{(n)}_s|^2\right] \leq C \int_0^t E\!\left[\sup_{r \leq s}|X^{(n)}_r - X^{(n-1)}_r|^2\right] ds.$

Iterating gives the bound $\leq C^n t^n / n!$ , which is summable. The series $\sum (X^{(n+1)} - X^{(n)})$ converges in $L^2$ , giving the solution. Uniqueness follows from applying the same Grönwall argument to the difference of two solutions.

Why It Matters

This theorem tells you when an SDE is well-posed. Langevin dynamics ( $b = -\nabla V$ , $\sigma = \sqrt{2\beta^{-1}}$ ) has a unique solution whenever the potential $V$ has Lipschitz gradient. The Ornstein-Uhlenbeck process satisfies the conditions with linear drift. Geometric Brownian motion has multiplicative noise $\sigma(x) = \sigma x$ , which is globally Lipschitz with constant $|\sigma|$ and satisfies the linear growth condition, so the theorem applies directly without any localization argument; multiplicative-noise SDEs only require localization when $\sigma$ grows superlinearly (e.g. $\sigma(x) = x^2$ ) or is non-Lipschitz on bounded sets (e.g. $\sigma(x) = \sqrt{|x|}$ for the CIR process).

Failure Mode

When the Lipschitz condition fails, solutions may not be unique. The classic example: $dX_t = |X_t|^{1/2}\,dB_t$ with $X_0 = 0$ . The diffusion coefficient $\sigma(x) = |x|^{1/2}$ is Hölder- $1/2$ but not Lipschitz at $x = 0$ . Pathwise uniqueness fails, though weak uniqueness (uniqueness in law) still holds by the Yamada-Watanabe theorem (see below). When both Lipschitz and linear growth fail, solutions can explode in finite time.

report a correction →

Definition

Yamada-Watanabe Uniqueness Conditions

The Lipschitz hypothesis on $\sigma$ in the existence-and-uniqueness theorem can be weakened. Yamada-Watanabe (1971) proved pathwise uniqueness for one-dimensional SDEs $dX_t = b(X_t)\,dt + \sigma(X_t)\,dB_t$ under

$|\sigma(x) - \sigma(y)| \leq \rho(|x - y|),$

where $\rho: [0, \infty) \to [0, \infty)$ is increasing with $\rho(0) = 0$ and $\int_0^{\varepsilon} \rho(u)^{-2}\,du = \infty$ for every $\varepsilon > 0$ (the Yamada-Watanabe modulus). Equivalent conditions on $b$ are weaker (Lipschitz suffices). The canonical example: $\rho(u) = u^{1/2}$ satisfies the integral condition (since $\int_0^{\varepsilon} u^{-1}\,du = \infty$ ), so $\sigma(x) = |x|^{1/2}$ gives pathwise uniqueness even though it is not Lipschitz at $0$ . This justifies the standard treatment of the CIR process and similar square-root diffusions.

The companion result (Yamada-Watanabe 1971): weak existence + pathwise uniqueness implies strong existence. So once you check Yamada-Watanabe pathwise uniqueness and any weak existence result, you automatically get a unique strong solution.

Theorem

Euler-Maruyama Convergence

Statement

Under the Lipschitz and linear growth conditions, the Euler-Maruyama scheme converges:

Strong convergence (pathwise): $E[\sup_{0 \leq t \leq T} |X_t - X_t^{\Delta t}|^2]^{1/2} = O(\sqrt{\Delta t})$
Weak convergence (distributional): for smooth test functions $g$ , $|E[g(X_T)] - E[g(X_T^{\Delta t})]| = O(\Delta t)$

Strong order is $1/2$ ; weak order is $1$ .

Intuition

Strong convergence is slower than for deterministic ODEs (order $1/2$ vs order $1$ ) because the Brownian noise introduces $O(\sqrt{\Delta t})$ fluctuations that cannot be captured by a single Euler step. Weak convergence is faster (order $1$ ) because distributional errors average out across realizations. If you care about the law of $X_T$ (Monte Carlo estimation), use the weak rate; if you need the actual path (e.g., coupling arguments), use the strong rate.

Proof Sketch

For strong convergence: expand $X_{t_{k+1}} - X_{t_k}$ using the integral form, subtract the Euler step, and bound the remainder using Itô isometry and the Lipschitz condition. The dominant error term is $\int_{t_k}^{t_{k+1}} [b(X_s, s) - b(X_{t_k}, t_k)]\,ds$ , which has magnitude $O(\Delta t^{3/2})$ per step after controlling $|X_s - X_{t_k}| = O(\sqrt{\Delta t})$ . Summing $O(T/\Delta t)$ steps and applying Grönwall's inequality gives the global $O(\sqrt{\Delta t})$ bound.

For weak convergence: use the Itô-Taylor expansion of $g(X_T)$ truncated at the appropriate order. The extra cancellation in the weak case comes from $E[Z_k] = 0$ and $E[Z_k^3] = 0$ .

Why It Matters

Every practical SDE simulation (SGLD, diffusion model sampling, Monte Carlo pricing) uses a discretization scheme. Euler-Maruyama is the simplest. Knowing the convergence orders tells you how many steps you need: halving the pathwise error requires 4x the steps (strong order $1/2$ ), but halving the distributional error requires only 2x the steps (weak order $1$ ). Higher-order schemes (Milstein, stochastic Runge-Kutta) improve the strong rate to $1$ by including the $dB \cdot dB$ term.

Failure Mode

When $\sigma$ depends on $x$ (multiplicative noise), the Euler-Maruyama scheme can produce negative values for processes that should be positive (e.g., geometric Brownian motion, CIR process). Implicit schemes or the Milstein correction handle this better. For stiff SDEs (large Lipschitz constant), explicit Euler requires very small $\Delta t$ for stability.

report a correction →

The Milstein Scheme

For scalar SDEs, the Milstein scheme (Milstein 1975) augments Euler-Maruyama with a correction that captures the Itô-Taylor term involving $(dB_t)^2 = dt$ :

$X_{k+1} = X_k + b(X_k)\,\Delta t + \sigma(X_k)\,\sqrt{\Delta t}\,Z_k + \tfrac{1}{2}\sigma(X_k)\,\sigma'(X_k)\,\Delta t\,(Z_k^2 - 1).$

The extra term $\frac{1}{2}\sigma\sigma'\,\Delta t\,(Z_k^2 - 1)$ is the Itô correction for the second-order term in the stochastic Taylor expansion. When $\sigma$ does not depend on $x$ (additive noise), $\sigma' = 0$ and Milstein reduces to Euler-Maruyama. When $\sigma$ has nontrivial state dependence, Milstein achieves strong order 1 versus Euler-Maruyama's strong order $1/2$ , at the cost of needing $\sigma'$ (a partial derivative). For multidimensional SDEs with non-commutative noise, the Milstein scheme requires Lévy area approximations and becomes substantially more involved.

Canonical Examples

Example

Ornstein-Uhlenbeck Process

$dX_t = -\theta X_t\,dt + \sigma\,dB_t$ with $\theta > 0$ . This is the continuous-time analogue of an AR(1) process. The solution is $X_t = X_0 e^{-\theta t} + \sigma \int_0^t e^{-\theta(t-s)}\,dB_s$ . The stationary distribution is $\mathcal{N}(0, \sigma^2/(2\theta))$ . In ML: this is the SDE behind SGLD (Langevin dynamics for a quadratic potential $V(x) = \theta x^2/2$ ) and the forward process of variance-preserving diffusion models.

Example

Geometric Brownian Motion

$dS_t = \mu S_t\,dt + \sigma S_t\,dB_t$ . Applying Itô's formula to $\log S_t$ gives $S_t = S_0 \exp\!\big((\mu - \sigma^2/2)t + \sigma B_t\big)$ . This is the Black-Scholes stock price model. The Itô correction $-\sigma^2/2$ is the reason the geometric mean return is lower than the arithmetic mean return; it is also why $E[S_t] = S_0 e^{\mu t}$ despite the $-\sigma^2/2$ in the exponent.

Example

CIR Process (Square-Root Diffusion)

$dX_t = \kappa(\theta - X_t)\,dt + \sigma\sqrt{X_t}\,dB_t$ with $2\kappa\theta > \sigma^2$ (Feller condition). This models interest rates and variance processes in finance. The square-root diffusion vanishes at $X_t = 0$ , preventing the process from going negative when the Feller condition holds. The stationary distribution is Gamma.

The Fokker-Planck Connection

If $X_t$ solves $dX_t = b(X_t)\,dt + \sigma(X_t)\,dB_t$ and has a smooth density $p(x,t)$ , that density satisfies the Fokker-Planck equation (forward Kolmogorov equation):

$\frac{\partial p}{\partial t} = -\frac{\partial}{\partial x}[b(x)\,p] + \frac{1}{2}\frac{\partial^2}{\partial x^2}[\sigma^2(x)\,p].$

This PDE governs how the probability mass evolves. In diffusion models, the forward SDE has a known Fokker-Planck equation whose solution converges to a Gaussian; the reverse SDE (score-based generation) runs the same PDE backward in time.

Common Confusions

Watch Out

Ito vs Stratonovich gives different SDEs, not different solutions to the same SDE

The SDE $dX = \sigma(X)\,dB$ in the Itô sense and $dX = \sigma(X) \circ dB$ in the Stratonovich sense are different equations with different solutions. They can be converted: a Stratonovich SDE $dX = \sigma(X) \circ dB$ equals the Itô SDE $dX = \frac{1}{2}\sigma(X)\sigma'(X)\,dt + \sigma(X)\,dB$ . The choice of convention is not a matter of taste when the diffusion coefficient depends on $X$ .

Watch Out

Strong order 1/2 does not mean Euler-Maruyama is useless

The $O(\sqrt{\Delta t})$ strong rate looks slow, but for Monte Carlo estimation (the main use case), only the weak rate matters. With weak order $1$ , Euler-Maruyama converges as fast as forward Euler does for ODEs when you care about expectations. Strong convergence matters for path-dependent functionals and coupling arguments, not for computing $E[g(X_T)]$ .

Watch Out

Not every SDE has a stationary distribution

An SDE has a stationary distribution only if drift and diffusion balance so that probability mass reaches an equilibrium. The Ornstein-Uhlenbeck process has one ( $\theta > 0$ pulls mass back to zero). Geometric Brownian motion does not (it drifts to $+\infty$ or $0$ depending on the sign of $\mu - \sigma^2/2$ ). Checking for stationarity requires verifying that the Fokker-Planck equation has a normalizable steady-state solution.

Exercises

ExerciseCore

Problem

Apply Itô's formula to $f(x) = x^2$ and the Ornstein-Uhlenbeck SDE $dX_t = -\theta X_t\,dt + \sigma\,dB_t$ to derive $dX_t^2$ . Use this to compute $E[X_t^2]$ when $X_0 = 0$ .

ExerciseAdvanced

Problem

The Milstein scheme adds the term $\frac{1}{2}\sigma(X_k)\sigma'(X_k)[(Z_k^2 - 1)\Delta t]$ to the Euler-Maruyama step. Show that for geometric Brownian motion $dS = \mu S\,dt + \sigma S\,dB$ , the Milstein scheme gives the exact solution at grid points (strong order 1).

References

Canonical textbooks:

Oksendal, Stochastic Differential Equations (6th ed., Springer, 2003), Chapters 5-8.
Karatzas & Shreve, Brownian Motion and Stochastic Calculus (2nd ed., Springer, 1991), Chapter 5.
Revuz & Yor, Continuous Martingales and Brownian Motion (3rd ed., Springer, 1999). The standard reference for the modern martingale-theoretic treatment.
Protter, Stochastic Integration and Differential Equations (2nd ed., Springer, 2004). Semimartingale approach with general jump-diffusion theory.
Kloeden & Platen, Numerical Solution of Stochastic Differential Equations (Springer, 1992), Chapters 9-10. The reference for Euler-Maruyama, Milstein, and stochastic Runge-Kutta.

Foundational papers:

Yamada & Watanabe, "On the Uniqueness of Solutions of Stochastic Differential Equations" (J. Math. Kyoto 11, 1971). The Yamada-Watanabe pathwise-uniqueness criterion and the weak-existence-plus-pathwise-uniqueness theorem.
Milstein, "Approximate Integration of Stochastic Differential Equations" (Theory of Probability and Its Applications 19(3), 1975). The Milstein scheme.

Current:

Pavliotis, Stochastic Processes and Applications (Springer, 2014), Chapters 3-4. ML-friendly treatment of OU, Langevin, and Fokker-Planck.
Le Gall, Brownian Motion, Martingales, and Stochastic Calculus (Springer, 2016), Chapter 7.
Da Prato & Zabczyk, Stochastic Equations in Infinite Dimensions (2nd ed., Cambridge, 2014). The reference for SPDEs and infinite-dimensional SDEs (relevant to function-space diffusion models).
Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations" (ICLR 2021; arXiv:2011.13456). SDE framework for diffusion models; the reverse-time SDE and probability flow ODE.

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Ito's Lemmalayer 3 · tier 2
Stochastic Calculus for MLlayer 3 · tier 3

Derived topics

11

Score Matchinglayer 3 · tier 1
PDE Fundamentals for Machine Learninglayer 1 · tier 2
Backward Stochastic Differential Equationslayer 3 · tier 2
Feynman–Kac Formulalayer 3 · tier 2
Fokker–Planck Equationlayer 3 · tier 2

+6 more on the derived-topics page.

Graph-backed continuations

Neural SDEs and the Diffusion Bridge Backward Stochastic Differential Equations Feynman–Kac Formula Fokker–Planck Equation Hamilton–Jacobi–Bellman Equation Langevin Dynamics PDE Fundamentals for Machine Learning Probability Flow ODE Score Matching SGD as a Stochastic Differential Equation Time Reversal of SDEs