Skip to main content

Mathematical Infrastructure

Stochastic Differential Equations

SDEs of the form dX = b dt + sigma dB: strong and weak solutions, existence and uniqueness under Lipschitz conditions, Euler-Maruyama discretization, and the canonical examples that appear throughout ML (Ornstein-Uhlenbeck, geometric Brownian motion, Langevin dynamics).

AdvancedTier 2StableSupporting~55 min

Why This Matters

An SDE is a differential equation driven by Brownian motion:

dXt=b(Xt,t)dt+σ(Xt,t)dBt.dX_t = b(X_t, t)\,dt + \sigma(X_t, t)\,dB_t.

This is the mathematical object behind Langevin dynamics (the sampler inside SGLD), the forward process of diffusion models, the continuous-time limit of stochastic gradient descent, the state dynamics in continuous-time RL, and the Black-Scholes model in finance. Knowing when an SDE has a unique solution, how to discretize it, and how the solution's density evolves (Fokker-Planck) is the bridge between the stochastic-calculus toolbox and the models that use it.

Mental Model

An SDE is an ODE with noise. The drift b(Xt,t)b(X_t, t) pulls the trajectory deterministically; the diffusion σ(Xt,t)\sigma(X_t, t) injects random fluctuations proportional to dBtdB_t. The interplay between drift and diffusion determines whether the process has a stationary distribution, how fast it mixes, and whether samples from the SDE can be used for inference.

The integral form is more honest than the differential notation:

Xt=X0+0tb(Xs,s)ds+0tσ(Xs,s)dBs.X_t = X_0 + \int_0^t b(X_s, s)\,ds + \int_0^t \sigma(X_s, s)\,dB_s.

The first integral is an ordinary Riemann integral. The second is an Itô integral, which requires different rules because BsB_s has infinite variation.

Formal Setup

Definition

Strong Solution

A strong solution of dXt=b(Xt,t)dt+σ(Xt,t)dBtdX_t = b(X_t,t)\,dt + \sigma(X_t,t)\,dB_t is a process XtX_t that is adapted to the filtration of the given Brownian motion BtB_t and satisfies the integral equation pathwise. The solution is built on top of the specific noise realization Bt(ω)B_t(\omega).

Definition

Weak Solution

A weak solution is a probability space, a Brownian motion BtB_t on that space, and a process XtX_t satisfying the SDE. The Brownian motion is part of the solution, not given in advance. Weak existence is a statement about the law of XtX_t; strong existence is a statement about pathwise construction from a given BtB_t.

Definition

Pathwise Uniqueness

The SDE has pathwise uniqueness if and only if any two strong solutions XtX_t, X~t\tilde{X}_t driven by the same Brownian motion and starting from the same initial condition satisfy P(Xt=X~t for all t)=1P(X_t = \tilde{X}_t \text{ for all } t) = 1.

Definition

Euler-Maruyama Discretization

Given time step Δt\Delta t and grid tk=kΔtt_k = k\Delta t, the Euler-Maruyama scheme approximates the SDE by:

Xk+1=Xk+b(Xk,tk)Δt+σ(Xk,tk)ΔtZkX_{k+1} = X_k + b(X_k, t_k)\,\Delta t + \sigma(X_k, t_k)\,\sqrt{\Delta t}\,Z_k

where ZkN(0,1)Z_k \sim \mathcal{N}(0,1) are i.i.d. This is the stochastic analogue of forward Euler.

Main Theorems

Theorem

Existence and Uniqueness for SDEs

Statement

If the drift bb and diffusion σ\sigma satisfy

b(x,t)b(y,t)+σ(x,t)σ(y,t)Lxy|b(x,t) - b(y,t)| + |\sigma(x,t) - \sigma(y,t)| \leq L|x - y|

for all x,y,tx, y, t (Lipschitz condition), and

b(x,t)2+σ(x,t)2K(1+x2)|b(x,t)|^2 + |\sigma(x,t)|^2 \leq K(1 + |x|^2)

(linear growth condition), then for any initial condition X0X_0 with E[X02]<E[|X_0|^2] < \infty, the SDE has a unique strong solution on [0,T][0,T] for any T<T < \infty. The solution satisfies E[sup0tTXt2]<E[\sup_{0 \leq t \leq T} |X_t|^2] < \infty.

Intuition

Lipschitz drift and diffusion prevent the solution from branching (uniqueness) or exploding (existence). The proof is a stochastic Picard iteration: define Xt(n+1)X_t^{(n+1)} as the integral of bb and σ\sigma evaluated at Xt(n)X_t^{(n)}, then show the sequence converges using the Lipschitz bound and Grönwall's inequality. This mirrors the deterministic ODE proof, with L2L^2 norms replacing sup-norms.

Proof Sketch

Define the Picard iterates Xt(0)=X0X^{(0)}_t = X_0 and Xt(n+1)=X0+0tb(Xs(n),s)ds+0tσ(Xs(n),s)dBsX^{(n+1)}_t = X_0 + \int_0^t b(X^{(n)}_s, s)\,ds + \int_0^t \sigma(X^{(n)}_s, s)\,dB_s. Using Itô isometry and the Lipschitz bound:

E ⁣[supstXs(n+1)Xs(n)2]C0tE ⁣[suprsXr(n)Xr(n1)2]ds.E\!\left[\sup_{s \leq t} |X^{(n+1)}_s - X^{(n)}_s|^2\right] \leq C \int_0^t E\!\left[\sup_{r \leq s}|X^{(n)}_r - X^{(n-1)}_r|^2\right] ds.

Iterating gives the bound Cntn/n!\leq C^n t^n / n!, which is summable. The series (X(n+1)X(n))\sum (X^{(n+1)} - X^{(n)}) converges in L2L^2, giving the solution. Uniqueness follows from applying the same Grönwall argument to the difference of two solutions.

Why It Matters

This theorem tells you when an SDE is well-posed. Langevin dynamics (b=Vb = -\nabla V, σ=2β1\sigma = \sqrt{2\beta^{-1}}) has a unique solution whenever the potential VV has Lipschitz gradient. The Ornstein-Uhlenbeck process satisfies the conditions with linear drift. Geometric Brownian motion has multiplicative noise σ(x)=σx\sigma(x) = \sigma x, which is globally Lipschitz with constant σ|\sigma| and satisfies the linear growth condition, so the theorem applies directly without any localization argument; multiplicative-noise SDEs only require localization when σ\sigma grows superlinearly (e.g. σ(x)=x2\sigma(x) = x^2) or is non-Lipschitz on bounded sets (e.g. σ(x)=x\sigma(x) = \sqrt{|x|} for the CIR process).

Failure Mode

When the Lipschitz condition fails, solutions may not be unique. The classic example: dXt=Xt1/2dBtdX_t = |X_t|^{1/2}\,dB_t with X0=0X_0 = 0. The diffusion coefficient σ(x)=x1/2\sigma(x) = |x|^{1/2} is Hölder-1/21/2 but not Lipschitz at x=0x = 0. Pathwise uniqueness fails, though weak uniqueness (uniqueness in law) still holds by the Yamada-Watanabe theorem (see below). When both Lipschitz and linear growth fail, solutions can explode in finite time.

Definition

Yamada-Watanabe Uniqueness Conditions

The Lipschitz hypothesis on σ\sigma in the existence-and-uniqueness theorem can be weakened. Yamada-Watanabe (1971) proved pathwise uniqueness for one-dimensional SDEs dXt=b(Xt)dt+σ(Xt)dBtdX_t = b(X_t)\,dt + \sigma(X_t)\,dB_t under

σ(x)σ(y)ρ(xy),|\sigma(x) - \sigma(y)| \leq \rho(|x - y|),

where ρ:[0,)[0,)\rho: [0, \infty) \to [0, \infty) is increasing with ρ(0)=0\rho(0) = 0 and 0ερ(u)2du=\int_0^{\varepsilon} \rho(u)^{-2}\,du = \infty for every ε>0\varepsilon > 0 (the Yamada-Watanabe modulus). Equivalent conditions on bb are weaker (Lipschitz suffices). The canonical example: ρ(u)=u1/2\rho(u) = u^{1/2} satisfies the integral condition (since 0εu1du=\int_0^{\varepsilon} u^{-1}\,du = \infty), so σ(x)=x1/2\sigma(x) = |x|^{1/2} gives pathwise uniqueness even though it is not Lipschitz at 00. This justifies the standard treatment of the CIR process and similar square-root diffusions.

The companion result (Yamada-Watanabe 1971): weak existence + pathwise uniqueness implies strong existence. So once you check Yamada-Watanabe pathwise uniqueness and any weak existence result, you automatically get a unique strong solution.

Theorem

Euler-Maruyama Convergence

Statement

Under the Lipschitz and linear growth conditions, the Euler-Maruyama scheme converges:

  • Strong convergence (pathwise): E[sup0tTXtXtΔt2]1/2=O(Δt)E[\sup_{0 \leq t \leq T} |X_t - X_t^{\Delta t}|^2]^{1/2} = O(\sqrt{\Delta t})
  • Weak convergence (distributional): for smooth test functions gg, E[g(XT)]E[g(XTΔt)]=O(Δt)|E[g(X_T)] - E[g(X_T^{\Delta t})]| = O(\Delta t)

Strong order is 1/21/2; weak order is 11.

Intuition

Strong convergence is slower than for deterministic ODEs (order 1/21/2 vs order 11) because the Brownian noise introduces O(Δt)O(\sqrt{\Delta t}) fluctuations that cannot be captured by a single Euler step. Weak convergence is faster (order 11) because distributional errors average out across realizations. If you care about the law of XTX_T (Monte Carlo estimation), use the weak rate; if you need the actual path (e.g., coupling arguments), use the strong rate.

Proof Sketch

For strong convergence: expand Xtk+1XtkX_{t_{k+1}} - X_{t_k} using the integral form, subtract the Euler step, and bound the remainder using Itô isometry and the Lipschitz condition. The dominant error term is tktk+1[b(Xs,s)b(Xtk,tk)]ds\int_{t_k}^{t_{k+1}} [b(X_s, s) - b(X_{t_k}, t_k)]\,ds, which has magnitude O(Δt3/2)O(\Delta t^{3/2}) per step after controlling XsXtk=O(Δt)|X_s - X_{t_k}| = O(\sqrt{\Delta t}). Summing O(T/Δt)O(T/\Delta t) steps and applying Grönwall's inequality gives the global O(Δt)O(\sqrt{\Delta t}) bound.

For weak convergence: use the Itô-Taylor expansion of g(XT)g(X_T) truncated at the appropriate order. The extra cancellation in the weak case comes from E[Zk]=0E[Z_k] = 0 and E[Zk3]=0E[Z_k^3] = 0.

Why It Matters

Every practical SDE simulation (SGLD, diffusion model sampling, Monte Carlo pricing) uses a discretization scheme. Euler-Maruyama is the simplest. Knowing the convergence orders tells you how many steps you need: halving the pathwise error requires 4x the steps (strong order 1/21/2), but halving the distributional error requires only 2x the steps (weak order 11). Higher-order schemes (Milstein, stochastic Runge-Kutta) improve the strong rate to 11 by including the dBdBdB \cdot dB term.

Failure Mode

When σ\sigma depends on xx (multiplicative noise), the Euler-Maruyama scheme can produce negative values for processes that should be positive (e.g., geometric Brownian motion, CIR process). Implicit schemes or the Milstein correction handle this better. For stiff SDEs (large Lipschitz constant), explicit Euler requires very small Δt\Delta t for stability.

The Milstein Scheme

For scalar SDEs, the Milstein scheme (Milstein 1975) augments Euler-Maruyama with a correction that captures the Itô-Taylor term involving (dBt)2=dt(dB_t)^2 = dt:

Xk+1=Xk+b(Xk)Δt+σ(Xk)ΔtZk+12σ(Xk)σ(Xk)Δt(Zk21).X_{k+1} = X_k + b(X_k)\,\Delta t + \sigma(X_k)\,\sqrt{\Delta t}\,Z_k + \tfrac{1}{2}\sigma(X_k)\,\sigma'(X_k)\,\Delta t\,(Z_k^2 - 1).

The extra term 12σσΔt(Zk21)\frac{1}{2}\sigma\sigma'\,\Delta t\,(Z_k^2 - 1) is the Itô correction for the second-order term in the stochastic Taylor expansion. When σ\sigma does not depend on xx (additive noise), σ=0\sigma' = 0 and Milstein reduces to Euler-Maruyama. When σ\sigma has nontrivial state dependence, Milstein achieves strong order 1 versus Euler-Maruyama's strong order 1/21/2, at the cost of needing σ\sigma' (a partial derivative). For multidimensional SDEs with non-commutative noise, the Milstein scheme requires Lévy area approximations and becomes substantially more involved.

Canonical Examples

Example

Ornstein-Uhlenbeck Process

dXt=θXtdt+σdBtdX_t = -\theta X_t\,dt + \sigma\,dB_t with θ>0\theta > 0. This is the continuous-time analogue of an AR(1) process. The solution is Xt=X0eθt+σ0teθ(ts)dBsX_t = X_0 e^{-\theta t} + \sigma \int_0^t e^{-\theta(t-s)}\,dB_s. The stationary distribution is N(0,σ2/(2θ))\mathcal{N}(0, \sigma^2/(2\theta)). In ML: this is the SDE behind SGLD (Langevin dynamics for a quadratic potential V(x)=θx2/2V(x) = \theta x^2/2) and the forward process of variance-preserving diffusion models.

Example

Geometric Brownian Motion

dSt=μStdt+σStdBtdS_t = \mu S_t\,dt + \sigma S_t\,dB_t. Applying Itô's formula to logSt\log S_t gives St=S0exp ⁣((μσ2/2)t+σBt)S_t = S_0 \exp\!\big((\mu - \sigma^2/2)t + \sigma B_t\big). This is the Black-Scholes stock price model. The Itô correction σ2/2-\sigma^2/2 is the reason the geometric mean return is lower than the arithmetic mean return; it is also why E[St]=S0eμtE[S_t] = S_0 e^{\mu t} despite the σ2/2-\sigma^2/2 in the exponent.

Example

CIR Process (Square-Root Diffusion)

dXt=κ(θXt)dt+σXtdBtdX_t = \kappa(\theta - X_t)\,dt + \sigma\sqrt{X_t}\,dB_t with 2κθ>σ22\kappa\theta > \sigma^2 (Feller condition). This models interest rates and variance processes in finance. The square-root diffusion vanishes at Xt=0X_t = 0, preventing the process from going negative when the Feller condition holds. The stationary distribution is Gamma.

The Fokker-Planck Connection

If XtX_t solves dXt=b(Xt)dt+σ(Xt)dBtdX_t = b(X_t)\,dt + \sigma(X_t)\,dB_t and has a smooth density p(x,t)p(x,t), that density satisfies the Fokker-Planck equation (forward Kolmogorov equation):

pt=x[b(x)p]+122x2[σ2(x)p].\frac{\partial p}{\partial t} = -\frac{\partial}{\partial x}[b(x)\,p] + \frac{1}{2}\frac{\partial^2}{\partial x^2}[\sigma^2(x)\,p].

This PDE governs how the probability mass evolves. In diffusion models, the forward SDE has a known Fokker-Planck equation whose solution converges to a Gaussian; the reverse SDE (score-based generation) runs the same PDE backward in time.

Common Confusions

Watch Out

Ito vs Stratonovich gives different SDEs, not different solutions to the same SDE

The SDE dX=σ(X)dBdX = \sigma(X)\,dB in the Itô sense and dX=σ(X)dBdX = \sigma(X) \circ dB in the Stratonovich sense are different equations with different solutions. They can be converted: a Stratonovich SDE dX=σ(X)dBdX = \sigma(X) \circ dB equals the Itô SDE dX=12σ(X)σ(X)dt+σ(X)dBdX = \frac{1}{2}\sigma(X)\sigma'(X)\,dt + \sigma(X)\,dB. The choice of convention is not a matter of taste when the diffusion coefficient depends on XX.

Watch Out

Strong order 1/2 does not mean Euler-Maruyama is useless

The O(Δt)O(\sqrt{\Delta t}) strong rate looks slow, but for Monte Carlo estimation (the main use case), only the weak rate matters. With weak order 11, Euler-Maruyama converges as fast as forward Euler does for ODEs when you care about expectations. Strong convergence matters for path-dependent functionals and coupling arguments, not for computing E[g(XT)]E[g(X_T)].

Watch Out

Not every SDE has a stationary distribution

An SDE has a stationary distribution only if drift and diffusion balance so that probability mass reaches an equilibrium. The Ornstein-Uhlenbeck process has one (θ>0\theta > 0 pulls mass back to zero). Geometric Brownian motion does not (it drifts to ++\infty or 00 depending on the sign of μσ2/2\mu - \sigma^2/2). Checking for stationarity requires verifying that the Fokker-Planck equation has a normalizable steady-state solution.

Exercises

ExerciseCore

Problem

Apply Itô's formula to f(x)=x2f(x) = x^2 and the Ornstein-Uhlenbeck SDE dXt=θXtdt+σdBtdX_t = -\theta X_t\,dt + \sigma\,dB_t to derive dXt2dX_t^2. Use this to compute E[Xt2]E[X_t^2] when X0=0X_0 = 0.

ExerciseAdvanced

Problem

The Milstein scheme adds the term 12σ(Xk)σ(Xk)[(Zk21)Δt]\frac{1}{2}\sigma(X_k)\sigma'(X_k)[(Z_k^2 - 1)\Delta t] to the Euler-Maruyama step. Show that for geometric Brownian motion dS=μSdt+σSdBdS = \mu S\,dt + \sigma S\,dB, the Milstein scheme gives the exact solution at grid points (strong order 1).

References

Canonical textbooks:

  • Oksendal, Stochastic Differential Equations (6th ed., Springer, 2003), Chapters 5-8.
  • Karatzas & Shreve, Brownian Motion and Stochastic Calculus (2nd ed., Springer, 1991), Chapter 5.
  • Revuz & Yor, Continuous Martingales and Brownian Motion (3rd ed., Springer, 1999). The standard reference for the modern martingale-theoretic treatment.
  • Protter, Stochastic Integration and Differential Equations (2nd ed., Springer, 2004). Semimartingale approach with general jump-diffusion theory.
  • Kloeden & Platen, Numerical Solution of Stochastic Differential Equations (Springer, 1992), Chapters 9-10. The reference for Euler-Maruyama, Milstein, and stochastic Runge-Kutta.

Foundational papers:

  • Yamada & Watanabe, "On the Uniqueness of Solutions of Stochastic Differential Equations" (J. Math. Kyoto 11, 1971). The Yamada-Watanabe pathwise-uniqueness criterion and the weak-existence-plus-pathwise-uniqueness theorem.
  • Milstein, "Approximate Integration of Stochastic Differential Equations" (Theory of Probability and Its Applications 19(3), 1975). The Milstein scheme.

Current:

  • Pavliotis, Stochastic Processes and Applications (Springer, 2014), Chapters 3-4. ML-friendly treatment of OU, Langevin, and Fokker-Planck.
  • Le Gall, Brownian Motion, Martingales, and Stochastic Calculus (Springer, 2016), Chapter 7.
  • Da Prato & Zabczyk, Stochastic Equations in Infinite Dimensions (2nd ed., Cambridge, 2014). The reference for SPDEs and infinite-dimensional SDEs (relevant to function-space diffusion models).
  • Song et al., "Score-Based Generative Modeling through Stochastic Differential Equations" (ICLR 2021; arXiv:2011.13456). SDE framework for diffusion models; the reverse-time SDE and probability flow ODE.

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

2

Derived topics

11

+6 more on the derived-topics page.