Skip to main content

Mathematical Infrastructure

Ito's Lemma

The chain rule of stochastic calculus: if a process follows an SDE, applying a smooth function to it yields a modified SDE with an extra second-order correction term that has no analogue in ordinary calculus.

AdvancedTier 2StableSupporting~50 min

Why This Matters

In ordinary calculus, if you know dx/dtdx/dt and want df(x)/dtdf(x)/dt, you apply the chain rule: f(x)dx/dtf'(x) \cdot dx/dt. In stochastic calculus, this formula is wrong. The chain rule picks up an extra term proportional to f(x)σ2f''(x) \sigma^2. This correction term is the reason stochastic calculus exists as a separate subject.

Hide overviewShow overview
Five-panel infographic: why the ordinary chain rule fails for Brownian paths, setup with an Ito process dX = mu dt + sigma dW, the multiplication rules (dt^2 = 0, dW^2 = dt, dt dW = 0), Ito's lemma chain rule with the second-order correction, and applications (Black-Scholes, geometric Brownian motion).
Ito's lemma is the chain rule for stochastic processes. The dW^2 = dt term forces a second-order correction that does not appear in ordinary calculus.

Every derivation in diffusion models, score-based generative models, and mathematical finance uses Ito's lemma. If you cannot apply it mechanically, you cannot read these papers.

Ito Isometry

Before stating the chain rule, we record the core identity that makes the Ito integral well-defined as an L2L^2 object.

Theorem

Ito Isometry

Statement

For fL2([0,T])f \in \mathcal{L}^2([0,T]) adapted to the filtration of WtW_t:

E[(0Tf(s)dBs)2]=E[0Tf(s)2ds]\mathbb{E}\left[\left(\int_0^T f(s)\,dB_s\right)^2\right] = \mathbb{E}\left[\int_0^T f(s)^2\,ds\right]

Intuition

The stochastic integral is a linear isometry from the Hilbert space of adapted square-integrable processes (with the dtdPdt \otimes dP inner product) into L2(Ω)L^2(\Omega). Variance on the left equals expected energy on the right.

Why It Matters

This identity is how the Ito integral is constructed. Simple processes satisfy it by direct computation. General adapted L2L^2 processes are defined as L2L^2 limits of simple processes, and the isometry guarantees the limit exists and is unique. Every existence and uniqueness proof for SDEs rests on it.

Mental Model

A Brownian motion path is so rough that (dWt)2(dW_t)^2 does not vanish. It equals dtdt in a precise sense (quadratic variation). When you Taylor expand f(Xt)f(X_t), the second-order term 12f(Xt)(dXt)2\frac{1}{2} f''(X_t)(dX_t)^2 survives because (dXt)2(dX_t)^2 contains a σ2dt\sigma^2 dt piece. In ordinary calculus, (dx)2=0(dx)^2 = 0. In stochastic calculus, (dW)2=dt(dW)^2 = dt.

Setup

Let XtX_t be an Ito process satisfying the SDE:

dXt=μ(Xt,t)dt+σ(Xt,t)dWtdX_t = \mu(X_t, t)\,dt + \sigma(X_t, t)\,dW_t

where WtW_t is standard Brownian motion, μ\mu is the drift, and σ\sigma is the diffusion coefficient.

Definition

Quadratic Variation Rule

The multiplication rules for Ito calculus are:

dtdt=0,dtdWt=0,dWtdWt=dtdt \cdot dt = 0, \quad dt \cdot dW_t = 0, \quad dW_t \cdot dW_t = dt

These rules follow from the quadratic variation of Brownian motion: Wt=t\langle W \rangle_t = t.

Main Theorems

Theorem

Ito's Lemma (One Dimension)

Statement

Let f(x,t)C2,1f(x, t) \in C^{2,1}. Then Yt=f(Xt,t)Y_t = f(X_t, t) satisfies:

df=(ft+μfx+σ222fx2)dt+σfxdWtdf = \left(\frac{\partial f}{\partial t} + \mu \frac{\partial f}{\partial x} + \frac{\sigma^2}{2} \frac{\partial^2 f}{\partial x^2}\right) dt + \sigma \frac{\partial f}{\partial x}\, dW_t

The term σ222fx2dt\frac{\sigma^2}{2} \frac{\partial^2 f}{\partial x^2}\,dt is the Ito correction. It has no analogue in ordinary calculus.

Intuition

Taylor expand ff to second order. The first-order terms give the ordinary chain rule. The second-order term 12f(dX)2\frac{1}{2} f''(dX)^2 normally vanishes, but here (dX)2=σ2(dW)2+higher order=σ2dt(dX)^2 = \sigma^2 (dW)^2 + \text{higher order} = \sigma^2 dt. So 12fσ2dt\frac{1}{2} f'' \sigma^2 dt survives.

Proof Sketch

Partition [0,t][0, t] into nn subintervals. Write the telescoping sum f(Xt,t)f(X0,0)=i[f(Xti+1,ti+1)f(Xti,ti)]f(X_t, t) - f(X_0, 0) = \sum_i [f(X_{t_{i+1}}, t_{i+1}) - f(X_{t_i}, t_i)]. Taylor expand each increment to second order. The first-order terms converge to the Ito integral. The second-order terms converge to 120tf(Xs)σ2(Xs,s)ds\frac{1}{2} \int_0^t f''(X_s) \sigma^2(X_s, s)\,ds because of the quadratic variation of WtW_t. Cross terms and higher-order terms vanish in L2L^2.

Why It Matters

This is the computational workhorse of stochastic calculus. Every application requires you to start with a process XtX_t and compute the dynamics of some function of it. Without Ito's lemma, you cannot derive the Black-Scholes equation, compute the SDE for the score function in diffusion models, or analyze Langevin dynamics.

Failure Mode

The formula requires fC2,1f \in C^{2,1}. If ff is not twice differentiable (e.g., f(x)=xf(x) = |x|), the standard Ito lemma does not apply. You need the Tanaka-Meyer formula, which introduces local time. Also, this is the Ito version. The Stratonovich chain rule has no correction term but changes the integral definition.

Canonical Examples

Example

Geometric Brownian Motion

Let StS_t satisfy dSt=μStdt+σStdWtdS_t = \mu S_t\,dt + \sigma S_t\,dW_t (stock price model). Apply Ito's lemma to f(S)=lnSf(S) = \ln S. We have f=1/Sf' = 1/S and f=1/S2f'' = -1/S^2.

d(lnSt)=1St(μStdt+σStdWt)+12(1St2)σ2St2dtd(\ln S_t) = \frac{1}{S_t}(\mu S_t\,dt + \sigma S_t\,dW_t) + \frac{1}{2}\left(-\frac{1}{S_t^2}\right)\sigma^2 S_t^2\,dt

=(μσ22)dt+σdWt= \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma\,dW_t

So lnSt\ln S_t is a Brownian motion with drift μσ2/2\mu - \sigma^2/2. The correction term σ2/2-\sigma^2/2 is why the expected log return is less than μ\mu.

Example

Square of Brownian motion

Let f(Wt)=Wt2f(W_t) = W_t^2. Here f=2Wtf' = 2W_t, f=2f'' = 2, μ=0\mu = 0, σ=1\sigma = 1.

d(Wt2)=2WtdWt+12(2)(1)2dt=2WtdWt+dtd(W_t^2) = 2W_t\,dW_t + \frac{1}{2}(2)(1)^2\,dt = 2W_t\,dW_t + dt

Integrating: Wt2=20tWsdWs+tW_t^2 = 2\int_0^t W_s\,dW_s + t. This gives the Ito integral identity 0tWsdWs=12(Wt2t)\int_0^t W_s\,dW_s = \frac{1}{2}(W_t^2 - t).

Theorem

Multidimensional Ito's Lemma

Statement

If dXti=μidt+jσijdWtjdX_t^i = \mu^i\,dt + \sum_j \sigma^{ij}\,dW_t^j, then for fC2,1f \in C^{2,1}:

df=ftdt+ifxidXti+12i,j2fxixjdXi,Xjtdf = \frac{\partial f}{\partial t}\,dt + \sum_i \frac{\partial f}{\partial x^i}\,dX_t^i + \frac{1}{2}\sum_{i,j} \frac{\partial^2 f}{\partial x^i \partial x^j}\,d\langle X^i, X^j \rangle_t

where dXi,Xjt=kσikσjkdtd\langle X^i, X^j \rangle_t = \sum_k \sigma^{ik}\sigma^{jk}\,dt.

Intuition

The same idea as the 1D case, but now the quadratic covariation between different components contributes through the full Hessian matrix of ff.

Proof Sketch

Same Taylor expansion argument as the 1D case, applied componentwise. The cross terms dXidXjdX^i dX^j contribute through the covariation Xi,Xj\langle X^i, X^j \rangle.

Why It Matters

Diffusion models in high dimensions (image generation) operate on multidimensional SDEs. The multivariate version is needed to derive the reverse-time SDE and the score matching objective.

Failure Mode

Same regularity requirements as the 1D case, but now applied to all partial derivatives up to second order.

Common Confusions

Watch Out

Why not just use the ordinary chain rule?

Because (dWt)2=dt0(dW_t)^2 = dt \neq 0. Brownian paths have infinite total variation on any interval, so the second-order term in the Taylor expansion does not vanish. If you apply the ordinary chain rule, you get the wrong drift. The geometric Brownian motion example shows this: the ordinary chain rule gives drift μ\mu for lnS\ln S, but the correct drift is μσ2/2\mu - \sigma^2/2.

Watch Out

Ito vs Stratonovich

In Stratonovich calculus, the chain rule has no correction term: df=f(Xt)dXtdf = f'(X_t) \circ dX_t. The price is that the Stratonovich integral is defined as a midpoint Riemann sum, not an endpoint sum. Ito integrals are martingales (useful for probability arguments). Stratonovich integrals obey the ordinary chain rule (useful for physics). They are related by: the Ito SDE dX=μdt+σdWdX = \mu\,dt + \sigma\,dW corresponds to the Stratonovich SDE dX=(μ12σσ)dt+σdWdX = (\mu - \frac{1}{2}\sigma \sigma')\,dt + \sigma \circ dW.

Feynman-Kac Formula

Ito's lemma gives a direct bridge between parabolic PDEs and SDEs. For a PDE of the form tu+μxu+12σ2xxuru=0\partial_t u + \mu\,\partial_x u + \tfrac{1}{2}\sigma^2\,\partial_{xx} u - r u = 0 with terminal condition u(T,x)=ϕ(x)u(T, x) = \phi(x), the solution has the probabilistic representation u(t,x)=E[er(Tt)ϕ(XT)Xt=x]u(t, x) = \mathbb{E}[e^{-r(T-t)} \phi(X_T) \mid X_t = x] where XtX_t follows dXt=μdt+σdBtdX_t = \mu\,dt + \sigma\,dB_t. The derivation is one line of Ito: apply the lemma to er(st)u(s,Xs)e^{-r(s-t)} u(s, X_s) and the PDE kills the drift, leaving a martingale whose expectation reduces to the terminal value. This identity is the foundation of Black-Scholes pricing and also underlies the reverse-time SDE construction in score-based generative models.

Summary

  • The Ito correction term is 12f(x)σ2dt\frac{1}{2} f''(x) \sigma^2 dt
  • It exists because (dW)2=dt(dW)^2 = dt, not zero
  • For lnS\ln S of geometric Brownian motion, the correction gives drift μσ2/2\mu - \sigma^2/2
  • The Ito integral 0tWsdWs=12(Wt2t)\int_0^t W_s\,dW_s = \frac{1}{2}(W_t^2 - t), not 12Wt2\frac{1}{2}W_t^2 as the ordinary chain rule would give

Exercises

ExerciseCore

Problem

Apply Ito's lemma to f(Wt)=eWtf(W_t) = e^{W_t}. What SDE does Yt=eWtY_t = e^{W_t} satisfy?

ExerciseAdvanced

Problem

Let XtX_t satisfy dXt=Xtdt+2dWtdX_t = -X_t\,dt + \sqrt{2}\,dW_t (Ornstein-Uhlenbeck process). Apply Ito's lemma to f(Xt,t)=Xt2e2tf(X_t, t) = X_t^2 e^{2t} and use it to compute E[Xt2]\mathbb{E}[X_t^2] given X0=0X_0 = 0.

References

Canonical:

  • Oksendal, B. (2003). Stochastic Differential Equations, 6th ed. Springer. Chapters 3-4 (Ito integral, Ito's formula).
  • Karatzas, I. and Shreve, S. (1991). Brownian Motion and Stochastic Calculus, 2nd ed. Springer. Chapter 3 (stochastic integration).
  • Shreve, S. (2004). Stochastic Calculus for Finance II. Springer. Chapter 4.
  • Le Gall, J.-F. (2016). Brownian Motion, Martingales, and Stochastic Calculus. Springer. Chapter 5.

Current:

  • Song et al., "Score-Based Generative Modeling through SDEs" (2021), Appendix A.

Next Topics

  • Diffusion models: the primary ML application of Ito's lemma today
  • Flow matching: an alternative to diffusion that avoids SDEs but relates to them

Last reviewed: April 18, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

1

Derived topics

5