Ito's Lemma

Sneiderman, Robby

Mathematical Infrastructure

Ito's Lemma

The chain rule of stochastic calculus: if a process follows an SDE, applying a smooth function to it yields a modified SDE with an extra second-order correction term that has no analogue in ordinary calculus.

AdvancedTier 2StableSupporting~50 min

Prerequisites

Stochastic Calculus for ML

Quiz (2)Prereq Map

Why This Matters

In ordinary calculus, if you know $dx/dt$ and want $df(x)/dt$ , you apply the chain rule: $f'(x) \cdot dx/dt$ . In stochastic calculus, this formula is wrong. The chain rule picks up an extra term proportional to $f''(x) \sigma^2$ . This correction term is the reason stochastic calculus exists as a separate subject.

Five-panel infographic: why the ordinary chain rule fails for Brownian paths, setup with an Ito process dX = mu dt + sigma dW, the multiplication rules (dt^2 = 0, dW^2 = dt, dt dW = 0), Ito's lemma chain rule with the second-order correction, and applications (Black-Scholes, geometric Brownian motion). — Ito's lemma is the chain rule for stochastic processes. The dW^2 = dt term forces a second-order correction that does not appear in ordinary calculus.

Every derivation in diffusion models, score-based generative models, and mathematical finance uses Ito's lemma. If you cannot apply it mechanically, you cannot read these papers.

Ito Isometry

Before stating the chain rule, we record the core identity that makes the Ito integral well-defined as an $L^2$ object.

Theorem

Ito Isometry

Statement

For $f \in \mathcal{L}^2([0,T])$ adapted to the filtration of $W_t$ :

$\mathbb{E}\left[\left(\int_0^T f(s)\,dB_s\right)^2\right] = \mathbb{E}\left[\int_0^T f(s)^2\,ds\right]$

Intuition

The stochastic integral is a linear isometry from the Hilbert space of adapted square-integrable processes (with the $dt \otimes dP$ inner product) into $L^2(\Omega)$ . Variance on the left equals expected energy on the right.

Why It Matters

This identity is how the Ito integral is constructed. Simple processes satisfy it by direct computation. General adapted $L^2$ processes are defined as $L^2$ limits of simple processes, and the isometry guarantees the limit exists and is unique. Every existence and uniqueness proof for SDEs rests on it.

report a correction →

Mental Model

A Brownian motion path is so rough that $(dW_t)^2$ does not vanish. It equals $dt$ in a precise sense (quadratic variation). When you Taylor expand $f(X_t)$ , the second-order term $\frac{1}{2} f''(X_t)(dX_t)^2$ survives because $(dX_t)^2$ contains a $\sigma^2 dt$ piece. In ordinary calculus, $(dx)^2 = 0$ . In stochastic calculus, $(dW)^2 = dt$ .

Setup

Let $X_t$ be an Ito process satisfying the SDE:

$dX_t = \mu(X_t, t)\,dt + \sigma(X_t, t)\,dW_t$

where $W_t$ is standard Brownian motion, $\mu$ is the drift, and $\sigma$ is the diffusion coefficient.

Definition

Quadratic Variation Rule $(d W)^{2} = d t$

The multiplication rules for Ito calculus are:

$dt \cdot dt = 0, \quad dt \cdot dW_t = 0, \quad dW_t \cdot dW_t = dt$

These rules follow from the quadratic variation of Brownian motion: $\langle W \rangle_t = t$ .

Main Theorems

Theorem

Ito's Lemma (One Dimension)

Statement

Let $f(x, t) \in C^{2,1}$ . Then $Y_t = f(X_t, t)$ satisfies:

$df = \left(\frac{\partial f}{\partial t} + \mu \frac{\partial f}{\partial x} + \frac{\sigma^2}{2} \frac{\partial^2 f}{\partial x^2}\right) dt + \sigma \frac{\partial f}{\partial x}\, dW_t$

The term $\frac{\sigma^2}{2} \frac{\partial^2 f}{\partial x^2}\,dt$ is the Ito correction. It has no analogue in ordinary calculus.

Intuition

Taylor expand $f$ to second order. The first-order terms give the ordinary chain rule. The second-order term $\frac{1}{2} f''(dX)^2$ normally vanishes, but here $(dX)^2 = \sigma^2 (dW)^2 + \text{higher order} = \sigma^2 dt$ . So $\frac{1}{2} f'' \sigma^2 dt$ survives.

Proof Sketch

Partition $[0, t]$ into $n$ subintervals. Write the telescoping sum $f(X_t, t) - f(X_0, 0) = \sum_i [f(X_{t_{i+1}}, t_{i+1}) - f(X_{t_i}, t_i)]$ . Taylor expand each increment to second order. The first-order terms converge to the Ito integral. The second-order terms converge to $\frac{1}{2} \int_0^t f''(X_s) \sigma^2(X_s, s)\,ds$ because of the quadratic variation of $W_t$ . Cross terms and higher-order terms vanish in $L^2$ .

Why It Matters

This is the computational workhorse of stochastic calculus. Every application requires you to start with a process $X_t$ and compute the dynamics of some function of it. Without Ito's lemma, you cannot derive the Black-Scholes equation, compute the SDE for the score function in diffusion models, or analyze Langevin dynamics.

Failure Mode

The formula requires $f \in C^{2,1}$ . If $f$ is not twice differentiable (e.g., $f(x) = |x|$ ), the standard Ito lemma does not apply. You need the Tanaka-Meyer formula, which introduces local time. Also, this is the Ito version. The Stratonovich chain rule has no correction term but changes the integral definition.

report a correction →

Canonical Examples

Example

Geometric Brownian Motion

Let $S_t$ satisfy $dS_t = \mu S_t\,dt + \sigma S_t\,dW_t$ (stock price model). Apply Ito's lemma to $f(S) = \ln S$ . We have $f' = 1/S$ and $f'' = -1/S^2$ .

$d(\ln S_t) = \frac{1}{S_t}(\mu S_t\,dt + \sigma S_t\,dW_t) + \frac{1}{2}\left(-\frac{1}{S_t^2}\right)\sigma^2 S_t^2\,dt$

$= \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma\,dW_t$

So $\ln S_t$ is a Brownian motion with drift $\mu - \sigma^2/2$ . The correction term $-\sigma^2/2$ is why the expected log return is less than $\mu$ .

Example

Square of Brownian motion

Let $f(W_t) = W_t^2$ . Here $f' = 2W_t$ , $f'' = 2$ , $\mu = 0$ , $\sigma = 1$ .

$d(W_t^2) = 2W_t\,dW_t + \frac{1}{2}(2)(1)^2\,dt = 2W_t\,dW_t + dt$

Integrating: $W_t^2 = 2\int_0^t W_s\,dW_s + t$ . This gives the Ito integral identity $\int_0^t W_s\,dW_s = \frac{1}{2}(W_t^2 - t)$ .

Theorem

Multidimensional Ito's Lemma

Statement

If $dX_t^i = \mu^i\,dt + \sum_j \sigma^{ij}\,dW_t^j$ , then for $f \in C^{2,1}$ :

$df = \frac{\partial f}{\partial t}\,dt + \sum_i \frac{\partial f}{\partial x^i}\,dX_t^i + \frac{1}{2}\sum_{i,j} \frac{\partial^2 f}{\partial x^i \partial x^j}\,d\langle X^i, X^j \rangle_t$

where $d\langle X^i, X^j \rangle_t = \sum_k \sigma^{ik}\sigma^{jk}\,dt$ .

Intuition

The same idea as the 1D case, but now the quadratic covariation between different components contributes through the full Hessian matrix of $f$ .

Proof Sketch

Same Taylor expansion argument as the 1D case, applied componentwise. The cross terms $dX^i dX^j$ contribute through the covariation $\langle X^i, X^j \rangle$ .

Why It Matters

Diffusion models in high dimensions (image generation) operate on multidimensional SDEs. The multivariate version is needed to derive the reverse-time SDE and the score matching objective.

Failure Mode

Same regularity requirements as the 1D case, but now applied to all partial derivatives up to second order.

report a correction →

Common Confusions

Watch Out

Why not just use the ordinary chain rule?

Because $(dW_t)^2 = dt \neq 0$ . Brownian paths have infinite total variation on any interval, so the second-order term in the Taylor expansion does not vanish. If you apply the ordinary chain rule, you get the wrong drift. The geometric Brownian motion example shows this: the ordinary chain rule gives drift $\mu$ for $\ln S$ , but the correct drift is $\mu - \sigma^2/2$ .

Watch Out

Ito vs Stratonovich

In Stratonovich calculus, the chain rule has no correction term: $df = f'(X_t) \circ dX_t$ . The price is that the Stratonovich integral is defined as a midpoint Riemann sum, not an endpoint sum. Ito integrals are martingales (useful for probability arguments). Stratonovich integrals obey the ordinary chain rule (useful for physics). They are related by: the Ito SDE $dX = \mu\,dt + \sigma\,dW$ corresponds to the Stratonovich SDE $dX = (\mu - \frac{1}{2}\sigma \sigma')\,dt + \sigma \circ dW$ .

Feynman-Kac Formula

Ito's lemma gives a direct bridge between parabolic PDEs and SDEs. For a PDE of the form $\partial_t u + \mu\,\partial_x u + \tfrac{1}{2}\sigma^2\,\partial_{xx} u - r u = 0$ with terminal condition $u(T, x) = \phi(x)$ , the solution has the probabilistic representation $u(t, x) = \mathbb{E}[e^{-r(T-t)} \phi(X_T) \mid X_t = x]$ where $X_t$ follows $dX_t = \mu\,dt + \sigma\,dB_t$ . The derivation is one line of Ito: apply the lemma to $e^{-r(s-t)} u(s, X_s)$ and the PDE kills the drift, leaving a martingale whose expectation reduces to the terminal value. This identity is the foundation of Black-Scholes pricing and also underlies the reverse-time SDE construction in score-based generative models.

Summary

The Ito correction term is $\frac{1}{2} f''(x) \sigma^2 dt$
It exists because $(dW)^2 = dt$ , not zero
For $\ln S$ of geometric Brownian motion, the correction gives drift $\mu - \sigma^2/2$
The Ito integral $\int_0^t W_s\,dW_s = \frac{1}{2}(W_t^2 - t)$ , not $\frac{1}{2}W_t^2$ as the ordinary chain rule would give

Exercises

ExerciseCore

Problem

Apply Ito's lemma to $f(W_t) = e^{W_t}$ . What SDE does $Y_t = e^{W_t}$ satisfy?

ExerciseAdvanced

Problem

Let $X_t$ satisfy $dX_t = -X_t\,dt + \sqrt{2}\,dW_t$ (Ornstein-Uhlenbeck process). Apply Ito's lemma to $f(X_t, t) = X_t^2 e^{2t}$ and use it to compute $\mathbb{E}[X_t^2]$ given $X_0 = 0$ .

References

Canonical:

Oksendal, B. (2003). Stochastic Differential Equations, 6th ed. Springer. Chapters 3-4 (Ito integral, Ito's formula).
Karatzas, I. and Shreve, S. (1991). Brownian Motion and Stochastic Calculus, 2nd ed. Springer. Chapter 3 (stochastic integration).
Shreve, S. (2004). Stochastic Calculus for Finance II. Springer. Chapter 4.
Le Gall, J.-F. (2016). Brownian Motion, Martingales, and Stochastic Calculus. Springer. Chapter 5.

Current:

Song et al., "Score-Based Generative Modeling through SDEs" (2021), Appendix A.

Next Topics

Diffusion models: the primary ML application of Ito's lemma today
Flow matching: an alternative to diffusion that avoids SDEs but relates to them

Last reviewed: April 18, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Stochastic Calculus for MLlayer 3 · tier 3

Derived topics

5

Diffusion Modelslayer 4 · tier 1
Backward Stochastic Differential Equationslayer 3 · tier 2
Feynman–Kac Formulalayer 3 · tier 2
Stochastic Differential Equationslayer 3 · tier 2
Flow Matchinglayer 4 · tier 2

Graph-backed continuations

Diffusion Models Flow Matching Backward Stochastic Differential Equations Feynman–Kac Formula Stochastic Differential Equations