Algorithms Foundations
Dynamic Programming
Solve complex optimization problems by decomposing them into overlapping subproblems with optimal substructure. The algorithmic backbone of sequence models, control theory, and reinforcement learning.
Why This Matters
Dynamic programming is how you solve problems that greedy cannot touch. Whenever an optimization problem has optimal substructure and overlapping subproblems, DP gives you an efficient exact solution by trading space for time. Beyond algorithms courses, the Bellman equation underlying DP is the same Bellman equation that drives all of reinforcement learning.
If you understand DP deeply, you understand value iteration, policy iteration, and Q-learning at the algorithmic level—all of which appear in Markov decision processes.
Mental Model
Instead of solving a problem from scratch, ask: can I express the solution to this problem in terms of solutions to smaller versions of the same problem? If yes, solve all the small problems first (or cache them as you encounter them) and build up to the full solution.
The key difference from divide-and-conquer: subproblems overlap. In merge sort, the subproblems are disjoint. In DP, the same subproblem appears many times, so you solve it once and store the result.
Core Definitions
Optimal Substructure
A problem exhibits optimal substructure with respect to a chosen decomposition into subproblems if some optimal solution to the whole problem decomposes into optimal solutions to the subproblems induced by that choice. Optimal substructure is a property of the decomposition, not of the problem in the abstract: the same problem can admit a decomposition with optimal substructure (which DP can exploit) and other decompositions where it fails. The classical caution is that in the longest simple path problem the natural prefix decomposition does not preserve optimality.
Overlapping Subproblems
A recursive decomposition has overlapping subproblems if and only if the same subproblem is encountered multiple times during the recursion. Without memoization, this causes exponential blowup. With memoization or bottom-up tabulation, each subproblem is solved exactly once.
Bellman Equation
The Bellman equation expresses the value of a state as the immediate reward plus the discounted value of the successor state under the optimal action:
This is the DP recursion for sequential decision-making under uncertainty.
Main Theorems
Bellman's Principle of Optimality
Statement
An optimal policy has the property that, regardless of the initial state and initial decision, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
Intuition
If you are on the optimal path from A to C through B, then the sub-path from B to C must itself be optimal. If it were not, you could replace it with a better sub-path and improve the whole solution, contradicting optimality.
Proof Sketch
By contradiction. Suppose the overall solution is optimal but its restriction to some subproblem is not. Then replacing the subproblem solution with a better one yields a strictly better overall solution, contradicting the optimality of .
Why It Matters
This principle is the justification for DP. Without it, solving subproblems optimally and combining them would not yield a globally optimal solution. It is also the foundation of the Bellman equation in RL: the value of a state depends only on optimal behavior from that state onward.
Failure Mode
The principle fails when decisions interact in ways that prevent decomposition. For example, in problems with global constraints that couple all decisions (certain scheduling problems with resource sharing), optimal substructure may not hold.
Top-Down vs Bottom-Up
Top-down (memoization): Write the natural recursion, then add a cache. You only solve subproblems that are actually needed. Easier to code but has recursion overhead and potential stack depth issues.
Bottom-up (tabulation): Determine the order of subproblems, fill in a table from smallest to largest. No recursion overhead. Enables space optimization when you only need the previous row of the table.
Both approaches have the same asymptotic time complexity when both visit the same set of subproblems. When a query touches only a small fraction of the state space, top-down memoization can be asymptotically cheaper because it explores only reachable subproblems, while a naive bottom-up sweep fills the entire table. The choice is otherwise a matter of convenience and constant factors.
Canonical Examples
Fibonacci Numbers
The naive recursion has exponential time because it recomputes the same values. DP reduces this to time and space (keep only the last two values).
This is the simplest possible illustration: overlapping subproblems ( is computed by both and ) and optimal substructure (trivially, since there is no optimization).
0/1 Knapsack
Given items with values and weights , and capacity , maximize total value. Define = maximum value using items with capacity . The recurrence is:
Time: . Space: , reducible to since each row depends only on the previous row. Note: this is pseudo-polynomial because may be exponential in the input size.
Longest Common Subsequence (LCS)
Given strings and , find the length of their longest common subsequence. Define = LCS length of and .
Time and space: where , .
Time-Space Tradeoffs
Many DP problems allow space reduction:
- 1D state depending on previous row only: reduce from to by keeping only the current and previous rows (knapsack).
- Hirschberg's trick: for LCS, compute the optimal value in space but recover the actual subsequence in time using a divide-and-conquer over the DP table.
- Knuth's optimization, divide-and-conquer optimization: for specific recurrence structures, reduce time from to or .
Common Confusions
DP is not just recursion with memoization
Memoization is a technique for implementing DP, not the essence of it. The essence is the principle of optimality: the problem must decompose into subproblems whose optimal solutions combine into an overall optimal solution. If optimal substructure does not hold, memoizing a recursion gives you the right answers to subproblems but combining them does not give the right answer to the whole problem. The recursion must encode a valid decomposition.
Pseudo-polynomial is not polynomial
The 0/1 knapsack DP runs in time. This looks polynomial, but is a number in the input, not the size of the input. The input size is , so is exponential in the input size. Knapsack is NP-hard, and the DP is a pseudo-polynomial time algorithm. This distinction matters for complexity theory.
DP vs greedy vs divide-and-conquer
- Greedy: make one choice, never reconsider. Works when matroid structure guarantees the local choice is globally safe.
- Divide-and-conquer: split into disjoint subproblems, solve recursively, combine. No overlapping subproblems (e.g., merge sort).
- DP: split into overlapping subproblems, solve each once, combine. Requires optimal substructure.
Memoization vs Tabulation
Both approaches implement DP but differ in traversal order and practical tradeoffs.
Top-down (memoization) writes the natural recursion, then intercepts repeat calls with a cache. In Python, this is a dictionary keyed by subproblem arguments; in systems code, it is a hash map or an array indexed by state. Only the subproblems actually needed are computed. For problems where most subproblems are irrelevant to the query (e.g., computing for Fibonacci only touches states out of a much larger state space), memoization avoids wasted work.
Concrete structure for coin change:
def min_coins(amount, coins, memo={}):
if amount in memo: return memo[amount]
if amount == 0: return 0
best = float('inf')
for c in coins:
if c <= amount:
best = min(best, 1 + min_coins(amount - c, coins, memo))
memo[amount] = best
return best
Bottom-up (tabulation) determines the topological order of subproblems, then fills an array in that order. No recursion overhead; no stack depth limits. Space optimization is natural: if depends only on , keep just one row.
Concrete structure for the same problem:
def min_coins_bu(amount, coins):
dp = [float('inf')] * (amount + 1)
dp[0] = 0
for a in range(1, amount + 1):
for c in coins:
if c <= a:
dp[a] = min(dp[a], 1 + dp[a - c])
return dp[amount]
Key tradeoffs:
| Top-down (memoization) | Bottom-up (tabulation) | |
|---|---|---|
| Traversal | Demand-driven, only needed states | All states in fixed order |
| Overhead | Recursion stack, hash lookup | Array indexing, predictable |
| Space opt. | Harder (cache must retain all states) | Easy (overwrite old rows) |
| Stack depth | Risky for large inputs (Python recursion limit) | None |
| Clarity | Matches the recurrence directly | Requires explicit ordering |
Both have identical asymptotic complexity. The choice depends on whether most subproblems are needed (prefer bottom-up) and whether the ordering is hard to compute (prefer top-down).
The Bellman equations in reinforcement learning typically use bottom-up tabulation when the state space is small and explicit, and function approximation (not exact DP) when the state space is too large to enumerate.
Summary
- DP requires two properties: optimal substructure and overlapping subproblems
- The Bellman equation is the DP recursion for sequential decision problems
- Top-down (memoization) and bottom-up (tabulation) match asymptotically when they visit the same subproblems; memoization can be asymptotically cheaper when the reachable state set is sparse
- Space optimization is often possible by keeping only the previous layer
- Pseudo-polynomial time (e.g., ) is not the same as polynomial time
- The principle of optimality is the justification for why DP gives correct answers
Exercises
Problem
Write the DP recurrence for the minimum number of coins needed to make change for amount using coin denominations . What is the time and space complexity?
Problem
The naive recursive Fibonacci has calls. Explain precisely why memoization reduces this to , and why bottom-up DP reduces space from to .
Problem
The matrix chain multiplication problem asks for the optimal parenthesization of . Write the DP recurrence and give its time complexity. Why does greedy fail here?
References
Canonical:
- Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapters 14-15 (DP foundations, LCS, matrix chain)
- Bellman, Dynamic Programming (1957), Chapters 3-4 (principle of optimality, sequential decision processes)
RL Connection:
- Sutton & Barto, Reinforcement Learning: An Introduction (2018), Chapters 3-4 (MDPs, Bellman equations, value iteration)
Space optimization:
- Hirschberg, "A Linear Space Algorithm for Computing Maximal Common Subsequences" (1975)
- Knuth, "Optimum Binary Search Trees" (1971) — optimization for monotone recurrences
Accessible:
- Kleinberg & Tardos, Algorithm Design, Chapter 6 (weighted interval scheduling, sequence alignment, RNA folding)
Advanced:
- Bertsekas, Dynamic Programming and Optimal Control (2017), Vol. I, Chapters 1-2 (stochastic DP, infinite-horizon problems)
Next Topics
- Greedy algorithms: the simpler paradigm for problems with matroid structure
- Convex optimization basics: continuous optimization with similar decomposition ideas
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
3- Sets, Functions, and Relationslayer 0A · tier 1
- Graph Algorithms Essentialslayer 0A · tier 2
- Greedy Algorithmslayer 0A · tier 2
Derived topics
2- Convex Optimization Basicslayer 1 · tier 1
- Knapsack Problemlayer 0A · tier 2
Graph-backed continuations