Foundations
Compactness and Heine-Borel
Sequential compactness, the Heine-Borel theorem in finite dimensions, the extreme value theorem, and why compactness is the key assumption in optimization.
Prerequisites
Why This Matters
Optimization problems in ML ask: does a minimum exist? Compactness is the standard tool for guaranteeing existence of minimizers. The extreme value theorem says a continuous function on a compact set attains its minimum. Without compactness, minima may not exist (the is not achieved), and many proof strategies in learning theory break down.
This page leans on three pieces of notation: , , and the idea of a .
Core Definitions
Open Cover
An open cover of a set is a collection of open sets such that . A subcover is a subcollection that still covers .
Compactness
A subset of a metric space is compact if and only if every open cover of has a finite subcover. Compactness is a topological property: it does not depend on the particular metric, only on the topology.
Sequential Compactness
A set is sequentially compact if and only if every sequence in has a subsequence that converges to a point in . In metric spaces, compactness and sequential compactness are equivalent.
Bounded Set
A subset of a metric space is bounded if and only if there exist and such that for all . In , this means is contained in some ball of finite radius.
Main Theorems
Heine-Borel Theorem
Statement
A subset is compact if and only if is closed and bounded.
Intuition
Bounded means sequences cannot escape to infinity. Closed means limits of convergent subsequences stay in the set. Together, they guarantee every sequence has a convergent subsequence with limit in .
Proof Sketch
Compact implies closed and bounded: If is not bounded, the sequence with has no convergent subsequence in . If is not closed, there is a limit point ; a sequence converging to has no subsequence converging in . Closed and bounded implies compact: By Bolzano-Weierstrass, every bounded sequence in has a convergent subsequence. Since is closed, the limit is in .
Why It Matters
Heine-Borel makes compactness easy to check in : just verify closed and bounded. This is the standard way to establish that a constrained optimization problem has a solution.
Failure Mode
Heine-Borel fails in infinite-dimensional spaces. The closed unit ball in is closed and bounded but not compact (the standard basis vectors have no convergent subsequence). In infinite dimensions, compactness is a much stronger condition than closed-and-bounded.
Extreme Value Theorem
Statement
A continuous function on a nonempty compact set attains its maximum and minimum. There exist with:
Intuition
Compactness prevents the minimizing sequence from escaping (either to infinity or to a boundary point outside ). Continuity preserves convergence, so the limit of a minimizing sequence is a minimizer.
Proof Sketch
Let . Take a sequence with . By compactness, a subsequence . By continuity, . So the infimum is attained at . Same argument for the supremum.
Why It Matters
This theorem is invoked implicitly whenever you write for a continuous loss over a compact parameter set . Without compactness, you can only write , and the minimizer may not exist.
Failure Mode
Fails without compactness: on has but no minimizer.
Fails without continuity: define by for and . The domain is compact, but is discontinuous at (the right limit is while ). The infimum is , but no minimizer exists: every gives , and . Continuity is the missing hypothesis, not compactness.
Compact ⟺ Complete + Totally Bounded
In any metric space the following are equivalent for a set :
- is compact.
- is sequentially compact: every sequence in has a subsequence converging to a point of .
- is complete and totally bounded.
Total boundedness means: for every , can be covered by finitely many open balls of radius . The minimum number of such balls is the covering number . This is the analytic origin of covering-number arguments in learning theory and high-dimensional probability, and it is the route by which compactness enters Rademacher complexity bounds via Dudley's chaining inequality.
This equivalence makes Heine-Borel feel less special. In , "closed" means complete and "bounded" means totally bounded, so closed + bounded is exactly compactness. In an infinite-dimensional normed space, bounded does not imply totally bounded — the closed unit ball in is bounded but contains the orthonormal basis with for , so no finite collection of -balls with covers it — which is why closed + bounded no longer implies compact in infinite dimensions.
Why Compactness Matters in ML
Compactness is one clean sufficient condition for several ML-relevant theorems, not the universal assumption. Be precise about what each theorem actually needs.
- Existence of minimizers via EVT. A continuous loss on a compact domain attains its minimum (extreme value theorem). A hard parameter constraint is closed and bounded in , hence compact, hence the existence proof goes through directly. Ordinary regularization on an unbounded domain is a different argument: coercivity ( as ) plus lower semicontinuity (or strict convexity) gives existence of a minimizer over without literal compactness.
- Covering-number arguments. What is needed is total boundedness of the relevant hypothesis class at the relevant scale, not full compactness. Compactness implies total boundedness (the bridge theorem above), but many learning-theory bounds work under weaker entropy assumptions on the class.
- Continuity arguments. Compactness can upgrade pointwise to uniform statements only with extra hypotheses. Dini's theorem, for example, requires monotone convergence of continuous functions to a continuous limit on a compact space.
- Function-approximation theory. The Arzelà-Ascoli theorem characterizes relatively compact families in via uniform boundedness and equicontinuity. Universal approximation theorems are usually stated on compact domains for clean topology, but Arzelà-Ascoli is a separate compactness principle, not the engine of basic universal approximation.
Examples
The closed unit ball in $\ell^2$ is closed and bounded but not compact
Let be the space of square-summable real sequences with norm . The closed unit ball is closed (the norm is continuous, so the preimage of is closed) and bounded.
Yet is not compact. The standard basis vectors (a in position , zeros elsewhere) all lie in , but for , . No subsequence of is Cauchy, so none converges. Sequential compactness fails, and Heine-Borel is therefore an -specific theorem, not a general metric-space fact.
In ML, this is what changes when you switch from a finite-parameter model to an infinite-dimensional function space: compactness arguments need to be rebuilt around weak topologies, equicontinuity (Arzelà-Ascoli), or explicit covering-number bounds.
Why $\arg\min$ may not be a singleton or even nonempty
Take on the closed half-line . The set is closed but not bounded; the minimum value is attained at infinitely many points , so is nonempty but not a singleton. Now take on the same domain: the infimum is , no minimizer exists at all, and is empty. The same loss on the compact restriction has minimizer . Compactness is exactly the condition that turns "" into "".
Common Confusions
Closed and bounded is not enough in infinite dimensions
A common mistake is applying Heine-Borel in function spaces. The closed unit ball in any infinite-dimensional normed space is never compact. This is why weak compactness and related notions are needed in functional analysis and measure-theoretic probability.
Compact subsets of R^n are always complete
Every compact metric space is complete (Cauchy sequences have convergent subsequences, whose limits are in the compact set). But completeness alone does not give compactness: is complete but not compact.
Exercises
Problem
Is the set compact? What about ?
Problem
Give an example of a continuous function on a closed (but unbounded) subset of that does not attain its infimum. Explain which hypothesis of the extreme value theorem fails.
References
Canonical:
- Rudin, Principles of Mathematical Analysis (1976), Chapter 2 (sections on compactness)
- Munkres, Topology (2000), Chapter 3
- Apostol, Mathematical Analysis (1974), Chapter 3 (compact subsets of R^n)
For ML context:
- Shalev-Shwartz & Ben-David, Understanding Machine Learning (2014), Chapter 27 (covering numbers and compactness)
- Folland, Real Analysis (1999), Section 4.4 (compactness in metric spaces and function spaces)
- Aliprantis & Border, Infinite Dimensional Analysis (2006), Chapters 2-3 (compactness in topological vector spaces)
Last reviewed: April 18, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
1- Metric Spaces, Convergence, and Completenesslayer 0A · tier 1
Derived topics
0No published topic currently declares this as a prerequisite.