Foundations
Inner Product Spaces and Orthogonality
Inner product axioms, Cauchy-Schwarz inequality, orthogonality, Gram-Schmidt, projections, and the bridge to Hilbert spaces.
Prerequisites
Why This Matters
Inner products define geometry in vector spaces: lengths, angles, and orthogonality. Kernel methods replace the standard with arbitrary inner products to measure similarity in high-dimensional feature spaces. Projections onto subspaces underlie least squares linear regression, PCA, and conditional expectation.
The angle-bracket notation denotes the inner product, and denotes orthogonality.
Core Definitions
Inner Product
An inner product on a real vector space is a function satisfying:
- Symmetry:
- Linearity in first argument:
- Positive definiteness: with equality iff
The induced norm is . The standard inner product on is .
Orthogonality
Two vectors are orthogonal if and only if , written . A set is orthonormal if and only if (Kronecker delta). The orthogonal complement of a subspace is .
Orthogonal Projection
The orthogonal projection of onto a subspace is the unique minimizing over , when this minimizer exists. If is an orthonormal basis for :
The residual lies in .
In finite dimensions every subspace admits a unique projection. In infinite-dimensional Hilbert spaces, existence and uniqueness of require to be a closed subspace (Hilbert projection theorem). For a non-closed subspace, the infimum may not be attained in ; one then projects onto its closure .
Orthogonal projection is the geometry behind least squares. When the linear system has no exact solution, the closest reachable point is where is given by the Moore-Penrose pseudoinverse. The residual lies in , which is the optimality condition. The Pseudoinverse Geometry Lab makes this draggable: move and watch the residual stay perpendicular to .
Hilbert Space (Preview)
A Hilbert space is a complete inner product space: every Cauchy sequence converges. Finite-dimensional inner product spaces are automatically Hilbert spaces. Infinite-dimensional examples include (the space of square-integrable functions) and reproducing kernel Hilbert spaces (RKHS) used in kernel methods.
Main Theorems
Cauchy-Schwarz Inequality
Statement
For all :
Equality holds if and only if and are linearly dependent.
Intuition
The cosine of the angle between two vectors has magnitude at most 1. In any inner product space, we can define , and Cauchy-Schwarz says this is well-defined.
Proof Sketch
For , define . Then and . Rearranging gives .
Why It Matters
Cauchy-Schwarz is the single most used inequality in analysis. It proves the triangle inequality for the induced norm. It bounds inner products (and hence correlations, cosine similarities, kernel evaluations) in terms of norms.
Failure Mode
The inequality is tight only when vectors are linearly dependent. For nearly orthogonal vectors, the bound is very loose. In high dimensions, random vectors tend to be nearly orthogonal, so the bound often has significant slack.
Gram-Schmidt Orthogonalization
Given linearly independent vectors , the Gram-Schmidt process produces an orthonormal set spanning the same subspace:
This is the constructive proof that every finite-dimensional inner product space has an orthonormal basis. In matrix form, Gram-Schmidt produces the QR decomposition , which is central to numerical matrix operations.
Bessel's Inequality and the Polarization Identity
Two identities recur throughout analysis and ML.
Bessel's Inequality
Let be an orthonormal set in an inner product space (finite or countable). For every :
with equality if and only if lies in the closure of . Each term is the energy of along the direction ; the inequality says total projected energy cannot exceed total energy. Equality is Parseval's identity and characterizes complete orthonormal systems.
Polarization Identity
For real inner product spaces, the inner product is recoverable from the norm:
Combined with the parallelogram law , this characterizes inner-product norms among all norms (Jordan-von Neumann theorem). Practically, it is how kernel methods compute inner products from distances: , the trick that turns a distance kernel into an inner product kernel.
Beyond the Standard Dot Product
Two inner products that appear constantly in ML.
Frobenius Inner Product on Matrices
For :
with induced norm . Vectorizing and shows this is just the standard dot product on . The Frobenius inner product gives the geometry behind low-rank matrix approximation: the Eckart-Young theorem says the best rank- approximation of in Frobenius norm is its truncated SVD, and the optimal value equals the residual singular-value energy.
Mahalanobis Inner Product
For a symmetric positive-definite matrix :
with induced Mahalanobis distance . Choosing (the inverse covariance) makes the geometry whitening-invariant: directions of high variance get down-weighted so that Mahalanobis distance reflects standardized deviation rather than raw Euclidean displacement. Quadratic discriminant analysis, Gaussian likelihoods, and metric learning (Xing et al. 2002, Weinberger and Saul 2009) all use Mahalanobis inner products. The factorization shows that - a Mahalanobis inner product is just a standard inner product after a linear transformation.
Common Confusions
Orthogonality depends on the inner product
Vectors that are orthogonal under the standard dot product may not be orthogonal under a different inner product. In ML, when using a Mahalanobis distance , the notion of orthogonality changes with .
Projection minimizes distance, not angle
The orthogonal projection is the closest point in to in the norm sense. It does not necessarily minimize the angle between and elements of .
Canonical Examples
Least squares as projection
Given with no exact solution, the least squares solution minimizes . This is equivalent to projecting onto the column space of . The projection satisfies the normal equations , which express the condition that the residual is orthogonal to the column space.
Exercises
Problem
Let and in with the standard inner product. Compute and verify that is orthogonal to .
Problem
Prove that in any inner product space, the induced norm satisfies the parallelogram law: . Show conversely that any norm satisfying the parallelogram law comes from an inner product.
References
Canonical:
- Axler, Linear Algebra Done Right (2024), Chapters 6-7
- Halmos, Finite-Dimensional Vector Spaces (1958), Chapters 1-2
- Strang, Linear Algebra and Its Applications (2006), Section 4.4 (orthogonality and projections)
For ML context:
- Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Chapter 3
- Horn & Johnson, Matrix Analysis (2013), Chapter 5 (norms and inner products)
- Trefethen & Bau, Numerical Linear Algebra (1997), Lectures 7-8 (Gram-Schmidt and QR decomposition)
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
1- Vectors, Matrices, and Linear Mapslayer 0A · tier 1
Derived topics
5- Eigenvalues and Eigenvectorslayer 0A · tier 1
- Gram Matrices and Kernel Matriceslayer 1 · tier 1
- Functional Analysis Corelayer 0B · tier 2
- Semantic Search and Embeddingslayer 3 · tier 2
- Random Matrix Theory Overviewlayer 4 · tier 2