Skip to main content

Mathematical Infrastructure

Spectral Theory of Operators

Spectral theorem for compact self-adjoint operators on Hilbert spaces: every such operator has a countable orthonormal eigenbasis with real eigenvalues accumulating only at zero. This is the infinite-dimensional backbone of PCA, kernel methods, and neural tangent kernel theory.

AdvancedTier 3StableSupporting~65 min

Why This Matters

In finite dimensions, the spectral theorem for symmetric matrices gives you PCA: diagonalize the covariance matrix and read off the principal components via eigenvalues and eigenvectors. But many ML methods live in infinite-dimensional spaces. Kernel methods operate in a reproducing kernel Hilbert space. Gaussian processes define covariance operators on function spaces. The neural tangent kernel is an infinite-width limit operator. To rigorously analyze any of these, you need the spectral theorem in infinite dimensions.

Hide overviewShow overview
Five-panel infographic on spectral theory of linear operators: definitions of the resolvent and spectrum (point, continuous, residual), the spectral theorem for self-adjoint operators on Hilbert space, examples (multiplication operators, integral operators, differential operators), the functional calculus, and applications in ML (kernel methods, Laplacian eigenfunctions, Koopman operators).
Spectral theory generalizes eigen-decomposition to infinite-dimensional operators, and underwrites kernel methods, Laplacian-based learning, and Koopman analysis of dynamical systems.

The result is remarkably clean: compact self-adjoint operators on Hilbert spaces behave almost exactly like symmetric matrices, with a countable orthonormal eigenbasis and real eigenvalues.

Mental Model

A compact self-adjoint operator TT on a Hilbert space is the infinite-dimensional analog of a real symmetric matrix. Think of TT as stretching space along orthogonal directions (eigenfunctions) by amounts (eigenvalues) that decay to zero. The eigenvalues form a sequence converging to zero, and the eigenfunctions form a complete orthonormal basis for the range of TT.

The key difference from finite dimensions: the spectrum can have a continuous part in general, but compactness forces it to be discrete (except possibly at zero).

Formal Setup and Notation

Let H\mathcal{H} be a separable Hilbert space with inner product ,\langle \cdot, \cdot \rangle. An operator T:HHT: \mathcal{H} \to \mathcal{H} is self-adjoint if and only if Tx,y=x,Ty\langle Tx, y \rangle = \langle x, Ty \rangle for all x,yHx, y \in \mathcal{H}. It is compact if and only if it maps bounded sets to precompact sets (sets with compact closure).

Definition

Compact Operator

An operator T:HHT: \mathcal{H} \to \mathcal{H} is compact if and only if for every bounded sequence {xn}\{x_n\}, the sequence {Txn}\{Tx_n\} has a convergent subsequence. Equivalently, TT is the norm limit of finite-rank operators. In ML: integral operators defined by continuous kernels on compact domains are compact.

Definition

Rayleigh Quotient

The Rayleigh quotient of a self-adjoint operator TT at x0x \neq 0 is:

R(x)=Tx,xx,xR(x) = \frac{\langle Tx, x \rangle}{\langle x, x \rangle}

The supremum of R(x)R(x) over all nonzero xx equals the largest eigenvalue (or the supremum of the spectrum). This is the infinite-dimensional analog of the variational characterization of eigenvalues.

Main Theorems

Theorem

Spectral Theorem for Compact Self-Adjoint Operators

Statement

Let T:HHT: \mathcal{H} \to \mathcal{H} be a compact self-adjoint operator on a separable Hilbert space. Then there exists a countable (finite or infinite) orthonormal sequence {en}\{e_n\} of eigenvectors with corresponding real eigenvalues {λn}\{\lambda_n\} such that:

Tx=nλnx,enenfor all xHTx = \sum_{n} \lambda_n \langle x, e_n \rangle e_n \quad \text{for all } x \in \mathcal{H}

The eigenvalues satisfy λ1λ2|\lambda_1| \geq |\lambda_2| \geq \cdots and λn0\lambda_n \to 0 if the sequence is infinite. The eigenvectors {en}\{e_n\} form an orthonormal basis for range(T)\overline{\text{range}(T)}.

Intuition

A compact self-adjoint operator is a "matrix" with countably many rows and columns, all orthogonal, with entries (eigenvalues) that decay to zero. You can truncate the sum after kk terms to get a rank-kk approximation, just like truncated SVD in finite dimensions.

Proof Sketch

Step 1: Show T=supx=1Tx,x\|T\| = \sup_{\|x\|=1} |\langle Tx, x \rangle|, so either λ1=T\lambda_1 = \|T\| or λ1=T\lambda_1 = -\|T\| is achieved (by compactness of the unit ball image). Step 2: The maximizer e1e_1 is an eigenvector with eigenvalue λ1\lambda_1. Step 3: TT maps {e1}\{e_1\}^\perp to itself (by self-adjointness), and the restriction is again compact and self-adjoint. Step 4: Induct to extract e2,e3,e_2, e_3, \ldots. Step 5: The remaining eigenvalues converge to zero by compactness.

Why It Matters

This theorem is the foundation of kernel PCA, Mercer's theorem, and the analysis of integral operators. When you compute the eigendecomposition of a kernel matrix, you are approximating this spectral decomposition. The decay rate of the eigenvalues determines the effective dimensionality of the RKHS and the statistical difficulty of kernel learning.

Failure Mode

Without compactness, the conclusion of the spectral theorem in this "countable eigenvalues converging to zero" form fails. The identity operator II on a separable infinite-dimensional Hilbert space has spectrum {1}\{1\} and every nonzero vector is an eigenvector with eigenvalue 11, so it does admit an orthonormal eigenbasis — just not one whose eigenvalues decay to zero, which is exactly the property compactness would force. More general bounded self-adjoint operators (e.g. multiplication by xx on L2[0,1]L^2[0,1]) have purely continuous spectrum and no eigenvectors at all. Without self-adjointness, eigenvalues can be complex and eigenvectors need not be orthogonal.

Theorem

Min-Max Characterization of Eigenvalues (Courant-Fischer)

Statement

The kk-th eigenvalue of a compact self-adjoint operator TT (in decreasing order) satisfies:

λk=maxdim(V)=kminxV,x=1Tx,x=mindim(V)=k1maxxV,x=1Tx,x\lambda_k = \max_{\dim(V) = k} \min_{x \in V, \|x\| = 1} \langle Tx, x \rangle = \min_{\dim(V) = k-1} \max_{x \perp V, \|x\| = 1} \langle Tx, x \rangle

Intuition

The kk-th eigenvalue is the best worst-case Rayleigh quotient over all kk-dimensional subspaces. This gives a variational characterization that does not require knowing the previous eigenvectors. It directly implies that eigenvalues are stable under perturbations (Weyl's inequality).

Proof Sketch

For the max-min form: any kk-dimensional subspace VV must intersect span{ek,ek+1,}\text{span}\{e_k, e_{k+1}, \ldots\} nontrivially (by dimension counting). On this intersection, Tx,xλk\langle Tx, x \rangle \leq \lambda_k. Choosing V=span{e1,,ek}V = \text{span}\{e_1, \ldots, e_k\} achieves equality.

Why It Matters

The min-max principle is the theoretical foundation of PCA optimality. It proves that the top-kk eigenvectors give the best rank-kk subspace for capturing variance. It also gives eigenvalue perturbation bounds (Weyl's inequality) that control how eigenvalues change when the operator is perturbed.

Failure Mode

The min-max principle applies to self-adjoint operators. For non-self-adjoint operators (e.g., non-symmetric kernel matrices), singular values replace eigenvalues and the characterization involves both left and right subspaces.

The spectral theory here underpins the concentration inequalities used in kernel learning theory, where eigenvalue decay controls the effective complexity of the hypothesis class.

Connection to ML

Definition

Mercer's Theorem (Connection)

If k:X×XRk: \mathcal{X} \times \mathcal{X} \to \mathbb{R} is a continuous positive definite kernel on a compact domain, the integral operator (Tkf)(x)=k(x,x)f(x)dx(T_k f)(x) = \int k(x, x') f(x') dx' is compact and self-adjoint. The spectral theorem gives k(x,x)=nλnen(x)en(x)k(x, x') = \sum_n \lambda_n e_n(x) e_n(x'). This is Mercer's theorem. The eigenfunctions ene_n are the feature map, and λn\lambda_n controls the importance of each feature.

Operator Classes and Spectral Structure

Hilbert-Schmidt and trace-class operators

A compact operator TT is Hilbert-Schmidt if and only if nλn2<\sum_n \lambda_n^2 < \infty, equivalently tr(TT)<\text{tr}(T^*T) < \infty. It is trace-class if and only if nλn<\sum_n |\lambda_n| < \infty (using singular values in the non-self-adjoint case). The containments are strict: trace-class \subset Hilbert-Schmidt \subset compact. Mercer kernels with k(x,x)dx<\int k(x,x) \, dx < \infty induce Hilbert-Schmidt integral operators, and continuous Mercer kernels on compact domains are trace-class with tr(Tk)=k(x,x)dx\text{tr}(T_k) = \int k(x,x) \, dx.

Singular value decomposition for compact operators

For a compact operator T:H1H2T: \mathcal{H}_1 \to \mathcal{H}_2 (not necessarily self-adjoint), there exist orthonormal systems {un}\{u_n\} in H1\mathcal{H}_1 and {vn}\{v_n\} in H2\mathcal{H}_2 and non-negative singular values σn0\sigma_n \to 0 such that Tx=nσnx,unvnT x = \sum_n \sigma_n \langle x, u_n \rangle v_n. The σn\sigma_n are the eigenvalues of (TT)1/2(T^*T)^{1/2}. This is the exact parallel of the finite-dimensional SVD and reduces to the spectral theorem when TT is self-adjoint (with σn=λn\sigma_n = |\lambda_n|).

Weyl's inequality

For self-adjoint compact operators A,BA, B with eigenvalues arranged in decreasing order, λj+k1(A+B)λj(A)+λk(B)\lambda_{j+k-1}(A+B) \leq \lambda_j(A) + \lambda_k(B). This is the foundational spectral perturbation result: small additive perturbations produce small eigenvalue shifts. It follows directly from the Courant-Fischer min-max characterization.

Schatten pp-norms

For a compact operator TT with singular values {σn}\{\sigma_n\}, the Schatten pp-norm is TSp=(nσnp)1/p\|T\|_{S_p} = (\sum_n \sigma_n^p)^{1/p} for 1p<1 \leq p < \infty, with TS=σ1\|T\|_{S_\infty} = \sigma_1 recovering the operator norm. S1S_1 is the trace class, S2S_2 is the Hilbert-Schmidt class (a Hilbert space under A,BS2=tr(AB)\langle A, B \rangle_{S_2} = \text{tr}(A^*B)). The Schatten classes are dual: Sp=SqS_p^* = S_q with 1/p+1/q=11/p + 1/q = 1, paralleling the p\ell^p duality.

Canonical Examples

Example

Spectral decomposition of a Gaussian kernel operator

Consider the Gaussian kernel k(x,x)=exp(xx2/(2σ2))k(x, x') = \exp(-\|x - x'\|^2 / (2\sigma^2)) on [0,1][0,1]. The integral operator has eigenvalues that decay exponentially fast: λnCexp(cn)\lambda_n \sim C \exp(-c n). This rapid decay means the RKHS is effectively finite-dimensional, explaining why Gaussian kernel methods work well even with moderate sample sizes. The eigenfunctions resemble Hermite functions. This rate is exact on Rd\mathbb{R}^d with Gaussian measure via Mehler's formula. On bounded domains with Lebesgue measure the decay is super-algebraic with a bandwidth-dependent rate (Santin-Schaback 2016 arXiv:1504.02868).

Example

NTK spectral analysis

The neural tangent kernel Θ(x,x)\Theta(x, x') at infinite width defines a compact self-adjoint operator on L2(data distribution)L^2(\text{data distribution}). The eigenvalue decay rate of this NTK operator determines the learning speed of gradient descent: modes with large eigenvalues are learned first, and modes with small eigenvalues are learned slowly or not at all. This is the spectral bias of neural networks.

Common Confusions

Watch Out

Bounded does not mean compact

Every compact operator is bounded, but not conversely. The identity operator on an infinite-dimensional Hilbert space is bounded but not compact. The compactness condition is strictly stronger and is what forces the eigenvalues to form a discrete sequence converging to zero.

Watch Out

Zero can be in the spectrum without being an eigenvalue

For compact operators, zero may or may not be an eigenvalue, but it is always in the spectrum when the Hilbert space is infinite-dimensional. The null space of TT (eigenvectors for eigenvalue 0) can be trivial, finite, or infinite-dimensional.

Watch Out

The empirical kernel matrix is not the operator

Training-time analysis often works with the Gram matrix Kij=k(xi,xj)K_{ij} = k(x_i, x_j), whose eigenvalues scale with nn and whose eigenvectors are vectors in Rn\mathbb{R}^n. The underlying object is the integral operator TkT_k on L2L^2, whose eigenvalues λn\lambda_n are fixed and whose eigenfunctions are functions on X\mathcal{X}. The two are linked by concentration: 1nλj(K)λj(Tk)\tfrac{1}{n} \lambda_j(K) \to \lambda_j(T_k) under mild conditions, with rate controlled by kernel smoothness (Koltchinskii-Giné 2000). When people write "kernel eigenvalue decay", they usually mean the operator eigenvalues; the matrix eigenvalues are only a finite-sample proxy.

Summary

  • Compact self-adjoint operators have countable eigenvalues converging to zero
  • Eigenvectors form an orthonormal basis for the closure of the range
  • Rayleigh quotient variational principle gives eigenvalues without knowing eigenvectors
  • Courant-Fischer min-max theorem implies PCA optimality and perturbation bounds
  • Mercer's theorem connects kernel functions to spectral decompositions
  • Eigenvalue decay rate controls effective dimensionality and learning difficulty

Exercises

ExerciseCore

Problem

Let TT be a compact self-adjoint operator with eigenvalues λ1λ20\lambda_1 \geq \lambda_2 \geq \cdots \geq 0. Show that the best rank-kk approximation to TT in operator norm is Tk=i=1kλi,eieiT_k = \sum_{i=1}^k \lambda_i \langle \cdot, e_i \rangle e_i and the approximation error is TTk=λk+1\|T - T_k\| = \lambda_{k+1}.

ExerciseAdvanced

Problem

Suppose a kernel kk on [0,1][0,1] has eigenvalues λn=n2s\lambda_n = n^{-2s} for s>1/2s > 1/2. What is the trace of the integral operator TkT_k? What does this represent in terms of the kernel?

References

Canonical:

  • Reed & Simon, Methods of Modern Mathematical Physics I: Functional Analysis (1980), Chapter VI
  • Lax, Functional Analysis (2002), Chapters 28-31

Current:

  • Steinwart & Christmann, Support Vector Machines (2008), Chapter 4 (Mercer's theorem for ML)
  • Wainwright, High-Dimensional Statistics (2019), Chapter 12 (kernel operators)
  • Santin & Schaback, "Approximation of eigenfunctions in kernel-based spaces" (2016), arXiv:1504.02868 (eigenvalue decay rates for Gaussian kernels on bounded domains)

Further directions

  • Polar decomposition for non-self-adjoint operators
  • Resolvent and functional calculus
  • Stone's theorem for unbounded self-adjoint operators
  • Spectral theorem for normal operators (broader class)
  • Connection to empirical PCA and SVD of data matrix
  • Kotani-Simon theorem on eigenvalue decay for Hilbert-Schmidt kernels

Next Topics

Natural extensions from spectral theory:

Last reviewed: April 18, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

3

Derived topics

2

Graph-backed continuations