Spectral Theory of Operators

Sneiderman, Robby

Mathematical Infrastructure

Spectral Theory of Operators

Spectral theorem for compact self-adjoint operators on Hilbert spaces: every such operator has a countable orthonormal eigenbasis with real eigenvalues accumulating only at zero. This is the infinite-dimensional backbone of PCA, kernel methods, and neural tangent kernel theory.

AdvancedTier 3StableSupporting~65 min

Prerequisites

Eigenvalues and Eigenvectors Complex Numbers for Fourier Functional Analysis Core

Quiz (5)Pulse Check Prereq Map

Why This Matters

In finite dimensions, the spectral theorem for symmetric matrices gives you PCA: diagonalize the covariance matrix and read off the principal components via eigenvalues and eigenvectors. But many ML methods live in infinite-dimensional spaces. Kernel methods operate in a reproducing kernel Hilbert space. Gaussian processes define covariance operators on function spaces. The neural tangent kernel is an infinite-width limit operator. To rigorously analyze any of these, you need the spectral theorem in infinite dimensions.

Five-panel infographic on spectral theory of linear operators: definitions of the resolvent and spectrum (point, continuous, residual), the spectral theorem for self-adjoint operators on Hilbert space, examples (multiplication operators, integral operators, differential operators), the functional calculus, and applications in ML (kernel methods, Laplacian eigenfunctions, Koopman operators). — Spectral theory generalizes eigen-decomposition to infinite-dimensional operators, and underwrites kernel methods, Laplacian-based learning, and Koopman analysis of dynamical systems.

The result is remarkably clean: compact self-adjoint operators on Hilbert spaces behave almost exactly like symmetric matrices, with a countable orthonormal eigenbasis and real eigenvalues.

Mental Model

A compact self-adjoint operator $T$ on a Hilbert space is the infinite-dimensional analog of a real symmetric matrix. Think of $T$ as stretching space along orthogonal directions (eigenfunctions) by amounts (eigenvalues) that decay to zero. The eigenvalues form a sequence converging to zero, and the eigenfunctions form a complete orthonormal basis for the range of $T$ .

The key difference from finite dimensions: the spectrum can have a continuous part in general, but compactness forces it to be discrete (except possibly at zero).

Formal Setup and Notation

Let $\mathcal{H}$ be a separable Hilbert space with inner product $\langle \cdot, \cdot \rangle$ . An operator $T: \mathcal{H} \to \mathcal{H}$ is self-adjoint if and only if $\langle Tx, y \rangle = \langle x, Ty \rangle$ for all $x, y \in \mathcal{H}$ . It is compact if and only if it maps bounded sets to precompact sets (sets with compact closure).

Definition

Compact Operator

An operator $T: \mathcal{H} \to \mathcal{H}$ is compact if and only if for every bounded sequence $\{x_n\}$ , the sequence $\{Tx_n\}$ has a convergent subsequence. Equivalently, $T$ is the norm limit of finite-rank operators. In ML: integral operators defined by continuous kernels on compact domains are compact.

Definition

Rayleigh Quotient

The Rayleigh quotient of a self-adjoint operator $T$ at $x \neq 0$ is:

$R(x) = \frac{\langle Tx, x \rangle}{\langle x, x \rangle}$

The supremum of $R(x)$ over all nonzero $x$ equals the largest eigenvalue (or the supremum of the spectrum). This is the infinite-dimensional analog of the variational characterization of eigenvalues.

Main Theorems

Theorem

Spectral Theorem for Compact Self-Adjoint Operators

Statement

Let $T: \mathcal{H} \to \mathcal{H}$ be a compact self-adjoint operator on a separable Hilbert space. Then there exists a countable (finite or infinite) orthonormal sequence $\{e_n\}$ of eigenvectors with corresponding real eigenvalues $\{\lambda_n\}$ such that:

$Tx = \sum_{n} \lambda_n \langle x, e_n \rangle e_n \quad \text{for all } x \in \mathcal{H}$

The eigenvalues satisfy $|\lambda_1| \geq |\lambda_2| \geq \cdots$ and $\lambda_n \to 0$ if the sequence is infinite. The eigenvectors $\{e_n\}$ form an orthonormal basis for $\overline{\text{range}(T)}$ .

Intuition

A compact self-adjoint operator is a "matrix" with countably many rows and columns, all orthogonal, with entries (eigenvalues) that decay to zero. You can truncate the sum after $k$ terms to get a rank- $k$ approximation, just like truncated SVD in finite dimensions.

Proof Sketch

Step 1: Show $\|T\| = \sup_{\|x\|=1} |\langle Tx, x \rangle|$ , so either $\lambda_1 = \|T\|$ or $\lambda_1 = -\|T\|$ is achieved (by compactness of the unit ball image). Step 2: The maximizer $e_1$ is an eigenvector with eigenvalue $\lambda_1$ . Step 3: $T$ maps $\{e_1\}^\perp$ to itself (by self-adjointness), and the restriction is again compact and self-adjoint. Step 4: Induct to extract $e_2, e_3, \ldots$ . Step 5: The remaining eigenvalues converge to zero by compactness.

Why It Matters

This theorem is the foundation of kernel PCA, Mercer's theorem, and the analysis of integral operators. When you compute the eigendecomposition of a kernel matrix, you are approximating this spectral decomposition. The decay rate of the eigenvalues determines the effective dimensionality of the RKHS and the statistical difficulty of kernel learning.

Failure Mode

Without compactness, the conclusion of the spectral theorem in this "countable eigenvalues converging to zero" form fails. The identity operator $I$ on a separable infinite-dimensional Hilbert space has spectrum $\{1\}$ and every nonzero vector is an eigenvector with eigenvalue $1$ , so it does admit an orthonormal eigenbasis — just not one whose eigenvalues decay to zero, which is exactly the property compactness would force. More general bounded self-adjoint operators (e.g. multiplication by $x$ on $L^2[0,1]$ ) have purely continuous spectrum and no eigenvectors at all. Without self-adjointness, eigenvalues can be complex and eigenvectors need not be orthogonal.

report a correction →

Theorem

Min-Max Characterization of Eigenvalues (Courant-Fischer)

Statement

The $k$ -th eigenvalue of a compact self-adjoint operator $T$ (in decreasing order) satisfies:

$\lambda_k = \max_{\dim(V) = k} \min_{x \in V, \|x\| = 1} \langle Tx, x \rangle = \min_{\dim(V) = k-1} \max_{x \perp V, \|x\| = 1} \langle Tx, x \rangle$

Intuition

The $k$ -th eigenvalue is the best worst-case Rayleigh quotient over all $k$ -dimensional subspaces. This gives a variational characterization that does not require knowing the previous eigenvectors. It directly implies that eigenvalues are stable under perturbations (Weyl's inequality).

Proof Sketch

For the max-min form: any $k$ -dimensional subspace $V$ must intersect $\text{span}\{e_k, e_{k+1}, \ldots\}$ nontrivially (by dimension counting). On this intersection, $\langle Tx, x \rangle \leq \lambda_k$ . Choosing $V = \text{span}\{e_1, \ldots, e_k\}$ achieves equality.

Why It Matters

The min-max principle is the theoretical foundation of PCA optimality. It proves that the top- $k$ eigenvectors give the best rank- $k$ subspace for capturing variance. It also gives eigenvalue perturbation bounds (Weyl's inequality) that control how eigenvalues change when the operator is perturbed.

Failure Mode

The min-max principle applies to self-adjoint operators. For non-self-adjoint operators (e.g., non-symmetric kernel matrices), singular values replace eigenvalues and the characterization involves both left and right subspaces.

report a correction →

The spectral theory here underpins the concentration inequalities used in kernel learning theory, where eigenvalue decay controls the effective complexity of the hypothesis class.

Connection to ML

Definition

Mercer's Theorem (Connection)

If $k: \mathcal{X} \times \mathcal{X} \to \mathbb{R}$ is a continuous positive definite kernel on a compact domain, the integral operator $(T_k f)(x) = \int k(x, x') f(x') dx'$ is compact and self-adjoint. The spectral theorem gives $k(x, x') = \sum_n \lambda_n e_n(x) e_n(x')$ . This is Mercer's theorem. The eigenfunctions $e_n$ are the feature map, and $\lambda_n$ controls the importance of each feature.

Operator Classes and Spectral Structure

Hilbert-Schmidt and trace-class operators

A compact operator $T$ is Hilbert-Schmidt if and only if $\sum_n \lambda_n^2 < \infty$ , equivalently $\text{tr}(T^*T) < \infty$ . It is trace-class if and only if $\sum_n |\lambda_n| < \infty$ (using singular values in the non-self-adjoint case). The containments are strict: trace-class $\subset$ Hilbert-Schmidt $\subset$ compact. Mercer kernels with $\int k(x,x) \, dx < \infty$ induce Hilbert-Schmidt integral operators, and continuous Mercer kernels on compact domains are trace-class with $\text{tr}(T_k) = \int k(x,x) \, dx$ .

Singular value decomposition for compact operators

For a compact operator $T: \mathcal{H}_1 \to \mathcal{H}_2$ (not necessarily self-adjoint), there exist orthonormal systems $\{u_n\}$ in $\mathcal{H}_1$ and $\{v_n\}$ in $\mathcal{H}_2$ and non-negative singular values $\sigma_n \to 0$ such that $T x = \sum_n \sigma_n \langle x, u_n \rangle v_n$ . The $\sigma_n$ are the eigenvalues of $(T^*T)^{1/2}$ . This is the exact parallel of the finite-dimensional SVD and reduces to the spectral theorem when $T$ is self-adjoint (with $\sigma_n = |\lambda_n|$ ).

Weyl's inequality

For self-adjoint compact operators $A, B$ with eigenvalues arranged in decreasing order, $\lambda_{j+k-1}(A+B) \leq \lambda_j(A) + \lambda_k(B)$ . This is the foundational spectral perturbation result: small additive perturbations produce small eigenvalue shifts. It follows directly from the Courant-Fischer min-max characterization.

Schatten $p$ -norms

For a compact operator $T$ with singular values $\{\sigma_n\}$ , the Schatten $p$ -norm is $\|T\|_{S_p} = (\sum_n \sigma_n^p)^{1/p}$ for $1 \leq p < \infty$ , with $\|T\|_{S_\infty} = \sigma_1$ recovering the operator norm. $S_1$ is the trace class, $S_2$ is the Hilbert-Schmidt class (a Hilbert space under $\langle A, B \rangle_{S_2} = \text{tr}(A^*B)$ ). The Schatten classes are dual: $S_p^* = S_q$ with $1/p + 1/q = 1$ , paralleling the $\ell^p$ duality.

Canonical Examples

Example

Spectral decomposition of a Gaussian kernel operator

Consider the Gaussian kernel $k(x, x') = \exp(-\|x - x'\|^2 / (2\sigma^2))$ on $[0,1]$ . The integral operator has eigenvalues that decay exponentially fast: $\lambda_n \sim C \exp(-c n)$ . This rapid decay means the RKHS is effectively finite-dimensional, explaining why Gaussian kernel methods work well even with moderate sample sizes. The eigenfunctions resemble Hermite functions. This rate is exact on $\mathbb{R}^d$ with Gaussian measure via Mehler's formula. On bounded domains with Lebesgue measure the decay is super-algebraic with a bandwidth-dependent rate (Santin-Schaback 2016 arXiv:1504.02868).

Example

NTK spectral analysis

The neural tangent kernel $\Theta(x, x')$ at infinite width defines a compact self-adjoint operator on $L^2(\text{data distribution})$ . The eigenvalue decay rate of this NTK operator determines the learning speed of gradient descent: modes with large eigenvalues are learned first, and modes with small eigenvalues are learned slowly or not at all. This is the spectral bias of neural networks.

Common Confusions

Watch Out

Bounded does not mean compact

Every compact operator is bounded, but not conversely. The identity operator on an infinite-dimensional Hilbert space is bounded but not compact. The compactness condition is strictly stronger and is what forces the eigenvalues to form a discrete sequence converging to zero.

Watch Out

Zero can be in the spectrum without being an eigenvalue

For compact operators, zero may or may not be an eigenvalue, but it is always in the spectrum when the Hilbert space is infinite-dimensional. The null space of $T$ (eigenvectors for eigenvalue 0) can be trivial, finite, or infinite-dimensional.

Watch Out

The empirical kernel matrix is not the operator

Training-time analysis often works with the Gram matrix $K_{ij} = k(x_i, x_j)$ , whose eigenvalues scale with $n$ and whose eigenvectors are vectors in $\mathbb{R}^n$ . The underlying object is the integral operator $T_k$ on $L^2$ , whose eigenvalues $\lambda_n$ are fixed and whose eigenfunctions are functions on $\mathcal{X}$ . The two are linked by concentration: $\tfrac{1}{n} \lambda_j(K) \to \lambda_j(T_k)$ under mild conditions, with rate controlled by kernel smoothness (Koltchinskii-Giné 2000). When people write "kernel eigenvalue decay", they usually mean the operator eigenvalues; the matrix eigenvalues are only a finite-sample proxy.

Summary

Compact self-adjoint operators have countable eigenvalues converging to zero
Eigenvectors form an orthonormal basis for the closure of the range
Rayleigh quotient variational principle gives eigenvalues without knowing eigenvectors
Courant-Fischer min-max theorem implies PCA optimality and perturbation bounds
Mercer's theorem connects kernel functions to spectral decompositions
Eigenvalue decay rate controls effective dimensionality and learning difficulty

Exercises

ExerciseCore

Problem

Let $T$ be a compact self-adjoint operator with eigenvalues $\lambda_1 \geq \lambda_2 \geq \cdots \geq 0$ . Show that the best rank- $k$ approximation to $T$ in operator norm is $T_k = \sum_{i=1}^k \lambda_i \langle \cdot, e_i \rangle e_i$ and the approximation error is $\|T - T_k\| = \lambda_{k+1}$ .

ExerciseAdvanced

Problem

Suppose a kernel $k$ on $[0,1]$ has eigenvalues $\lambda_n = n^{-2s}$ for $s > 1/2$ . What is the trace of the integral operator $T_k$ ? What does this represent in terms of the kernel?

References

Canonical:

Reed & Simon, Methods of Modern Mathematical Physics I: Functional Analysis (1980), Chapter VI
Lax, Functional Analysis (2002), Chapters 28-31

Current:

Steinwart & Christmann, Support Vector Machines (2008), Chapter 4 (Mercer's theorem for ML)
Wainwright, High-Dimensional Statistics (2019), Chapter 12 (kernel operators)
Santin & Schaback, "Approximation of eigenfunctions in kernel-based spaces" (2016), arXiv:1504.02868 (eigenvalue decay rates for Gaussian kernels on bounded domains)

Further directions

Polar decomposition for non-self-adjoint operators
Resolvent and functional calculus
Stone's theorem for unbounded self-adjoint operators
Spectral theorem for normal operators (broader class)
Connection to empirical PCA and SVD of data matrix
Kotani-Simon theorem on eigenvalue decay for Hilbert-Schmidt kernels

Next Topics

Natural extensions from spectral theory:

Kernels and RKHS: reproducing kernel Hilbert spaces built on Mercer's theorem
Neural tangent kernel: spectral analysis of infinite-width neural networks
Principal component analysis: finite-sample spectral methods
Functional analysis core: Hahn-Banach, open mapping, Banach-Alaoglu — the big four supporting all of this

Last reviewed: April 18, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Eigenvalues and Eigenvectorslayer 0A · tier 1
Complex Numbers for Fourierlayer 0A · tier 2
Functional Analysis Corelayer 0B · tier 2

Derived topics

2

DeepONetlayer 3 · tier 2
Fourier Neural Operatorlayer 3 · tier 2

Graph-backed continuations

DeepONet Fourier Neural Operator

Why This Matters

Mental Model

Formal Setup and Notation

Main Theorems

Connection to ML

Operator Classes and Spectral Structure

Hilbert-Schmidt and trace-class operators

Singular value decomposition for compact operators

Weyl's inequality

Schatten ppp-norms

Canonical Examples

Common Confusions

Summary

Exercises

References

Further directions

Next Topics

Required before and derived from this topic

Required prerequisites

Derived topics

Schatten $p$ -norms