Foundations
Matrix Operations and Properties
Essential matrix operations for ML: trace, determinant, inverse, pseudoinverse, Schur complement, and the Sherman-Morrison-Woodbury formula. When and why each matters.
Prerequisites
Why This Matters
Matrices are the language of ML. Weight matrices, covariance matrices, kernel matrices, Hessians. They are everywhere. You need to know what operations you can perform on them, what those operations mean, and when they are numerically safe.
Three operations recur throughout: the , the , and the .
This page covers the operations that appear repeatedly in ML theory and practice.
If you want to build the geometric intuition first, the Matrix Mechanics Lab is the hands-on version of this page. It shows what matrix multiplication does to shapes, vectors, and coordinate systems before the abstract identities take over.
Quick Version
- Trace. adds the diagonal entries and, for square matrices, equals the sum of eigenvalues.
- Determinant. measures signed area or volume scaling. Zero means the map crushes some direction completely. Numerically, use singular values or a condition number rather than determinant magnitude to judge near singularity.
- Inverse. exists only when the map is nonsingular, but the real numerical question is whether inversion is stable enough to trust.
- Pseudoinverse. gives the minimum-norm least-squares solution when the exact inverse does not exist.
- Schur complement / SMW. These are the “block algebra” and “cheap update” tools that keep Gaussian, kernel, and optimization computations tractable.
Mental Model
A matrix represents a linear map between vector spaces. Trace, determinant, inverse, and pseudoinverse each extract a different invariant of that map: trace sums the eigenvalues, determinant is the signed volume-scaling factor, inverse exists when no eigenvalue is zero, and pseudoinverse handles the rank-deficient case via the SVD.
Trace
Trace
The trace of a square matrix is the sum of its diagonal elements:
Equivalently, the trace equals the sum of eigenvalues: .
Key properties of the trace:
- Linearity: and
- Cyclic property:
- Transpose invariance:
The cyclic property is extremely useful. It lets you rearrange matrix products inside a trace, which simplifies many derivations in ML (e.g., computing gradients of matrix expressions).
Determinant
Determinant
The determinant of a square matrix equals the product of its eigenvalues:
Geometrically, measures the factor by which scales volumes. The matrix is singular (not invertible) if and only if .
Key properties:
- for
In ML, determinants appear in Gaussian distributions (the normalization constant involves ), in volume arguments for information theory, and in Bayesian model selection.
In numerical code, determinant magnitude is a poor invertibility diagnostic. The product of many eigenvalues can underflow, overflow, or look tiny simply because the dimension is large. If you care whether solving with is safe, look at singular values, rank, or the condition number. If you need for a positive definite matrix, compute it through a Cholesky factorization: if , then .
Matrix Inverse
Matrix Inverse
For a square matrix , the inverse satisfies . It exists if and only if (equivalently, all eigenvalues are nonzero).
Key identities:
- (note the reversed order)
When Inversion is Dangerous
Condition Number
The condition number of a matrix is:
where and are the largest and smallest singular values. A large condition number means the matrix is nearly singular and inversion is numerically unstable.
Rule of thumb: if , you lose about digits of accuracy when solving by inversion. For double precision (about 16 digits), is dangerous.
In practice, avoid explicit matrix inversion. Use factorizations (Cholesky, LU, QR) to solve linear systems instead.
Moore-Penrose Pseudoinverse
Moore-Penrose Pseudoinverse
The Moore-Penrose pseudoinverse of a matrix is the unique matrix satisfying four conditions: (1) , (2) , (3) , (4) .
For full column rank ():
For full row rank ():
The pseudoinverse gives the least-squares solution to when is not square or not invertible: minimizes . When the solution is non-unique, also has the smallest among the minimizers — the minimum-norm least-squares solution.
This is exactly what happens in linear regression: the OLS solution is .
The cleanest geometric reading is the SVD form , where inverts the nonzero singular values and zeroes the rest. That form is numerically stable and works for every matrix shape. See SVD. For the orthogonal-projection picture (closest reachable output, simplest input that reaches it), the standalone Pseudoinverse Geometry Lab walks the three cases — overdetermined, underdetermined, rank-deficient — as draggable diagrams.
Transpose and Adjoint
The transpose swaps rows and columns: .
The conjugate transpose (adjoint) also conjugates complex entries: . For real matrices, .
A matrix is symmetric if and only if . A matrix is orthogonal if and only if . These properties simplify many computations. Symmetric matrices have real eigenvalues, and orthogonal matrices preserve lengths. A symmetric matrix with all nonnegative eigenvalues is positive semidefinite.
Schur Complement
Schur Complement
Given a block matrix:
If is invertible, the Schur complement of in is:
The determinant factors as .
The Schur complement appears in:
- Gaussian conditioning (deriving conditional distributions from joint)
- Block matrix inversion
- Optimization (eliminating variables in quadratic forms)
Sherman-Morrison-Woodbury Formula
Sherman-Morrison-Woodbury Formula
Statement
If is invertible, , is invertible, and , then:
Intuition
When you add a low-rank update to a matrix whose inverse you already know, you can compute the new inverse by solving a smaller system instead of a full inversion. This is a huge saving when .
Proof Sketch
Multiply by the proposed right-hand side and verify you get . This is a direct algebraic verification. Expand the product and simplify using and .
Why It Matters
This formula appears throughout ML: online learning (rank-1 updates to covariance matrices), Kalman filters, Gaussian process inference with structured kernels, and Bayesian linear regression. Any time you have and need , use this formula.
Failure Mode
The formula requires both and to be invertible. If either is singular (or nearly so), the formula is inapplicable or numerically unstable.
The special case with (rank-1 update) is the Sherman-Morrison formula:
ML Implementation Patterns
| Task | Algebraic object | Safer implementation habit |
|---|---|---|
| Solve | Use solve, LU, QR, or Cholesky; do not form | |
| Gaussian log likelihood | Use Cholesky and sum log diagonals | |
| Least squares | Use QR or SVD, especially when is ill-conditioned | |
| Rank or low-rank structure | Inspect singular values and choose a tolerance | |
| Online covariance / kernel update | Use SMW only when the small inner system is well-conditioned |
Common Confusions
Trace and determinant serve different purposes
The trace sums eigenvalues; the determinant multiplies them. A matrix can have large trace but zero determinant (e.g., ). The trace tells you about the "total magnitude" of eigenvalues; the determinant tells you whether the matrix is invertible and how it scales volume.
Never invert matrices explicitly
In nearly all practical ML code, you should solve using a linear
system solver (e.g., np.linalg.solve), not by computing and then
multiplying. Direct inversion is slower, less numerically stable, and rarely
necessary.
A tiny determinant is not the same as a bad solve
Determinants multiply all volume scalings together, so they depend strongly on dimension and units. A matrix can have a tiny determinant because it has many moderately small singular values, while the solve may still be acceptable. A matrix can also have determinant 1 and be badly conditioned if one singular value is huge and another is tiny. Use singular values or when the question is numerical safety.
Summary
- = sum of eigenvalues; use the cyclic property freely
- = product of eigenvalues; zero means singular
- The pseudoinverse gives the least-squares solution to
- Condition number measures how dangerous inversion is
- Schur complements factor block matrices and appear in Gaussian conditioning
- Sherman-Morrison-Woodbury turns rank- updates into solves
- Use Cholesky/QR/SVD/logdet tricks in code; algebraic identities are not always numerically safe as written
- In practice, solve linear systems instead of inverting matrices
Exercises
Problem
Let . Compute , , , and (using the 2-norm).
Problem
Show that for any matrices and .
Problem
You have computed for a matrix. A new data point arrives, requiring you to compute where . What is the computational cost using Sherman-Morrison versus recomputing the inverse from scratch?
References
Canonical:
- Strang, Introduction to Linear Algebra (2016), Chapters 2, 5, 6
- Horn & Johnson, Matrix Analysis (2013), Chapters 0-1
- Golub & Van Loan, Matrix Computations (2013), Chapters 2-3 (matrix operations and LU/QR factorization)
Current:
- Petersen & Pedersen, The Matrix Cookbook (2012). Essential reference for matrix identities.
- Axler, Linear Algebra Done Right (2024), Chapters 3-4 (linear maps, invertibility, determinants)
- Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Chapter 4 (matrix decompositions)
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
2- Linear Independencelayer 0A · tier 1
- Sets, Functions, and Relationslayer 0A · tier 1
Derived topics
12- Eigenvalues and Eigenvectorslayer 0A · tier 1
- Singular Value Decompositionlayer 0A · tier 1
- The Hessian Matrixlayer 0A · tier 1
- Conditioning and Condition Numberlayer 1 · tier 1
- Convex Optimization Basicslayer 1 · tier 1
+7 more on the derived-topics page.
Graph-backed continuations