ML Methods
Hyperbolic Embeddings for Graphs
Negatively curved space embeds tree-like graphs with exponentially less distortion than Euclidean space at the same dimension. The Poincare ball model, Mobius gyrovector arithmetic, and Sala et al.'s sharp distortion bound make this concrete; Ganea et al. lift the construction to neural network layers.
Why This Matters
A perfect binary tree of depth has nodes, so the number of nodes within graph distance of the root grows like . A Euclidean ball of radius in has volume , polynomial in . To fit nodes into a polynomial volume while preserving distances, an embedding must compress: tree leaves get shoved together and pairwise distances are crushed. This is not a tuning issue. It is a volume mismatch between the data and the space.
Hyperbolic space has volume that grows like (see non-Euclidean and hyperbolic geometry). The exponential matches the tree. Sala, De Sa, Gu, and Re (2018) made this quantitative: any tree on nodes embeds into the Poincare disk with distortion using bits of precision, while the best Euclidean embedding into any fixed dimension has distortion for some trees.
This was not folklore. Nickel and Kiela (2017) reported 5-dimensional Poincare embeddings of WordNet hypernymy beating 200-dimensional Euclidean embeddings on link prediction. Ganea, Becigneul, and Hofmann (2018) then showed how to build neural-network layers (linear maps, biases, activations, attention) that operate inside rather than around it.
Setup: the Poincare ball
Poincare ball model
The Poincare ball is the open unit ball
equipped with the Riemannian metric
The induced distance between is
The boundary is at infinite distance from any interior point. The model has constant sectional curvature .
Two facts make usable as an embedding space:
- Geodesics through the origin are straight lines. The geodesic from to a point is the Euclidean segment, with hyperbolic arclength .
- Other geodesics are circular arcs orthogonal to the boundary . Picture: a "straight line" curves toward the center because pushing along the boundary is metrically expensive (the conformal factor as ).
The distortion bound for tree embeddings
The central result. Distortion of an embedding is the multiplicative gap between graph and embedded distances:
A perfect (isometric) embedding has distortion .
Sala-De Sa-Gu-Re distortion bound for tree embeddings
Intuition
Embed the root at the origin. For each node already placed, reserve an angular cone for each child and place the child at hyperbolic distance equal to the edge weight along the bisector of its cone. Because the circumference of a hyperbolic circle of radius scales like , the cone available to each child is exponentially larger than the cone of the parent, so even a high-branching tree fits without children colliding. Scaling the whole embedding by a large pushes nodes outward where there is exponential room, controlling angular interference.
Proof Sketch
Embedding construction (Sala et al., Algorithm 1). Place the root at . Process nodes in BFS order. For a node at hyperbolic distance from the origin with children, partition the angular cone at (the minus the angle back to its parent) into sectors. Place child at hyperbolic distance along the bisector of its sector, where is the edge weight.
Why the construction has low distortion. Two nodes in have tree distance , where is their lowest common ancestor. After scaling by , the embedded points satisfy , where is the max degree. The negative correction term is the angular wiggle from approximating "go up then down" by a hyperbolic geodesic. It is bounded uniformly in , so for large enough relative to , every pairwise ratio lies in .
Why precision bits is needed. Hyperbolic distances near the boundary saturate floating-point precision. A point at hyperbolic distance has Euclidean norm , so distinguishing two points at hyperbolic distance apart requires bits to keep from rounding to zero. Scaling pushes typical embedded distances to , hence the bound. Sala et al. give an exact ulp-level analysis.
Euclidean lower bound. Bourgain (1985) and Linial-London-Rabinovich (1995) show any -point metric embeds into with distortion , and this is tight: the complete binary tree (or random regular expander) requires distortion in any Euclidean space. So scaling dimension at fixed distortion does not save Euclidean embeddings of trees: distortion is unavoidable in for every .
Why It Matters
This is the only result that cleanly justifies "use hyperbolic space for hierarchies." The constructive direction tells you the minimum dimension and precision needed. The Euclidean lower bound tells you that no clever Euclidean parametrization can match it asymptotically. Together they explain Nickel and Kiela's 5D-vs-200D result without appealing to mystery: the Euclidean baseline was paying distortion, which a 200-dimensional bag-of-features partially absorbed but did not remove.
Failure Mode
The theorem is about trees and tree-like graphs (low Gromov hyperbolicity ). Grids, lattices, and dense graphs have and gain nothing. The same construction applied to a 2D grid embeds it with distortion , much worse than its trivial Euclidean embedding. Always estimate on a sample of 4-tuples before committing to a hyperbolic embedding (Adcock-Sullivan-Mahoney 2013 give a sampling estimator).
Mobius gyrovector arithmetic
To build neural networks in you need an analogue of vector addition that respects the manifold. Ungar's (2008) gyrovector formalism provides one.
Mobius addition on the Poincare ball
For , the Mobius sum is
It is not associative or commutative, but it is a smooth left-loop operation: , , and the map is a smooth bijection . The inverse is .
Exponential and logarithmic maps at the origin
The exponential map at the origin is
and the logarithm is its inverse:
takes a tangent vector at and returns the point in reached by moving along the geodesic from in that direction for time equal to the tangent vector's length.
A Ganea-style hyperbolic linear layer is then
where and . The construction reads as: lift to the tangent space at the origin, apply a Euclidean linear map, project back to the manifold, then translate by Mobius addition. Hyperbolic activations, attention scores, and MLR classifiers follow the same lift-act-project pattern.
Hyperbolic distance is not a rescaling of Euclidean distance
The Poincare distance formula is not just scaled by a function of position. Two points with small Euclidean separation near the boundary can be very far apart in because as . This is why hyperbolic embeddings have capacity: the boundary is at infinity, so there is always more room to push deeper subtrees outward.
Hyperbolic neural networks are not Euclidean networks with a tanh squash
The output of lies in , so a Euclidean MLP with a final tanh produces points in as a set. That is not a hyperbolic neural network. The intermediate operations (matrix multiplication, addition, normalization) treat the points as Euclidean and ignore the metric. A hyperbolic network uses Mobius operations and Riemannian gradients computed via and , so each parameter step respects the geometry. Treating points as Euclidean, then re-projecting after each step, loses exactly the structure the model was supposed to exploit.
Curvature is a hyperparameter, not a fixed truth
Most papers set curvature for convenience. Curvature scales distances and gradients in ways that interact with optimization and numerical precision. Chami et al. (2019) make learnable per layer; the learned values often drift toward zero on graphs with low Gromov but high local clustering, indicating that the model wants Euclidean geometry in places. If your model insists on everywhere, that is evidence the dataset is not actually tree-like.
When hyperbolic embeddings help
The right diagnostic is Gromov hyperbolicity , the smallest constant such that every geodesic triangle is -thin (each side lies within of the union of the other two). Trees have ; trees with extra cross-edges have small ; expanders and 2D grids have .
Empirically:
| Graph family | Typical | Hyperbolic gain over Euclidean at fixed dim |
|---|---|---|
| WordNet hypernymy, Gene Ontology, citation chains | Large; Nickel-Kiela report 30%+ MAP improvements at 5-50 dim | |
| Social networks (Facebook, citation networks) | to | Moderate; Chami et al. show small but consistent gains |
| Web graphs, biological PPI | to | Marginal; depends on subgraph and task |
| Road networks, 2D meshes, regular grids | None or negative; Euclidean is the right choice | |
| Random Erdos-Renyi | None for typical |
The takeaway: estimate before committing. It is a one-pass sampling computation.
Worked Example: a 7-node binary tree in
Take the perfect binary tree with root , children , grandchildren for . All edges have weight .
Place . Place the two children on opposite rays at hyperbolic distance :
For , the angular cone facing away from is the right half-plane. Split it into two sectors and place the grandchildren on the bisectors at hyperbolic distance from . This requires Mobius translation: , and similarly for .
The hyperbolic distance comes out to roughly , and the tree distance is . With scale factor , the same construction gives versus tree distance , distortion . Pushing higher pushes the ratio further toward , illustrating the construction in the proof sketch.
Problem
Verify directly that the Poincare ball model has constant curvature at the origin. Specifically: compute the geodesic distance from to a point along the Euclidean segment, and show it equals . Then compute the circumference of the hyperbolic circle of radius centered at and show it equals , the signature of curvature .
Problem
Prove the Euclidean lower bound for the path graph: any embedding of the unweighted path on nodes into Euclidean space has distortion , but any embedding of the complete binary tree on nodes has distortion .
Then explain why "complete binary tree" is the right witness: what specific feature of defeats Euclidean embedding that paths and stars do not have?
Problem
Implement Mobius addition in numpy and verify three properties on random points in :
- Left identity: .
- Left inverse: .
- Non-associativity: there exist such that .
Then describe what the "gyration" corrects for and where it appears in the Ganea hyperbolic-MLP code (in scalar matrix multiplication versus pointwise activation).
References
- Maximilian Nickel, Douwe Kiela. Poincare Embeddings for Learning Hierarchical Representations. NeurIPS 2017. The empirical opener: 5D Poincare beats 200D Euclidean on WordNet hypernymy. arXiv:1705.08039
- Frederic Sala, Christopher De Sa, Albert Gu, Christopher Re. Representation Tradeoffs for Hyperbolic Embeddings. ICML 2018. Sharp distortion bound (1+epsilon at O(log N / epsilon) precision) and an information-theoretic lower bound matching it. arXiv:1804.03329
- Octavian Ganea, Gary Becigneul, Thomas Hofmann. Hyperbolic Neural Networks. NeurIPS 2018. Mobius addition, exp/log maps, and hyperbolic linear, GRU, and MLR layers; the construction reused by every later hyperbolic NN paper. arXiv:1805.09112
- Ines Chami, Rex Ying, Christopher Re, Jure Leskovec. Hyperbolic Graph Convolutional Neural Networks. NeurIPS 2019. Hyperbolic GCNs with learnable curvature; gains on tree-like graphs, parity on others. The empirical evidence that curvature is dataset-dependent. arXiv:1910.12933
- James W. Anderson. Hyperbolic Geometry (2nd ed.). Springer Undergraduate Mathematics Series, 2005. Chapters 1 to 4: Poincare disk and upper-half-plane models, isometries, distance formulas, Mobius transformations as the conformal automorphism group.
- Abraham A. Ungar. Analytic Hyperbolic Geometry and Albert Einstein's Special Theory of Relativity. World Scientific, 2008. Chapters 3 and 6: gyrovector spaces, Mobius gyrogroup, gyration as the associativity defect.
- Martin R. Bridson, Andre Haefliger. Metric Spaces of Non-Positive Curvature. Springer Grundlehren 319, 1999. Chapter II.1 (CAT(0) spaces) and III.H (hyperbolic groups and Gromov hyperbolicity).
- Aaron B. Adcock, Blair D. Sullivan, Michael W. Mahoney. Tree-Like Structure in Large Social and Information Networks. ICDM 2013. Sampling estimator for Gromov hyperbolicity delta on graphs with millions of nodes, with case studies.
- Albert Gu, Frederic Sala, Beliz Gunel, Christopher Re. Learning Mixed-Curvature Representations in Product Spaces. ICLR 2019. Per-component curvature on products for graphs with mixed Gromov . arXiv:1806.03417
- Maximillian Nickel, Douwe Kiela. Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. ICML 2018. The Lorentz (hyperboloid) model is numerically better behaved than the Poincare ball for optimization near the boundary; preferred in modern hyperbolic NN implementations. arXiv:1806.03417
- Tao Yu, Christopher De Sa. Numerically Accurate Hyperbolic Embeddings Using Tiling-Based Models. NeurIPS 2019. Multicomponent floating-point representations that avoid the rounding pathology of the Poincare ball. arXiv:1911.08543
- Aaron Lou, Isay Katsman, Qingxuan Jiang, Serge Belongie, Ser-Nam Lim, Christopher De Sa. Differentiating through the Frechet Mean. ICML 2020. Closed-form Riemannian centroids and gradients on hyperbolic manifolds, enabling layer-wise hyperbolic batch normalization. arXiv:2003.00335
- Yiding Yang, Liang Wu, Wei Xu et al. Hyperbolic Graph Convolutional Networks: A Survey (2024). Synthesis of post-2019 hyperbolic GNN work covering Lorentz GCN, hyperbolic attention, hyperbolic graph transformers, and benchmarks. arXiv:2406.16438
Related Topics
- Non-Euclidean and Hyperbolic Geometry — the geometric grounder this page builds on
- Distance Metrics Compared — where hyperbolic distance fits among other metric choices
- Metric Spaces, Convergence, Completeness
- Riemannian Optimization — the optimizer side of hyperbolic NNs
- Graph Neural Networks — the Euclidean baseline
- Representation Learning Theory
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
2- Metric Spaces, Convergence, and Completenesslayer 0A · tier 1
- Non-Euclidean and Hyperbolic Geometrylayer 1 · tier 2
Derived topics
2- Representation Learning Theorylayer 3 · tier 2
- Riemannian Optimization and Manifold Constraintslayer 3 · tier 2
Graph-backed continuations