Beyond LLMS
Occupancy Networks and Neural Fields
Representing 3D geometry and appearance as continuous functions parameterized by neural networks: NeRF, occupancy networks, DeepSDF, volume rendering, and the connection to Gaussian splatting.
Prerequisites
Why This Matters
Traditional 3D representations (meshes, voxel grids, point clouds) are discrete and fixed-resolution. Neural fields represent 3D geometry and appearance as continuous functions parameterized by neural networks. This allows querying the scene at arbitrary resolution and learning 3D structure directly from 2D images.
NeRF (Neural Radiance Fields) demonstrated that a simple MLP can represent complex scenes with photorealistic quality, trained only from posed photographs. This opened new directions in 3D reconstruction, view synthesis, and scene understanding.
Mental Model
A neural field is a function where the input is a coordinate (position in space, or position plus viewing direction) and the output is a property at that coordinate (color, density, occupancy, signed distance). The network parameters encode the entire scene. Querying the function at a new coordinate gives you the scene property at that point.
Representation Families at a Glance
| Family | Query or primitive | Typical supervision | How geometry or images are recovered | Best fit | Main constraint |
|---|---|---|---|---|---|
| Occupancy network | object observations, point clouds, voxels, single images | extract a level set with marching cubes | shape reconstruction and completion | no direct view-dependent appearance model | |
| Signed distance field | shape observations plus SDF samples or surface constraints | zero level set or sphere tracing | geometry-first reconstruction with normals | color and radiance must be added separately | |
| Radiance field (NeRF family) | posed multi-view images of one scene | volume rendering along rays | novel-view synthesis with view-dependent effects | expensive query-time evaluation unless heavily accelerated | |
| Gaussian splatting | explicit Gaussians with position, covariance, opacity, color | posed multi-view images of one scene | rasterize and alpha-composite projected splats | fast rendering and interactive view synthesis | explicit memory footprint and geometry often needs extra regularization |
The important split is not just "implicit vs explicit." Occupancy networks and DeepSDF were introduced as geometry representations for objects or shape classes. NeRF and Gaussian splatting are scene-reconstruction methods for novel-view synthesis. They all live in the coordinate-based 3D world, but they do not answer the same question.
Neural Radiance Fields (NeRF)
Neural Radiance Field
A NeRF represents a scene as a continuous function:
where is 3D position, is viewing direction, is emitted color, and is volume density. The density depends only on position (geometry is view-independent), while color depends on both position and direction (capturing view-dependent effects like specular highlights).
The network architecture is a simple MLP with positional encoding. The input coordinates are mapped through sinusoidal functions at multiple frequencies before being fed to the network:
This positional encoding lets the MLP represent high-frequency spatial detail that it would otherwise smooth over (due to the spectral bias of MLPs toward low-frequency functions).
Volume Rendering
Volume Rendering for Neural Radiance Fields
Statement
The expected color of a camera ray is:
where is the accumulated transmittance from the near plane to point . The product gives the probability density that the ray terminates at .
Intuition
A ray travels through space, accumulating color from each point weighted by two factors: how dense the material is at that point () and how much light has already been blocked before reaching that point (). Dense regions contribute more color. Regions behind opaque surfaces contribute nothing because is near zero.
Proof Sketch
Model light transport as a 1D absorption-emission process along the ray. The transmittance satisfies , giving the exponential form. The color integral follows from summing the emitted radiance at each point, weighted by the probability of the ray reaching that point and being absorbed there.
Why It Matters
This equation is differentiable with respect to and , which are outputs of the neural network. By comparing the rendered pixel color to the observed pixel color in a training image, you can backpropagate through the volume rendering integral to train the NeRF. The only supervision needed is posed 2D images.
Failure Mode
The integral is approximated by quadrature (summing over discrete samples along the ray). Too few samples produce aliasing and miss thin structures. Too many samples are computationally expensive. Hierarchical sampling (coarse then fine) mitigates this but does not eliminate it. Training also requires accurate camera poses; errors in pose estimation produce blurry reconstructions.
In practice, the integral is approximated as:
where is the distance between adjacent samples and .
Occupancy Networks
Occupancy Network
An occupancy network represents a 3D surface as the decision boundary of a classifier:
where is the probability that point is inside the object. The surface is the level set .
The surface can be extracted at any resolution using marching cubes on a grid of query points. Unlike voxel grids, the resolution is limited only by the density of the query grid, not by the representation itself.
DeepSDF: Signed Distance Functions
DeepSDF
A neural signed distance function maps points to their signed distance from the surface:
where outside the object, inside, and on the surface. The gradient gives the surface normal at any point.
DeepSDF has a geometric advantage over occupancy networks: the SDF value gives the distance to the nearest surface point, enabling efficient sphere tracing for rendering and providing a natural regularizer ( almost everywhere for a true SDF).
Gaussian Splatting
3D Gaussian Splatting (2023) represents scenes as a collection of 3D Gaussian primitives, each with position, covariance, color, and opacity. Rendering projects these Gaussians onto the image plane and alpha-composites them.
This is an explicit representation (a finite set of primitives with explicit parameters) rather than an implicit one (a function evaluated at query points). The key advantages:
- Rendering speed: Rasterization of Gaussians is much faster than ray marching through a neural field. Real-time rendering at high resolution is possible.
- Optimization: Each Gaussian's parameters are optimized directly via gradient descent on the rendering loss. Adaptive densification adds Gaussians where the reconstruction error is high.
The tradeoff is representational, not just numerical. Gaussian splatting stores many explicit scene primitives and renders them quickly. NeRF stores a compact implicit scene function and evaluates it at render time. Raw 3D Gaussian Splatting is often much faster to render, but later follow-on work such as 2D Gaussian Splatting adds geometric regularization because appearance quality and geometric accuracy are not identical goals.
Common Confusions
Neural fields are not neural networks that output meshes
A neural field is a function from coordinates to properties, evaluated pointwise. It does not output a mesh or point cloud directly. Extracting a mesh requires querying the field on a dense grid and running marching cubes (for occupancy/SDF) or rendering many views (for NeRF). The representation is continuous and implicit; the mesh is a derived output.
Occupancy networks, DeepSDF, and NeRF are not interchangeable
Occupancy networks and DeepSDF are geometry representations. Vanilla NeRF is a scene-specific radiance field for novel-view synthesis. If the task is shape completion, you care about surfaces and normals. If the task is view synthesis, you care about rendered color along rays. Putting these models under the same "neural fields" umbrella is useful, but pretending they solve the same problem is not.
NeRF requires posed images, not just any photo collection
NeRF needs accurate camera intrinsics and extrinsics (position and orientation) for each training image. These are typically obtained from structure-from-motion (SfM) tools like COLMAP. Without accurate poses, NeRF cannot learn a consistent 3D scene. Recent work (Nerfacto, BARF) jointly optimizes poses and the neural field, but this remains harder than the fixed-pose setting.
Continuous representation does not mean cheap evaluation
A continuous field removes voxel-grid discretization, but the cost moves to query-time evaluation. Vanilla NeRF still requires many samples per ray and one network evaluation per sample. Much of the systems work after NeRF, including hash encodings and splatting, is about reducing that query cost rather than changing the underlying 3D task.
Gaussian splatting is not a neural network
3D Gaussian Splatting uses gradient-based optimization but the scene representation is a set of Gaussians with explicit parameters, not a neural network. There are no learned weights, hidden layers, or activation functions. It is a differentiable rendering framework, not a neural field.
Summary
- Neural fields represent 3D scenes as continuous functions parameterized by neural networks
- NeRF maps (position, direction) to (color, density) and renders via volume integration
- Volume rendering is differentiable, enabling training from 2D images alone
- Occupancy networks use a binary classifier; DeepSDF uses signed distance
- Occupancy and SDF fields are geometry-first; NeRF and splatting are usually scene-specific rendering methods
- Gaussian splatting trades implicit compactness for explicit rendering speed
- Positional encoding is critical for representing high-frequency detail in MLPs
Exercises
Problem
A NeRF samples 64 points along each ray, and the image is 800x800 pixels. How many forward passes through the MLP are needed to render one image? If each forward pass takes 10 microseconds, how long does rendering take?
Problem
Explain why a standard MLP without positional encoding struggles to represent a scene with sharp edges and fine texture. What does the positional encoding specifically enable?
References
Canonical:
- Mildenhall et al., "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis" (ECCV 2020)
- Mescheder et al., "Occupancy Networks: Learning 3D Reconstruction in Function Space" (CVPR 2019)
- Park et al., "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation" (CVPR 2019)
Current:
- Kerbl et al., "3D Gaussian Splatting for Real-Time Radiance Field Rendering" (SIGGRAPH 2023)
- Muller et al., "Instant Neural Graphics Primitives with a Multiresolution Hash Encoding" (SIGGRAPH 2022). Hash-grid acceleration for neural fields
- Tancik et al., "Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains" (NeurIPS 2020). Why positional encodings counter spectral bias
- Huang et al., "2D Gaussian Splatting for Geometrically Accurate Radiance Fields" (2024). A geometry-focused follow-on to 3D Gaussian Splatting
- Tewari et al., "Advances in Neural Rendering" (Eurographics STAR 2022)
Last reviewed: April 23, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
1- Feedforward Networks and Backpropagationlayer 2 · tier 1
Derived topics
1- 3D Gaussian Splattinglayer 4 · tier 3
Graph-backed continuations