NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis 链接到标题
* Authors: [[Ben Mildenhall]], [[Pratul P. Srinivasan]], [[Matthew Tancik]], [[Jonathan T. Barron]], [[Ravi Ramamoorthi]], [[Ren Ng]]
初读印象 链接到标题
NeRF 提出了一种使用 5D 输入(射线起点3维坐标以及视线2维方向) 到三维空间密度和颜色的三维场景的神经辐射场表示
都2022年了,我不允许你还不懂NeRF - mathfinder的文章 - 知乎
Introduction 链接到标题
What is Nerf: We represent a static scene as a continuous 5D function that outputs the radiance emitted in each direction (θ, φ ) at each point (x, y, z) in space, and a density at each point which acts like a differential opacity controlling how much radiance is accumulated by a ray passing through (x, y, z).
optimize: minimizing the error between each observed image and the corresponding views rendered from our representation
shortcomming of basic nerf representation:
- not converge to a sufficiently high-resolution representation
- inefficient in the required number of samples per camera ray.
addressed by: transforming input 5D coordinates with a positional encoding that enables the MLP to represent higher frequency functions, and we propose a hierarchical sampling procedure to reduce the number of queries required to adequately sample this high-frequency scene representation.
Contributions 链接到标题
- representing continuous scenes with complex geometry and materials as 5D neural radiance fields, parameterized as basic MLP networks.
- A differentiable rendering procedure based on classical volume rendering techniques,
- A positional encoding to map each input 5D coordinate into a higher dimensional space,
Related Work 链接到标题
- Neural 3D shape representations
shortcoming:
- requirement of access to ground truth 3D geometry,
- limited to simple shapes with low geometric complexity,
- View synthesis and image-based rendering
- gradient-based mesh optimization
- optimization based on image is often difficult
- requires a template mesh with fixed topology
- volumetric representations
- scale to higher resolution imagery is fundamentally limited by poor time and space complexity
- gradient-based mesh optimization
Neural Radiance Field Scene Representation 链接到标题
inputs:
- 3D location $x = (x, y, z)$
- 2D viewing direction $(\theta, \phi )$, expressed by 3D Cartesian unit vector $d$ outputs:
- color $c = (r, g, b)$
- volume density $σ$
MLP network $F_\Theta: (\mathbf{x}, \mathbf{d}) \rightarrow(\mathbf{c}, \sigma)$
the network predict
- density $\sigma$ as a function of only the location $x$
- the RGB color $c$ to be predicted as a function of both location and viewing direction
The MLP $F_\Theta$:
- processes the input 3D coordinate $x$ with 8 fully-connected layers (using ReLU activations and 256 channels per layer), and outputs $σ$ and a 256-dimensional feature vector.
- This feature vector is then concatenated with the camera ray’s viewing direction and passed to one additional fully-connected layer (using a ReLU activation and 128 channels) that output the view-dependent RGB color