NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis 链接到标题

* Authors: [[Ben Mildenhall]], [[Pratul P. Srinivasan]], [[Matthew Tancik]], [[Jonathan T. Barron]], [[Ravi Ramamoorthi]], [[Ren Ng]]

Local library

初读印象链接到标题

NeRF 提出了一种使用 5D 输入(射线起点3维坐标以及视线2维方向) 到三维空间密度和颜色的三维场景的神经辐射场表示

都2022年了，我不允许你还不懂NeRF - mathfinder的文章 - 知乎

What is Nerf: We represent a static scene as a continuous 5D function that outputs the radiance emitted in each direction (θ, φ ) at each point (x, y, z) in space, and a density at each point which acts like a diﬀerential opacity controlling how much radiance is accumulated by a ray passing through (x, y, z).

optimize: minimizing the error between each observed image and the corresponding views rendered from our representation

shortcomming of basic nerf representation:

not converge to a suﬃciently high-resolution representation
ineﬃcient in the required number of samples per camera ray.

addressed by: transforming input 5D coordinates with a positional encoding that enables the MLP to represent higher frequency functions, and we propose a hierarchical sampling procedure to reduce the number of queries required to adequately sample this high-frequency scene representation.

Contributions 链接到标题

representing continuous scenes with complex geometry and materials as 5D neural radiance ﬁelds, parameterized as basic MLP networks.
A diﬀerentiable rendering procedure based on classical volume rendering techniques,
A positional encoding to map each input 5D coordinate into a higher dimensional space,

Neural 3D shape representations shortcoming:
1. requirement of access to ground truth 3D geometry,
2. limited to simple shapes with low geometric complexity,
View synthesis and image-based rendering
- gradient-based mesh optimization
  1. optimization based on image is often difficult
  2. requires a template mesh with ﬁxed topology
- volumetric representations
  1. scale to higher resolution imagery is fundamentally limited by poor time and space complexity

Neural Radiance Field Scene Representation 链接到标题

inputs:

3D location $x = (x, y, z)$
2D viewing direction $(\theta, \phi )$, expressed by 3D Cartesian unit vector $d$ outputs:
color $c = (r, g, b)$
volume density $σ$

MLP network $F_\Theta: (\mathbf{x}, \mathbf{d}) \rightarrow(\mathbf{c}, \sigma)$

the network predict

density $\sigma$ as a function of only the location $x$
the RGB color $c$ to be predicted as a function of both location and viewing direction

The MLP $F_\Theta$:

processes the input 3D coordinate $x$ with 8 fully-connected layers (using ReLU activations and 256 channels per layer), and outputs $σ$ and a 256-dimensional feature vector.
This feature vector is then concatenated with the camera ray’s viewing direction and passed to one additional fully-connected layer (using a ReLU activation and 128 channels) that output the view-dependent RGB color

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis 链接到标题

初读印象 链接到标题

Introduction 链接到标题

Contributions 链接到标题

Related Work 链接到标题

Neural Radiance Field Scene Representation 链接到标题

初读印象链接到标题