High-resolution 3D-Consistent Image Synthesis with Neural Fields

Photo-realistic free-view image synthesis of real-world scenes is a long-standing problem in computer vision and computer graphics. Traditional graphics pipeline requires production-quality 3D models, computationally expensive rendering, and manual work, making it challenging to apply to large-scale image synthesis for a wide range of real-world scenes. In the meantime, generative models such as GANs can be trained on a large number of unstructured images to synthesize high-quality images. However, most generative models operate in 2D space. Therefore, they lack the 3D understanding of the training images, which results in their inability to synthesize images of the same 3D scene with multi-view consistency. They also lack direct 3D camera control over the generated images.

Our goal is to explore 3D-aware deep generative models which can naturally synthezie high-resolution multi-view consistent images. By incorporating Neural Radiance Fields (NeRF) with generative models, it is possible to enforce 3D structures into modeling. One of the research output is StyleNeRF, a new 3D-aware generative model for high-resolution 3D consistent image synthesis at interactive rates. It also allows control of the 3D camera pose and enables control of specific style attributes.

However, we still face multiple challenges to push forward this direction towards (1) general unaligned objects (2) accurate geometry (3) dynamic and scene-level generation.

Non-autoregressive Sequence Generation