Dynamic NeRF Methods Comparison

This project is a comparison study of various dynamic neural rendering methods, we compare HyperNeRF against Nerfies and TiNeuVox on several real-world scenes that were captured with an Iphone 13 Pro. Noteably this project was done in the summer of 2023, right before 3D Gaussian Splatting was published.

Overview

We train HyperNeRF on custom Nerfies-format datasets and render RGB + depth videos. We rewrite rendering scripts for various methods such as the Jax code in “Nerfies” where necessary.

Comparison

The models were mostly trained for about 80k steps (200k for the coffee scene, 150k for the drink scene), as after that the PSNR score on the val set only improves marginally (PSNR improves about ~1 for the face scene when training 250k vs 80k epochs) and (subjectively) the quality of the novel render did not improve anymore. We note that for HyperNERF the authors used a more advanced config for their training as explained in their Configuration section. They explicitly state that the standard config will result in worse results. However, I was not able to run their novel view config as it requires some extra data processing (creating a points.npy file in order to get a pointcloud) for custom data.

Comparison Notes

In this section we compare HyperNeRF with Nerfies, InstantNGP & TiNeuVox. For Nerfies, we also use a rendering script that merges their eval script and rendering notebook. This file can be found inside the nerfies/ folder. Fortunately, the python packages as well as the dataset format from HyperNeRF can be used out of the box as HyperNeRF is based on Nerfies. TiNeuVox also uses the same dataset format as Nerfies and HyperNeRF; we just need to create a .py config for the dataset, similar to the ones in their repo. We note that training TiNeuVox on custom data requires setting the background loss in the configs to False as the Nerfies notebook does not calculate the pointcloud out of the box. We also use the slight modification of TiNeuVox/load_hyper.py for TiNeuVox as the timestamp key is missing in the processed nerfies format for custom data.

Face Scene (Comparison)

Method	PSNR ↑
HyperNeRF	30.65 (Train) / 29.04 (Val)
Nerfies	28.12 (Train) / 26.74 (Val)
TiNeuVox	31.12 (Train) / 29.40 (Val)

HyperNeRF Face Output Nerfies Face Output Instant-NGP Face Output

We can see that here, HyperNeRF outperforms Nerfies based on the metrics and the colours look a bit more photrealistic, whereas Nerfies looks a bit more pale. However, we can see some noise being present in the HyperNeRF novel view rendering, which is not the case for Nerfies. We also make a visual comparison with InstantNGP as I was not able to figure out how to get the metrics; this is only qualitative. Due to the simplicity of the scene here, InstantNGP also performs quite well as long as the camera views are more centered around the image. Similar to their results reported on the HyperNeRF dataset, TiNeuVox also outperforms HyperNeRF. Unfortunately, the novel view video rendering is not implemented for TiNeuVox, so only the rendering of train and eval images is possible with the open source repo.

Smoothie Scene (Comparison)

Method	PSNR ↑
HyperNeRF	30.65 (Train) / 29.04 (Val)
TiNeuVox	30.92 (Train) / 29.55 (Val)
Nerfies	29.02 (Train) / 27.39 (Val)

HyperNeRF Smoothie Output Nerfies Smoothie Output Instant-NGP Smoothie Output

This scene is more challenging, we can see that Nerfies has an inconsistent colour within the mixer. InstantNGP also struggles with this scene, especially with more extreme viewpoints as it does not include dynamics modelling. Therefore the camera path is more centered in comparison to Nerfies and HyperNeRF. TiNeuVox outperforms all other methods, being slightly better than HyperNeRF.

Runtime Comparison

It should also be noted that the training time of InstantNGP is significantly faster (within a few minutes), whereas Nerfies and HyperNeRF were trained for 1-2 hours for the scenes seen above. TiNeuVox also trains about 3-6x faster than HyperNeRF & Nerfies as it finishes training in about 20 minutes for a scene with 100 images.

Outputs

Face Scene

Face Input HyperNeRF Face Output (RGB+Depth)

Coffee Scene

Coffee Input HyperNeRF Coffee Output (RGB+Depth)

Smoothie Scene

Smoothie Input HyperNeRF Smoothie Output (RGB+Depth)

Cat Scene

Cat Input HyperNeRF Cat Output (RGB+Depth)

Drink Scene

Drink Input HyperNeRF Drink Output (RGB+Depth)

Baseline Comparisons

HyperNeRF Face Output Nerfies Face Output Instant-NGP Face Output

HyperNeRF Smoothie Output Nerfies Smoothie Output Instant-NGP Smoothie Output

References

Park, Keunhong, et al. “HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
Park, Keunhong, et al. “Nerfies: Deformable Neural Radiance Fields.” Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
Müller, Thomas, et al. “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding.” ACM Transactions on Graphics, 2022.
Fang, Jiemin, et al. “Fast Dynamic Radiance Fields with Time-Aware Neural Voxels.” SIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1-9.
Kerbl, Bernhard, et al. “3D Gaussian Splatting for Real-Time Radiance Field Rendering.” ACM Transactions on Graphics, vol. 42, no. 4, 2023, pp. 139-1.