distillnerf

[NeurIPS 2024] DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

https://github.com/nvlabs/distillnerf

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

[NeurIPS 2024] DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

Basic Info

Host: GitHub
Owner: NVlabs
License: other
Language: Python
Default Branch: main
Homepage: https://distillnerf.github.io/
Size: 5.06 MB

Statistics

Stars: 29
Watchers: 7
Forks: 1
Open Issues: 2
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

[NeurIPS 2024] DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

We introduce DistillNeRF, a generalizable model for 3D scene representation, self-supervised by natural sensor streams along with distillation from offline NeRFs and vision foundation models. It supports rendering RGB, depth, and foundation feature images, without test-time per-scene optimization, and enables downstream tasks such as zero-shot 3D semantic occupancy prediction and open-vocabulary text queries.

DistillNeRF overview

Letian Wang, Seung Wook Kim, Jiawei Yang, Cunjun Yu, Boris Ivanovic, Steven L. Waslander, Yue Wang, Sanja Fidler, Marco Pavone, Peter Karkus

Webpage: https://distillnerf.github.io/
Paper: https://arxiv.org/abs/2406.12095
Video: https://www.youtube.com/watch?v=HRRmYGubTEU

Installation
Code Structure
Dataset preparation
Run code
- Data Inspection
- Training
Visualizations with trained models
Citation
Acknowledgement
Licence

Installation

Our code is developed on Ubuntu 22.04 using Python 3.8 and PyTorch 1.13.1+cu116. Please note that the code has only been tested with these specified versions. We recommend using conda for the installation of dependencies.

Create the distillnerf conda environment and install all dependencies:

shell conda create -n distillnerf python=3.8 -y conda activate distillnerf . setup.sh export PYTHONPATH=.

Code Structure

Our code base is built on mmdetection3d, where we implement our model, config files, data loader, losses, hooks, and utils in projects/DistillNeRF. DistillNeRF ├── ... ├── projects/ │ ├── DistillNeRF/ │ │ ├── configs # Config files │ │ ├── datasets # Customized data loader │ │ ├── hooks # Customized WandB logging hooks │ │ ├── losses # Customized losses | | ├── models # DistillNeRF model, model wrapper, and model components | | ├── modules # Other components used in our model | | ├── pipelines # Customized data reading | | ├── utils # util tools ├── ... We also provide scripts for creating docker in docker, running different variants of our model in sample_scripts, lauching training in slurm in slurm_script, and some visualization tools in tools.

Dataset preparation

NuScenes Dataset: See NuScenes Dataset Preparation for detailed instructions on preparing the NuScenes dataset and additional needed files
Waymo Dataset See Waymo NOTR Dataset Preparation for detailed instructions on preparing the NuScenes dataset.

Prepare depth images for distillation

In this paper, we train offline per-scene NeRF, render depth images and save them to the data/nuscenes/ or data/waymo/ directory respectively. We'll release the depth images used in our paper soon.

In this repo, we also prepare some temporary data, so that at least you can run through the code, and train a model without depth supervision from per-scene NeRFs. To do that, download this Temporary File and place it at the root directory of the repo. We have changed the SKIP_MISSING parameter in the dataset config (e.g. projects.DistillNeRF.configs.datasets.dataset_config.py) to be True, so that the dataloader will load these temporary data files. When you start to train your model, turn SKIP_MISSING to be False, to avid data mis-loading.

Note that, please only use these data if you have agreed to the terms for non-commercial use from nuScenes https://www.nuscenes.org/nuscenes. The preprocessed dataset are under the CC BY-NC-SA 4.0 licence.

Download auxiliary models

From DepthAnything, download the pretrained weight depth-anything-base, which is used to generate depth features.

Download PointRend model weight, which is used to generate sky masks.

Create a new directory named aux_models, and put these two models under this directory.
1. Folder structure The final directory should look like this DistillNeRF ├── ... ├── aux_models/ │ ├── depth_anything_vitb14.pth │ ├── model_final_cf6ac1.pkl ├── checkpoint/ ├── data/ │ ├── nuscenes/ │ │ ├── maps/ │ │ ├── samples/ │ │ ├── sweeps/ │ │ ├── v1.0-test/ | | ├── nuscenes_infos_train_sweeps.pkl | | ├── nuscenes_infos_val_sweeps.pkl | | ├── nuscenes_infos_val_temporal_v2.pkl │ ├── waymo/ │ │ ├── kitti_format/ │ │ ├── waymo_format/ ├── templt_files/ ├── ...

Run code

Here we provide scripts to visualize the data, training, and visualize the predictions. If you're running the model locally with limited compute, you could append this line of argments after your script (after --cfg-options), so that the model only loads 1 camera instead of 6 cameras, which should be runnable in most machines.

model.num_camera=1

Below we provide code for NuScenes, and please refer to the last section in here for instructions on waymo.

Data Inspection

Before initiating the training, you might want to inspect the data and some initial predictions from our model. We've included scripts for visualizing the them.

visualize images

To run through the one DistillNeRF model (incorporating parameterized space and depth distillation), use

**** python tools/train.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper.py --seed 0 --work-dir=../work_dir_debug --cfg-options model.visualize_imgs=True Sample scripts for more models can be found in sample_scripts/visualize_images

visualize voxels

To run through the one DistillNeRF model (incorporating parameterized space and depth distillation), use

**** python tools/train.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper.py --seed 0 --work-dir=../work_dir_debug --cfg-options model.visualize_voxels=True where we simply enable visualize_voxels to be True, instead of visualize_images. Sample scripts for more models can be found in sample_scripts/visualize_voxels

Training

Wandb setup (optional)

Before start training, you may want to set up the wandb in order to log the metrics/predictions.

You can run the script wandb online or wandb offline to choose wheter the logs will be uploaded to the cloud or saved locall.

To set up your wandb account, you can follow the wandb prompt after you launch training. You can also uncomment these lines in tools/train.py and add you WANDB_API_KEY in advance.

os.environ["WANDB_API_KEY"] = 'YOUR_WANDB_API_KEY' os.environ["WANDB_MODE"] = "online"

Training script

To run through one DistillNeRF model (without depth distillation, without parameterized space), use

python tools/train.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper_linearspace_no_depth_distilll.py --seed 0 --work-dir=../work_dir_debug

We also provide a slurm scritp example for training this model . ./slurm_scripts/launch_nuscenes_linearspace_no_depth_distill.sh

See For more scripts for different variants of our model, refer to the sample_scripts/training.

Visualizations with trained models

Download the Trained Model Weight, to inspect the visualize predictions from our model.

Note that, please only use these models if you have agreed to the terms for non-commercial use from nuScenes https://www.nuscenes.org/nuscenes. The models are under the CC BY-NC-SA 4.0 licence.

After you obtain a trained model, we provide additional scripts to visualize the images and voxels, and also novel view synthesis, as below. The visualizations will be saved into a default directory. You can choose to not save the visualization by appending model.save_visualized_imgs=False to your command, and change the saving directory by appending model.vis_save_directory=YOUR_VIS_DIR.

visualize images

To run through one DistillNeRF model (no parameterized space), use

python ./tools/visualization.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper_linearspace.py ./checkpoint/model_linearspace.pth --cfg-options model.visualize_imgs=True

For more examples, refer to the sample_scripts/visualization_images_with_model.

visualize voxels To run through one DistillNeRF model (incorporating parameterized space), use

python ./tools/visualization.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper.py ./checkpoint/model.pth --cfg-options model.visualize_voxels=True

For more examples, refer to the sample_scripts/visualization_voxels_with_model.

foundation model feature visualization

To visvualize DINO feature, run python ./tools/visualization.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper_linearspace_dino.py ./checkpoint/model_linearspace_dino.pth --cfg-options model.visualize_foundation_model_feat=True

To visvualize CLIP feature, run python ./tools/visualization.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper_linearspace_clip.py ./checkpoint/model_linearspace_clip.pth --cfg-options model.visualize_foundation_model_feat=True

open-vocabulary query

To conduct open-vocabulary query, use

python ./tools/visualization.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper_linearspace_clip.py ./checkpoint/model_linearspace_clip.pth --cfg-options model.language_query=True

novel-view synthesis - RGB

To run through one DistillNeRF model (no parameterized space), use python ./tools/novel_view_synthesis.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper_linearspace.py ./checkpoint/model_linearspace.pth --cfg-options model.visualize_imgs=True

The scripts above will generate 3 novel views. To generate more novel views and create a video, use this command . ./tools/novel_view_synthesis.sh Note that you need to choose with model you want to use, by commenting and uncommenting in ./tools/novel_view_synthesis.sh.

novel-view synthesis - foundation model feature

To generate the novel view of DINO feature, use

python ./tools/novel_view_synthesis.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper_linearspace_dino.py ./checkpoint/model_linearspace_dino.pth --cfg-options model.visualize_foundation_model_feat=True

To generate the novel view of CLIP feature, use

python ./tools/novel_view_synthesis.py ./projects/DistillNeRF/configs/model_wrapper/model_wrapper_linearspace_clip.py ./checkpoint/model_linearspace_clip.pth --cfg-options model.visualize_foundation_model_feat=True

Again, the scripts above will generate 3 novel views. To generate more novel views and create a video, use this command . ./tools/novel_view_synthesis.sh Note that you need to choose with model you want to use, by commenting and uncommenting in ./tools/novel_view_synthesis.sh.

Citation

Consider citing our paper if you find this repo or our paper is useful for your research

bibtex @misc{wang2024distillnerf, title={DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features}, author={Letian Wang and Seung Wook Kim and Jiawei Yang and Cunjun Yu and Boris Ivanovic and Steven L. Waslander and Yue Wang and Sanja Fidler and Marco Pavone and Peter Karkus}, year={2024}, eprint={2406.12095}, archivePrefix={arXiv}, primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'} }

Licence

The source code is released under the NSCL licence. The preprocessed dataset and pretrained models are under the CC BY-NC-SA 4.0 licence.

This implementation is based on MMDetection3D, following its licence. Thanks for the great works!

Please also refer to the folder third_party_notice for the list of open-source software that we used to process the data, along with their licence.

Owner

Name: NVIDIA Research Projects
Login: NVlabs
Kind: organization

Website: http://research.nvidia.com
Repositories: 166
Profile: https://github.com/NVlabs

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection3D Contributors"
title: "OpenMMLab's Next-generation Platform for General 3D Object Detection"
date-released: 2020-07-23
url: "https://github.com/open-mmlab/mmdetection3d"
license: Apache-2.0

GitHub Events

Total

Issues event: 5
Watch event: 31
Issue comment event: 3
Fork event: 1

Last Year

Issues event: 5
Watch event: 31
Issue comment event: 3
Fork event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 4
Total pull requests: 0
Average time to close issues: 21 days
Average time to close pull requests: N/A
Total issue authors: 3
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 0
Average time to close issues: 21 days
Average time to close pull requests: N/A
Issue authors: 3
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

ucredu (2)
jiayanqi (1)
johnren-code (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/serve/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

projects/DistillNeRF/pyproject.toml pypi

requirements/build.txt pypi

requirements/docs.txt pypi

docutils ==0.16.0
m2r *
mistune ==0.8.4
myst-parser *
sphinx ==4.0.2
sphinx-copybutton *
sphinx_markdown_tables *

requirements/mminstall.txt pypi

mmcv-full >=1.4.8,<=1.6.0
mmdet >=2.24.0,<=3.0.0
mmsegmentation >=0.20.0,<=1.0.0

requirements/optional.txt pypi

open3d *
spconv *
waymo-open-dataset-tf-2-1-0 ==1.2.0

requirements/readthedocs.txt pypi

mmcv >=1.4.8
mmdet >=2.24.0
mmsegmentation >=0.20.1
torch *
torchvision *

requirements/runtime.txt pypi

lyft_dataset_sdk *
networkx >=2.2,<2.3
numba ==0.53.0
numpy *
nuscenes-devkit *
plyfile *
scikit-image *
tensorboard *
trimesh >=2.35.39,<2.35.40

requirements/tests.txt pypi

asynctest * test
codecov * test
flake8 * test
interrogate * test
isort * test
kwarray * test
pytest * test
pytest-cov * test
pytest-runner * test
ubelt * test
xdoctest >=0.10.0 test
yapf * test

requirements.txt pypi

requirements_distillnerf.txt pypi

black *
efficientnet_pytorch *
einops *
isort *
lpips *
matplotlib *
numba ==0.53.0
numpy ==1.23.5
pandas *
pytest *
pytest-xdist *
seaborn *
tqdm ==4.65.2
wandb *

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

distillnerf

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

[NeurIPS 2024] DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

Contents

Installation

Code Structure

Dataset preparation

Run code

Data Inspection

Training

Visualizations with trained models

Citation

Licence

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies