discoscene

CVPR 2023 Highlight: DiscoScene

https://github.com/snap-research/discoscene

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

CVPR 2023 Highlight: DiscoScene

Basic Info

Host: GitHub
Owner: snap-research
License: other
Language: Python
Default Branch: master
Homepage: https://snap-research.github.io/discoscene/
Size: 24.1 MB

Statistics

Stars: 149
Watchers: 24
Forks: 3
Open Issues: 4
Releases: 0

Created over 3 years ago · Last pushed almost 3 years ago

Metadata Files

Readme Contributing License Citation

DiscoScene: Spatially Disentangled Generative Radiance Field for Controllable 3D-aware Scene Synthesis
_{Official PyTorch implementation of the CVPR 2023 Highlight paper}

Figure: Framework of DiscoScene.

DiscoScene: Spatially Disentangled Generative Radiance Field for Controllable 3D-aware Scene Synthesis
Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, Sergey Tulyakov

[Paper] [Project Page] [Demo]

This work presents DisCoScene: a 3D-aware generative model for high-quality and controllable scene synthesis. The key ingredient of our approach is a very abstract object-level representation (3D bounding boxes without semantic annotation) as the scene layout prior, which is simple to obtain, general to describe various scene contents, and yet informative to disentangle objects and background. Moreover, it serves as an intuitive user control for scene editing. Based on such a prior, our model spatially disentangles the whole scene into object-centric generative radiance fields by learning on only 2D images with the global-local discrimination. Our model obtains the generation fidelity and editing flexibility of individual objects while being able to efficiently compose objects and the background into a complete scene. We demonstrate state-of-the-art performance on many scene datasets, including the challenging Waymo outdoor dataset.

Requirements

All our model are trained and tested on V100, and A100 GPUs.
64-bit Python 3.8 and PyTorch 1.11.0.
CUDA 11.3 or later.
Users can use the following commands to install the packages bash conda create -n discoscene python=3.8 conda activate discoscene pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu113 pip install -e . ## Preparing datasets We provide a script to download the datasets. Clevr and 3D-Front are synthetic datasets and we will release the rendering scripts soon. bash download_datasets.sh

Test Demo

Download pretrained model with the script bash download_models.sh Users can use the following command to generate demo videos shell python render.py \ --work_dir ${WORK_DIR} \ --checkpoint ${MODEL_PATH} \ --num ${NUM} \ --seed ${SEED} \ --step ${STEP} \ --render_type ${RENDER_TYPE} \ --generate_html ${SAVE_HTML} \ --dataset_type ${DATASET} \ --ssaa ${SSAA} \ discoscene \ --val_anno_path ${VAL_ANN}\ --val_data_file_format dir \ --num_bbox ${NUM_BBOX}

where

WORK_DIR refers to the path to save the results.
MODEL_PATH refers to the path of the pretrained model.
NUM refers to the number of samples to synthesize.
SEED refers to the random seed used for sampling.
STEP refers to the number of steps for the generated video.
RENDER_TYPE refers to the controlablity of the rendered videos, including rotate_object, move_object, add_object, delete_object, rotate_camera, move_camera.
SAVE_HTML controls whether to save images as an HTML for better visualization when rendering videos.
DATASET refers to the type of dataset, including clevr, 3dfront and waymo.
SSAA refers to the ratio of supersampling anti-aliasing.
VAL_ANN refers to the layout information.
NUM_BBOX refers the bouding box number of the annotation file.

We include scripts for rendering demo video on Clevr, 3D-Front and Waymo dataset. Users can use following commands to generate demo videos. bash ./scripts/discoscene/rendering/test_clevr256.sh bash ./scripts/discoscene/rendering/test_3dfront256.sh bash ./scripts/discoscene/rendering/test_waymo256.sh

Training

For example, users can use the following command to train DiscoScene in the resolution of 256x256

shell ./scripts/training_demos/discoscene_res256.sh\ ${NUM_GPUS} \ ${DATA_PATH} \ [OPTIONS]

where

NUM_GPUS refers to the number of GPUs used for training.
DATA_PATH refers to the path to the dataset (zip format is strongly recommended).
[OPTIONS] refers to any additional option to pass. Detailed instructions on available options can be found via python train.py discoscene --help.

NOTE: This demo script uses discoscene_res256 as the default job_name, which is particularly used to identify experiments. Concretely, a directory with name job_name will be created under the root working directory, which is set as work_dirs/ by default. To prevent overwriting previous experiments, an exception will be raised to interrupt the training if the job_name directory has already existed. Please use --job_name=${JOB_NAME} option to specify a new job name.

We include the training scripts on Clevr, 3D-Front and Waymo. Users can use these bash files to train our model bash ./scripts/discoscene/training/train_clevr256.sh bash ./scripts/discoscene/training/train_3dfront256.sh bash ./scripts/discoscene/training/train_waymo256.sh

Evaluation

Users can use the following command to evaluate a well-trained model

shell ./scripts/test_metrics_discoscene.sh \ ${NUM_GPUS} \ ${DATA_PATH} \ ${ANNOTATION_PATH} \ ${MODEL_PATH} \ ${NUM_FAKE_SAMPLES} \ fid [OPTIONS] Here is the evaluation example for Clevr bash scripts/discoscene/evaluation/evaluate_clevr256.sh

BibTeX

bibtex @InProceedings{Xu_2023_CVPR, author = {Xu, Yinghao and Chai, Menglei and Shi, Zifan and Peng, Sida and Skorokhodov, Ivan and Siarohin, Aliaksandr and Yang, Ceyuan and Shen, Yujun and Lee, Hsin-Ying and Zhou, Bolei and Tulyakov, Sergey}, title = {DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2023}, }

Owner

Name: Snap Research
Login: snap-research
Kind: organization

Website: https://research.snap.com/
Repositories: 17
Profile: https://github.com/snap-research

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Shen"
  given-names: "Yujun"
- family-names: "Zhang"
  given-names: "Zhiyi"
- family-names: "Yang"
  given-names: "Dingdong"
- family-names: "Xu"
  given-names: "Yinghao"
- family-names: "Yang"
  given-names: "Ceyuan"
- family-names: "Zhu"
  given-names: "Jiapeng"
title: "Hammer: An Efficient Toolkit for Training Deep Models"
version: 1.0.0
date-released: 2022-02-08
url: "https://github.com/bytedance/Hammer"

GitHub Events

Total

Issues event: 1
Watch event: 6
Fork event: 1

Last Year

Issues event: 1
Watch event: 6
Fork event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 6
Total pull requests: 1
Average time to close issues: 6 months
Average time to close pull requests: 20 days
Total issue authors: 6
Total pull request authors: 1
Average comments per issue: 1.5
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

HongliXiao (1)
t-martyniuk (1)
OrangeSodahub (1)
shim94kr (1)
Torment123 (1)
rxjfighting (1)

Pull Request Authors

justimyhxu (1)

Top Labels

Issue Labels

Pull Request Labels

documentation (1) enhancement (1)

Dependencies

requirements.txt pypi

bs4 *
click *
cloup *
easydict *
einops *
imageio *
lmdb *
matplotlib *
mrcfile *
ninja ==1.10.2
numpy ==1.21.5
opencv-python-headless ==4.5.5.62
pillow ==9.0.0
psutil *
pymcubes *
requests *
rich *
scikit-learn ==1.0.2
scikit-video ==1.1.11
scipy ==1.7.3
tensorboard ==2.7.0
torch ==1.11.0
torch-tb-profiler ==0.3.1
torchaudio ==0.11.0
torchvision ==0.12.0
tqdm *
trimesh *

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

discoscene

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

DiscoScene: Spatially Disentangled Generative Radiance Field for Controllable 3D-aware Scene Synthesis
_{Official PyTorch implementation of the CVPR 2023 Highlight paper}

Requirements

Test Demo

Training

Evaluation

BibTeX

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

discoscene

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

DiscoScene: Spatially Disentangled Generative Radiance Field for Controllable 3D-aware Scene Synthesis Official PyTorch implementation of the CVPR 2023 Highlight paper

Requirements

Test Demo

Training

Evaluation

BibTeX

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

DiscoScene: Spatially Disentangled Generative Radiance Field for Controllable 3D-aware Scene Synthesis
_{Official PyTorch implementation of the CVPR 2023 Highlight paper}