bidiff

[CVPR'24] Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

https://github.com/bidiff/bidiff

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

[CVPR'24] Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Basic Info

Host: GitHub
Owner: BiDiff
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 50.6 MB

Statistics

Stars: 170
Watchers: 14
Forks: 5
Open Issues: 2
Releases: 0

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme Contributing License Code of conduct Citation

README.md

Lihe Ding^1,4* Shaocong Dong^2* Zhanpeng Huang³ Zibin Wang^3,
Yiyuan Zhang¹ Kaixiong Gong¹ Dan Xu² Tianfan Xue¹

¹ The Chinese University of Hong Kong
² The Hong Kong University of Science and Technology
³ SenseTime ⁴ Shanghai AI Laboratory
^* Equal Contribution Corresponding Author

Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

[x] Implement BiDiff on diffusers (training && inference).
[x] Replace NeuS with FlexiCubes.
[ ] Release the weights trained on Objaverse-LVIS.
[ ] Release the processed training data.
[ ] Release the data processing scripts.
[ ] Re-train our model on Objaverse-XL.
[ ] Hugging Face live demo.
[x] Support fully decoupled texture and geometry control (below are results from BiDiff sampling).

NEWS

BiDiff supports fully decoupled texture and geometry control now.
We implement an initial version of BiDiff on diffusers and improve the 3D representation from NeuS to FlexiCubes.
Data, weights, and a more detailed document are coming.

1. High-quality 3D Object Generation

Click the GIF to access the high-resolution video.

<!-- -->	<!-- -->
"An eagle head."	"A GUNDAM robot."

<!-- -->	<!-- -->
"A Nike sport shoes."	"A house in Van Gogh style."

2. Meshes with Authentic Textures

Click the GIF to access the high-resolution video.


"Bear."	"Fruit."	"Cow."

3. Biredtional Diffusion (BiDiff) Framework

Most 3D generation research focuses on up-projecting 2D foundation models into the 3D space, either by minimizing 2D Score Distillation Sampling (SDS) loss or fine-tuning on multi-view datasets. Without explicit 3D priors, these methods often lead to geometric anomalies and multi-view inconsistency. Recently, researchers have attempted to improve the genuineness of 3D objects by directly training on 3D datasets, albeit at the cost of low-quality texture generation due to the limited texture diversity in 3D datasets. To harness the advantages of both approaches, we propose Bidirectional Diffusion (BiDiff), a unified framework that incorporates both a 3D and a 2D diffusion process, to preserve both 3D fidelity and 2D texture richness, respectively. Moreover, as a simple combination may yield inconsistent generation results, we further bridge them with novel bidirectional guidance. In addition, our method can be used as an initialization of optimization-based models to further improve the quality of 3D model and efficiency of optimization, reducing the process from 3.4 hours to 20 minutes. Experimental results have shown that our model achieves high-quality, diverse, and scalable 3D generation

The BiDiff framework operates as follows: (a) At each step of diffusion, we render the 3D diffusion's intermediate outputs into 2D images, which then guide the denoising of the 2D diffusion model. Simultaneously, the intermediate multi-view outputs from the 2D diffusion are re-projected to assist the denoising of the 3D diffusion model. Red arrows show the bidirectional guidance, which ensures that both diffusion processes evolve coherently. (b) We use the outcomes of the 2D-3D diffusion as a strong initialization for optimization methods, allowing for further refinement with fewer optimization steps.

4. Quantities of Results.

Getting Started

The code is tested on torch 2.0.1 and cuda 11.7. Data and weights will be uploaded to here. ```sh

cuda 11.7 torch 2.0.1 diffusers origin 0.18.0.dev0

pip install -e ".[torch]" pip install git+https://github.com/NVlabs/nvdiffrast/ pip install kaolin==0.15.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.0.1cu117.html sudo apt-get install libsparsehash-dev pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0 pip install imageio trimesh tqdm matplotlib torchscatter ninja einops ```

Train

We provide a sh file for training. Please modify parameters and gpus in it. bash cd ./examples/bidiff bash ./scripts/train_bidiff.sh

Inference

We provide a sh file for inference. bash cd ./examples/bidiff bash ./scripts/sample_bidiff.sh And you can specify the batch inference configure file by --sample_config_file. In the configure file (json), you can specify multiple prompts and parameters, and the number of all parameters should be consistent. Inference will be executed repeatedly with prompts x negative_prompts x PARAMETERS times.

Citation

If the paper and the code are helpful for your research, please kindly cite: @article{ding2023text, title={Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors}, author={Ding, Lihe and Dong, Shaocong, and Huang, Zhanpeng, and Wang, Zibin and Zhang, Yiyuan and Gong, Kaixiong and Xu, Dan and Xue, Tianfan}, journal={arXiv preprint arXiv:2312.04963}, year={2023}, }

Owner

Login: BiDiff
Kind: user

Repositories: 1
Profile: https://github.com/BiDiff

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'Diffusers: State-of-the-art diffusion models'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Patrick
    family-names: von Platen
  - given-names: Suraj
    family-names: Patil
  - given-names: Anton
    family-names: Lozhkov
  - given-names: Pedro
    family-names: Cuenca
  - given-names: Nathan
    family-names: Lambert
  - given-names: Kashif
    family-names: Rasul
  - given-names: Mishig
    family-names: Davaadorj
  - given-names: Thomas
    family-names: Wolf
repository-code: 'https://github.com/huggingface/diffusers'
abstract: >-
  Diffusers provides pretrained diffusion models across
  multiple modalities, such as vision and audio, and serves
  as a modular toolbox for inference and training of
  diffusion models.
keywords:
  - deep-learning
  - pytorch
  - image-generation
  - diffusion
  - text2image
  - image2image
  - score-based-generative-modeling
  - stable-diffusion
license: Apache-2.0
version: 0.12.1

GitHub Events

Total

Watch event: 11

Last Year

Watch event: 11

Dependencies

examples/research_projects/colossalai/requirement.txt pypi

Jinja2 *
diffusers *
ftfy *
tensorboard *
torch *
torchvision *
transformers *

examples/research_projects/dreambooth_inpaint/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
diffusers ==0.9.0
ftfy *
tensorboard *
torchvision *
transformers >=4.21.0

examples/research_projects/intel_opts/textual_inversion/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
intel_extension_for_pytorch >=1.13
tensorboard *
torchvision *
transformers >=4.21.0

examples/research_projects/intel_opts/textual_inversion_dfq/requirements.txt pypi

accelerate *
ftfy *
modelcards *
neural-compressor *
tensorboard *
torchvision *
transformers >=4.25.0

examples/research_projects/lora/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
datasets *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/mulit_token_textual_inversion/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/mulit_token_textual_inversion/requirements_flax.txt pypi

Jinja2 *
flax *
ftfy *
optax *
tensorboard *
torch *
torchvision *
transformers >=4.25.1

examples/research_projects/multi_subject_dreambooth/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/onnxruntime/text_to_image/requirements.txt pypi

accelerate >=0.16.0
datasets *
ftfy *
modelcards *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/onnxruntime/textual_inversion/requirements.txt pypi

accelerate >=0.16.0
ftfy *
modelcards *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/onnxruntime/unconditional_image_generation/requirements.txt pypi

accelerate >=0.16.0
datasets *
tensorboard *
torchvision *

examples/text_to_image/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
datasets *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/text_to_image/requirements_flax.txt pypi

Jinja2 *
datasets *
flax *
ftfy *
optax *
tensorboard *
torch *
torchvision *
transformers >=4.25.1

examples/textual_inversion/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/textual_inversion/requirements_flax.txt pypi

Jinja2 *
flax *
ftfy *
optax *
tensorboard *
torch *
torchvision *
transformers >=4.25.1

examples/unconditional_image_generation/requirements.txt pypi

accelerate >=0.16.0
datasets *
torchvision *

pyproject.toml pypi

setup.py pypi

deps *

src/diffusers/models/shap_e/setup.py pypi

Pillow *
blobfile *
clip *
filelock *
fire *
humanize *
matplotlib *
numpy *
requests *
scikit-image *
scipy *
torch *
tqdm *

.github/actions/setup-miniconda/action.yml actions

actions/cache v2 composite

.github/workflows/build_docker_images.yml actions

actions/checkout v3 composite
docker/build-push-action v3 composite
docker/login-action v2 composite

.github/workflows/build_documentation.yml actions

.github/workflows/build_pr_documentation.yml actions

.github/workflows/delete_doc_comment.yml actions

.github/workflows/delete_doc_comment_trigger.yml actions

.github/workflows/nightly_tests.yml actions

./.github/actions/setup-miniconda * composite
actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/pr_quality.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/pr_tests.yml actions

actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/push_tests.yml actions

actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/push_tests_fast.yml actions

actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/push_tests_mps.yml actions

./.github/actions/setup-miniconda * composite
actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/stale.yml actions

actions/checkout v2 composite
actions/setup-python v1 composite

.github/workflows/typos.yml actions

actions/checkout v3 composite
crate-ci/typos v1.12.4 composite

.github/workflows/upload_pr_documentation.yml actions

docker/diffusers-flax-cpu/Dockerfile docker

ubuntu 20.04 build

docker/diffusers-flax-tpu/Dockerfile docker

ubuntu 20.04 build

docker/diffusers-onnxruntime-cpu/Dockerfile docker

ubuntu 20.04 build

docker/diffusers-onnxruntime-cuda/Dockerfile docker

nvidia/cuda 11.6.2-cudnn8-devel-ubuntu20.04 build

docker/diffusers-pytorch-cpu/Dockerfile docker

ubuntu 20.04 build

docker/diffusers-pytorch-cuda/Dockerfile docker

nvidia/cuda 11.7.1-cudnn8-runtime-ubuntu20.04 build

examples/bidiff/requirements.txt pypi

PyMCubes *
SentencePiece *
accelerate >=0.16.0
bitsandbytes *
datasets *
easydict *
einops *
ftfy *
icecream *
imageio *
inplace_abn *
matplotlib *
ninja *
omegaconf *
opencv-python *
pandas *
tensorboard *
torch_scatter *
torchvision *
tqdm *
transformers >=4.25.1
trimesh *

examples/controlnet/requirements.txt pypi

accelerate >=0.16.0
datasets *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/controlnet/requirements_flax.txt pypi

Jinja2 *
datasets *
flax *
ftfy *
optax *
tensorboard *
torch *
torchvision *
transformers >=4.25.1

examples/custom_diffusion/requirements.txt pypi

Jinja2 *
accelerate *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/dreambooth/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/dreambooth/requirements_flax.txt pypi

Jinja2 *
flax *
ftfy *
optax *
tensorboard *
torch *
torchvision *
transformers >=4.25.1

examples/instruct_pix2pix/requirements.txt pypi

accelerate >=0.16.0
datasets *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science