Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: jungletada
- License: apache-2.0
- Language: Python
- Default Branch: master
- Size: 62.5 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Exploiting Diffusion Prior for Generalizable Dense Prediction
Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee, Ming-Hsuan Yang
CVPR 2024
This is the official implementation of Exploiting Diffusion Prior for Generalizable Dense Prediction.
Quick Start
Installation
Our implementation is based on Python 3.10 and CUDA 11.3.
Required
diffusers==0.20.0
pytorch==1.12.1
torchvision==0.13.1
transformers==4.31.0
Optional
shell
accelerate # for training
gradio # for demo
omegaconf # for configuration
xformers # for acceleration
Checkpoints
We provide the model weights of five tasks for reproducing the results in the paper. These checkpoints are trained with 10K synthesized bedroom images, prompts, and pseudo ground truths.
Besides, for normal and depth prediction, we provide the weights trained with more diverse scenes and without prompts, which are more suitable for practical use cases.
Download the weights from this google drive and place them in the root directory.
Run
For checkpoints with -notext, set disable_prompts=True.
```python from PIL import Image from pipeline import Pipeline
LORADIR = 'ckpt/normal-scene100-notext' disableprompts = LORADIR.endswith('-notext') ppl = Pipeline( disableprompts=disableprompts, lorackpt=LORADIR, device='cuda', mixedprecision='fp16', ) img = Image.open('/path/to/img') ```
For depth prediction,
python
output_np_array = ppl(img, inference_step=5, target_mode='F')
Otherwise,
python
output_pil_img = ppl(img, inference_step=5, target_mode='RGB')
Alternatively, we provide Gradio demo. You can launch it with
shell
python app.py
and access the app at localhost:7860.
Data Generation
Images
We conduct the experiments with synthetic images, so we can control and analyze the performance of different data domains. We first generate prompts with scene keywords. Then we generate images with the prompts.
To generate prompts,
shell
python tools/gencap.py KEYWORD -n NUMBER_OF_PROMPTS -o OUTPUT_TXT
KEYWORD can be a single word or a text file containing multiple words separated by lines.
To generate images,
shell
python tools/txt2img.py --from-file PROMPTS_TXT --output OUTPUT_DIR --batch-size BSZ
These two scripts are some wrappers of huggingface's transformers and diffusers.
Then make a meta file to record images and prompts. Prompts are not necessary if you set disable-prompts (see the section of training).
shell
python tools/makemeta.py --imgs IMAGE_DIR [--captions PROMPTS]
It collects the png and jpg files in IMAGE_DIR, sort them by their file names, and generates a metadata.jsonl in IMAGE_DIR with the same format as huggingface's ImageFolder. If prompts are provided, it should be in the same order as the file names.
Pseudo Ground Truths
Then we generate pseudo ground truths with the following code bases.
surface normals: 3DCommonCorruptions
depths: ZoeDepth
albedo and shading: PIE-Net
semantic segmentation: EVA-02
For normals, albedo and shading, clone the repo, set up the environments, and put getnorm.py and getintr.py in each directory.
For depths, getdepth.py can be run in isolation.
shell
python tools/get{norm,depth,intr}.py -i INPUT_IMG_DIR -o OUTPUT_DIR
These scripts store the predictions in lmdb by default. The keys of predictions are the file names without extensions. The keys of albedo and shading outputs get an extra -r (reflectance) and-s (shading) suffix. Use --save-files to save outputs in files.
For semantic segmentation, generate segmentation maps with eva02_L_ade_seg_upernet_sz512 in EVA-02.
Detailed instructions
1. Download `eva02_L_ade_seg_upernet_sz512.pth` 2. Make a directory with an arbitrary name, e.g. `dummy`, and make another directory named `images` under it. ```shell mkdir -p dummy/images ``` 3. Link the directory of input images as `validation` in `dummy/images` ```shell ln -s INPUT_IMG_DIR dummy/images/validation ``` 4. Modify `data_root` at line 3 in `configs/_base_/datasets/ade20k.py` to be `dummy` 5. Run `test.py` in the [EVA-02 repo](https://github.com/baaivision/EVA/tree/master/EVA-02/seg) with ```shell python test.py \ configs/eva02/upernet/upernet_eva02_large_24_512_slide_80k.py \ eva02_L_ade_seg_upernet_sz512.pth \ --show-dir OUTPUT_DIR \ --opacity 1 ``` 6. Convert the segmentation maps with better color mapping for classes commonly seen in bedrooms. ```shell python tools/color2cls.py INPUT_DIR OUTPUT_DIR --pal 1 --ext png python tools/cls2color.py INPUT_DIR OUTPUT_DIR --pal 2 ```Then collect the segmentation maps in lmdb.
shell
python tools/makedb.py INPUT_DIR OUTPUT_DB
Training
To reproduce the trained models, the following script is the basic setting for all tasks. The script is adaped from an example provided by huggingface.
bash
accelerate launch \
--num_processes=2 --mixed_precision="fp16" train.py \
--train_batch_size=8 \
--max_train_steps=50000 \
--learning_rate=1e-04 \
--lr_scheduler="cosine" \
--lr_warmup_steps=0 \
--prediction_type="v_prediction"
Additionally, for depths, set --target_mode=F and --target_scale=8.
For depths, albedo, shading, and segmentation, set --random_flip.
For albedo, set --target_extra_key=r.
For shading, set --target_extra_key=s.
To add and train lora for only self-attention, set --self_attn_only.
To disable prompts, set --disable_prompts.
To enable xformers, set --enable_xformers_memory_efficient_attention.
Inference
To generate predictions, run infer.py with the same options you run train.py.
```shell DATADIR="/path/to/source/images" # optinoal PROMPTS="/path/to/prompts.txt" # optional LORADIR="/path/to/train/output" OUTPUT_DIR="/path/to/output"
python infer.py \ --src $DATADIR \ --prompts $PROMPTS \ --lora-ckpt $LORADIR \ --output $OUTPUT_DIR \ --config config.yaml \ --batch-size 4 ```
For depths, set --target-mode=F, --target-scale=8. It generates depths and saves in numpy compressed npz format with key x.
Optionally set --target-pred-type, --self-attn-only, and --disable-prompts that aligns training. If you don't provide --src, it will generate images with the original (no lora) model from --prompts . If you don't set --disable-prompts but forget to provide --prompts, it will raise an error.
More settings for the generation process such as the number of generation steps and guidance scales are in config.yaml.
Besides, in the paper we construct the samples of previous diffusion steps with input images and estimated output predictions, but we empirically found using the orignial DDIM, which estimates both input images and output predictions, gives slightly worse in-domain performance but slightly better generalizability. The difference is little, though. The results in the paper were generated by the original DDIM. Set --use-oracle-ddim to use exactly the same generation process of the paper.
Also note that the words in the options are connected by hyphens -, not underscores _.
Evaluation
The evaluation script runs on GPU. For normals,
shell
python test/eval.py PRED GROUND_TRUTH --metrics l1 angular
For depths,
shell
python test/eval.py PRED GROUND_TRUTH --metrics rel delta --ext npz --abs
python test/eval.py PRED GROUND_TRUTH --metrics rmse --ext npz --abs --norm
For albedo and shading,
shell
python test/eval.py PRED GROUND_TRUTH --metrics mse
For segmentation, turn output images into class maps.
shell
python tools/color2cls.py INPUT_DIR OUTPUT_DIR --pal 2 --ext npy --filter
Then calculate miou.
shell
python test/miou.py PRED GROUND_TRUTH
The mIoU evaluation is borrowed from MMSegmentation.
Acknowledgement
This repo contains the code from diffusers and MMSegmentation.
Citation
bibtex
@InProceedings{lee2024dmp,
author = {Lee, Hsin-Ying and Tseng, Hung-Yu and Lee, Hsin-Ying and Yang, Ming-Hsuan},
title = {Exploiting Diffusion Prior for Generalizable Dense Prediction},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024},
}
Owner
- Name: DINGJIE PENG
- Login: jungletada
- Kind: user
- Location: Tokyo, Japan
- Repositories: 1
- Profile: https://github.com/jungletada
Citation (CITATION.cff)
cff-version: 1.2.0
title: >-
Exploiting Diffusion Prior for Generalizable Dense Prediction
message: >-
If you find this work interesting, please cite it as below.
authors:
- given-names: Hsin-Ying
family-names: Lee
affiliation: UC Merced
- given-names: Hung-Yu
family-names: Tseng
affiliation: Meta
- given-names: Hsin-Ying
family-names: Lee
affiliation: Snap
- given-names: Ming-Hsuan
family-names: Yang
affiliation: UC Merced
preferred-citation:
type: conference-paper
title: Exploiting Diffusion Prior for Generalizable Dense Prediction
authors:
- given-names: Hsin-Ying
family-names: Lee
affiliation: UC Merced
- given-names: Hung-Yu
family-names: Tseng
affiliation: Meta
- given-names: Hsin-Ying
family-names: Lee
affiliation: Snap
- given-names: Ming-Hsuan
family-names: Yang
affiliation: UC Merced
year: 2024
collection-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
GitHub Events
Total
- Push event: 2
- Create event: 2
Last Year
- Push event: 2
- Create event: 2