https://github.com/aim-uofa/genpercept

[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 4 committers (25.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.8%) to scientific vocabulary

Keywords

depth-estimation dichotomous-image-segmentation human-pose-estimation iclr2025 image-matting monocular-depth-estimation one-step semantic-segmentation surface-normals

Last synced: 5 months ago · JSON representation

Repository

[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

Basic Info

Host: GitHub
Owner: aim-uofa
License: bsd-2-clause
Language: Python
Default Branch: main
Homepage: https://huggingface.co/spaces/guangkaixu/GenPercept
Size: 38.2 MB

Statistics

Stars: 188
Watchers: 5
Forks: 7
Open Issues: 6
Releases: 0

Topics

depth-estimation dichotomous-image-segmentation human-pose-estimation iclr2025 image-matting monocular-depth-estimation one-step semantic-segmentation surface-normals

Created almost 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

[ICLR2025] What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Former Title: "Diffusion Models Trained with Large Data Are Transferable Visual Models" [Guangkai Xu](https://github.com/guangkaixu/), [Yongtao Ge](https://yongtaoge.github.io/), [Mingyu Liu](https://mingyulau.github.io/), [Chengxiang Fan](https://leaf1170124460.github.io/),
[Kangyang Xie](https://github.com/felix-ky), [Zhiyue Zhao](https://github.com/ZhiyueZhau), [Hao Chen](https://stan-haochen.github.io/), [Chunhua Shen](https://cshen.github.io/), Zhejiang University ### [HuggingFace (Space)](https://huggingface.co/spaces/guangkaixu/GenPercept) | [HuggingFace (Model)](https://huggingface.co/guangkaixu/genpercept-models) | [arXiv](https://arxiv.org/abs/2403.06090) #### 🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️

📢 News

2025.1.24: 🎉🎉🎉 GenPercept has been accepted by ICLR 2025. 🎉🎉🎉
2024.10.25: Update GenPercept Huggingface App demo.
2024.10.24: Release latest training and inference code, which is armed with the accelerate library and based on Marigold.
2024.10.24: Release arXiv v3 paper. We reorganize the structure of the paper and offer more detailed analysis.
2024.4.30: Release checkpoint weights of surface normal and dichotomous image segmentation.
2024.4.7: Add HuggingFace App demo.
2024.4.6: Release inference code and depth checkpoint weight of GenPercept in the GitHub repo.
2024.3.15: Release arXiv v2 paper, with supplementary material.
2024.3.10: Release arXiv v1 paper.

📚 Download Resource Summary

Space-Huggingface demo: https://huggingface.co/spaces/guangkaixu/GenPercept.
Models-all (including ablation study): https://huggingface.co/guangkaixu/genpercept-exps.
Models-main-paper: https://huggingface.co/guangkaixu/genpercept-models.
Models-depth: https://huggingface.co/guangkaixu/genpercept-depth.
Models-normal: https://huggingface.co/guangkaixu/genpercept-normal.
Models-dis: https://huggingface.co/guangkaixu/genpercept-dis.
Models-matting: https://huggingface.co/guangkaixu/genpercept-matting.
Models-seg: https://huggingface.co/guangkaixu/genpercept-seg.
Models-disparity: https://huggingface.co/guangkaixu/genpercept-disparity.
Models-disparity-dpt-head: https://huggingface.co/guangkaixu/genpercept-disparity-dpt-head.
Datasets-input demo: https://huggingface.co/datasets/guangkaixu/genpercept-input-demo.
Datasets-evaluation data: https://huggingface.co/datasets/guangkaixu/genperceptdatasetseval.
Datasets-evaluation results: https://huggingface.co/datasets/guangkaixu/genpercept-exps-eval.

🖥️ Dependencies

bash conda create -n genpercept python=3.10 conda activate genpercept pip install -r requirements.txt pip install -e .

🚀 Inference

Using Command-line Scripts

Download the stable-diffusion-2-1 and our trained models from HuggingFace and put the checkpoints under ./pretrained_weights/ and ./weights/, respectively. You can download them with the script script/download_sd21.sh and script/download_weights.sh, or download the weights of depth, normal, Dichotomous Image Segmentation, matting, segmentation, disparity, disparitydpthead seperately.

Then, place images in the ./input/ dictionary. We offer demo images in Huggingface, and you can also download with the script script/download_sample_data.sh. Then, run inference with scripts as below.

```bash

Depth

source script/infer/mainpaper/inferencegenpercept_depth.sh

Normal

source script/infer/mainpaper/inferencegenpercept_normal.sh

Dis

source script/infer/mainpaper/inferencegenpercept_dis.sh

Matting

source script/infer/mainpaper/inferencegenpercept_matting.sh

Seg

source script/infer/mainpaper/inferencegenpercept_seg.sh

Disparity

source script/infer/mainpaper/inferencegenpercept_disparity.sh

Disparitydpthead

source script/infer/mainpaper/inferencegenperceptdisparitydpt_head.sh ```

If you would like to change the input folder path, unet path, and output path, input these parameters like: ```bash

Assign a values

inputrgbdir=... unet=... output_dir=...

Take depth as example

source script/infer/mainpaper/inferencegenperceptdepth.sh $inputrgbdir $unet $outputdir For a general inference script, please seescript/infer/inference_general.sh``` in detail.

Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)

Using torch.hub

TODO

🔥 Train

NOTE: We implement the training with the accelerate library, but find a worse training accuracy with multi gpus compared to one gpu, with the same training effective_batch_size and max_iter. Your assistance in resolving this issue would be greatly appreciated. Thank you very much!

Preparation

Datasets: TODO

Place training datasets unser datasets/

Download the stable-diffusion-2-1 from HuggingFace and put the checkpoints under ./pretrained_weights/. You can also download with the script script/download_sd21.sh.

Start Training

The reproduction training scripts in arxiv v3 paper is released in script/, whose configs are stored in config/. Models with max_train_batch_size > 2 are trained on an H100 and max_train_batch_size <= 2 on an RTX 4090. Run the train script:

```bash

Take depth training of main paper as an example

source script/trainsd21mainpaper/sd21trainaccelerategenpercept1cardensuredepthbs8peraccupixelmsessigrad_loss.sh ```

🎖️ Eval

Preparation

Download evaluation datasets and place them in datasets_eval.
Download our trained models of main paper and ablation study in Section 3 of arxiv v3 paper, and place them in weights/genpercept-exps.

Start Evaluation

The evaluation scripts are stored in script/eval_sd21.

```bash

Take "ensemble1 + step1" as an example

source script/evalsd21/evalensemble1step1/0inferevalall.sh ```

📖 Recommanded Works

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. arXiv, GitHub.
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. arXiv, GitHub.
FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. arXiv, GitHub.

👍 Results in Paper

Depth and Surface Normal

Dichotomous Image Segmentation

Image Matting

Image Segmentation

🎫 License

For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

🎓 Citation

@article{xu2024diffusion, title={What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?}, author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua}, journal={arXiv preprint arXiv:2403.06090}, year={2024} }

Owner

Name: Advanced Intelligent Machines (AIM)
Login: aim-uofa
Kind: organization
Location: China

Repositories: 23
Profile: https://github.com/aim-uofa

A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...

GitHub Events

Total

Issues event: 15
Watch event: 82
Issue comment event: 10
Push event: 6
Fork event: 2

Last Year

Issues event: 15
Watch event: 82
Issue comment event: 10
Push event: 6
Fork event: 2

Committers

Last synced: 9 months ago

All Time

Total Commits: 29
Total Committers: 4
Avg Commits per committer: 7.25
Development Distribution Score (DDS): 0.31

Past Year

Commits: 17
Committers: 3
Avg Commits per committer: 5.667
Development Distribution Score (DDS): 0.471

Top Committers

Name	Email	Commits
guangkaixu	g**u@g**m	20
Chunhua Shen	1****n	4
hugoycj	5**9@q**m	4
YongtaoGe	y**e@a**u	1

Committer Domains (Top 20 + Academic)

adelaide.edu.au: 1 qq.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 21
Total pull requests: 1
Average time to close issues: about 1 month
Average time to close pull requests: 28 days
Total issue authors: 19
Total pull request authors: 1
Average comments per issue: 1.05
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 10
Pull requests: 0
Average time to close issues: 3 days
Average time to close pull requests: N/A
Issue authors: 10
Pull request authors: 0
Average comments per issue: 0.6
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

gituser123456789000 (2)
jk4011 (2)
shixuan7 (1)
Binyr (1)
haodong2000 (1)
faruknane (1)
yangcong1617 (1)
gwang-kim (1)
liziwennba (1)
2hiTee (1)
yyy-123-abc (1)
chester256 (1)
Boatsure (1)
CoinCheung (1)
mldemox (1)

Pull Request Authors

hugoycj (2)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

accelerate >=0.22.0
datasets >=2.15.0
diffusers >=0.20.1
h5py *
jsonlines *
matplotlib >=3.8.2
more_itertools *
opencv-python *
peft *
plyfile *
scikit-image *
scipy >=1.11.4
tensorboard *
torch ==2.0.1
torchvision *
transformers >=4.32.1
xformers ==0.0.20

setup.py pypi

https://github.com/aim-uofa/genpercept

Science Score: 33.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

[ICLR2025] What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

📢 News

📚 Download Resource Summary

🖥️ Dependencies

🚀 Inference

Using Command-line Scripts

Depth

Normal

Dis

Matting

Seg

Disparity

Disparitydpthead

Assign a values

Take depth as example

Using torch.hub

🔥 Train

Preparation

Start Training

Take depth training of main paper as an example

🎖️ Eval

Preparation

Start Evaluation

Take "ensemble1 + step1" as an example

📖 Recommanded Works

👍 Results in Paper

Depth and Surface Normal

Dichotomous Image Segmentation

Image Matting

Image Segmentation

🎫 License

🎓 Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies