https://github.com/aim-uofa/genpercept
[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models
Science Score: 33.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
✓Committers with academic emails
1 of 4 committers (25.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.8%) to scientific vocabulary
Keywords
Repository
[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models
Basic Info
- Host: GitHub
- Owner: aim-uofa
- License: bsd-2-clause
- Language: Python
- Default Branch: main
- Homepage: https://huggingface.co/spaces/guangkaixu/GenPercept
- Size: 38.2 MB
Statistics
- Stars: 188
- Watchers: 5
- Forks: 7
- Open Issues: 6
- Releases: 0
Topics
Metadata Files
README.md
[ICLR2025] What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?
Former Title: "Diffusion Models Trained with Large Data Are Transferable Visual Models" [Guangkai Xu](https://github.com/guangkaixu/), [Yongtao Ge](https://yongtaoge.github.io/), [Mingyu Liu](https://mingyulau.github.io/), [Chengxiang Fan](https://leaf1170124460.github.io/),[Kangyang Xie](https://github.com/felix-ky), [Zhiyue Zhao](https://github.com/ZhiyueZhau), [Hao Chen](https://stan-haochen.github.io/), [Chunhua Shen](https://cshen.github.io/), Zhejiang University ### [HuggingFace (Space)](https://huggingface.co/spaces/guangkaixu/GenPercept) | [HuggingFace (Model)](https://huggingface.co/guangkaixu/genpercept-models) | [arXiv](https://arxiv.org/abs/2403.06090) #### 🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️
📢 News
- 2025.1.24: 🎉🎉🎉 GenPercept has been accepted by ICLR 2025. 🎉🎉🎉
- 2024.10.25: Update GenPercept Huggingface App demo.
- 2024.10.24: Release latest training and inference code, which is armed with the accelerate library and based on Marigold.
- 2024.10.24: Release arXiv v3 paper. We reorganize the structure of the paper and offer more detailed analysis.
- 2024.4.30: Release checkpoint weights of surface normal and dichotomous image segmentation.
- 2024.4.7: Add HuggingFace App demo.
- 2024.4.6: Release inference code and depth checkpoint weight of GenPercept in the GitHub repo.
- 2024.3.15: Release arXiv v2 paper, with supplementary material.
- 2024.3.10: Release arXiv v1 paper.
📚 Download Resource Summary
- Space-Huggingface demo: https://huggingface.co/spaces/guangkaixu/GenPercept.
- Models-all (including ablation study): https://huggingface.co/guangkaixu/genpercept-exps.
- Models-main-paper: https://huggingface.co/guangkaixu/genpercept-models.
- Models-depth: https://huggingface.co/guangkaixu/genpercept-depth.
- Models-normal: https://huggingface.co/guangkaixu/genpercept-normal.
- Models-dis: https://huggingface.co/guangkaixu/genpercept-dis.
- Models-matting: https://huggingface.co/guangkaixu/genpercept-matting.
- Models-seg: https://huggingface.co/guangkaixu/genpercept-seg.
- Models-disparity: https://huggingface.co/guangkaixu/genpercept-disparity.
- Models-disparity-dpt-head: https://huggingface.co/guangkaixu/genpercept-disparity-dpt-head.
- Datasets-input demo: https://huggingface.co/datasets/guangkaixu/genpercept-input-demo.
- Datasets-evaluation data: https://huggingface.co/datasets/guangkaixu/genperceptdatasetseval.
- Datasets-evaluation results: https://huggingface.co/datasets/guangkaixu/genpercept-exps-eval.
🖥️ Dependencies
bash
conda create -n genpercept python=3.10
conda activate genpercept
pip install -r requirements.txt
pip install -e .
🚀 Inference
Using Command-line Scripts
Download the stable-diffusion-2-1 and our trained models from HuggingFace and put the checkpoints under ./pretrained_weights/ and ./weights/, respectively. You can download them with the script script/download_sd21.sh and script/download_weights.sh, or download the weights of depth, normal, Dichotomous Image Segmentation, matting, segmentation, disparity, disparitydpthead seperately.
Then, place images in the ./input/ dictionary. We offer demo images in Huggingface, and you can also download with the script script/download_sample_data.sh. Then, run inference with scripts as below.
```bash
Depth
source script/infer/mainpaper/inferencegenpercept_depth.sh
Normal
source script/infer/mainpaper/inferencegenpercept_normal.sh
Dis
source script/infer/mainpaper/inferencegenpercept_dis.sh
Matting
source script/infer/mainpaper/inferencegenpercept_matting.sh
Seg
source script/infer/mainpaper/inferencegenpercept_seg.sh
Disparity
source script/infer/mainpaper/inferencegenpercept_disparity.sh
Disparitydpthead
source script/infer/mainpaper/inferencegenperceptdisparitydpt_head.sh ```
If you would like to change the input folder path, unet path, and output path, input these parameters like: ```bash
Assign a values
inputrgbdir=... unet=... output_dir=...
Take depth as example
source script/infer/mainpaper/inferencegenperceptdepth.sh $inputrgbdir $unet $outputdir
For a general inference script, please seescript/infer/inference_general.sh``` in detail.
Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)
Using torch.hub
TODO
🔥 Train
NOTE: We implement the training with the accelerate library, but find a worse training accuracy with multi gpus compared to one gpu, with the same training effective_batch_size and max_iter. Your assistance in resolving this issue would be greatly appreciated. Thank you very much!
Preparation
Datasets: TODO
Place training datasets unser datasets/
Download the stable-diffusion-2-1 from HuggingFace and put the checkpoints under ./pretrained_weights/. You can also download with the script script/download_sd21.sh.
Start Training
The reproduction training scripts in arxiv v3 paper is released in script/, whose configs are stored in config/. Models with max_train_batch_size > 2 are trained on an H100 and max_train_batch_size <= 2 on an RTX 4090. Run the train script:
```bash
Take depth training of main paper as an example
source script/trainsd21mainpaper/sd21trainaccelerategenpercept1cardensuredepthbs8peraccupixelmsessigrad_loss.sh ```
🎖️ Eval
Preparation
- Download evaluation datasets and place them in
datasets_eval. - Download our trained models of main paper and ablation study in Section 3 of arxiv v3 paper, and place them in
weights/genpercept-exps.
Start Evaluation
The evaluation scripts are stored in script/eval_sd21.
```bash
Take "ensemble1 + step1" as an example
source script/evalsd21/evalensemble1step1/0inferevalall.sh ```
📖 Recommanded Works
- Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. arXiv, GitHub.
- GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. arXiv, GitHub.
- FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. arXiv, GitHub.
👍 Results in Paper
Depth and Surface Normal
Dichotomous Image Segmentation
Image Matting
Image Segmentation
🎫 License
For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.
🎓 Citation
@article{xu2024diffusion,
title={What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?},
author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua},
journal={arXiv preprint arXiv:2403.06090},
year={2024}
}
Owner
- Name: Advanced Intelligent Machines (AIM)
- Login: aim-uofa
- Kind: organization
- Location: China
- Repositories: 23
- Profile: https://github.com/aim-uofa
A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...
GitHub Events
Total
- Issues event: 15
- Watch event: 82
- Issue comment event: 10
- Push event: 6
- Fork event: 2
Last Year
- Issues event: 15
- Watch event: 82
- Issue comment event: 10
- Push event: 6
- Fork event: 2
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| guangkaixu | g****u@g****m | 20 |
| Chunhua Shen | 1****n | 4 |
| hugoycj | 5****9@q****m | 4 |
| YongtaoGe | y****e@a****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 21
- Total pull requests: 1
- Average time to close issues: about 1 month
- Average time to close pull requests: 28 days
- Total issue authors: 19
- Total pull request authors: 1
- Average comments per issue: 1.05
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 10
- Pull requests: 0
- Average time to close issues: 3 days
- Average time to close pull requests: N/A
- Issue authors: 10
- Pull request authors: 0
- Average comments per issue: 0.6
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- gituser123456789000 (2)
- jk4011 (2)
- shixuan7 (1)
- Binyr (1)
- haodong2000 (1)
- faruknane (1)
- yangcong1617 (1)
- gwang-kim (1)
- liziwennba (1)
- 2hiTee (1)
- yyy-123-abc (1)
- chester256 (1)
- Boatsure (1)
- CoinCheung (1)
- mldemox (1)
Pull Request Authors
- hugoycj (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- accelerate >=0.22.0
- datasets >=2.15.0
- diffusers >=0.20.1
- h5py *
- jsonlines *
- matplotlib >=3.8.2
- more_itertools *
- opencv-python *
- peft *
- plyfile *
- scikit-image *
- scipy >=1.11.4
- tensorboard *
- torch ==2.0.1
- torchvision *
- transformers >=4.32.1
- xformers ==0.0.20