https://github.com/aim-uofa/genpercept

[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

https://github.com/aim-uofa/genpercept

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.8%) to scientific vocabulary

Keywords

depth-estimation dichotomous-image-segmentation human-pose-estimation iclr2025 image-matting monocular-depth-estimation one-step semantic-segmentation surface-normals
Last synced: 5 months ago · JSON representation

Repository

[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models

Basic Info
Statistics
  • Stars: 188
  • Watchers: 5
  • Forks: 7
  • Open Issues: 6
  • Releases: 0
Topics
depth-estimation dichotomous-image-segmentation human-pose-estimation iclr2025 image-matting monocular-depth-estimation one-step semantic-segmentation surface-normals
Created almost 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

[ICLR2025] What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?

Former Title: "Diffusion Models Trained with Large Data Are Transferable Visual Models" [Guangkai Xu](https://github.com/guangkaixu/),   [Yongtao Ge](https://yongtaoge.github.io/),   [Mingyu Liu](https://mingyulau.github.io/),   [Chengxiang Fan](https://leaf1170124460.github.io/),  
[Kangyang Xie](https://github.com/felix-ky),   [Zhiyue Zhao](https://github.com/ZhiyueZhau),   [Hao Chen](https://stan-haochen.github.io/),   [Chunhua Shen](https://cshen.github.io/),   Zhejiang University ### [HuggingFace (Space)](https://huggingface.co/spaces/guangkaixu/GenPercept) | [HuggingFace (Model)](https://huggingface.co/guangkaixu/genpercept-models) | [arXiv](https://arxiv.org/abs/2403.06090) #### 🔥 Fine-tune diffusion models for perception tasks, and inference with only one step! ✈️
image

📢 News

  • 2025.1.24: 🎉🎉🎉 GenPercept has been accepted by ICLR 2025. 🎉🎉🎉
  • 2024.10.25: Update GenPercept Huggingface App demo.
  • 2024.10.24: Release latest training and inference code, which is armed with the accelerate library and based on Marigold.
  • 2024.10.24: Release arXiv v3 paper. We reorganize the structure of the paper and offer more detailed analysis.
  • 2024.4.30: Release checkpoint weights of surface normal and dichotomous image segmentation.
  • 2024.4.7: Add HuggingFace App demo.
  • 2024.4.6: Release inference code and depth checkpoint weight of GenPercept in the GitHub repo.
  • 2024.3.15: Release arXiv v2 paper, with supplementary material.
  • 2024.3.10: Release arXiv v1 paper.

📚 Download Resource Summary

  • Space-Huggingface demo: https://huggingface.co/spaces/guangkaixu/GenPercept.
  • Models-all (including ablation study): https://huggingface.co/guangkaixu/genpercept-exps.
  • Models-main-paper: https://huggingface.co/guangkaixu/genpercept-models.
  • Models-depth: https://huggingface.co/guangkaixu/genpercept-depth.
  • Models-normal: https://huggingface.co/guangkaixu/genpercept-normal.
  • Models-dis: https://huggingface.co/guangkaixu/genpercept-dis.
  • Models-matting: https://huggingface.co/guangkaixu/genpercept-matting.
  • Models-seg: https://huggingface.co/guangkaixu/genpercept-seg.
  • Models-disparity: https://huggingface.co/guangkaixu/genpercept-disparity.
  • Models-disparity-dpt-head: https://huggingface.co/guangkaixu/genpercept-disparity-dpt-head.
  • Datasets-input demo: https://huggingface.co/datasets/guangkaixu/genpercept-input-demo.
  • Datasets-evaluation data: https://huggingface.co/datasets/guangkaixu/genperceptdatasetseval.
  • Datasets-evaluation results: https://huggingface.co/datasets/guangkaixu/genpercept-exps-eval.

🖥️ Dependencies

bash conda create -n genpercept python=3.10 conda activate genpercept pip install -r requirements.txt pip install -e .

🚀 Inference

Using Command-line Scripts

Download the stable-diffusion-2-1 and our trained models from HuggingFace and put the checkpoints under ./pretrained_weights/ and ./weights/, respectively. You can download them with the script script/download_sd21.sh and script/download_weights.sh, or download the weights of depth, normal, Dichotomous Image Segmentation, matting, segmentation, disparity, disparitydpthead seperately.

Then, place images in the ./input/ dictionary. We offer demo images in Huggingface, and you can also download with the script script/download_sample_data.sh. Then, run inference with scripts as below.

```bash

Depth

source script/infer/mainpaper/inferencegenpercept_depth.sh

Normal

source script/infer/mainpaper/inferencegenpercept_normal.sh

Dis

source script/infer/mainpaper/inferencegenpercept_dis.sh

Matting

source script/infer/mainpaper/inferencegenpercept_matting.sh

Seg

source script/infer/mainpaper/inferencegenpercept_seg.sh

Disparity

source script/infer/mainpaper/inferencegenpercept_disparity.sh

Disparitydpthead

source script/infer/mainpaper/inferencegenperceptdisparitydpt_head.sh ```

If you would like to change the input folder path, unet path, and output path, input these parameters like: ```bash

Assign a values

inputrgbdir=... unet=... output_dir=...

Take depth as example

source script/infer/mainpaper/inferencegenperceptdepth.sh $inputrgbdir $unet $outputdir For a general inference script, please seescript/infer/inference_general.sh``` in detail.

Thanks to our one-step perception paradigm, the inference process runs much faster. (Around 0.4s for each image on an A800 GPU card.)

Using torch.hub

TODO

🔥 Train

NOTE: We implement the training with the accelerate library, but find a worse training accuracy with multi gpus compared to one gpu, with the same training effective_batch_size and max_iter. Your assistance in resolving this issue would be greatly appreciated. Thank you very much!

Preparation

Datasets: TODO

Place training datasets unser datasets/

Download the stable-diffusion-2-1 from HuggingFace and put the checkpoints under ./pretrained_weights/. You can also download with the script script/download_sd21.sh.

Start Training

The reproduction training scripts in arxiv v3 paper is released in script/, whose configs are stored in config/. Models with max_train_batch_size > 2 are trained on an H100 and max_train_batch_size <= 2 on an RTX 4090. Run the train script:

```bash

Take depth training of main paper as an example

source script/trainsd21mainpaper/sd21trainaccelerategenpercept1cardensuredepthbs8peraccupixelmsessigrad_loss.sh ```

🎖️ Eval

Preparation

  1. Download evaluation datasets and place them in datasets_eval.
  2. Download our trained models of main paper and ablation study in Section 3 of arxiv v3 paper, and place them in weights/genpercept-exps.

Start Evaluation

The evaluation scripts are stored in script/eval_sd21.

```bash

Take "ensemble1 + step1" as an example

source script/evalsd21/evalensemble1step1/0inferevalall.sh ```

📖 Recommanded Works

  • Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. arXiv, GitHub.
  • GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. arXiv, GitHub.
  • FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models. arXiv, GitHub.

👍 Results in Paper

Depth and Surface Normal

image

Dichotomous Image Segmentation

image

Image Matting

image

Image Segmentation

image

🎫 License

For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

🎓 Citation

@article{xu2024diffusion, title={What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?}, author={Xu, Guangkai and Ge, Yongtao and Liu, Mingyu and Fan, Chengxiang and Xie, Kangyang and Zhao, Zhiyue and Chen, Hao and Shen, Chunhua}, journal={arXiv preprint arXiv:2403.06090}, year={2024} }

Owner

  • Name: Advanced Intelligent Machines (AIM)
  • Login: aim-uofa
  • Kind: organization
  • Location: China

A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...

GitHub Events

Total
  • Issues event: 15
  • Watch event: 82
  • Issue comment event: 10
  • Push event: 6
  • Fork event: 2
Last Year
  • Issues event: 15
  • Watch event: 82
  • Issue comment event: 10
  • Push event: 6
  • Fork event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 29
  • Total Committers: 4
  • Avg Commits per committer: 7.25
  • Development Distribution Score (DDS): 0.31
Past Year
  • Commits: 17
  • Committers: 3
  • Avg Commits per committer: 5.667
  • Development Distribution Score (DDS): 0.471
Top Committers
Name Email Commits
guangkaixu g****u@g****m 20
Chunhua Shen 1****n 4
hugoycj 5****9@q****m 4
YongtaoGe y****e@a****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 21
  • Total pull requests: 1
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 28 days
  • Total issue authors: 19
  • Total pull request authors: 1
  • Average comments per issue: 1.05
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 10
  • Pull requests: 0
  • Average time to close issues: 3 days
  • Average time to close pull requests: N/A
  • Issue authors: 10
  • Pull request authors: 0
  • Average comments per issue: 0.6
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • gituser123456789000 (2)
  • jk4011 (2)
  • shixuan7 (1)
  • Binyr (1)
  • haodong2000 (1)
  • faruknane (1)
  • yangcong1617 (1)
  • gwang-kim (1)
  • liziwennba (1)
  • 2hiTee (1)
  • yyy-123-abc (1)
  • chester256 (1)
  • Boatsure (1)
  • CoinCheung (1)
  • mldemox (1)
Pull Request Authors
  • hugoycj (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • accelerate >=0.22.0
  • datasets >=2.15.0
  • diffusers >=0.20.1
  • h5py *
  • jsonlines *
  • matplotlib >=3.8.2
  • more_itertools *
  • opencv-python *
  • peft *
  • plyfile *
  • scikit-image *
  • scipy >=1.11.4
  • tensorboard *
  • torch ==2.0.1
  • torchvision *
  • transformers >=4.32.1
  • xformers ==0.0.20
setup.py pypi