https://github.com/compvis/fm-boosting

FMBoost: Boosting Latent Diffusion with Flow Matching (ECCV 2024 Oral)

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 7 committers (14.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.9%) to scientific vocabulary

Keywords

diffusion-models flow-matching stable-diffusion super-resolution

Last synced: 5 months ago · JSON representation

Repository

FMBoost: Boosting Latent Diffusion with Flow Matching (ECCV 2024 Oral)

Basic Info

Host: GitHub
Owner: CompVis
License: mit
Language: Python
Default Branch: main
Homepage: https://compvis.github.io/fm-boosting/
Size: 107 MB

Statistics

Stars: 229
Watchers: 28
Forks: 4
Open Issues: 5
Releases: 0

Topics

diffusion-models flow-matching stable-diffusion super-resolution

Created about 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

🚀 Boosting Latent Diffusion with Flow Matching

Johannes Schusterbauer ^* · Ming Gui^* · Pingchuan Ma^* · Nick Stracke · Stefan A. Baumann · Vincent Tao Hu · Björn Ommer

CompVis Group @ LMU Munich

^* equal contribution

ECCV 2024 Oral

cover

Samples synthesized in $1024^2$ px. We elevate DMs and similar architectures to a higher-resolution domain, achieving exceptionally rapid processing speeds. We leverage the Latent Consistency Models (LCM), distilled from SD1.5 and SDXL, respectively. To achieve the same resolution as LCM (SDXL), we boost LCM-SD1.5 with our general Coupling Flow Matching (CFM) model. This yields a further speedup in the synthesis process and enables the generation of high-resolution images of high fidelity in an average $0.347$ seconds. The LCM-SDXL model fails to produce competitive results within this shortened timeframe, highlighting the effectiveness of our approach in achieving both speed and quality in image synthesis.

📝 Overview

In this work, we leverage the complementary strengths of Diffusion Models (DMs), Flow Matching models (FMs), and Variational AutoEncoders (VAEs): the diversity of stochastic DMs, the speed of FMs in training and inference stages, and the efficiency of a convolutional decoder to map latents into pixel space. This synergy results in a small diffusion model that excels in generating diverse samples at a low resolution. Flow Matching then takes a direct path from this lower-resolution representation to a higher-resolution latent, which is subsequently translated into a high-resolution image by a convolutional decoder. We achieve competitive high-resolution image synthesis at $1024^2$ and $2048^2$ pixels with minimal computational cost.

🚀 Pipeline

During training we feed both a low- and a high-res image through the pre-trained encoder to obtain a low- and a high-res latent code. Our model is trained to regress a vector field which forms a probability path from the low- to the high-res latent within $t \in [0, 1]$.

training

At inference we can take any diffusion model, generate the low-res latent, and then use our Coupling Flow Matching model to synthesize the higher dimensional latent code. Finally, the pre-trained decoder projects the latent code back to pixel space, resulting in $1024^2$ or $2048^2$ images.

inference

📈 Results

We show zero-shot quantitative comparison of our method against other state-of-the-art methods on the COCO dataset. Our method achieves a good trade-off between performance and computational cost.

results-coco

We can cascade our models to increase the resolution of a $128^2$ px LDM 1.5 generation to a $2048^2$ px output.

cascading

You can find more qualitative results on our project page.

🔥 Usage

Please execute the following command to download the first stage autoencoder checkpoint: mkdir checkpoints wget -O checkpoints/sd_ae.ckpt https://www.dropbox.com/scl/fi/lvfvy7qou05kxfbqz5d42/sd_ae.ckpt?rlkey=fvtu2o48namouu9x3w08olv3o&st=vahu44z5&dl=0

Data

For training the model, you have to provide a config file. An example config can be found in configs/flow400_64-128/unet-base_psu.yaml. Please customize the data part to your use case.

In order to speed up the training process, we pre-computed the latents. Your dataloader should return a batch with the following keys, i.e. image, latent, and latent_lowres. Please notice that we use pixel space upsampling (PSU in the paper), therefore the latent and latent_lowres should have the same spatial resolution (refer to L228 extract_from_batch() in fmboost/trainer.py).

Training

Afterwards, you can start the training with

bash python3 train.py --config configs/flow400_64-128/unet-base_psu.yaml --name your-name --use_wandb

the flag --use_wandb enables logging to WandB. By default, it only logs metrics to a CSV file and tensorboard. All logs are stored in the logs folder. You can also define a folder structure for your experiment name, e.g. logs/exp_name.

Resume checkpoint

If you want to resume from a checkpoint, just add the additional parameter

bash ... --resume_checkpoint path_to_your_checkpoint.ckpt

This resumes all states from the checkpoint (i.e. optimizer states). If you want to just load weights in a non-strict manner from some checkpoint, use the --load_weights argument.

Inference

We will release a pretrained checkpoint and the corresponding inference jupyter notebook soon. Stay tuned!

🎓 Citation

Please cite our paper:

bibtex @InProceedings{schusterbauer2024boosting, title={Boosting Latent Diffusion with Flow Matching}, author={Johannes Schusterbauer and Ming Gui and Pingchuan Ma and Nick Stracke and Stefan A. Baumann and Vincent Tao Hu and Björn Ommer}, booktitle = {ECCV}, year={2024} }

Owner

Name: CompVis - Computer Vision and Learning LMU Munich
Login: CompVis
Kind: organization
Email: assist.mvl@lrz.uni-muenchen.de
Location: Germany

Website: https://ommer-lab.com/
Repositories: 33
Profile: https://github.com/CompVis

Computer Vision and Learning research group at Ludwig Maximilian University of Munich (formerly Computer Vision Group at Heidelberg University)

GitHub Events

Total

Issues event: 4
Watch event: 61
Issue comment event: 3
Push event: 2
Fork event: 5

Last Year

Issues event: 4
Watch event: 61
Issue comment event: 3
Push event: 2
Fork event: 5

Committers

Last synced: 9 months ago

All Time

Total Commits: 32
Total Committers: 7
Avg Commits per committer: 4.571
Development Distribution Score (DDS): 0.75

Past Year

Commits: 10
Committers: 3
Avg Commits per committer: 3.333
Development Distribution Score (DDS): 0.4

Top Committers

Name	Email	Commits
joh-fischer	7****r	8
Tao Hu	d****o	6
m990130	p**a@l**e	5
Pingchuan Ma	P**a@l**e	5
mgui	m**i@l**e	4
Johannes Fischer	j**r@o**m	3
BiggerPung	p**a@P**x	1

Committer Domains (Top 20 + Academic)

lmu.de: 2 pingcma-mac.fritz.box: 1 lrz.uni-muenchen.de: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 9
Total pull requests: 0
Average time to close issues: 3 months
Average time to close pull requests: N/A
Total issue authors: 9
Total pull request authors: 0
Average comments per issue: 1.78
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 0
Average time to close issues: about 2 months
Average time to close pull requests: N/A
Issue authors: 5
Pull request authors: 0
Average comments per issue: 0.8
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

yeates (1)
benearnthof (1)
jshilong (1)
SorryMaker666 (1)
ygean (1)
philipwan (1)
ziyizhetutanota (1)
NickSpanos55 (1)
Oguzhanercan (1)

https://github.com/compvis/fm-boosting

Science Score: 33.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

🚀 Boosting Latent Diffusion with Flow Matching

📝 Overview

🚀 Pipeline

📈 Results

🔥 Usage

Data

Training

Resume checkpoint

Inference

🎓 Citation

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels