Recent Releases of diffusers

diffusers - v0.35.1 for improvements in Qwen-Image Edit

Thanks to @naykun for the following PRs that improve Qwen-Image Edit:

https://github.com/huggingface/diffusers/pull/12188
https://github.com/huggingface/diffusers/pull/12190

- Python
Published by sayakpaul 10 months ago

diffusers - Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more

This release comes packed with new image generation and editing pipelines, a new video pipeline, new training scripts, quality-of-life improvements, and much more. Read the rest of the release notes fully to not miss out on the fun stuff.

New pipelines 🧨

We welcomed new pipelines in this release:

Wan 2.2
Flux-Kontext
Qwen-Image
Qwen-Image-Edit

Wan 2.2 📹

This update to Wan provides significant improvements in video fidelity, prompt adherence, and style. Please check out the official doc to learn more.

Flux-Kontext 🎇

Flux-Kontext is a 12-billion-parameter rectified flow transformer capable of editing images based on text instructions. Please check out the official doc to learn more about it.

Qwen-Image 🌅

After a successful run of delivering language models and vision-language models, the Qwen team is back with an image generation model, which is Apache-2.0 licensed! It achieves significant advances in complex text rendering and precise image editing. To learn more about this powerful model, refer to our docs.

Thanks to @naykun for contributing both Qwen-Image and Qwen-Image-Edit via this PR and this PR.

New training scripts 🎛️

Make these newly added models your own with our training scripts:

Single-file modeling implementations

Following the 🤗 Transformers’ philosophy of single-file modeling implementations, we have started implementing modeling code in single and self-contained files. The Flux Transformer code is one example of this.

Attention refactor

We have massively refactored how we do attention in the models. This allows us to provide support for different attention backends (such as PyTorch native scaled_dot_product_attention, Flash Attention 3, SAGE attention, etc.) in the library seamlessly.

Having attention supported this way also allows us to integrate different parallelization mechanisms, which we’re actively working on. Follow this PR if you’re interested.

Users shouldn’t be affected at all by these changes. Please open an issue if you face any problems.

Regional compilation

Regional compilation trims cold-start latency by only compiling the small and frequently-repeated block(s) of a model - typically a transformer layer - and enables reusing compiled artifacts for every subsequent occurrence. For many diffusion architectures, this delivers the same runtime speedups as full-graph compilation and reduces compile time by 8–10x. Refer to this doc to learn more.

Thanks to @anijain2305 for contributing this feature in this PR.

We have also authored a number of posts that center around the use of torch.compile. You can check them out at the links below:

Faster pipeline loading ⚡️

Users can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.

```diff from diffusers import DiffusionPipeline import torch

ckptid = "Qwen/Qwen-Image" pipe = DiffusionPipeline.frompretrained( - ckptid, torchdtype=torch.bfloat16 - ).to("cuda") + ckptid, torchdtype=torch.bfloat16, device_map="cuda" + )
```

You can speed up loading even more by enabling parallelized loading of state dict shards. This is particularly helpful when you’re working with large models like Wan and Qwen-Image, where the model state dicts are typically sharded across multiple files.

```python import os os.environ["HFENABLEPARALLEL_LOADING"] = "yes"

rest of the loading code

.... ```

Better GGUF integration

@Isotr0py contributed support for native GGUF CUDA kernels in this PR. This should provide an approximately 10% improvement in inference speed.

We have also worked on a tool for converting regular checkpoints to GGUF, letting the community easily share their GGUF checkpoints. Learn more here.

We now support loading of Diffusers format GGUF checkpoints.

You can learn more about all of this in our GGUF official docs.

Modular Diffusers (Experimental)

Modular Diffusers is a system for building diffusion pipelines pipelines with individual pipeline blocks. It is highly customisable, with blocks that can be mixed and matched to adapt to or create a pipeline for a specific workflow or multiple workflows.

The API is currently in active development and is being released as an experimental feature. Learn more in our docs.

All commits

[tests] skip instead of returning. by @sayakpaul in #11793
adjust to get CI test cases passed on XPU by @kaixuanliu in #11759
fix deprecation in lora after 0.34.0 release by @sayakpaul in #11802
[chore] post release v0.34.0 by @sayakpaul in #11800
Follow up for Group Offload to Disk by @DN6 in #11760
[rfc][compile] compile method for DiffusionPipeline by @anijain2305 in #11705
[tests] add a test on torch compile for varied resolutions by @sayakpaul in #11776
adjust tolerance criteria for test_float16_inference in unit test by @kaixuanliu in #11809
Flux Kontext by @a-r-r-o-w in #11812
Kontext training by @sayakpaul in #11813
Kontext fixes by @a-r-r-o-w in #11815
remove syncs before denoising in Kontext by @sayakpaul in #11818
[CI] disable onnx, mps, flax from the CI by @sayakpaul in #11803
TorchAO compile + offloading tests by @a-r-r-o-w in #11697
Support dynamically loading/unloading loras with group offloading by @a-r-r-o-w in #11804
[lora] fix: lora unloading behvaiour by @sayakpaul in #11822
[lora]feat: use exclude modules to loraconfig. by @sayakpaul in #11806
ENH: Improve speed of function expanding LoRA scales by @BenjaminBossan in #11834
Remove print statement in SCM Scheduler by @a-r-r-o-w in #11836
[tests] add test for hotswapping + compilation on resolution changes by @sayakpaul in #11825
reset deterministic in tearDownClass by @jiqing-feng in #11785
[tests] Fix failing float16 cuda tests by @a-r-r-o-w in #11835
[single file] Cosmos by @a-r-r-o-w in #11801
[docs] fix single_file example. by @sayakpaul in #11847
Use real-valued instead of complex tensors in Wan2.1 RoPE by @mjkvaak-amd in #11649
[docs] Batch generation by @stevhliu in #11841
[docs] Deprecated pipelines by @stevhliu in #11838
fix norm not training in traincontrollora_flux.py by @Luo-Yihang in #11832
[From Single File] support from_single_file method for WanVACE3DTransformer by @J4BEZ in #11807
[lora] tests for exclude_modules with Wan VACE by @sayakpaul in #11843
update: FluxKontextInpaintPipeline support by @vuongminh1907 in #11820
[Flux Kontext] Support Fal Kontext LoRA by @linoytsaban in #11823
[docs] Add a note of _keep_in_fp32_modules by @a-r-r-o-w in #11851
[benchmarks] overhaul benchmarks by @sayakpaul in #11565
FIX setloradevice when target layers differ by @BenjaminBossan in #11844
Fix Wan AccVideo/CausVid fuse_lora by @a-r-r-o-w in #11856
[chore] deprecate blip controlnet pipeline. by @sayakpaul in #11877
[docs] fix references in flux pipelines. by @sayakpaul in #11857
[tests] remove tests for deprecated pipelines. by @sayakpaul in #11879
[docs] LoRA metadata by @stevhliu in #11848
[training ] add Kontext i2i training by @sayakpaul in #11858
[CI] Fix big GPU test marker by @DN6 in #11786
First Block Cache by @a-r-r-o-w in #11180
[tests] annotate compilation test classes with bnb by @sayakpaul in #11715
Update chroma.md by @shm4r7 in #11891
[CI] Speed up GPU PR Tests by @DN6 in #11887
Pin k-diffusion for CI by @sayakpaul in #11894
[Docker] update doc builder dockerfile to include quant libs. by @sayakpaul in #11728
[tests] Remove more deprecated tests by @sayakpaul in #11895
[tests] mark the wanvace lora tester flaky by @sayakpaul in #11883
[tests] add compile + offload tests for GGUF. by @sayakpaul in #11740
feat: add multiple input image support in Flux Kontext by @Net-Mist in #11880
Fix unique memory address when doing group-offloading with disk by @sayakpaul in #11767
[SD3] CFG Cutoff fix and official callback by @asomoza in #11890
The Modular Diffusers by @yiyixuxu in #9672
[quant] QoL improvements for pipeline-level quant config by @sayakpaul in #11876
Bump torch from 2.4.1 to 2.7.0 in /examples/server by @dependabot[bot] in #11429
[LoRA] fix: disabling hooks when loading loras. by @sayakpaul in #11896
[utils] account for MPS when available in get_device(). by @sayakpaul in #11905
[ControlnetUnion] Multiple Fixes by @asomoza in #11888
Avoid creating tensor in CosmosAttnProcessor2_0 by @chenxiao111222 in #11761)
[tests] Unify compilation + offloading tests in quantization by @sayakpaul in #11910
Speedup model loading by 4-5x ⚡ by @a-r-r-o-w in #11904
[docs] torch.compile blog post by @stevhliu in #11837
Flux: pass jointattentionkwargs when using gradient_checkpointing by @piercus in #11814
Fix: Align VAE processing in ControlNet SD3 training with inference by @Henry-Bi in #11909
Bump aiohttp from 3.10.10 to 3.12.14 in /examples/server by @dependabot[bot] in #11924
[tests] Improve Flux tests by @a-r-r-o-w in #11919
Remove device synchronization when loading weights by @a-r-r-o-w in #11927
Remove forced float64 from onnx stable diffusion pipelines by @lostdisc in #11054
Fixed bug: Uncontrolled recursive calls that caused an infinite loop when loading certain pipelines containing Transformer2DModel by @lengmo1996 in #11923
[ControlnetUnion] Propagate #11888 to img2img by @asomoza in #11929
enable flux pipeline compatible with unipc and dpm-solver by @gameofdimension in #11908
[training] add an offload utility that can be used as a context manager. by @sayakpaul in #11775
Add SkyReels V2: Infinite-Length Film Generative Model by @tolgacangoz in #11518
[refactor] Flux/Chroma single file implementation + Attention Dispatcher by @a-r-r-o-w in #11916
[docs] clarify the mapping between Transformer2DModel and finegrained variants. by @sayakpaul in #11947
[Modular] Updates for Custom Pipeline Blocks by @DN6 in #11940
[docs] Update toctree by @stevhliu in #11936
[docs] include bp link. by @sayakpaul in #11952
Fix kontext finetune issue when batch size >1 by @mymusise in #11921
[tests] Add test slices for Hunyuan Video by @a-r-r-o-w in #11954
[tests] Add test slices for Cosmos by @a-r-r-o-w in #11955
[tests] Add fast test slices for HiDream-Image by @a-r-r-o-w in #11953
[Modular] update the collection behavior by @yiyixuxu in #11963
fix "Expected all tensors to be on the same device, but found at least two devices" error by @yao-matrix in #11690
Remove logger warnings for attention backends and hard error during runtime instead by @a-r-r-o-w in #11967
[Examples] Uniform notations in trainfluxlora by @tomguluson92 in #10011
fix style by @yiyixuxu in #11975
[tests] Add test slices for Wan by @a-r-r-o-w in #11920
[docs] update guidance_scale docstring for guidance_distilled models. by @sayakpaul in #11935
[tests] enforce torch version in the compilation tests. by @sayakpaul in #11979
[modular diffusers] Wan by @a-r-r-o-w in #11913
[compile] logger statements create unnecessary guards during dynamo tracing by @a-r-r-o-w in #11987
enable quantcompile test on xpu by @yao-matrix in #11988
[WIP] Wan2.2 by @yiyixuxu in #12004
[refactor] some shared parts between hooks + docs by @a-r-r-o-w in #11968
[refactor] Wan single file implementation by @a-r-r-o-w in #11918
Fix huggingface-hub failing tests by @asomoza in #11994
feat: add flux kontext by @jlonge4 in #11985
[modular] add Modular flux for text-to-image by @sayakpaul in #11995
[docs] include lora fast post. by @sayakpaul in #11993
[docs] quant_kwargs by @stevhliu in #11712
[docs] Fix link by @stevhliu in #12018
[wan2.2] add 5b i2v by @yiyixuxu in #12006
wan2.2 i2v FirstBlockCache fix by @okaris in #12013
[core] support attention backends for LTX by @sayakpaul in #12021
[docs] Update index by @stevhliu in #12020
[Fix] huggingface-cli to hf missed files by @asomoza in #12008
[training-scripts] Make pytorch examples UV-compatible by @sayakpaul in #12000
[wan2.2] fix vae patches by @yiyixuxu in #12041
Allow SD pipeline to use newer schedulers, eg: FlowMatch by @ppbrown in #12015
[LoRA] support lightx2v lora in wan by @sayakpaul in #12040
Fix type of force_upcast to bool by @BerndDoser in #12046
Update autoencoderklcosmos.py by @tanuj-rai in #12045
Qwen-Image by @naykun in #12055
[wan2.2] follow-up by @yiyixuxu in #12024
tests + minor refactor for QwenImage by @a-r-r-o-w in #12057
Cross attention module to Wan Attention by @samuelt0 in #12058
fix(qwen-image): update vae license by @naykun in #12063
CI fixing by @paulinebm in #12059
enable all gpus when running ci. by @sayakpaul in #12062
fix the rest for all GPUs in CI by @sayakpaul in #12064
[docs] Install by @stevhliu in #12026
[wip] feat: support lora in qwen image and training script by @sayakpaul in #12056
[docs] small corrections to the example in the Qwen docs by @sayakpaul in #12068
[tests] Fix Qwen test_inference slices by @a-r-r-o-w in #12070
[tests] deal with the failing AudioLDM2 tests by @sayakpaul in #12069
optimize QwenImagePipeline to reduce unnecessary CUDA synchronization by @chengzeyi in #12072
Add cuda kernel support for GGUF inference by @Isotr0py in #11869
fix input shape for WanGGUFTexttoVideoSingleFileTests by @jiqing-feng in #12081
[refactor] condense group offloading by @a-r-r-o-w in #11990
Fix group offloading synchronization bug for parameter-only GroupModule's by @a-r-r-o-w in #12077
Helper functions to return skip-layer compatible layers by @a-r-r-o-w in #12048
Make prompt_2 optional in Flux Pipelines by @DN6 in #12073
[tests] tighten compilation tests for quantization by @sayakpaul in #12002
Implement Frequency-Decoupled Guidance (FDG) as a Guider by @dg845 in #11976
fix flux type hint by @DefTruth in #12089
[qwen] device typo by @yiyixuxu in #12099
[lora] adapt new LoRA config injection method by @sayakpaul in #11999
loraconversionutils: replace lora up/down with a/b even if transformer. in key by @Beinsezii in #12101
[tests] device placement for non-denoiser components in group offloading LoRA tests by @sayakpaul in #12103
[Modular] Fast Tests by @yiyixuxu in #11937
[GGUF] feat: support loading diffusers format gguf checkpoints. by @sayakpaul in #11684
[docs] diffusers gguf checkpoints by @sayakpaul in #12092
[core] add modular support for Flux I2I by @sayakpaul in #12086
[lora] support loading loras from lightx2v/Qwen-Image-Lightning by @sayakpaul in #12119
[Modular] More Updates for Custom Code Loading by @DN6 in #11969
enable compilation in qwen image. by @sayakpaul in #12061
[tests] Add inference test slices for SD3 and remove unnecessary tests by @a-r-r-o-w in #12106
[chore] complete the licensing statement. by @sayakpaul in #12001
[docs] Cache link by @stevhliu in #12105
[Modular] Add experimental feature warning for Modular Diffusers by @DN6 in #12127
Add lowcpumemusage option to fromsinglefile to align with frompretrained by @IrisRainbowNeko in #12114
[docs] Modular diffusers by @stevhliu in #11931
[Bugfix] typo fix in NPU FA by @leisuzz in #12129
Add QwenImage Inpainting and Img2Img pipeline by @Trgtuan10 in #12117
[core] parallel loading of shards by @sayakpaul in #12028
try to use deepseek with an agent to auto i18n to zh by @SamYuan1990 in #12032
[docs] Refresh effective and efficient doc by @stevhliu in #12134
Fix bf15/fp16 for pipelinewanvace.py by @SlimRG in #12143
make parallel loading flag a part of constants. by @sayakpaul in #12137
[docs] Parallel loading of shards by @stevhliu in #12135
feat: cuda device_map for pipelines. by @sayakpaul in #12122
[core] respect local_files_only=True when using sharded checkpoints by @sayakpaul in #12005
support hf_quantizer in cache warmup. by @sayakpaul in #12043
make test_gguf all pass on xpu by @yao-matrix in #12158
[docs] Quickstart by @stevhliu in #12128
Qwen Image Edit Support by @naykun in #12164
remove silu for CogView4 by @lambertwjh in #12150
[qwen] Qwen image edit followups by @sayakpaul in #12166
Minor modification to support DC-AE-turbo by @chenjy2003 in #12169
[Docs] typo error in qwen image by @leisuzz in #12144
fix: caching allocator behaviour for quantization. by @sayakpaul in #12172
fix(training_utils): wrap device in list for DiffusionPipeline by @MengAiDev in #12178
[docs] Clarify guidance scale in Qwen pipelines by @sayakpaul in #12181
[LoRA] feat: support more Qwen LoRAs from the community. by @sayakpaul in #12170
Update README.md by @Taechai in #12182
[chore] add lora button to qwenimage docs by @sayakpaul in #12183
[Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora by @linoytsaban in #12074
Release: v0.35.0 by @sayakpaul (direct commit on v0.35.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@vuongminh1907
- update: FluxKontextInpaintPipeline support (#11820)
@Net-Mist
- feat: add multiple input image support in Flux Kontext (#11880)
@tolgacangoz
- Add SkyReels V2: Infinite-Length Film Generative Model (#11518)
@naykun
- Qwen-Image (#12055)
- fix(qwen-image): update vae license (#12063)
- Qwen Image Edit Support (#12164)
@Trgtuan10
- Add QwenImage Inpainting and Img2Img pipeline (#12117)
@SamYuan1990
- try to use deepseek with an agent to auto i18n to zh (#12032)

- Python
Published by sayakpaul 10 months ago

diffusers - Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more

📹 New video generation pipelines

Wan VACE

Wan VACE supports various generation techniques which achieve controllable video generation. It comes in two variants: a 1.3B model for fast iteration & prototyping, and a 14B for high quality generation. Some of the capabilities include:

Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: huggingface/controlnet_aux
Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)
Inpainting and Outpainting
Subject to Video (faces, object, characters, etc.)
Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)

The code snippets available in this pull request demonstrate some examples of how videos can be generated with controllability signals.

Check out the docs to learn more.

Cosmos Predict2 Video2World

Cosmos-Predict2 is a key branch of the Cosmos World Foundation Models (WFMs) ecosystem for Physical AI, specializing in future state prediction through advanced world modeling. It offers two powerful capabilities: text-to-image generation for creating high-quality images from text descriptions, and video-to-world generation for producing visual simulations from video inputs.

The Video2World model comes in a 2B and 14B variant. Check out the docs to learn more.

LTX 0.9.7 and Distilled

LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.

Check out the docs to learn more.

Hunyuan Video Framepack and F1

Framepack is a novel method for enabling long video generation. There are two released variants of Hunyuan Video trained using this technique. Check out the docs to learn more.

FusionX

The FusionX family of models and LoRAs, built on top of Wan2.1-14B, should already be supported. To load the model, use from_single_file():

python transformer = AutoModel.from_single_file( "https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors", torch_dtype=torch.bfloat16 )

To load the LoRAs, use load_lora_weights():

python pipe = DiffusionPipeline.from_pretrained( "Wan-AI/Wan2.1-T2V-14B-Diffusers", torch_dtype=torch.bfloat16 ).to("cuda") pipe.load_lora_weights( "vrgamedevgirl84/Wan14BT2VFusioniX", weight_name="FusionX_LoRa/Wan2.1_T2V_14B_FusionX_LoRA.safetensors" )

AccVideo and CausVid (only LoRAs)

AccVideo and CausVid are two novel distillation techniques that speed up the generation time of video diffusion models while preserving quality. Diffusers supports loading their extracted LoRAs with their respective models.

🌠 New image generation pipelines

Cosmos Predict2 Text2Image

Text-to-image models from the Cosmos-Predict2 release. The models comes in a 2B and 14B variant. Check out the docs to learn more.

Chroma

Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it. Checkout the docs to learn more

Thanks to @Ednaordinary for contributing it in this PR!

VisualCloze

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning is an innovative in-context learning framework based universal image generation framework that offers key capabilities:

Support for various in-domain tasks
Generalization to unseen tasks through in-context learning
Unify multiple tasks into one step and generate both target image and intermediate results
Support reverse-engineering conditions from target images

Check out the docs to learn more. Thanks to @lzyhha for contributing this in this PR!

Better `torch.compile` support

We have worked with the PyTorch team to improve how we provide torch.compile() compatibility throughout the library. More specifically, we now test the widely used models like Flux for any recompilation and graph break issues which can get in the way of fully realizing torch.compile() benefits. Refer to the following links to learn more:

https://github.com/huggingface/diffusers/pull/11085
https://github.com/huggingface/diffusers/issues/11430

Additionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:

Code

```py import torch from diffusers import DiffusionPipeline torch._dynamo.config.cache_size_limit = 10000 pipeline = DiffusionPipeline.from_pretrained( "black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16 ) pipline.enable_model_cpu_offload() # Compile. pipeline.transformer.compile() image = pipeline( prompt="An astronaut riding a horse on Mars", guidance_scale=0., height=768, width=1360, num_inference_steps=4, max_sequence_length=256, ).images[0] print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB") ```

This is compatible with group offloading, too. Interested readers can check out the concerned PRs below:

https://github.com/huggingface/diffusers/pull/11605
https://github.com/huggingface/diffusers/pull/11670

You can substantially reduce memory requirements by combining quantization with offloading and then improving speed with torch.compile(). Below is an example:

Code

```py from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig from diffusers import AutoModel, FluxPipeline from transformers import T5EncoderModel import torch torch._dynamo.config.recompile_limit = 1000 quant_kwargs = {"load_in_4bit": True, "bnb_4bit_compute_dtype": torch_dtype, "bnb_4bit_quant_type": "nf4"} text_encoder_2_quant_config = TransformersBitsAndBytesConfig(**quant_kwargs) dit_quant_config = DiffusersBitsAndBytesConfig(**quant_kwargs) ckpt_id = "black-forest-labs/FLUX.1-dev" text_encoder_2 = T5EncoderModel.from_pretrained( ckpt_id, subfolder="text_encoder_2", quantization_config=text_encoder_2_quant_config, torch_dtype=torch_dtype, ) transformer = AutoModel.from_pretrained( ckpt_id, subfolder="transformer", quantization_config=dit_quant_config, torch_dtype=torch_dtype, ) pipe = FluxPipeline.from_pretrained( ckpt_id, transformer=transformer, text_encoder_2=text_encoder_2, torch_dtype=torch_dtype, ) pipe.enable_model_cpu_offload() pipe.transformer.compile() image = pipeline( prompt="An astronaut riding a horse on Mars", guidance_scale=3.5, height=768, width=1360, num_inference_steps=28, max_sequence_length=512, ).images[0] ```

Starting from bitsandbytes==0.46.0 onwards, bnb-quantized models should be fully compatible with torch.compile() without graph-breaks. This means that when compiling a bnb-quantized model, users can do: model.compile(fullgraph=True). This can significantly improve speed while still providing memory benefits. The figure below provides a comparison with Flux.1-Dev. Refer to this benchmarking script to learn more.

Note that for 4bit bnb models, it’s currently needed to install PyTorch nightly if fullgraph=True is specified during compilation.

Huge shoutout to @anijain2305 and @StrongerXi from the PyTorch team for the incredible support.

PipelineQuantizationConfig

Users can now provide a quantization config while initializing a pipeline:

```python import torch from diffusers import DiffusionPipeline from diffusers.quantizers import PipelineQuantizationConfig

pipelinequantconfig = PipelineQuantizationConfig( quantbackend="bitsandbytes4bit", quantkwargs={"loadin4bit": True, "bnb4bitquanttype": "nf4", "bnb4bitcomputedtype": torch.bfloat16}, componentstoquantize=["transformer", "textencoder2"], ) pipe = DiffusionPipeline.frompretrained( "black-forest-labs/FLUX.1-dev", quantizationconfig=pipelinequantconfig, torchdtype=torch.bfloat16, ).to("cuda")

image = pipe("photo of a cute dog").images[0] ```

This reduces the barrier to entry for our users willing to use quantization without having to write too much code. Refer to the documentation to learn more about different configurations allowed through PipelineQuantizationConfig.

Group offloading with disk

In the previous release, we shipped “group offloading” which lets you offload blocks/nodes within a model, optimizing its memory consumption. It also lets you overlap this offloading with computation, providing a good speed-memory trade-off, especially in low VRAM environments.

However, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work.

Starting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the offload_to_disk_path to enable this feature.

python pipeline.transformer.enable_group_offload( onload_device="cuda", offload_device="cpu", offload_type="leaf_level", offload_to_disk_path="path/to/disk" )

Refer to these two tables to compare the speed and memory trade-offs.

LoRA metadata parsing

It is beneficial to include the LoraConfig in a LoRA state dict that was used to train the LoRA. In its absence, users were restricted to using the same LoRA alpha as the LoRA rank. We have modified the most popular training scripts to allow passing custom lora_alpha through the CLI. Refer to this thread for more updates. Refer to this comment for some extended clarifications.

New training scripts

We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out this resource for more details. Thanks to @scxue and @lawrence-cj for contributing it in this PR.
HiDream LoRA DreamBooth training script (docs). The script supports training with quantization. HiDream is an MIT-licensed model. So, make it yours with this training script.

Updates on educational materials on quantization

We have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:

All commits

[LoRA] support musubi wan loras. by @sayakpaul in #11243
fix testvanillafunetuning failure on XPU and A100 by @yao-matrix in #11263
make teststablediffusioninpaintfp16 pass on XPU by @yao-matrix in #11264
make testdicttupleoutputsequivalent pass on XPU by @yao-matrix in #11265
add onnxruntime-qnn & onnxruntime-cann by @xieofxie in #11269
make testinstantstylemultiplemasks pass on XPU by @yao-matrix in #11266
[BUG] Fix convertvaepttodiffusers bug by @lavinal712 in #11078
Fix LTX 0.9.5 single file by @hlky in #11271
[Tests] Cleanup lora tests utils by @sayakpaul in #11276
[CI] relax tolerance for unclip further by @sayakpaul in #11268
do not use DIFFUSERS_REQUEST_TIMEOUT for notification bot by @sayakpaul in #11273
Fix incorrect tilelatentmin_width calculation in AutoencoderKLMochi by @kuantuna in #11294
HiDream Image by @hlky in #11231
flow matching lcm scheduler by @quickjkee in #11170
Update autoencoderkl_allegro.md by @Forbu in #11303
Hidream refactoring follow ups by @a-r-r-o-w in #11299
Fix incorrect tilelatentmin_width calculations by @kuantuna in #11305
[ControlNet] Adds controlnet for SanaTransformer by @ishan-modi in #11040
make KandinskyV22PipelineInpaintCombinedFastTests::testfloat16inference pass on XPU by @yao-matrix in #11308
make teststablediffusionkarrassigmas pass on XPU by @yao-matrix in #11310
make KolorsPipelineFastTests::test_inference_batch_single_identical pass on XPU by @faaany in #11313
[LoRA] support more SDXL loras. by @sayakpaul in #11292
[HiDream] code example by @linoytsaban in #11317
import for FlowMatchLCMScheduler by @asomoza in #11318
Use float32 on mps or npu in transformerhidreamimage's rope by @hlky in #11316
Add skrample section to community_projects.md by @Beinsezii in #11319
[docs] Promote AutoModel usage by @sayakpaul in #11300
[LoRA] Add LoRA support to AuraFlow by @hameerabbasi in #10216
Fix vae.Decoder prevoutputchannel by @hlky in #11280
fix CPU offloading related fail cases on XPU by @yao-matrix in #11288
[docs] fix hidream docstrings. by @sayakpaul in #11325
Rewrite AuraFlowPatchEmbed.peselectionindexbasedon_dim to be torch.compile compatible by @AstraliteHeart in #11297
post release 0.33.0 by @sayakpaul in #11255
another fix for FlowMatchLCMScheduler forgotten import by @asomoza in #11330
Fix Hunyuan I2V for transformers>4.47.1 by @DN6 in #11293
unpin torch versions for onnx Dockerfile by @sayakpaul in #11290
[single file] enable telemetry for single file loading when using GGUF. by @sayakpaul in #11284
[docs] add a snippet for compilation in the auraflow docs. by @sayakpaul in #11327
Hunyuan I2V fast tests fix by @DN6 in #11341
[BUG] fixed _toctree.yml alphabetical ordering by @ishan-modi in #11277
Fix wrong dtype argument name as torch_dtype by @nPeppon in #11346
[chore] fix lora docs utils by @sayakpaul in #11338
[docs] add note about useduckshape in auraflow docs. by @sayakpaul in #11348
[LoRA] Propagate hotswap better by @sayakpaul in #11333
[Hi Dream] follow-up by @yiyixuxu in #11296
[bitsandbytes] improve dtype mismatch handling for bnb + lora. by @sayakpaul in #11270
Update controlnet_flux.py by @haofanwang in #11350
enable 2 test cases on XPU by @yao-matrix in #11332
[BNB] Fix testmovingtocputhrows_warning by @SunMarc in #11356
support Wan-FLF2V by @yiyixuxu in #11353
Fix: StableDiffusionXLControlNetAdapterInpaintPipeline incorrectly inherited StableDiffusionLoraLoaderMixin by @Kazuki-Yoda in #11357
update output for Hidream transformer by @yiyixuxu in #11366
[Wan2.1-FLF2V] update conversion script by @yiyixuxu in #11365
[Flux LoRAs] fix lr scheduler bug in distributed scenarios by @linoytsaban in #11242
[traindreamboothlorasdxl.py] Fix the LR Schedulers when numtrain_epochs is passed in a distributed training env by @kghamilton89 in #11240
fix issue that training flux controlnet was unstable and validation r… by @PromeAIpro in #11373
Fix Wan I2V prepare_latents dtype by @a-r-r-o-w in #11371
[BUG] fixes in kadinsky pipeline by @ishan-modi in #11080
Add Serialized Type Name kwarg in Model Output by @anzr299 in #10502
[cogview4][feat] Support attention mechanism with variable-length support and batch packing by @OleehyO in #11349
Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma by @josephrocca in #11120
[Refactor] Minor Improvement for import utils by @ishan-modi in #11161
Add stochastic sampling to FlowMatchEulerDiscreteScheduler by @apolinario in #11369
[LoRA] add LoRA support to HiDream and fine-tuning script by @linoytsaban in #11281
Update modeling imports by @a-r-r-o-w in #11129
[HiDream] move deprecation to 0.35.0 by @yiyixuxu in #11384
Update README_hidream.md by @AMEERAZAM08 in #11386
Fix group offloading with blocklevel and usestream=True by @a-r-r-o-w in #11375
[traindreamboothflux] Add LANCZOS as the default interpolation mode for image resizing by @ishandutta0098 in #11395
[Feature] Added Xlab Controlnet support by @ishan-modi in #11249
Kolors additional pipelines, community contrib by @Teriks in #11372
[HiDream LoRA] optimizations + small updates by @linoytsaban in #11381
Fix Flux IP adapter argument in the pipeline example by @AeroDEmi in #11402
[BUG] fixed WAN docstring by @ishan-modi in #11226
Fix typos in strings and comments by @co63oc in #11407
[traindreamboothlora.py] Set LANCZOS as default interpolation mode for resizing by @merterbak in #11421
[tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile() by @sayakpaul in #11085
enable group_offload cases and quanto cases on XPU by @yao-matrix in #11405
enable testlayerwisecasting_memory cases on XPU by @yao-matrix in #11406
[tests] fix import. by @sayakpaul in #11434
[traintextto_image] Better image interpolation in training scripts follow up by @tongyu0924 in #11426
[traintexttoimagelora] Better image interpolation in training scripts follow up by @tongyu0924 in #11427
enable 28 GGUF test cases on XPU by @yao-matrix in #11404
[Hi-Dream LoRA] fix bug in validation by @linoytsaban in #11439
Fixing missing provider options argument by @urpetkov-amd in #11397
Set LANCZOS as the default interpolation for image resizing in ControlNet training by @YoulunPeng in #11449
Raise warning instead of error for block offloading with streams by @a-r-r-o-w in #11425
enable marigold_intrinsics cases on XPU by @yao-matrix in #11445
torch.compile fullgraph compatibility for Hunyuan Video by @a-r-r-o-w in #11457
enable consistency test cases on XPU, all passed by @yao-matrix in #11446
enable unidiffuser test cases on xpu by @yao-matrix in #11444
Add generic support for Intel Gaudi accelerator (hpu device) by @dsocek in #11328
Add StableDiffusion3InstructPix2PixPipeline by @xduzhangjiayu in #11378
make safe diffusion test cases pass on XPU and A100 by @yao-matrix in #11458
[testmodelstransformerhunyuanvideo] help us test torch.compile() for impactful models by @tongyu0924 in #11431
Add LANCZOS as default interplotation mode. by @Va16hav07 in #11463
make autoencoders. controlnetflux and wantransformer3dsinglefile pass on xpu by @yao-matrix in #11461
[WAN] fix recompilation issues by @sayakpaul in #11475
Fix typos in docs and comments by @co63oc in #11416
[tests] xfail recent pipeline tests for specific methods. by @sayakpaul in #11469
cache packages_distributions by @vladmandic in #11453
[docs] Memory optims by @stevhliu in #11385
[docs] Adapters by @stevhliu in #11331
[traindreamboothlorasdxladvanced] Add LANCZOS as the default interpolation mode for image resizing by @yuanjua in #11471
[traindreamboothlorafluxadvanced] Add LANCZOS as the default interpolation mode for image resizing by @ysurs in #11472
enable semantic diffusion and stable diffusion panorama cases on XPU by @yao-matrix in #11459
[Feature] Implement tiled VAE encoding/decoding for Wan model. by @c8ef in #11414
[traintexttoimagesdxl]Add LANCZOS as default interpolation mode for image resizing by @ParagEkbote in #11455
[traindreamboothlorasdxl] Add --imageinterpolation_mode option for image resizing (default to lanczos) by @MinJu-Ha in #11490
[traindreamboothlora_lumina2] Add LANCZOS as the default interpolation mode for image resizing by @cjfghk5697 in #11491
[training] feat: enable quantization for hidream lora training. by @sayakpaul in #11494
Set LANCZOS as the default interpolation method for image resizing. by @yijun-lee in #11492
Update training script for txt to img sdxl with lora supp with new interpolation. by @RogerSinghChugh in #11496
Fix torchao docs typo for fp8 granular quantization by @a-r-r-o-w in #11473
Update setup.py to pin min version of peft by @sayakpaul in #11502
update dep table. by @sayakpaul in #11504
[LoRA] use removeprefix to preserve sanity. by @sayakpaul in #11493
Hunyuan Video Framepack by @a-r-r-o-w in #11428
enable lora cases on XPU by @yao-matrix in #11506
[lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility by @iamwavecut in #11441)
[docs] minor updates to bitsandbytes docs. by @sayakpaul in #11509
Cosmos by @a-r-r-o-w in #10660
clean up the Init for stable_diffusion by @yiyixuxu in #11500
fix audioldm by @sayakpaul (direct commit on v0.34.0-release)
Revert "fix audioldm" by @sayakpaul (direct commit on v0.34.0-release)
[LoRA] make lora alpha and dropout configurable by @linoytsaban in #11467
Add cross attention type for Sana-Sprint training in diffusers. by @scxue in #11514
Conditionally import torchvision in Cosmos transformer by @a-r-r-o-w in #11524
[tests] fix audioldm2 for transformers main. by @sayakpaul in #11522
feat: pipeline-level quantization config by @sayakpaul in #11130
[Tests] Enable more general testing for torch.compile() with LoRA hotswapping by @sayakpaul in #11322
[LoRA] support non-diffusers hidream loras by @sayakpaul in #11532
enable 7 cases on XPU by @yao-matrix in #11503
[LTXPipeline] Update latents dtype to match VAE dtype by @james-p-xu in #11533
enable dit integration cases on xpu by @yao-matrix in #11523
enable print_env on xpu by @yao-matrix in #11507
Change Framepack transformer layer initialization order by @a-r-r-o-w in #11535
[tests] add tests for framepack transformer model. by @sayakpaul in #11520
Hunyuan Video Framepack F1 by @a-r-r-o-w in #11534
enable several pipeline integration tests on XPU by @yao-matrix in #11526
[testmodelstransformer_ltx.py] help us test torch.compile() for impactful models by @cjfghk5697 in #11512
Add VisualCloze by @lzyhha in #11377
Fix typo in traindiffusionorposdxllora_wds.py by @Meeex2 in #11541
fix: remove torch_dtype="auto" option from docstrings by @johannaSommer in #11513
[traindreambooth.py] Fix the LR Schedulers when numtrain_epochs is passed in a distributed training env by @kghamilton89 in #11239
[LoRA] small change to support Hunyuan LoRA Loading for FramePack by @linoytsaban in #11546
LTX Video 0.9.7 by @a-r-r-o-w in #11516
[tests] Enable testing for HiDream transformer by @sayakpaul in #11478
Update pipelinefluximg2img.py to add missing vaeslicing and vaetiling calls. by @Meatfucker in #11545
Fix deprecation warnings in testltximage2video.py by @AChowdhury1211 in #11538
[tests] Add torch.compile test for UNet2DConditionModel by @olccihyeon in #11537
[Single File] GGUF/Single File Support for HiDream by @DN6 in #11550
[gguf] Refactor torch_function to avoid unnecessary computation by @anijain2305 in #11551
[tests] add tests for combining layerwise upcasting and groupoffloading. by @sayakpaul in #11558
[docs] Regional compilation docs by @sayakpaul in #11556
enhance value guard of deviceagnostic_dispatch by @yao-matrix in #11553
Doc update by @Player256 in #11531
Revert error to warning when loading LoRA from repo with multiple weights by @apolinario in #11568
[docs] tip for group offloding + quantization by @sayakpaul in #11576
[LoRA] support non-diffusers LTX-Video loras by @linoytsaban in #11572
[WIP][LoRA] start supporting kijai wan lora. by @sayakpaul in #11579
[Single File] Fix loading for LTX 0.9.7 transformer by @DN6 in #11578
Use HF Papers by @qgallouedec in #11567
LTX 0.9.7-distilled; documentation improvements by @a-r-r-o-w in #11571
[LoRA] kijai wan lora support for I2V by @linoytsaban in #11588
docs: fix invalid links by @osrm in #11505
[docs] Remove fast diffusion tutorial by @stevhliu in #11583
RegionalPrompting: Inherit from Stable Diffusion by @b-sai in #11525
[chore] allow string device to be passed to randn_tensor. by @sayakpaul in #11559
Type annotation fix by @DN6 in #11597
[LoRA] minor fix for load_lora_weights() for Flux and a test by @sayakpaul in #11595
Update Intel Gaudi doc by @regisss in #11479
enable pipeline test cases on xpu by @yao-matrix in #11527
[Feature] AutoModel can load components using model_index.json by @ishan-modi in #11401
[docs] Pipeline-level quantization by @stevhliu in #11604
Fix bug when variant and safetensor file does not match by @kaixuanliu in #11587
[tests] Changes to the torch.compile() CI and tests by @sayakpaul in #11508
Fix mixed variant downloading by @DN6 in #11611
fix security issue in build docker ci by @sayakpaul in #11614
Make group offloading compatible with torch.compile() by @sayakpaul in #11605
[training docs] smol update to README files by @linoytsaban in #11616
Adding NPU for get device function by @leisuzz in #11617
[LoRA] improve LoRA fusion tests by @sayakpaul in #11274
[Sana Sprint] add image-to-image pipeline by @linoytsaban in #11602
[CI] fix the filename for displaying failures in lora ci. by @sayakpaul in #11600
[docs] PyTorch 2.0 by @stevhliu in #11618
[textualinversionsdxl.py] fix lr scheduler steps count by @yuanjua in #11557
Fix wrong indent for examples of controlnet script by @Justin900429 in #11632
removing unnecessary else statement by @YanivDorGalron in #11624
enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed by @yao-matrix in #11620
Bug: Fixed Image 2 Image example by @vltmedia in #11619
typo fix in pipeline_flux.py by @YanivDorGalron in #11623
Fix typos in strings and comments by @co63oc in #11476
[docs] update torchao doc link by @sayakpaul in #11634
Use float32 RoPE freqs in Wan with MPS backends by @hvaara in #11643
[chore] misc changes in the bnb tests for consistency. by @sayakpaul in #11355
[tests] chore: rename lora model-level tests. by @sayakpaul in #11481
[docs] Caching methods by @stevhliu in #11625
[docs] Model cards by @stevhliu in #11112
[CI] Some improvements to Nightly reports summaries by @DN6 in #11166
[chore] bring PipelineQuantizationConfig at the top of the import chain. by @sayakpaul in #11656
[examples] flux-control: use numtrainingstepsforscheduler by @Markus-Pobitzer in #11662
use deterministic to get stable result by @jiqing-feng in #11663
[tests] add test for torch.compile + group offloading by @sayakpaul in #11670
Wan VACE by @a-r-r-o-w in #11582
fixed axesdimsrope init (huggingface#11641) by @sofinvalery in #11678
[tests] Fix how compiler mixin classes are used by @sayakpaul in #11680
Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process by @DN6 in #11596
Add community class StableDiffusionXL_T5Pipeline by @ppbrown in #11626
Update pipelinefluxinpaint.py to fix paddingmaskcrop returning only the inpainted area by @Meatfucker in #11658
Allow remote code repo names to contain "." by @akasharidas in #11652
[LoRA] support Flux Control LoRA with bnb 8bit. by @sayakpaul in #11655
[Wan] Fix VAE sampling mode in WanVideoToVideoPipeline by @tolgacangoz in #11639
enable torchao test cases on XPU and switch to device agnostic APIs for test cases by @yao-matrix in #11654
[tests] tests for compilation + quantization (bnb) by @sayakpaul in #11672
[tests] model-level device_map clarifications by @sayakpaul in #11681
Improve Wan docstrings by @a-r-r-o-w in #11689
Set torchversion to N/A if torch is disabled. by @rasmi in #11645
Avoid DtoH sync from access of nonzero() item in scheduler by @jbschlosser in #11696
Apply Occam's Razor in position embedding calculation by @tolgacangoz in #11562
[docs] add compilation bits to the bitsandbytes docs. by @sayakpaul in #11693
swap out token for style bot. by @sayakpaul in #11701
[docs] mention fp8 benefits on supported hardware. by @sayakpaul in #11699
Support Wan AccVideo lora by @a-r-r-o-w in #11704
[LoRA] parse metadata from LoRA and save metadata by @sayakpaul in #11324
Cosmos Predict2 by @a-r-r-o-w in #11695
Chroma Pipeline by @Ednaordinary in #11698
[LoRA ]fix flux lora loader when return_metadata is true for non-diffusers by @sayakpaul in #11716
[training] show how metadata stuff should be incorporated in training scripts. by @sayakpaul in #11707
Fix misleading comment by @carlthome in #11722
Add Pruna optimization framework documentation by @davidberenstein1957 in #11688
Support more Wan loras (VACE) by @a-r-r-o-w in #11726
[LoRA training] update metadata use for lora alpha + README by @linoytsaban in #11723
⚡️ Speed up method AutoencoderKLWan.clear_cache by 886% by @misrasaurabh1 in #11665
[training] add ds support to lora hidream by @leisuzz in #11737
[tests] device_map tests for all models. by @sayakpaul in #11708
[chore] change to 2025 licensing for remaining by @sayakpaul in #11741
Chroma Follow Up by @DN6 in #11725
[Quantizers] add is_compileable property to quantizers. by @sayakpaul in #11736
Update more licenses to 2025 by @a-r-r-o-w in #11746
Add missing HiDream license by @a-r-r-o-w in #11747
Bump urllib3 from 2.2.3 to 2.5.0 in /examples/server by @dependabot[bot] in #11748
[LoRA] refactor lora loading at the model-level by @sayakpaul in #11719
[CI] Fix WAN VACE tests by @DN6 in #11757
[CI] Fix SANA tests by @DN6 in #11756
Fix HiDream pipeline test module by @DN6 in #11754
make group offloading work with disk/nvme transfers by @sayakpaul in #11682
Update Chroma Docs by @DN6 in #11753
fix invalid component handling behaviour in PipelineQuantizationConfig by @sayakpaul in #11750
Fix failing cpu offload test for LTX Latent Upscale by @DN6 in #11755
[docs] Quantization + torch.compile + offloading by @stevhliu in #11703
[docs] device_map by @stevhliu in #11711
[docs] LoRA scale scheduling by @stevhliu in #11727
Fix dimensionalities in apply_rotary_emb functions' comments by @tolgacangoz in #11717
enable deterministic in bnb 4 bit tests by @jiqing-feng in #11738
enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU by @yao-matrix in #11671
[tests] properly skip tests instead of return by @sayakpaul in #11771
[CI] Skip ONNX Upscale tests by @DN6 in #11774
[Wan] Fix mask padding in Wan VACE pipeline. by @bennyguo in #11778
Add --loraalpha and metadata handling to traindreamboothlorasana.py by @imbr92 in #11744
[docs] minor cleanups in the lora docs. by @sayakpaul in #11770
[lora] only remove hooks that we add back by @yiyixuxu in #11768
[tests] Fix HunyuanVideo Framepack device tests by @a-r-r-o-w in #11789
[chore] raise as early as possible in group offloading by @sayakpaul in #11792
[tests] Fix group offloading and layerwise casting test interaction by @a-r-r-o-w in #11796
guard omnigen processor. by @sayakpaul in #11799
Release: v0.34.0 by @sayakpaul (direct commit on v0.34.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@yao-matrix
- fix testvanillafunetuning failure on XPU and A100 (#11263)
- make teststablediffusioninpaintfp16 pass on XPU (#11264)
- make testdicttupleoutputsequivalent pass on XPU (#11265)
- make testinstantstylemultiplemasks pass on XPU (#11266)
- make KandinskyV22PipelineInpaintCombinedFastTests::testfloat16inference pass on XPU (#11308)
- make teststablediffusionkarrassigmas pass on XPU (#11310)
- fix CPU offloading related fail cases on XPU (#11288)
- enable 2 test cases on XPU (#11332)
- enable group_offload cases and quanto cases on XPU (#11405)
- enable testlayerwisecasting_memory cases on XPU (#11406)
- enable 28 GGUF test cases on XPU (#11404)
- enable marigold_intrinsics cases on XPU (#11445)
- enable consistency test cases on XPU, all passed (#11446)
- enable unidiffuser test cases on xpu (#11444)
- make safe diffusion test cases pass on XPU and A100 (#11458)
- make autoencoders. controlnetflux and wantransformer3dsinglefile pass on xpu (#11461)
- enable semantic diffusion and stable diffusion panorama cases on XPU (#11459)
- enable lora cases on XPU (#11506)
- enable 7 cases on XPU (#11503)
- enable dit integration cases on xpu (#11523)
- enable print_env on xpu (#11507)
- enable several pipeline integration tests on XPU (#11526)
- enhance value guard of deviceagnostic_dispatch (#11553)
- enable pipeline test cases on xpu (#11527)
- enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed (#11620)
- enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654)
- enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU (#11671)
@hlky
- Fix LTX 0.9.5 single file (#11271)
- HiDream Image (#11231)
- Use float32 on mps or npu in transformerhidreamimage's rope (#11316)
- Fix vae.Decoder prevoutputchannel (#11280)
@quickjkee
- flow matching lcm scheduler (#11170)
@ishan-modi
- [ControlNet] Adds controlnet for SanaTransformer (#11040)
- [BUG] fixed _toctree.yml alphabetical ordering (#11277)
- [BUG] fixes in kadinsky pipeline (#11080)
- [Refactor] Minor Improvement for import utils (#11161)
- [Feature] Added Xlab Controlnet support (#11249)
- [BUG] fixed WAN docstring (#11226)
- [Feature] AutoModel can load components using model_index.json (#11401)
@linoytsaban
- [HiDream] code example (#11317)
- [Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)
- [LoRA] add LoRA support to HiDream and fine-tuning script (#11281)
- [HiDream LoRA] optimizations + small updates (#11381)
- [Hi-Dream LoRA] fix bug in validation (#11439)
- [LoRA] make lora alpha and dropout configurable (#11467)
- [LoRA] small change to support Hunyuan LoRA Loading for FramePack (#11546)
- [LoRA] support non-diffusers LTX-Video loras (#11572)
- [LoRA] kijai wan lora support for I2V (#11588)
- [training docs] smol update to README files (#11616)
- [Sana Sprint] add image-to-image pipeline (#11602)
- [LoRA training] update metadata use for lora alpha + README (#11723)
@hameerabbasi
- [LoRA] Add LoRA support to AuraFlow (#10216)
@DN6
- Fix Hunyuan I2V for transformers>4.47.1 (#11293)
- Hunyuan I2V fast tests fix (#11341)
- [Single File] GGUF/Single File Support for HiDream (#11550)
- [Single File] Fix loading for LTX 0.9.7 transformer (#11578)
- Type annotation fix (#11597)
- Fix mixed variant downloading (#11611)
- [CI] Some improvements to Nightly reports summaries (#11166)
- Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process (#11596)
- Chroma Follow Up (#11725)
- [CI] Fix WAN VACE tests (#11757)
- [CI] Fix SANA tests (#11756)
- Fix HiDream pipeline test module (#11754)
- Update Chroma Docs (#11753)
- Fix failing cpu offload test for LTX Latent Upscale (#11755)
- [CI] Skip ONNX Upscale tests (#11774)
@yiyixuxu
- [Hi Dream] follow-up (#11296)
- support Wan-FLF2V (#11353)
- update output for Hidream transformer (#11366)
- [Wan2.1-FLF2V] update conversion script (#11365)
- [HiDream] move deprecation to 0.35.0 (#11384)
- clean up the Init for stable_diffusion (#11500)
- [lora] only remove hooks that we add back (#11768)
@Teriks
- Kolors additional pipelines, community contrib (#11372)
@co63oc
- Fix typos in strings and comments (#11407)
- Fix typos in docs and comments (#11416)
- Fix typos in strings and comments (#11476)
@xduzhangjiayu
- Add StableDiffusion3InstructPix2PixPipeline (#11378)
@scxue
- Add cross attention type for Sana-Sprint training in diffusers. (#11514)
@lzyhha
- Add VisualCloze (#11377)
@b-sai
- RegionalPrompting: Inherit from Stable Diffusion (#11525)
@Ednaordinary
- Chroma Pipeline (#11698)

- Python
Published by sayakpaul 11 months ago

diffusers - v0.33.1: fix ftfy import

All commits

fix ftfy import for wan pipelines by @yiyixuxu in #11262

- Python
Published by yiyixuxu about 1 year ago

diffusers - Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more

New Pipelines for Video Generation

Wan 2.1

Wan2.1 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. The model release includes 4 different model variants and three different pipelines for Text to Video, Image to Video and Video to Video.

Wan-AI/Wan2.1-T2V-1.3B-Diffusers
Wan-AI/Wan2.1-T2V-14B-Diffusers
Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
Wan-AI/Wan2.1-I2V-14B-720P-Diffusers

Check out the docs here to learn more.

LTX Video 0.9.5

LTX Video 0.9.5 is the updated version of the super-fast LTX Video model series. The latest model introduces additional conditioning options, such as keyframe-based animation and video extension (both forward and backward).

To support these additional conditioning inputs, we’ve introduced the LTXConditionPipeline and LTXVideoCondition object.

To learn more about the usage, check out the docs here.

Hunyuan Image to Video

Hunyuan utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder. The input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data and seamlessly integrating information from both the image and its associated caption.

To learn more, check out the docs here.

Others

EasyAnimateV5 (thanks to @bubbliiiing for contributing this in this PR)
ConsisID (thanks to @SHYuanBest for contributing this in this PR)

New Pipelines for Image Generation

Sana-Sprint

SANA-Sprint is an efficient diffusion model for ultra-fast text-to-image generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4, rivaling the quality of models like Flux.

Shoutout to @lawrence-cj for their help and guidance on this PR.

Check out the pipeline docs of SANA-Sprint to learn more.

Lumina2

Lumina-Image-2.0 is a 2B parameter flow-based diffusion transformer for text-to-image generation released under the Apache 2.0 license.

Check out the docs to learn more. Thanks to @zhuole1025 for contributing this through this PR.

One can also LoRA fine-tune Lumina2, taking advantage of its Apach2.0 licensing. Check out the guide for more details.

Omnigen

OmniGen is a unified image generation model that can handle multiple tasks including text-to-image, image editing, subject-driven generation, and various computer vision tasks within a single framework. The model consists of a VAE, and a single transformer based on Phi-3 that handles text and image encoding as well as the diffusion process.

Check out the docs to learn more about OmniGen. Thanks to @staoxiao for contributing OmniGen in this PR.

Others

CogView4 (thanks to @zRzRzRzRzRzRzR for contributing CogView4 in this PR)

New Memory Optimizations

Layerwise Casting

PyTorch supports torch.float8_e4m3fn and torch.float8_e5m2 as weight storage dtypes, but they can’t be used for computation on many devices due to unimplemented kernel support.

However, you can still use these dtypes to store model weights in FP8 precision and upcast them to a widely supported dtype such as torch.float16 or torch.bfloat16 on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting. This can potentially cut down the VRAM requirements of a model by 50%.

Code

```py import torch from diffusers import CogVideoXPipeline, CogVideoXTransformer3DModel from diffusers.utils import export_to_video model_id = "THUDM/CogVideoX-5b" # Load the model in bfloat16 and enable layerwise casting transformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16) transformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16) # Load the pipeline pipe = CogVideoXPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16) pipe.to("cuda") prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0] export_to_video(video, "output.mp4", fps=8) ```

Group Offloading

Group offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either torch.nn.ModuleList or torch.nn.Sequential), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced.

On CUDA devices, we also have the option to enable using layer prefetching with CUDA Streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed which makes inference substantially faster while still keeping VRAM requirements very low. With this, we introduce the idea of overlapping computation with data transfer.

One thing to note is that using CUDA streams can cause a considerable spike in CPU RAM usage. Please ensure that the available CPU RAM is 2 times the size of the model if you choose to set use_stream=True. You can reduce CPU RAM usage by setting low_cpu_mem_usage=True. This should limit the CPU RAM used to be roughly the same as the size of the model, but will introduce slight latency in the inference process.

You can also use record_stream=True when using use_stream=True to obtain more speedups at the expense of slightly increased memory usage.

Code

```py import torch from diffusers import CogVideoXPipeline from diffusers.utils import export_to_video # Load the pipeline onload_device = torch.device("cuda") offload_device = torch.device("cpu") pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) # We can utilize the enable_group_offload method for Diffusers model implementations pipe.transformer.enable_group_offload( onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True ) prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0] # This utilized about 14.79 GB. It can be further reduced by using tiling and using leaf_level offloading throughout the pipeline. print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB") export_to_video(video, "output.mp4", fps=8) ```

Group offloading can also be applied to non-Diffusers models such as text encoders from the transformers library.

Code

```py import torch from diffusers import CogVideoXPipeline from diffusers.hooks import apply_group_offloading from diffusers.utils import export_to_video # Load the pipeline onload_device = torch.device("cuda") offload_device = torch.device("cpu") pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) # For any other model implementations, the apply_group_offloading function can be used apply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2) ```

Remote Components

Remote components are an experimental feature designed to offload memory-intensive steps of the inference pipeline to remote endpoints. The initial implementation focuses primarily on VAE decoding operations. Below are the currently supported model endpoints:

| Model | Endpoint | Model | |---------------------|---------------------------------------------------------------------|--------------------------------------| | Stable Diffusion v1 | https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud | stabilityai/sd-vae-ft-mse | | Stable Diffusion XL | https://x2dmsqunjd6k9prw.us-east-1.aws.endpoints.huggingface.cloud | madebyollin/sdxl-vae-fp16-fix | | Flux | https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud | black-forest-labs/FLUX.1-schnell | | HunyuanVideo | https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud | hunyuanvideo-community/HunyuanVideo |

This is an example of using remote decoding with the Hunyuan Video pipeline:

Code

```py from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel model_id = "hunyuanvideo-community/HunyuanVideo" transformer = HunyuanVideoTransformer3DModel.from_pretrained( model_id, subfolder="transformer", torch_dtype=torch.bfloat16 ) pipe = HunyuanVideoPipeline.from_pretrained( model_id, transformer=transformer, vae=None, torch_dtype=torch.float16 ).to("cuda") latent = pipe( prompt="A cat walks on the grass, realistic", height=320, width=512, num_frames=61, num_inference_steps=30, output_type="latent", ).frames video = remote_decode( endpoint="https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud/", tensor=latent, output_type="mp4", ) if isinstance(video, bytes): with open("video.mp4", "wb") as f: f.write(video) ```

Check out the docs to know more.

Introducing Cached Inference for DiTs

Cached Inference for Diffusion Transformer models is a performance optimization that significantly accelerates the denoising process by caching intermediate values. This technique reduces redundant computations across timesteps, resulting in faster generation with a slight dip in output quality.

Check out the docs to learn more about the available caching methods.

Pyramind Attention Broadcast

Code

```py import torch from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) pipe.to("cuda") config = PyramidAttentionBroadcastConfig( spatial_attention_block_skip_range=2, spatial_attention_timestep_skip_range=(100, 800), current_timestep_callback=lambda: pipe.current_timestep, ) pipe.transformer.enable_cache(config) ```

FasterCache

Code

```py import torch from diffusers import CogVideoXPipeline, FasterCacheConfig pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) pipe.to("cuda") config = FasterCacheConfig( spatial_attention_block_skip_range=2, spatial_attention_timestep_skip_range=(-1, 901), unconditional_batch_skip_range=2, attention_weight_callback=lambda _: 0.5, is_guidance_distilled=True, ) pipe.transformer.enable_cache(config) ```

Quantization

Quanto Backend

Diffusers now has support for the Quanto quantization backend, which provides float8 , int8 , int4 and int2 quantization dtypes.

```python import torch from diffusers import FluxTransformer2DModel, QuantoConfig

modelid = "black-forest-labs/FLUX.1-dev" quantizationconfig = QuantoConfig(weightsdtype="float8") transformer = FluxTransformer2DModel.frompretrained( modelid, subfolder="transformer", quantizationconfig=quantizationconfig, torchdtype=torch.bfloat16, ) ```

Quanto int8 models are also compatible with torch.compile :

Code

```py import torch from diffusers import FluxTransformer2DModel, QuantoConfig model_id = "black-forest-labs/FLUX.1-dev" quantization_config = QuantoConfig(weights_dtype="float8") transformer = FluxTransformer2DModel.from_pretrained( model_id, subfolder="transformer", quantization_config=quantization_config, torch_dtype=torch.bfloat16, ) transformer.compile() ```

Improved loading for `uintx` TorchAO checkpoints with `torch>=2.6`

TorchAO checkpoints currently have to be serialized using pickle. For some quantization dtypes using the uintx format, such as uint4wo this involves saving subclassed TorchAO Tensor objects in the model file. This made loading the models directly with Diffusers a bit tricky since we do not allow deserializing artbitary Python objects from pickle files.

Torch 2.6 allows adding expected Tensors to torch safe globals, which lets us directly load TorchAO checkpoints with these objects.

diff - state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu") - with init_empty_weights(): - transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json") - transformer.load_state_dict(state_dict, strict=True, assign=True) + transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_uint4wo/")

LoRAs

We have shipped a couple of improvements on the LoRA front in this release.

🚨 Improved coverage for loading non-diffusers LoRA checkpoints for Flux

Take note of the breaking change introduced in this PR 🚨 We suggest you upgrade your peft installation to the latest version - pip install -U peft especially when dealing with Flux LoRAs.

torch.compile() support when hotswapping LoRAs without triggering recompilation

A common use case when serving multiple adapters is to load one adapter first, generate images, load another adapter, generate more images, load another adapter, etc. This workflow normally requires calling loadloraweights(), set_adapters(), and possibly delete_adapters() to save memory. Moreover, if the model is compiled using torch.compile, performing these steps requires recompilation, which takes time.

To better support this common workflow, you can “hotswap” a LoRA adapter, to avoid accumulating memory and in some cases, recompilation. It requires an adapter to already be loaded, and the new adapter weights are swapped in-place for the existing adapter.

Check out the docs to learn more about this feature.

The other major change is the support for

Loading LoRAs into quantized model checkpoints

`dtype` Maps for Pipelines

Since various pipelines require their components to run in different compute dtypes, we now support passing a dtype map when initializing a pipeline:

```python from diffusers import HunyuanVideoPipeline import torch

pipe = HunyuanVideoPipeline.frompretrained( "hunyuanvideo-community/HunyuanVideo", torchdtype={"transformer": torch.bfloat16, "default": torch.float16}, ) print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16) ```

AutoModel

This release includes an AutoModel object similar to the one found in transformers that automatically fetches the appropriate model class for the provided repo.

```python from diffusers import AutoModel

unet = AutoModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet") ```

All commits

[Sana 4K] Add vae tiling option to avoid OOM by @leisuzz in #10583
IP-Adapter for StableDiffusion3Img2ImgPipeline by @guiyrt in #10589
[DC-AE, SANA] fix SanaMultiscaleLinearAttention applyquadraticattention bf16 by @chenjy2003 in #10595
Move buffers to device by @hlky in #10523
[Docs] Update SD3 ipadapter modelid to diffusers checkpoint by @guiyrt in #10597
Scheduling fixes on MPS by @hlky in #10549
[Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo by @chengzeyi in #10544
NPU adaption for RMSNorm by @leisuzz in #10534
implementing flux on TPUs with ptxla by @entrpn in #10515
[core] ConsisID by @SHYuanBest in #10140
[training] set rest of the blocks with requires_grad False. by @sayakpaul in #10607
chore: remove redundant words by @sunxunle in #10609
bugfix for npu not support float64 by @baymax591 in #10123
[chore] change licensing to 2025 from 2024. by @sayakpaul in #10615
Enable dreambooth lora finetune example on other devices by @jiqing-feng in #10602
Remove the FP32 Wrapper when evaluating by @lmxyy in #10617
[tests] make tests device-agnostic (part 3) by @faaany in #10437
fix offload gpu tests etc by @yiyixuxu in #10366
Remove cache migration script by @Wauplin in #10619
[core] Layerwise Upcasting by @a-r-r-o-w in #10347
Improve TorchAO error message by @a-r-r-o-w in #10627
[CI] Update HF_TOKEN in all workflows by @DN6 in #10613
add onnxruntime-migraphx as part of check for onnxruntime in import_utils.py by @kahmed10 in #10624
[Tests] modify the test slices for the failing flax test by @sayakpaul in #10630
[docs] fix image path in para attention docs by @sayakpaul in #10632
[docs] uv installation by @stevhliu in #10622
width and height are mixed-up by @raulc0399 in #10629
Add IP-Adapter example to Flux docs by @hlky in #10633
removing redundant requires_grad = False by @YanivDorGalron in #10628
[chore] add a script to extract loras from full fine-tuned models by @sayakpaul in #10631
Add pipelinestablediffusionxlattentive_eraser by @Anonym0u3 in #10579
NPU Adaption for Sanna by @leisuzz in #10409
Add sigmoid scheduler in scheduling_ddpm.py docs by @JacobHelwig in #10648
create a script to train autoencoderkl by @lavinal712 in #10605
Add community pipeline for semantic guidance for FLUX by @Marlon154 in #10610
ControlNet Union controlnetconditioningscale for multiple control inputs by @hlky in #10666
[training] Convert to ImageFolder script by @hlky in #10664
Add provider_options to OnnxRuntimeModel by @hlky in #10661
fix check_inputs func in LuminaText2ImgPipeline by @victolee0 in #10651
SDXL ControlNet Union pipelines, make control_image argument immutible by @Teriks in #10663
Revert RePaint scheduler 'fix' by @GiusCat in #10644
[core] Pyramid Attention Broadcast by @a-r-r-o-w in #9562
[fix] refer useframewiseencoding on AutoencoderKLHunyuanVideo._encode by @hanchchch in #10600
Refactor gradient checkpointing by @a-r-r-o-w in #10611
[Tests] conditionally check fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory by @sayakpaul in #10669
Fix pipeline dtype unexpected change when using SDXL reference community pipelines in float16 mode by @dimitribarbot in #10670
[tests] update llamatokenizer in hunyuanvideo tests by @sayakpaul in #10681
support StableDiffusionAdapterPipeline.fromsinglefile by @Teriks in #10552
fix(hunyuan-video): typo in height and width input check by @badayvedat in #10684
[FIX] check_inputs function in Auraflow Pipeline by @SahilCarterr in #10678
Fix enable memory efficient attention on ROCm by @tenpercent in #10564
Fix inconsistent random transform in instruct pix2pix by @Luvata in #10698
feat(training-utils): support device and dtype params in computedensityfortimestepsampling by @badayvedat in #10699
Fixed grammar in "writeownpipeline" readme by @N0-Flux-given in #10706
Fix Documentation about Image-to-Image Pipeline by @ParagEkbote in #10704
[bitsandbytes] Simplify bnb int8 dequant by @sayakpaul in #10401
Fix traintextto_image.py --help by @nkthiebaut in #10711
Notebooks for Community Scripts-6 by @ParagEkbote in #10713
[Fix] Type Hint in from_pretrained() to Ensure Correct Type Inference by @SahilCarterr in #10714
add provideroptions in frompretrained by @xieofxie in #10719
[Community] Enhanced Model Search by @suzukimain in #10417
[bugfix] NPU Adaption for Sana by @leisuzz in #10724
Quantized Flux with IP-Adapter by @hlky in #10728
EDMEulerScheduler accept sigmas, add finalsigmastype by @hlky in #10734
[LoRA] fix peft state dict parsing by @sayakpaul in #10532
Add Self type hint to ModelMixin's from_pretrained by @hlky in #10742
[Tests] Test layerwise casting with training by @sayakpaul in #10765
speedup hunyuan encoder causal mask generation by @dabeschte in #10764
[CI] Fix Truffle Hog failure by @DN6 in #10769
Add OmniGen by @staoxiao in #10148
feat: new community mixturetilingsdxl pipeline for SDXL by @elismasilva in #10759
Add support for lumina2 by @zhuole1025 in #10642
Refactor OmniGen by @a-r-r-o-w in #10771
Faster set_adapters by @Luvata in #10777
[Single File] Add Single File support for Lumina Image 2.0 Transformer by @DN6 in #10781
Fix use_lu_lambdas and use_karras_sigmas with beta_schedule=squaredcos_cap_v2 in DPMSolverMultistepScheduler by @hlky in #10740
MultiControlNetUnionModel on SDXL by @guiyrt in #10747
fix: [Community pipeline] Fix flattened elements on image by @elismasilva in #10774
make tensors contiguous before passing to safetensors by @faaany in #10761
Disable PEFT input autocast when using fp8 layerwise casting by @a-r-r-o-w in #10685
Update FlowMatch docstrings to mention correct output classes by @a-r-r-o-w in #10788
Refactor CogVideoX transformer forward by @a-r-r-o-w in #10789
Module Group Offloading by @a-r-r-o-w in #10503
Update Custom Diffusion Documentation for Multiple Concept Inference to resolve issue #10791 by @puhuk in #10792
[FIX] check_inputs function in lumina2 by @SahilCarterr in #10784
follow-up refactor on lumina2 by @yiyixuxu in #10776
CogView4 (supports different length c and uc) by @zRzRzRzRzRzRzR in #10649
typo fix by @YanivDorGalron in #10802
Extend Support for callbackonstep_end for AuraFlow and LuminaText2Img Pipelines by @ParagEkbote in #10746
[chore] update notes generation spaces by @sayakpaul in #10592
[LoRA] improve lora support for flux. by @sayakpaul in #10810
Fix max_shift value in flux and related functions to 1.15 (issue #10675) by @puhuk in #10807
[docs] add missing entries to the lora docs. by @sayakpaul in #10819
DiffusionPipeline mixin to+FromOriginalModelMixin/FromSingleFileMixin from_single_file type hint by @hlky in #10811
[LoRA] make set_adapters() robust on silent failures. by @sayakpaul in #9618
[FEAT] Model loading refactor by @SunMarc in #10604
[misc] feat: introduce a style bot. by @sayakpaul in #10274
Remove print statements by @a-r-r-o-w in #10836
[tests] use proper gemma class and config in lumina2 tests. by @sayakpaul in #10828
[LoRA] add LoRA support to Lumina2 and fine-tuning script by @sayakpaul in #10818
[Utils] add utilities for checking if certain utilities are properly documented by @sayakpaul in #7763
Add missing isinstance for arg checks in GGUFParameter by @AstraliteHeart in #10834
[tests] test encode_prompt() in isolation by @sayakpaul in #10438
store activation cls instead of function by @SunMarc in #10832
fix: support transformer models' generation_config in pipeline by @JeffersonQin in #10779
Notebooks for Community Scripts-7 by @ParagEkbote in #10846
[CI] install accelerate transformers from main by @sayakpaul in #10289
[CI] run fast gpu tests conditionally on pull requests. by @sayakpaul in #10310
SD3 IP-Adapter runtime checkpoint conversion by @guiyrt in #10718
Some consistency-related fixes for HunyuanVideo by @a-r-r-o-w in #10835
SkyReels Hunyuan T2V & I2V by @a-r-r-o-w in #10837
fix: run tests from a pr workflow. by @sayakpaul in #9696
[chore] template for remote vae. by @sayakpaul in #10849
fix remote vae template by @sayakpaul in #10852
[CI] Fix incorrectly named test module for Hunyuan DiT by @DN6 in #10854
[CI] Update always test Pipelines list in Pipeline fetcher by @DN6 in #10856
device_map in load_model_dict_into_meta by @hlky in #10851
[Fix] Docs overview.md by @SahilCarterr in #10858
remove format check for safetensors file by @SunMarc in #10864
[docs] LoRA support by @stevhliu in #10844
Comprehensive type checking for from_pretrained kwargs by @guiyrt in #10758
Fix torch_dtype in Kolors text encoder with transformers v4.49 by @hlky in #10816
[LoRA] restrict certain keys to be checked for peft config update. by @sayakpaul in #10808
Add SD3 ControlNet to AutoPipeline by @hlky in #10888
[docs] Update prompt weighting docs by @stevhliu in #10843
[docs] Flux group offload by @stevhliu in #10847
[Fix] fp16 unscaling in traindreamboothlora_sdxl by @SahilCarterr in #10889
[docs] Add CogVideoX Schedulers by @a-r-r-o-w in #10885
[chore] correct qk norm list. by @sayakpaul in #10876
[Docs] Fix toctree sorting by @DN6 in #10894
[refactor] SD3 docs & remove additional code by @a-r-r-o-w in #10882
[refactor] Remove additional Flux code by @a-r-r-o-w in #10881
[CI] Improvements to conditional GPU PR tests by @DN6 in #10859
Multi IP-Adapter for Flux pipelines by @guiyrt in #10867
Fix Callback Tensor Inputs of the SDXL Controlnet Inpaint and Img2img Pipelines are missing "controlnet_image". by @CyberVy in #10880
Security fix by @ydshieh in #10905
Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation by @toshas in #10884
[Tests] fix: lumina2 lora fuse_nan test by @sayakpaul in #10911
Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. by @CyberVy in #10907
[CI] Fix Fast GPU tests on PR by @DN6 in #10912
[CI] Fix for failing IP Adapter test in Fast GPU PR tests by @DN6 in #10915
Experimental per control type scale for ControlNet Union by @hlky in #10723
[style bot] improve security for the stylebot. by @sayakpaul in #10908
[CI] Update Stylebot Permissions by @DN6 in #10931
[Alibaba Wan Team] continue on #10921 Wan2.1 by @yiyixuxu in #10922
Support IPAdapter for more Flux pipelines by @hlky in #10708
Add remote_decode to remote_utils by @hlky in #10898
Update VAE Decode endpoints by @hlky in #10939
[chore] fix-copies to flux pipelines by @sayakpaul in #10941
[Tests] Remove more encode prompts tests by @sayakpaul in #10942
Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model by @bubbliiiing in #10626
Fix SD2.X clip single file load projection_dim by @Teriks in #10770
add fromsinglefile to animatediff by @ in #10924
Add Example of IPAdapterScaleCutoffCallback to Docs by @ParagEkbote in #10934
Update pipeline_cogview4.py by @zRzRzRzRzRzRzR in #10944
Fix redundant prevoutputchannel assignment in UNet2DModel by @ahmedbelgacem in #10945
Improve loadipadapter RAM Usage by @CyberVy in #10948
[tests] make tests device-agnostic (part 4) by @faaany in #10508
Update evaluation.md by @sayakpaul in #10938
[LoRA] feat: support non-diffusers lumina2 LoRAs. by @sayakpaul in #10909
[Quantization] support pass MappingType for TorchAoConfig by @a120092009 in #10927
Fix the missing parentheses when calling istorchaoavailable in quantization_config.py. by @CyberVy in #10961
[LoRA] Support Wan by @a-r-r-o-w in #10943
Fix incorrect seed initialization when args.seed is 0 by @azolotenkov in #10964
feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL by @elismasilva in #10951
[Docs] CogView4 comment fix by @zRzRzRzRzRzRzR in #10957
update check_input for cogview4 by @yiyixuxu in #10966
Add VAE Decode endpoint slow test by @hlky in #10946
[flux lora training] fix t5 training bug by @linoytsaban in #10845
use style bot GH Action from huggingface_hub by @hanouticelina in #10970
[traindreamboothlora.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @flyxiv in #10973
[tests] fix tests for save load components by @sayakpaul in #10977
Fix loading OneTrainer Flux LoRA by @hlky in #10978
fix default values of Flux guidance_scale in docstrings by @catwell in #10982
[CI] remove synchornized. by @sayakpaul in #10980
Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill by @dependabot[bot] in #10984
Fix Flux Controlnet Pipeline callbacktensor_inputs Missing Some Elements by @CyberVy in #10974
[Single File] Add user agent to SF download requests. by @DN6 in #10979
Add CogVideoX DDIM Inversion to Community Pipelines by @LittleNyima in #10956
fix wan i2v pipeline bugs by @yupeng1111 in #10975
Hunyuan I2V by @a-r-r-o-w in #10983
Fix Graph Breaks When Compiling CogView4 by @chengzeyi in #10959
Wan VAE move scaling to pipeline by @hlky in #10998
[LoRA] remove full key prefix from peft. by @sayakpaul in #11004
[Single File] Add single file support for Wan T2V/I2V by @DN6 in #10991
Add STG to community pipelines by @kinam0252 in #10960
[LoRA] Improve copied from comments in the LoRA loader classes by @sayakpaul in #10995
Fix for fetching variants only by @DN6 in #10646
[Quantization] Add Quanto backend by @DN6 in #10756
[Single File] Add single file loading for SANA Transformer by @ishan-modi in #10947
[LoRA] Improve warning messages when LoRA loading becomes a no-op by @sayakpaul in #10187
[LoRA] CogView4 by @a-r-r-o-w in #10981
[Tests] improve quantization tests by additionally measuring the inference memory savings by @sayakpaul in #11021
[Research Project] Add AnyText: Multilingual Visual Text Generation And Editing by @tolgacangoz in #8998
[Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 by @DN6 in #11018
fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings by @elismasilva in #11012
[LoRA] support wan i2v loras from the world. by @sayakpaul in #11025
Fix SD3 IPAdapter feature extractor by @hlky in #11027
chore: fix help messages in advanced diffusion examples by @wonderfan in #10923
Fix missing **kwargs in lora_pipeline.py by @CyberVy in #11011
Fix for multi-GPU WAN inference by @AmericanPresidentJimmyCarter in #10997
[Refactor] Clean up import utils boilerplate by @DN6 in #11026
Use output_size in repeat_interleave by @hlky in #11030
[hybrid inference 🍯🐝] Add VAE encode by @hlky in #11017
Wan Pipeline scaling fix, type hint warning, multi generator fix by @hlky in #11007
[LoRA] change to warning from info when notifying the users about a LoRA no-op by @sayakpaul in #11044
Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline by @hlky in #10827
making formatted_images initialization compact by @YanivDorGalron in #10801
Fix aclnnRepeatInterleaveIntWithDim error on NPU for get1drotaryposembed by @ZhengKai91 in #10820
[Tests] restrict memory tests for quanto for certain schemes. by @sayakpaul in #11052
[LoRA] feat: support non-diffusers wan t2v loras. by @sayakpaul in #11059
[examples/controlnet/traincontrolnetsd3.py] Fixes #11050 - Cast promptembeds and pooledpromptembeds to weightdtype to prevent dtype mismatch by @andjoer in #11051
reverts accidental change that removes attn_mask in attn. Improves fl… by @entrpn in #11065
Fix deterministic issue when getting pipeline dtype and device by @dimitribarbot in #10696
[Tests] add requires peft decorator. by @sayakpaul in #11037
CogView4 Control Block by @zRzRzRzRzRzRzR in #10809
[CI] pin transformers version for benchmarking. by @sayakpaul in #11067
Fix Wan I2V Quality by @chengzeyi in #11087
LTX 0.9.5 by @a-r-r-o-w in #10968
make PR GPU tests conditioned on styling. by @sayakpaul in #11099
Group offloading improvements by @a-r-r-o-w in #11094
Fix pipelinefluxcontrolnet.py by @co63oc in #11095
update readme instructions. by @entrpn in #11096
Resolve stride mismatch in UNet's ResNet to support Torch DDP by @jinc7461 in #11098
Fix Group offloading behaviour when using streams by @a-r-r-o-w in #11097
Quality options in export_to_video by @hlky in #11090
[CI] uninstall deps properly from pr gpu tests. by @sayakpaul in #11102
[BUG] Fix Autoencoderkl train script by @lavinal712 in #11113
[Wan LoRAs] make T2V LoRAs compatible with Wan I2V by @linoytsaban in #11107
[tests] enable bnb tests on xpu by @faaany in #11001
[fix bug] PixArt inference_steps=1 by @lawrence-cj in #11079
Flux with Remote Encode by @hlky in #11091
[tests] make cuda only tests device-agnostic by @faaany in #11058
Provide option to reduce CPU RAM usage in Group Offload by @DN6 in #11106
remove F.rms_norm for now by @yiyixuxu in #11126
Notebooks for Community Scripts-8 by @ParagEkbote in #11128
fix callbacktensor_inputs of sd controlnet inpaint pipeline missing some elements by @CyberVy in #11073
[core] FasterCache by @a-r-r-o-w in #10163
add sana-sprint by @yiyixuxu in #11074
Don't override torch_dtype and don't use when quantization_config is set by @hlky in #11039
Update README and example code for AnyText usage by @tolgacangoz in #11028
Modify the implementation of retrieve_timesteps in CogView4-Control. by @zRzRzRzRzRzRzR in #11125
[fix SANA-Sprint] by @lawrence-cj in #11142
New HunyuanVideo-I2V by @a-r-r-o-w in #11066
[doc] Fix Korean Controlnet Train doc by @flyxiv in #11141
Improve information about group offloading and layerwise casting by @a-r-r-o-w in #11101
add a timestep scale for sana-sprint teacher model by @lawrence-cj in #11150
[Quantization] dtype fix for GGUF + fix BnB tests by @DN6 in #11159
Set self.hfpeftconfigloaded to True when LoRA is loaded using load_lora_adapter in PeftAdapterMixin class by @kentdan3msu in #11155
WanI2V encode_image by @hlky in #11164
[Docs] Update Wan Docs with memory optimizations by @DN6 in #11089
Fix LatteTransformer3DModel dtype mismatch with enabletemporalattentions by @hlky in #11139
Raise warning and round down if Wan num_frames is not 4k + 1 by @a-r-r-o-w in #11167
[Docs] Fix environment variables in installation.md by @remarkablemark in #11179
Add latents_mean and latents_std to SDXLLongPromptWeightingPipeline by @hlky in #11034
Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set by @kakukakujirori in #10918
[tests] no hard-coded cuda by @faaany in #11186
[WIP] Add Wan Video2Video by @DN6 in #11053
map BACKENDRESETMAXMEMORYALLOCATED to resetpeakmemory_stats on XPU by @yao-matrix in #11191
fix autocast by @jiqing-feng in #11190
fix: for checking mandatory and optional pipeline components by @elismasilva in #11189
remove unnecessary call to F.pad by @bm-synth in #10620
allow models to run with a user-provided dtype map instead of a single dtype by @hlky in #10301
[tests] HunyuanDiTControlNetPipeline inference precision issue on XPU by @faaany in #11197
Revert save_model in ModelMixin savepretrained and use safeserialization=False in test by @hlky in #11196
[docs] torch_dtype map by @hlky in #11194
Fix enablesequentialcpu_offload in CogView4Pipeline by @hlky in #11195
SchedulerMixin from_pretrained and ConfigMixin Self type annotation by @hlky in #11192
Update import_utils.py by @Lakshaysharma048 in #10329
Add CacheMixin to Wan and LTX Transformers by @DN6 in #11187
feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline by @elismasilva in #11188
[Model Card] standardize advanced diffusion training sdxl lora by @chiral-carbon in #7615
Change KolorsPipeline LoRA Loader to StableDiffusion by @BasileLewan in #11198
Update Style Bot workflow by @hanouticelina in #11202
Fixed requests.get function call by adding timeout parameter. by @kghamilton89 in #11156
Fix Single File loading for LTX VAE by @DN6 in #11200
[feat]Add strength in flux_fill pipeline (denoising strength for fluxfill) by @Suprhimp in #10603
[LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning by @tolgacangoz in #11174
Add Wan with STG as a community pipeline by @Ednaordinary in #11184
Add missing MochiEncoder3D.gradient_checkpointing attribute by @mjkvaak-amd in #11146
enable 1 case on XPU by @yao-matrix in #11219
ensure dtype match between diffused latents and vae weights by @heyalexchoi in #8391
[docs] MPS update by @stevhliu in #11212
Add support to pass image embeddings to the WAN I2V pipeline. by @goiri in #11175
[traincontrolnet.py] Fix the LR schedulers when numtrain_epochs is passed in a distributed training env by @Bhavay-2001 in #8461
[Training] Better image interpolation in training scripts by @asomoza in #11206
[LoRA] Implement hot-swapping of LoRA by @BenjaminBossan in #9453
introduce compute arch specific expectations and fix testsd3img2img_inference failure by @yao-matrix in #11227
[Flux LoRA] fix issues in flux lora scripts by @linoytsaban in #11111
Flux quantized with lora by @hlky in #10990
[feat] implement record_stream when using CUDA streams during group offloading by @sayakpaul in #11081
[bistandbytes] improve replacement warnings for bnb by @sayakpaul in #11132
minor update to sana sprint docs. by @sayakpaul in #11236
[docs] minor updates to dtype map docs. by @sayakpaul in #11237
[LoRA] support more comyui loras for Flux 🚨 by @sayakpaul in #10985
fix: SD3 ControlNet validation so that it runs on a A100. by @sayakpaul in #11238
AudioLDM2 Fixes by @hlky in #11244
AutoModel by @hlky in #11115
fix FluxReduxSlowTests::testfluxredux_inference case failure on XPU by @yao-matrix in #11245
[docs] AutoModel by @hlky in #11250
Update Ruff to latest Version by @DN6 in #10919
fix flux controlnet bug by @free001style in #11152
fix timeout constant by @sayakpaul in #11252
fix consisid imports by @sayakpaul in #11254
Release: v0.33.0 by @sayakpaul (direct commit on v0.33.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@guiyrt
- IP-Adapter for StableDiffusion3Img2ImgPipeline (#10589)
- [Docs] Update SD3 ipadapter modelid to diffusers checkpoint (#10597)
- MultiControlNetUnionModel on SDXL (#10747)
- SD3 IP-Adapter runtime checkpoint conversion (#10718)
- Comprehensive type checking for from_pretrained kwargs (#10758)
- Multi IP-Adapter for Flux pipelines (#10867)
@chengzeyi
- [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo (#10544)
- Fix Graph Breaks When Compiling CogView4 (#10959)
- Fix Wan I2V Quality (#11087)
@entrpn
- implementing flux on TPUs with ptxla (#10515)
- reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)
- update readme instructions. (#11096)
@SHYuanBest
- [core] ConsisID (#10140)
@faaany
- [tests] make tests device-agnostic (part 3) (#10437)
- make tensors contiguous before passing to safetensors (#10761)
- [tests] make tests device-agnostic (part 4) (#10508)
- [tests] enable bnb tests on xpu (#11001)
- [tests] make cuda only tests device-agnostic (#11058)
- [tests] no hard-coded cuda (#11186)
- [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (#11197)
@yiyixuxu
- fix offload gpu tests etc (#10366)
- follow-up refactor on lumina2 (#10776)
- [Alibaba Wan Team] continue on #10921 Wan2.1 (#10922)
- update check_input for cogview4 (#10966)
- remove F.rms_norm for now (#11126)
- add sana-sprint (#11074)
@DN6
- [CI] Update HF_TOKEN in all workflows (#10613)
- [CI] Fix Truffle Hog failure (#10769)
- [Single File] Add Single File support for Lumina Image 2.0 Transformer (#10781)
- [CI] Fix incorrectly named test module for Hunyuan DiT (#10854)
- [CI] Update always test Pipelines list in Pipeline fetcher (#10856)
- [Docs] Fix toctree sorting (#10894)
- [CI] Improvements to conditional GPU PR tests (#10859)
- [CI] Fix Fast GPU tests on PR (#10912)
- [CI] Fix for failing IP Adapter test in Fast GPU PR tests (#10915)
- [CI] Update Stylebot Permissions (#10931)
- [Single File] Add user agent to SF download requests. (#10979)
- [Single File] Add single file support for Wan T2V/I2V (#10991)
- Fix for fetching variants only (#10646)
- [Quantization] Add Quanto backend (#10756)
- [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018)
- [Refactor] Clean up import utils boilerplate (#11026)
- Provide option to reduce CPU RAM usage in Group Offload (#11106)
- [Quantization] dtype fix for GGUF + fix BnB tests (#11159)
- [Docs] Update Wan Docs with memory optimizations (#11089)
- [WIP] Add Wan Video2Video (#11053)
- Add CacheMixin to Wan and LTX Transformers (#11187)
- Fix Single File loading for LTX VAE (#11200)
- Update Ruff to latest Version (#10919)
@Anonym0u3
- Add pipelinestablediffusionxlattentive_eraser (#10579)
@lavinal712
- create a script to train autoencoderkl (#10605)
- [BUG] Fix Autoencoderkl train script (#11113)
@Marlon154
- Add community pipeline for semantic guidance for FLUX (#10610)
@ParagEkbote
- Fix Documentation about Image-to-Image Pipeline (#10704)
- Notebooks for Community Scripts-6 (#10713)
- Extend Support for callbackonstep_end for AuraFlow and LuminaText2Img Pipelines (#10746)
- Notebooks for Community Scripts-7 (#10846)
- Add Example of IPAdapterScaleCutoffCallback to Docs (#10934)
- Notebooks for Community Scripts-8 (#11128)
@suzukimain
- [Community] Enhanced Model Search (#10417)
@staoxiao
- Add OmniGen (#10148)
@elismasilva
- feat: new community mixturetilingsdxl pipeline for SDXL (#10759)
- fix: [Community pipeline] Fix flattened elements on image (#10774)
- feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL (#10951)
- fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012)
- fix: for checking mandatory and optional pipeline components (#11189)
- feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (#11188)
@zhuole1025
- Add support for lumina2 (#10642)
@zRzRzRzRzRzRzR
- CogView4 (supports different length c and uc) (#10649)
- Update pipeline_cogview4.py (#10944)
- [Docs] CogView4 comment fix (#10957)
- CogView4 Control Block (#10809)
- Modify the implementation of retrieve_timesteps in CogView4-Control. (#11125)
@toshas
- Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation (#10884)
@bubbliiiing
- Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626)
@LittleNyima
- Add CogVideoX DDIM Inversion to Community Pipelines (#10956)
@kinam0252
- Add STG to community pipelines (#10960)
@tolgacangoz
- [Research Project] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)
- Update README and example code for AnyText usage (#11028)
- [LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning (#11174)
@Ednaordinary
- Add Wan with STG as a community pipeline (#11184)

- Python
Published by sayakpaul about 1 year ago

diffusers - v0.32.2

Fixes for Flux Single File loading, LoRA loading for 4bit BnB Flux, Hunyuan Video

This patch release

Fixes a regression in loading Comfy UI format single file checkpoints for Flux
Fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models
Adds unload_lora_weights for Flux Control
Fixes a bug that prevents Hunyuan Video from running with batch size > 1
Allow Hunyuan Video to load LoRAs created from the original repository code

All commits

[Single File] Fix loading Flux Dev finetunes with Comfy Prefix by @DN6 in #10545
[CI] Update HF Token on Fast GPU Model Tests by @DN6 #10570
[CI] Update HF Token in Fast GPU Tests by @DN6 #10568
Fix batch > 1 in HunyuanVideo by @hlky in #10548
Fix HunyuanVideo produces NaN on PyTorch<2.5 by @hlky in #10482
Fix hunyuan video attention mask dim by @a-r-r-o-w in #10454
[LoRA] Support original format loras for HunyuanVideo by @a-r-r-o-w in #10376
[LoRA] feat: support loading loras into 4bit quantized Flux models. by @sayakpaul in #10578
[LoRA] clean up load_lora_into_text_encoder() and fuse_lora() copied from by @sayakpaul in #10495
[LoRA] feat: support unload_lora_weights() for Flux Control. by @sayakpaul in #10206
Fix Flux multiple Lora loading bug by @maxs-kan in #10388
[LoRA] fix: lora unloading when using expanded Flux LoRAs. by @sayakpaul in #10397

- Python
Published by DN6 over 1 year ago

diffusers - v0.32.1

TorchAO Quantizer fixes

This patch release fixes a few bugs related to the TorchAO Quantizer introduced in v0.32.0.

Importing Diffusers would raise an error in PyTorch versions lower than 2.3.0. This should no longer be a problem.
Device Map does not work as expected when using the quantizer. We now raise an error if it is used. Support for using device maps with different quantization backends will be added in the near future.
Quantization was not performed due to faulty logic. This is now fixed and better tested.

Refer to our documentation to learn more about how to use different quantization backends.

All commits

make style for https://github.com/huggingface/diffusers/pull/10368 by @yiyixuxu in #10370
fix test pypi installation in the release workflow by @sayakpaul in #10360
Fix TorchAO related bugs; revert device_map changes by @a-r-r-o-w in #10371

- Python
Published by a-r-r-o-w over 1 year ago

diffusers - Diffusers 0.32.0: New video pipelines, new image pipelines, new quantization backends, new training scripts, and more

https://github.com/user-attachments/assets/34d5f7ca-8e33-4401-8109-5c245ce7595f

This release took a while, but it has many exciting updates. It contains several new pipelines for image and video generation, new quantization backends, and more.

Going forward, to provide more transparency to the community about ongoing developments and releases in Diffusers, we will be making use of a roadmap tracker.

New Video Generation Pipelines 📹

Open video generation models are on the rise, and we’re pleased to provide comprehensive integration support for all of them. The following video pipelines are bundled in this release:

Check out this section to learn more about the fine-tuning options available for these new video models.

New Image Generation Pipelines

SANA
- Text-to-image
- PAG
Flux Control (including Control LoRA)
- Depth Control
- Canny Control
Flux Redux
Flux Fill Inpainting / Outpainting
Flux RF-Inversion
SD3.5 ControlNet
ControlNet Union XL
SD3.5 IP Adapter
Flux IP adapter

Important Note about the new Flux Models

We can combine the regular Flux.1 Dev LoRAs with Flux Control LoRAs, Flux Control, and Flux Fill. For example, you can enable few-steps inference with Flux Fill using:

```python from diffusers import FluxFillPipeline from diffusers.utils import load_image import torch

pipe = FluxFillPipeline.frompretrained( "black-forest-labs/FLUX.1-Fill-dev", torchdtype=torch.bfloat16 ).to("cuda")

adapterid = "alimama-creative/FLUX.1-Turbo-Alpha" pipe.loadloraweights(adapterid)

image = loadimage("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png") mask = loadimage("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")

image = pipe( prompt="a white paper cup", image=image, maskimage=mask, height=1632, width=1232, guidancescale=30, numinferencesteps=8, maxsequencelength=512, generator=torch.Generator("cpu").manual_seed(0) ).images[0] image.save("flux-fill-dev.png") ```

To learn more, check out the documentation.

[!NOTE]
SANA is a small model compared to other models like Flux and Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. We support LoRA fine-tuning of SANA. Check out this section for more details.

Acknowledgements

Shoutout to @lawrence-cj and @chenjy2003 for contributing SANA in this PR. SANA also features a Deep Compression Autoencoder, which was contributed by @lawrence-cj in this PR.
Shoutout to @guiyrt for contributing SD3.5 IP Adapter in this PR.

New Quantization Backends

TorchAO
GGUF

Please be aware of the following caveats:

TorchAO quantized checkpoints cannot be serialized in safetensors currently. This may change in the future.
GGUF currently only supports loading pre-quantized checkpoints into models in this release. Support for saving models with GGUF quantization will be added in the future.

New training scripts

This release features many new training scripts for the community to play:

All commits

post-release 0.31.0 by @sayakpaul in #9742
fix bug in require_accelerate_version_greater by @faaany in #9746
[Official callbacks] SDXL Controlnet CFG Cutoff by @asomoza in #9311
[SD3-5 dreambooth lora] update model cards by @linoytsaban in #9749
config attribute not foud error for FluxImagetoImage Pipeline for multi controlnet solved by @rshah240 in #9586
Some minor updates to the nightly and push workflows by @sayakpaul in #9759
[Docs] fix docstring typo in SD3 pipeline by @shenzhiy21 in #9765
[bugfix] bugfix for npu free memory by @leisuzz in #9640
[research_projects] add flux training script with quantization by @sayakpaul in #9754
Add a doc for AWS Neuron in Diffusers by @JingyaHuang in #9766
[refactor] enhance readability of flux related pipelines by @Luciennnnnnn in #9711
Added Support of Xlabs controlnet to FluxControlNetInpaintPipeline by @SahilCarterr in #9770
[research_projects] Update README.md to include a note about NF5 T5-xxl by @sayakpaul in #9775
[Fix] traindreamboothlorafluxadvanced ValueError: unexpected save model: by @rootonchair in #9777
[Fix] remove setting lr for T5 text encoder when using prodigy in flux dreambooth lora script by @biswaroop1547 in #9473
[SD 3.5 Dreambooth LoRA] support configurable training block & layers by @linoytsaban in #9762
[flux dreambooth lora training] make LoRA target modules configurable + small bug fix by @linoytsaban in #9646
adds the pipeline for pixart alpha controlnet by @raulc0399 in #8857
[core] Allegro T2V by @a-r-r-o-w in #9736
Allegro VAE fix by @a-r-r-o-w in #9811
[CI] add new runner for testing by @sayakpaul in #9699
[training] fixes to the quantization training script and add AdEMAMix optimizer as an option by @sayakpaul in #9806
[training] use the lr when using 8bit adam. by @sayakpaul in #9796
[Tests] clean up and refactor gradient checkpointing tests by @sayakpaul in #9494
[CI] add a big GPU marker to run memory-intensive tests separately on CI by @sayakpaul in #9691
[LoRA] fix: lora loading when using with a device_mapped model. by @sayakpaul in #9449
Revert "[LoRA] fix: lora loading when using with a device_mapped mode… by @yiyixuxu in #9823
[Model Card] standardize advanced diffusion training sd15 lora by @chiral-carbon in #7613
NPU Adaption for FLUX by @leisuzz in #9751
Fixes EMAModel "from_pretrained" method by @SahilCarterr in #9779
Update traincontrolnetflux.py,Fix size mismatch issue in validation by @ScilenceForest in #9679
Handling mixed precision for dreambooth flux lora training by @icsl-Jeon in #9565
Reduce Memory Cost in Flux Training by @leisuzz in #9829
Add Diffusion Policy for Reinforcement Learning by @DorsaRoh in #9824
[feat] add load_lora_adapter() for compatible models by @sayakpaul in #9712
Refac training utils.py by @RogerSinghChugh in #9815
[core] Mochi T2V by @a-r-r-o-w in #9769
[Fix] Test of sd3 lora by @SahilCarterr in #9843
Fix: Remove duplicated comma in distributed_inference.md by @vahidaskari in #9868
Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA by @jellyheadandrew in #9228
Updated encodepromptwithclip and encodeprompt in traindreamboth_sd3 by @SahilCarterr in #9800
[Core] introduce controlnet module by @sayakpaul in #8768
[Flux] reduce explicit device transfers and typecasting in flux. by @sayakpaul in #9817
Improve downloads of sharded variants by @DN6 in #9869
[fix] Replaced shutil.copy with shutil.copyfile by @SahilCarterr in #9885
Enabling gradient checkpointing in eval() mode by @MikeTkachuk in #9878
[FIX] Fix TypeError in DreamBooth SDXL when use_dora is False by @SahilCarterr in #9879
[Advanced LoRA v1.5] fix: gradient unscaling problem by @sayakpaul in #7018
Revert "[Flux] reduce explicit device transfers and typecasting in flux." by @sayakpaul in #9896
Feature IP Adapter Xformers Attention Processor by @elismasilva in #9881
Notebooks for Community Scripts Examples by @ParagEkbote in #9905
Fix Progress Bar Updates in SD 1.5 PAG Img2Img pipeline by @painebenjamin in #9925
Update pipelinefluximg2img.py by @example-git in #9928
add depth controlnet sd3 pre-trained checkpoints to docs by @pureexe in #9937
Move Wuerstchen Dreambooth to research_projects by @ParagEkbote in #9935
Update ip_adapter.py by @mkknightr in #8882
Modify applyoverlay for inpainting with paddingmask_crop (Inpainting area: "Only Masked") by @clarkkent0618 in #8793
Correct pipeline_output.py to the type Mochi by @twobob in #9945
Add all AttnProcessor classes in AttentionProcessor type by @Prgckwb in #9909
Fixed Nits in Docs and Example Script by @ParagEkbote in #9940
Add server example by @thealmightygrant in #9918
CogVideoX 1.5 by @zRzRzRzRzRzRzR in #9877
Notebooks for Community Scripts-2 by @ParagEkbote in #9952
[advanced flux training] bug fix + reduce memory cost as in #9829 by @linoytsaban in #9838
[LoRA] feat: save_lora_adapter() by @sayakpaul in #9862
Make CogVideoX RoPE implementation consistent by @a-r-r-o-w in #9963
[CI] Unpin torch<2.5 in CI by @DN6 in #9961
Move IP Adapter Scripts to research project by @ParagEkbote in #9960
add skip_layers argument to SD3 transformer model class by @bghira in #9880
Fix beta and exponential sigmas + add tests by @hlky in #9954
Flux latents fix by @DN6 in #9929
[LoRA] enable LoRA for Mochi-1 by @sayakpaul in #9943
Improve control net block index for sd3 by @linjiapro in #9758
Update handle single blocks on convertxlabsfluxloratodiffusers by @raulmosa in #9915
fix controlnet module refactor by @yiyixuxu in #9968
Fix prepare latent image ids and vae sample generators for flux by @a-r-r-o-w in #9981
[Tests] skip nan lora tests on PyTorch 2.5.1 CPU. by @sayakpaul in #9975
make pipelines tests device-agnostic (part1) by @faaany in #9399
ControlNet fromsinglefile when already converted by @hlky in #9978
Flux Fill, Canny, Depth, Redux by @a-r-r-o-w in #9985
[SD3 dreambooth lora] smol fix to checkpoint saving by @linoytsaban in #9993
[Docs] add: missing pipelines from the spec. by @sayakpaul in #10005
Add prompt about wandb in examples/dreambooth/readme. by @SkyCol in #10014
[docs] Fix CogVideoX table by @a-r-r-o-w in #10008
Notebooks for Community Scripts-3 by @ParagEkbote in #10032
Sd35 controlnet by @yiyixuxu in #10020
Add beta, exponential and karras sigmas to FlowMatchEulerDiscreteScheduler by @hlky in #10001
Update sdxl reference pipeline to latest sdxl pipeline by @dimitribarbot in #9938
[Community Pipeline] Add some feature for regional prompting pipeline by @cjkangme in #9874
Add sdxl controlnet reference community pipeline by @dimitribarbot in #9893
Change imagegenaux repository URL by @asomoza in #10048
make pipelines tests device-agnostic (part2) by @faaany in #9400
[Mochi-1] ensuring to compute the fourier features in FP32 in Mochi encoder by @sayakpaul in #10031
[Fix] Syntax error by @SahilCarterr in #10068
[CI] Add quantization by @sayakpaul in #9832
Add sigmas to Flux pipelines by @hlky in #10081
Fixed Nits in Evaluation Docs by @ParagEkbote in #10063
fix link in the docs by @coding-famer in #10058
fix offloading for sd3.5 controlnets by @yiyixuxu in #10072
[Single File] Fix SD3.5 single file loading by @DN6 in #10077
Fix num_images_per_prompt>1 with Skip Guidance Layers in StableDiffusion3Pipeline by @hlky in #10086
[Single File] Pass token when fetching interpreted config by @DN6 in #10082
Interpolate fix on cuda for large output tensors by @pcuenca in #10067
Convert sigmas to np.array in FlowMatch set_timesteps by @hlky in #10088
fix: missing AutoencoderKL lora adapter by @beniz in #9807
Let server decide default repo visibility by @Wauplin in #10047
Fix some documentation in ./src/diffusers/models/embeddings.py for demo by @DTG2005 in #9579
Don't stale close-to-merge by @pcuenca in #10096
Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG by @painebenjamin in #9932
Notebooks for Community Scripts-4 by @ParagEkbote in #10094
Fix Broken Link in Optimization Docs by @ParagEkbote in #10105
DPM++ third order fixes by @StAlKeR7779 in #9104
update by @aihao2000 in #7067
Avoid compiling a progress bar. by @lsb in #10098
[Bug fix] "previous_timestep()" in DDPM scheduling compatible with "trailing" and "linspace" options by @AnandK27 in #9384
Fix multi-prompt inference by @hlky in #10103
Test skip_guidance_layers in SD3 pipeline by @hlky in #10102
Use parameters + buffers when deciding upscale_dtype by @universome in #9882
[tests] refactor vae tests by @sayakpaul in #9808
add torchxla support in pipelinestable_audio.py by @ in #10109
Fix pipeline_stable_audio formating by @hlky in #10114
[bitsandbytes] allow directly CUDA placements of pipelines loaded with bnb components by @sayakpaul in #9840
Fix Broken Links in ReadMe by @ParagEkbote in #10117
Add sigmas to pipelines using FlowMatch by @hlky in #10116
[Flux Redux] add prompt & multiple image input by @linoytsaban in #10056
Fix a bug in the state dict judgment in ip_adapter.py. by @zhangp365 in #10095
Fix a bug for SD35 control net training and improve control net block index by @linjiapro in #10065
pass attn mask arg for flux by @yiyixuxu in #10122
[docs] loadloraadapter by @stevhliu in #10119
Use torch.device instead of current device index for BnB quantizer by @a-r-r-o-w in #10069
[Tests] fix condition argument in xfail. by @sayakpaul in #10099
[Tests] xfail incompatible SD configs. by @sayakpaul in #10127
[FIX] Bug in FluxPosEmbed by @SahilCarterr in #10115
[Guide] Quantize your Diffusion Models with bnb by @ariG23498 in #10012
Remove duplicate checks for len(generator) != batch_size when generator is a list by @a-r-r-o-w in #10134
[community] Load Models from Sources like Civitai into Existing Pipelines by @suzukimain in #9986
[DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); by @lawrence-cj in #9708
fixed a dtype bfloat16 bug in torch_utils.py by @zhangp365 in #10125
[LoRA] depcrecate saveattnprocs(). by @sayakpaul in #10126
Update ptxla training by @entrpn in #9864
support sd3.5 for controlnet example by @DavyMorgan in #9860
[Single file] Support revision argument when loading single file config by @a-r-r-o-w in #10168
[community pipeline] Add RF-inversion Flux pipeline by @linoytsaban in #9816
Improve post-processing performance by @soof-golan in #10170
Use torch in get_3d_rotary_pos_embed/_allegro by @hlky in #10161
Flux Control LoRA by @a-r-r-o-w in #9999
Add PAG Support for Stable Diffusion Inpaint Pipeline by @darshil0805 in #9386
[community pipeline rf-inversion] - fix example in doc by @linoytsaban in #10179
Fix Nonetype attribute error when loading multiple Flux loras by @jonathanyin12 in #10182
Added Error when len(gligenimages ) is not equal to len(gligenphrases) in StableDiffusionGLIGENTextImagePipeline by @SahilCarterr in #10176
[Single File] Add single file support for AutoencoderDC by @DN6 in #10183
Add ControlNetUnion by @hlky in #10131
fix min-snr implementation by @ethansmith2000 in #8466
Add support for XFormers in SD3 by @CanvaChen in #8583
[LoRA] add a test to ensure set_adapters() and attn kwargs outs match by @sayakpaul in #10110
[CI] merge peft pr workflow into the main pr workflow. by @sayakpaul in #10042
[WIP][Training] Flux Control LoRA training script by @sayakpaul in #10130
[core] LTX Video by @a-r-r-o-w in #10021
Ci update tpu by @paulinebm in #10197
Remove negative_* from SDXL callback by @hlky in #10203
refactor StableDiffusionXLControlNetUnion by @hlky in #10200
update StableDiffusion3Img2ImgPipeline.add image size validation by @ZHJ19970917 in #10166
Remove mps workaround for fp16 GELU, which is now supported natively by @skotapati in #10133
[RF inversion community pipeline] add eta_decay by @linoytsaban in #10199
Allow image resolutions multiple of 8 instead of 64 in SVD pipeline by @mlfarinha in #6646
Use torch in get_2d_sincos_pos_embed and get_3d_sincos_pos_embed by @hlky in #10156
add reshape to fix usememoryefficient_attention in flax by @entrpn in #7918
Add offload option in flux-control training by @Adenialzz in #10225
Test error raised when loading normal and expanding loras together in Flux by @a-r-r-o-w in #10188
[Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. by @lawrence-cj in #9982
[Tests] update always test pipelines list. by @sayakpaul in #10143
Update sana.md with minor corrections by @sayakpaul in #10232
[docs] minor stuff to ltx video docs. by @sayakpaul in #10229
Fix format issue in push_test yml by @DN6 in #10235
[core] Hunyuan Video by @a-r-r-o-w in #10136
Update pipelinecontrolnet.py add support for pytorchxla by @ in #10222
[Docs] add rest of the lora loader mixins to the docs. by @sayakpaul in #10230
Use t instead of timestep in _apply_perturbed_attention_guidance by @hlky in #10243
Add dynamic_shifting to SD3 by @hlky in #10236
Fix use_flow_sigmas by @hlky in #10242
Fix ControlNetUnion callbacktensor_inputs by @hlky in #10218
Use non-human subject in StableDiffusion3ControlNetPipeline example by @hlky in #10214
Add enablevaetiling to AllegroPipeline, fix example by @hlky in #10212
Fix checkpoint in CogView3PlusPipeline example by @hlky in #10211
Fix RePaint Scheduler by @hlky in #10185
Add ControlNetUnion to AutoPipeline from_pretrained by @hlky in #10219
fix downsample bug in MidResTemporalBlock1D by @holmosaint in #10250
[core] TorchAO Quantizer by @a-r-r-o-w in #10009
[docs] Add missing AttnProcessors by @stevhliu in #10246
[chore] add contribution note for lawrence. by @sayakpaul in #10253
Fix copied from comment in Mochi lora loader by @a-r-r-o-w in #10255
[LoRA] Support LTX Video by @a-r-r-o-w in #10228
[docs] Clarify dtypes for Sana by @a-r-r-o-w in #10248
[Single File] Add GGUF support by @DN6 in #9964
Fix Mochi Quality Issues by @DN6 in #10033
[tests] Remove/rename unsupported quantization torchao type by @a-r-r-o-w in #10263
[docs] delete_adapters() by @stevhliu in #10245
[Community Pipeline] Fix typo that cause error on regional prompting pipeline by @cjkangme in #10251
Add set_shift to FlowMatchEulerDiscreteScheduler by @hlky in #10269
[LoRA] feat: lora support for SANA. by @sayakpaul in #10234
[chore] fix: licensing headers in mochi and ltx by @sayakpaul in #10275
Use torch in get_2d_rotary_pos_embed by @hlky in #10155
[chore] fix: reamde -> readme by @sayakpaul in #10276
Make time_embed_dim of UNet2DModel changeable by @Bichidian in #10262
Support pass kwargs to sd3 custom attention processor by @Matrix53 in #9818
Flux Control(Depth/Canny) + Inpaint by @affromero in #10192
Fix sigmalast with useflow_sigmas by @hlky in #10267
Fix Doc links in GGUF and Quantization overview docs by @DN6 in #10279
Make zeroing prompt embeds for Mochi Pipeline configurable by @DN6 in #10284
[Single File] Add single file support for Flux Canny, Depth and Fill by @DN6 in #10288
[tests] Fix broken cuda, nightly and lora tests on main for CogVideoX by @a-r-r-o-w in #10270
Rename Mochi integration test correctly by @a-r-r-o-w in #10220
[tests] remove nullop import checks from lora tests by @a-r-r-o-w in #10273
[chore] Update README_sana.md to update the default model by @sayakpaul in #10285
Hunyuan VAE tiling fixes and transformer docs by @a-r-r-o-w in #10295
Add Flux Control to AutoPipeline by @hlky in #10292
Update loraconversionutils.py by @zhaowendao30 in #9980
Check correct model type is passed to from_pretrained by @hlky in #10189
[LoRA] Support HunyuanVideo by @SHYuanBest in #10254
[Single File] Add single file support for Mochi Transformer by @DN6 in #10268
Allow Mochi Transformer to be split across multiple GPUs by @DN6 in #10300
Fix local_files_only for checkpoints with shards by @hlky in #10294
Fix failing lora tests after HunyuanVideo lora by @a-r-r-o-w in #10307
unet's sample_size attribute is to accept tuple(h, w) in StableDiffusionPipeline by @Foundsheep in #10181
Enable Gradient Checkpointing for UNet2DModel (New) by @dg845 in #7201
[WIP] SD3.5 IP-Adapter Pipeline Integration by @guiyrt in #9987
Add support for sharded models when TorchAO quantization is enabled by @a-r-r-o-w in #10256
Make tensors in ResNet contiguous for Hunyuan VAE by @a-r-r-o-w in #10309
[Single File] Add GGUF support for LTX by @DN6 in #10298
[LoRA] feat: support loading regular Flux LoRAs into Flux Control, and Fill by @sayakpaul in #10259
[Tests] add integration tests for lora expansion stuff in Flux. by @sayakpaul in #10318
Mochi docs by @DN6 in #9934
[Docs] Update ltxvideo.md to remove generator from `frompretrained()` by @sayakpaul in #10316
docs: fix a mistake in docstring by @Leojc in #10319
[BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.newzeros() TypeError in function preparelatents caused by audiovaelength by @syntaxticsugr in #10306
[docs] Fix quantization links by @stevhliu in #10323
[Sana]add 2K related model for Sana by @lawrence-cj in #10322
[Docs] Update gguf.md to remove generator from the pipeline from_pretrained by @sayakpaul in #10299
Fix pushtestsmps.yml by @hlky in #10326
Fix EMAModel testfrompretrained by @hlky in #10325
Support Flux IP Adapter by @hlky in #10261
flux controlnet inpaint config bug by @yigitozgenc in #10291
Community hosted weights for diffusers format HunyuanVideo weights by @a-r-r-o-w in #10344
Fix enablesequentialcpuoffload in testkandinsky_combined by @hlky in #10324
update get_parameter_dtype by @yiyixuxu in #10342
[Single File] Add Single File support for HunYuan video by @DN6 in #10320
[Sana bug] bug fix for 2K model config by @lawrence-cj in #10340
.from_single_file() - Add missing .shape by @gau-nernst in #10332
Bump minimum TorchAO version to 0.7.0 by @a-r-r-o-w in #10293
[docs] fix: torchao example. by @sayakpaul in #10278
[tests] Refactor TorchAO serialization fast tests by @a-r-r-o-w in #10271
[SANA LoRA] sana lora training tests and misc. by @sayakpaul in #10296
[Single File] Fix loading by @DN6 in #10349
[Tests] QoL improvements to the LoRA test suite by @sayakpaul in #10304
Fix FluxIPAdapterTesterMixin by @hlky in #10354
Fix failing CogVideoX LoRA fuse test by @a-r-r-o-w in #10352
Rename LTX blocks and docs title by @a-r-r-o-w in #10213
[LoRA] test fix by @sayakpaul in #10351
[Tests] Fix more tests sayak by @sayakpaul in #10359
[core] LTX Video 0.9.1 by @a-r-r-o-w in #10330
Release: v0.32.0 by @sayakpaul (direct commit on v0.32.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@faaany
- fix bug in require_accelerate_version_greater (#9746)
- make pipelines tests device-agnostic (part1) (#9399)
- make pipelines tests device-agnostic (part2) (#9400)
@linoytsaban
- [SD3-5 dreambooth lora] update model cards (#9749)
- [SD 3.5 Dreambooth LoRA] support configurable training block & layers (#9762)
- [flux dreambooth lora training] make LoRA target modules configurable + small bug fix (#9646)
- [advanced flux training] bug fix + reduce memory cost as in #9829 (#9838)
- [SD3 dreambooth lora] smol fix to checkpoint saving (#9993)
- [Flux Redux] add prompt & multiple image input (#10056)
- [community pipeline] Add RF-inversion Flux pipeline (#9816)
- [community pipeline rf-inversion] - fix example in doc (#10179)
- [RF inversion community pipeline] add eta_decay (#10199)
@raulc0399
- adds the pipeline for pixart alpha controlnet (#8857)
@yiyixuxu
- Revert "[LoRA] fix: lora loading when using with a device_mapped mode… (#9823)
- fix controlnet module refactor (#9968)
- Sd35 controlnet (#10020)
- fix offloading for sd3.5 controlnets (#10072)
- pass attn mask arg for flux (#10122)
- update get_parameter_dtype (#10342)
@jellyheadandrew
- Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA (#9228)
@DN6
- Improve downloads of sharded variants (#9869)
- [CI] Unpin torch<2.5 in CI (#9961)
- Flux latents fix (#9929)
- [Single File] Fix SD3.5 single file loading (#10077)
- [Single File] Pass token when fetching interpreted config (#10082)
- [Single File] Add single file support for AutoencoderDC (#10183)
- Fix format issue in push_test yml (#10235)
- [Single File] Add GGUF support (#9964)
- Fix Mochi Quality Issues (#10033)
- Fix Doc links in GGUF and Quantization overview docs (#10279)
- Make zeroing prompt embeds for Mochi Pipeline configurable (#10284)
- [Single File] Add single file support for Flux Canny, Depth and Fill (#10288)
- [Single File] Add single file support for Mochi Transformer (#10268)
- Allow Mochi Transformer to be split across multiple GPUs (#10300)
- [Single File] Add GGUF support for LTX (#10298)
- Mochi docs (#9934)
- [Single File] Add Single File support for HunYuan video (#10320)
- [Single File] Fix loading (#10349)
@ParagEkbote
- Notebooks for Community Scripts Examples (#9905)
- Move Wuerstchen Dreambooth to research_projects (#9935)
- Fixed Nits in Docs and Example Script (#9940)
- Notebooks for Community Scripts-2 (#9952)
- Move IP Adapter Scripts to research project (#9960)
- Notebooks for Community Scripts-3 (#10032)
- Fixed Nits in Evaluation Docs (#10063)
- Notebooks for Community Scripts-4 (#10094)
- Fix Broken Link in Optimization Docs (#10105)
- Fix Broken Links in ReadMe (#10117)
@painebenjamin
- Fix Progress Bar Updates in SD 1.5 PAG Img2Img pipeline (#9925)
- Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)
@hlky
- Fix beta and exponential sigmas + add tests (#9954)
- ControlNet fromsinglefile when already converted (#9978)
- Add beta, exponential and karras sigmas to FlowMatchEulerDiscreteScheduler (#10001)
- Add sigmas to Flux pipelines (#10081)
- Fix num_images_per_prompt>1 with Skip Guidance Layers in StableDiffusion3Pipeline (#10086)
- Convert sigmas to np.array in FlowMatch set_timesteps (#10088)
- Fix multi-prompt inference (#10103)
- Test skip_guidance_layers in SD3 pipeline (#10102)
- Fix pipeline_stable_audio formating (#10114)
- Add sigmas to pipelines using FlowMatch (#10116)
- Use torch in get_3d_rotary_pos_embed/_allegro (#10161)
- Add ControlNetUnion (#10131)
- Remove negative_* from SDXL callback (#10203)
- refactor StableDiffusionXLControlNetUnion (#10200)
- Use torch in get_2d_sincos_pos_embed and get_3d_sincos_pos_embed (#10156)
- Use t instead of timestep in _apply_perturbed_attention_guidance (#10243)
- Add dynamic_shifting to SD3 (#10236)
- Fix use_flow_sigmas (#10242)
- Fix ControlNetUnion callbacktensor_inputs (#10218)
- Use non-human subject in StableDiffusion3ControlNetPipeline example (#10214)
- Add enablevaetiling to AllegroPipeline, fix example (#10212)
- Fix checkpoint in CogView3PlusPipeline example (#10211)
- Fix RePaint Scheduler (#10185)
- Add ControlNetUnion to AutoPipeline from_pretrained (#10219)
- Add set_shift to FlowMatchEulerDiscreteScheduler (#10269)
- Use torch in get_2d_rotary_pos_embed (#10155)
- Fix sigmalast with useflow_sigmas (#10267)
- Add Flux Control to AutoPipeline (#10292)
- Check correct model type is passed to from_pretrained (#10189)
- Fix local_files_only for checkpoints with shards (#10294)
- Fix pushtestsmps.yml (#10326)
- Fix EMAModel testfrompretrained (#10325)
- Support Flux IP Adapter (#10261)
- Fix enablesequentialcpuoffload in testkandinsky_combined (#10324)
- Fix FluxIPAdapterTesterMixin (#10354)
@dimitribarbot
- Update sdxl reference pipeline to latest sdxl pipeline (#9938)
- Add sdxl controlnet reference community pipeline (#9893)
@suzukimain
- [community] Load Models from Sources like Civitai into Existing Pipelines (#9986)
@lawrence-cj
- [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); (#9708)
- [Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. (#9982)
- [Sana]add 2K related model for Sana (#10322)
- [Sana bug] bug fix for 2K model config (#10340)
@darshil0805
- Add PAG Support for Stable Diffusion Inpaint Pipeline (#9386)
@affromero
- Flux Control(Depth/Canny) + Inpaint (#10192)
@SHYuanBest
- [LoRA] Support HunyuanVideo (#10254)
@guiyrt
- [WIP] SD3.5 IP-Adapter Pipeline Integration (#9987)

- Python
Published by sayakpaul over 1 year ago

diffusers - v0.31.0

v0.31.0: Stable Diffusion 3.5 Large, CogView3, Quantization, Training Scripts, and more

Stable Diffusion 3.5 Large

Stability AI’s latest text-to-image generation model is Stable Diffusion 3.5 Large. SD3.5 Large is the next iteration of Stable Diffusion 3. It comes with two checkpoints (both of which have 8B params):

A regular one
A timestep-distilled one enabling few-step inference

Make sure to fill up the form by going to the model page, and then run huggingface-cli login before running the code below.

```python

make sure to update diffusers

pip install -U diffusers

import torch from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.frompretrained( "stabilityai/stable-diffusion-3.5-large", torchdtype=torch.bfloat16 ).to("cuda")

image = pipe( prompt="a photo of a cat holding a sign that says hello world", negativeprompt="", numinferencesteps=40, height=1024, width=1024, guidancescale=4.5, ).images[0]

image.save("sd3helloworld.png") ```

Follow the documentation to know more.

Cogview3-plus

We added a new text-to-image model, Cogview3-plus, from the THUDM team! The model is DiT-based and supports image generation from 512 to 2048px. Thanks to @zRzRzRzRzRzRzR for contributing it!

```python from diffusers import CogView3PlusPipeline import torch

pipe = CogView3PlusPipeline.frompretrained("THUDM/CogView3-Plus-3B", torchdtype=torch.float16).to("cuda")

Enable it to reduce GPU memory usage

pipe.enablemodelcpuoffload() pipe.vae.enableslicing() pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."

image = pipe( prompt=prompt, guidancescale=7.0, numimagesperprompt=1, numinferencesteps=50, width=1024, height=1024, ).images[0]

image.save("cogview3.png") ```

Refer to the documentation to know more.

Quantization

We have landed native quantization support in Diffusers, starting with bitsandbytes as its first quantization backend. With this, we hope to see large diffusion models becoming much more accessible to run on consumer hardware.

The example below shows how to run Flux.1 Dev with the NF4 data-type. Make sure you install the libraries:

bash pip install -Uq git+https://github.com/huggingface/transformers@main pip install -Uq bitsandbytes pip install -Uq diffusers

```python from diffusers import BitsAndBytesConfig, FluxTransformer2DModel import torch

ckptid = "black-forest-labs/FLUX.1-dev" nf4config = BitsAndBytesConfig( loadin4bit=True, bnb4bitquanttype="nf4", bnb4bitcomputedtype=torch.bfloat16 ) modelnf4 = FluxTransformer2DModel.frompretrained( ckptid, subfolder="transformer", quantizationconfig=nf4config, torchdtype=torch.bfloat16 ) ```

Then, we use model_nf4 to instantiate the FluxPipeline:

```python

from diffusers import FluxPipeline

pipeline = StableDiffusion3Pipeline.frompretrained( ckptid, transformer=modelnf4, torchdtype=torch.bfloat16 ) pipeline.enablemodelcpu_offload()

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"

image = pipeline( prompt=prompt, negativeprompt="", numinferencesteps=50, guidancescale=4.5, maxsequencelength=512, ).images[0] image.save("whimsical.png") ```

Follow the documentation here to know more. Additionally, check out this Colab Notebook that runs Flux.1 Dev in an end-to-end manner with NF4 quantization.

Training scripts

We have a fresh bucket of training scripts with this release:

Video model fine-tuning can be quite expensive. So, we have worked on a repository, cogvideox-factory, which provides memory-optimized scripts to fine-tune the Cog family of models.

Misc

We now support the loading of different kinds of Flux LoRAs, including Kohya, TheLastBen, and Xlabs.
Loading of Xlabs Flux ControlNets is also now supported. Thanks to @Anghellia for contributing it!

All commits

Feature flux controlnet img2img and inpaint pipeline by @ighoshsubho in #9408
Remove CogVideoX mentions from single file docs; Test updates by @a-r-r-o-w in #9444
set maxshardsize to None for pipeline save_pretrained by @a-r-r-o-w in #9447
adapt masked im2im pipeline for SDXL by @noskill in #7790
[Flux] add lora integration tests. by @sayakpaul in #9353
[training] CogVideoX Lora by @a-r-r-o-w in #9302
Several fixes to Flux ControlNet pipelines by @vladmandic in #9472
[refactor] LoRA tests by @a-r-r-o-w in #9481
[CI] fix nightly model tests by @sayakpaul in #9483
[Cog] some minor fixes and nits by @sayakpaul in #9466
[Tests] Reduce the model size in the lumina test by @saqlain2204 in #8985
Fix the bug of sd3 controlnet training when using gradient checkpointing. by @pibbo88 in #9498
[Schedulers] Add exponential sigmas / exponential noise schedule by @hlky in #9499
Allow DDPMPipeline half precision by @sbinnee in #9222
Add Noise Schedule/Schedule Type to Schedulers Overview documentation by @hlky in #9504
fix bugs for sd3 controlnet training by @xduzhangjiayu in #9489
[Doc] Fix path and and also import imageio by @LukeLIN-web in #9506
[CI] allow faster downloads from the Hub in CI. by @sayakpaul in #9478
a few fix for SingleFile tests by @yiyixuxu in #9522
Add exponential sigmas to other schedulers and update docs by @hlky in #9518
[Community Pipeline] Batched implementation of Flux with CFG by @sayakpaul in #9513
Update community_projects.md by @lee101 in #9266
[docs] Model sharding by @stevhliu in #9521
update getparameterdtype by @yiyixuxu in #9526
[Doc] Improved level of clarity for latentstorgb. by @LagPixelLOL in #9529
[Schedulers] Add beta sigmas / beta noise schedule by @hlky in #9509
flux controlnet fix (control_modes batch & others) by @yiyixuxu in #9507
[Tests] Fix ChatGLMTokenizer by @asomoza in #9536
[bug] Precedence of operations in VAE should be slicing -> tiling by @a-r-r-o-w in #9342
[LoRA] make set_adapters() method more robust. by @sayakpaul in #9535
[examples] add train flux-controlnet scripts in example. by @PromeAIpro in #9324
[Tests] [LoRA] clean up the serialization stuff. by @sayakpaul in #9512
[Core] fix variant-identification. by @sayakpaul in #9253
[refactor] remove conv_cache from CogVideoX VAE by @a-r-r-o-w in #9524
[traininstructpix2pix.py]Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @AnandK27 in #9316
[chore] fix: retain memory utility. by @sayakpaul in #9543
[LoRA] support Kohya Flux LoRAs that have text encoders as well by @sayakpaul in #9542
Add beta sigmas to other schedulers and update docs by @hlky in #9538
Add PAG support to StableDiffusionControlNetPAGInpaintPipeline by @juancopi81 in #8875
Support bfloat16 for Upsample2D by @darhsu in #9480
fix cogvideox autoencoder decode by @Xiang-cd in #9569
[sd3] make sure height and size are divisible by 16 by @yiyixuxu in #9573
fix xlabs FLUX lora conversion typo by @Clement-Lelievre in #9581
[Chore] add a note on the versions in Flux LoRA integration tests by @sayakpaul in #9598
fix vae dtype when accelerate config using --mixed_precision="fp16" by @xduzhangjiayu in #9601
refac: docstrings in import_utils.py by @yijun-lee in #9583
Fix for use_safetensors parameters, allow use of parameter on loading submodels by @elismasilva in #9576)
Update distributedinference.md to include `transformer.devicemap` by @sayakpaul in #9553
fix: CogVideox train dataset preprocessdata crop video by @glide-the in #9574
[LoRA] Handle DoRA better by @sayakpaul in #9547
Fixed noisepredtext referenced before assignment. by @LagPixelLOL in #9537
Fix the bug that joint_attention_kwargs is not passed to the FLUX's transformer attention processors by @HorizonWind2004 in #9517
refac/pipeline_output by @yijun-lee in #9582
[LoRA] allow loras to be loaded with lowcpumem_usage. by @sayakpaul in #9510
add PAG support for SD Img2Img by @SahilCarterr in #9463
make controlnet support interrupt by @pureexe in #9620
[LoRA] fix dora test to catch the warning properly. by @sayakpaul in #9627
flux controlnet controlguidancestart and controlguidanceend implement by @ighoshsubho in #9571
fix IsADirectoryError when running the training code for sd3dreamboothlora_16gb.ipynb by @alaister123 in #9634
Add Differential Diffusion to Kolors by @saqlain2204 in #9423
FluxMultiControlNetModel by @hlky in #9647
[CI] replace ubuntu version to 22.04. by @sayakpaul in #9656
[docs] Fix xDiT doc image damage by @Eigensystem in #9655
[Tests] increase transformers version in test_low_cpu_mem_usage_with_loading by @sayakpaul in #9662
Flux - soft inpainting via differential diffusion by @ryanlyn in #9268
CogView3Plus DiT by @zRzRzRzRzRzRzR in #9570
Improve the performance and suitable for NPU computing by @leisuzz in #9642
[Community Pipeline] Add 🪆Matryoshka Diffusion Models by @tolgacangoz in #9157
Added Lora Support to SD3 Img2Img Pipeline by @SahilCarterr in #9659
Add predoriginalsample to if not return_dict path by @hlky in #9649
Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel by @hlky in #9652
Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel by @hlky in #9651
Refactor SchedulerOutput and add predoriginalsample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 by @hlky in #9650
Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral by @hlky in #9616
[Fix] when run load pretain with localfilesonly, local variable 'cached_folder' referenced before assignment by @RobinXL in #9376
[Chore] fix import of EntryNotFoundError. by @sayakpaul in #9676
Dreambooth lora flux bug 3dtensor to 2dtensor by @0x-74 in #9653
refactor image_processor.py file by @charchit7 in #9608
[doc] Fix some docstrings in src/diffusers/training_utils.py by @mreraser in #9606
[docs] refactoring docstrings in community/hd_painter.py by @Jwaminju in #9593
[docs] refactoring docstrings in models/embeddings_flax.py by @Jwaminju in #9592
Fix some documentation in ./src/diffusers/models/adapter.py by @ahnjj in #9591
[training] CogVideoX-I2V LoRA by @a-r-r-o-w in #9482
[authored by @Anghellia) Add support of Xlabs Controlnets #9638 by @yiyixuxu in #9687
Docs: CogVideoX by @glide-the in #9578
Resolves [BUG] 'GatheredParameters' object is not callable by @charchit7 in #9614
[LoRA] log a warning when there are missing keys in the LoRA loading. by @sayakpaul in #9622
[SD3 dreambooth-lora training] small updates + bug fixes by @linoytsaban in #9682
[peft] simple update when unscale by @sweetcocoa in #9689
[pipeline] CogVideoX-Fun Control by @a-r-r-o-w in #9671
[core] improve VAE encode/decode framewise batching by @a-r-r-o-w in #9684
[tests] fix name and unskip CogI2V integration test by @a-r-r-o-w in #9683
[Flux] Add advanced training script + support textual inversion inference by @linoytsaban in #9434
[refactor] DiffusionPipeline.download by @a-r-r-o-w in #9557
[advanced flux lora script] minor updates to readme by @linoytsaban in #9705
Fix bug in Textual Inversion Unloading by @bonlime in #9304
Add prompt scheduling callback to community scripts by @hlky in #9718
[CI] pin max torch version to fix CI errors by @a-r-r-o-w in #9709
[Docker] pin torch versions in the dockerfiles. by @sayakpaul in #9721
make deps_table_update to fix CI tests by @a-r-r-o-w in #9720
[Quantization] Add quantization support for bitsandbytes by @sayakpaul in #9213
Fix typo in cogvideo pipeline by @lichenyu20 in #9722
[Docs] docs to xlabs controlnets. by @sayakpaul in #9688
[docs] add docstrings in pipline_stable_diffusion.py by @jeongiin in #9590
minor doc/test update by @yiyixuxu in #9734
[bugfix] reduce float value error when adding noise by @gameofdimension in #9004
fix singlestep dpm tests by @yiyixuxu in #9716
Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models by @tolgacangoz in #9723
Update sd3 controlnet example by @DavyMorgan in #9735
[Fix] Using sharded checkpoints with gated repositories by @asomoza in #9737
[bitsandbbytes] follow-ups by @sayakpaul in #9730
Fix typos by @DN6 in #9739
issafetensorscompatible fix by @DN6 in #9741
Release: v0.31.0 by @sayakpaul (direct commit on v0.31.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@ighoshsubho
- Feature flux controlnet img2img and inpaint pipeline (#9408)
- flux controlnet controlguidancestart and controlguidanceend implement (#9571)
@noskill
- adapt masked im2im pipeline for SDXL (#7790)
@saqlain2204
- [Tests] Reduce the model size in the lumina test (#8985)
- Add Differential Diffusion to Kolors (#9423)
@hlky
- [Schedulers] Add exponential sigmas / exponential noise schedule (#9499)
- Add Noise Schedule/Schedule Type to Schedulers Overview documentation (#9504)
- Add exponential sigmas to other schedulers and update docs (#9518)
- [Schedulers] Add beta sigmas / beta noise schedule (#9509)
- Add beta sigmas to other schedulers and update docs (#9538)
- FluxMultiControlNetModel (#9647)
- Add predoriginalsample to if not return_dict path (#9649)
- Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel (#9652)
- Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel (#9651)
- Refactor SchedulerOutput and add predoriginalsample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 (#9650)
- Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral (#9616)
- Add prompt scheduling callback to community scripts (#9718)
@yiyixuxu
- a few fix for SingleFile tests (#9522)
- update getparameterdtype (#9526)
- flux controlnet fix (control_modes batch & others) (#9507)
- [sd3] make sure height and size are divisible by 16 (#9573)
- [authored by @Anghellia) Add support of Xlabs Controlnets #9638 (#9687)
- minor doc/test update (#9734)
- fix singlestep dpm tests (#9716)
@PromeAIpro
- [examples] add train flux-controlnet scripts in example. (#9324)
@juancopi81
- Add PAG support to StableDiffusionControlNetPAGInpaintPipeline (#8875)
@glide-the
- fix: CogVideox train dataset preprocessdata crop video (#9574)
- Docs: CogVideoX (#9578)
@SahilCarterr
- add PAG support for SD Img2Img (#9463)
- Added Lora Support to SD3 Img2Img Pipeline (#9659)
@ryanlyn
- Flux - soft inpainting via differential diffusion (#9268)
@zRzRzRzRzRzRzR
- CogView3Plus DiT (#9570)
@tolgacangoz
- [Community Pipeline] Add 🪆Matryoshka Diffusion Models (#9157)
- Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models (#9723)
@linoytsaban
- [SD3 dreambooth-lora training] small updates + bug fixes (#9682)
- [Flux] Add advanced training script + support textual inversion inference (#9434)
- [advanced flux lora script] minor updates to readme (#9705)

- Python
Published by sayakpaul over 1 year ago

diffusers - v0.30.3: CogVideoX Image-to-Video and Video-to-Video

This patch release adds Diffusers support for the upcoming CogVideoX-5B-I2V release (an Image-to-Video generation model)! The model weights will be available by end of the week on the HF Hub at THUDM/CogVideoX-5b-I2V (Link). Stay tuned for the release!

This release features two new pipelines:

CogVideoXImageToVideoPipeline
CogVideoXVideoToVideoPipeline

Additionally, we now have support for tiled encoding in the CogVideoX VAE. This can be enabled by calling the vae.enable_tiling() method, and it is used in the new Video-to-Video pipeline to encode sample videos to latents in a memory-efficient manner.

CogVideoXImageToVideoPipeline

The code below demonstrates how to use the new image-to-video pipeline:

```python import torch from diffusers import CogVideoXImageToVideoPipeline from diffusers.utils import exporttovideo, load_image

pipe = CogVideoXImageToVideoPipeline.frompretrained("THUDM/CogVideoX-5b-I2V", torchdtype=torch.bfloat16) pipe.to("cuda")

Optionally, enable memory optimizations.

If enabling CPU offloading, remember to remove `pipe.to("cuda")` above

pipe.enablemodelcpuoffload() pipe.vae.enabletiling()

prompt = "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." image = loadimage( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg" ) video = pipe(image, prompt, usedynamiccfg=True) exportto_video(video.frames[0], "output.mp4", fps=8) ```

CogVideoXVideoToVideoPipeline

The code below demonstrates how to use the new video-to-video pipeline:

```python import torch from diffusers import CogVideoXDPMScheduler, CogVideoXVideoToVideoPipeline from diffusers.utils import exporttovideo, load_video

Models: "THUDM/CogVideoX-2b" or "THUDM/CogVideoX-5b"

pipe = CogVideoXVideoToVideoPipeline.frompretrained("THUDM/CogVideoX-5b-trial", torchdtype=torch.bfloat16) pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config) pipe.to("cuda")

inputvideo = loadvideo( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4" ) prompt = ( "An astronaut stands triumphantly at the peak of a towering mountain. Panorama of rugged peaks and " "valleys. Very futuristic vibe and animated aesthetic. Highlights of purple and golden colors in " "the scene. The sky is looks like an animated/cartoonish dream of galaxies, nebulae, stars, planets, " "moons, but the remainder of the scene is mostly realistic." )

video = pipe( video=inputvideo, prompt=prompt, strength=0.8, guidancescale=6, numinferencesteps=50 ).frames[0] exporttovideo(video, "output.mp4", fps=8) ```

Shoutout to @tin2tin for the awesome demonstration!

Refer to our documentation to learn more about it.

All commits

[core] Support VideoToVideo with CogVideoX by @a-r-r-o-w in #9333
[core] CogVideoX memory optimizations in VAE encode by @a-r-r-o-w in #9340
[CI] Quick fix for Cog Video Test by @DN6 in #9373
[refactor] move positional embeddings to patch embed layer for CogVideoX by @a-r-r-o-w in #9263
CogVideoX-5b-I2V support by @zRzRzRzRzRzRzR in #9418

- Python
Published by a-r-r-o-w over 1 year ago

diffusers - v0.30.2: Update from single file default repository

All commits

update runway repo for single_file by @yiyixuxu in #9323
Fix Flux CLIP prompt embeds repeat for numimagesper_prompt > 1 by @DN6 in #9280
[IP Adapter] Fix cachedir and localfiles_only for image encoder by @asomoza in #9272

- Python
Published by asomoza almost 2 years ago

diffusers - V0.30.1: CogVideoX-5B & Bug fixes

CogVideoX-5B

This patch release adds diffusers support for the upcoming CogVideoX-5B release! The model weights will be available next week on the Huggingface Hub at THUDM/CogVideoX-5b. Stay tuned for the release!

Additionally, we have implemented VAE tiling feature, which reduces the memory requirement for CogVideoX models. With this update, the total memory requirement is now 12GB for CogVideoX-2B and 21GB for CogVideoX-5B (with CPU offloading). To Enable this feature, simply call enable_tiling() on the VAE.

The code below shows how to generate a video with CogVideoX-5B

```python import torch from diffusers import CogVideoXPipeline from diffusers.utils import exporttovideo

prompt = "Tracking shot,late afternoon light casting long shadows,a cyclist in athletic gear pedaling down a scenic mountain road,winding path with trees and a lake in the background,invigorating and adventurous atmosphere."

pipe = CogVideoXPipeline.frompretrained( "THUDM/CogVideoX-5b", torchdtype=torch.bfloat16 )

pipe.enablemodelcpuoffload() pipe.vae.enabletiling()

video = pipe( prompt=prompt, numvideosperprompt=1, numinferencesteps=50, numframes=49, guidance_scale=6, ).frames[0]

exporttovideo(video, "output.mp4", fps=8) ```

https://github.com/user-attachments/assets/c2d4f7e8-ef86-4da6-8085-cb9f83f47f34

Refer to our documentation to learn more about it.

All commits

Update Video Loading/Export to use imageio by @DN6 in #9094
[refactor] CogVideoX followups + tiled decoding support by @a-r-r-o-w in #9150
Add Learned PE selection for Auraflow by @cloneofsimo in #9182
[Single File] Fix configuring scheduler via legacy kwargs by @DN6 in #9229
[Flux LoRA] support parsing alpha from a flux lora state dict. by @sayakpaul in #9236
[tests] fix broken xformers tests by @a-r-r-o-w in #9206
Cogvideox-5B Model adapter change by @zRzRzRzRzRzRzR in #9203
[Single File] Support loading Comfy UI Flux checkpoints by @DN6 in #9243

- Python
Published by yiyixuxu almost 2 years ago

diffusers - v0.30.0: New Pipelines (Flux, Stable Audio, Kolors, CogVideoX, Latte, and more), New Methods (FreeNoise, SparseCtrl), and New Refactors

New pipelines

Untitled

Image taken from the Lumina’s GitHub.

This release features many new pipelines. Below, we provide a list:

Audio pipelines 🎼

Stable Audio

Video pipelines 📹

Latte (thanks to @maxin-cn for the contribution through #8404)
CogVideoX (thanks to @zRzRzRzRzRzRzR for the contribution through #9082)

Image pipelines 🎇

Lumina (thanks to @PommesPeter for the contribution through #8652)
Kolors
AuraFlow
Flux

Be sure to check out the respective docs to know more about these pipelines. Some additional pointers are below for curious minds:

Lumina introduces a new DiT architecture that is multilingual in nature.
Kolors is inspired by SDXL and is also multilingual in nature.
Flux introduces the largest (more than 12B parameters!) open-sourced DiT variant available to date. For efficient DreamBooth + LoRA training, we recommend @bghira’s guide here.
We have worked on a guide that shows how to quantize these large pipelines for memory efficiency with optimum.quanto. Check it out here.
CogVideoX introduces a novel and truly 3D VAE into Diffusers.

Perturbed Attention Guidance (PAG)

| Without PAG | With PAG | |-------------|----------| | | |

We already had community pipelines for PAG, but given its usefulness, we decided to make it a first-class citizen of the library. We have a central usage guide for PAG here, which should be the entry point for a user interested in understanding and using PAG for their use cases. We currently support the following pipelines with PAG:

StableDiffusionPAGPipeline
StableDiffusion3PAGPipeline
StableDiffusionControlNetPAGPipeline
StableDiffusionXLPAGPipeline
StableDiffusionXLPAGImg2ImgPipeline
StableDiffusionXLPAGInpaintPipeline
StableDiffusionXLControlNetPAGPipeline
StableDiffusion3PAGPipeline
PixArtSigmaPAGPipeline
HunyuanDiTPAGPipeline
AnimateDiffPAGPipeline
KolorsPAGPipeline

If you’re interested in helping us extend our PAG support for other pipelines, please check out this thread. Special thanks to Ahn Donghoon (@sunovivid), the author of PAG, for helping us with the integration and adding PAG support to SD3.

AnimateDiff with SparseCtrl

SparseCtrl introduces methods of controllability into text-to-video diffusion models leveraging signals such as line/edge sketches, depth maps, and RGB images by incorporating an additional condition encoder, inspired by ControlNet, to process these signals in the AnimateDiff framework. It can be applied to a diverse set of applications such as interpolation or video prediction (filling in the gaps between sequence of images for animation), personalized image animation, sketch-to-video, depth-to-video, and more. It was introduced in SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models.

There are two SparseCtrl-specific checkpoints and a Motion LoRA made available by the authors namely:

Scribble Interpolation Example:

```python import torch

from diffusers import AnimateDiffSparseControlNetPipeline, AutoencoderKL, MotionAdapter, SparseControlNetModel from diffusers.schedulers import DPMSolverMultistepScheduler from diffusers.utils import exporttogif, load_image

motionadapter = MotionAdapter.frompretrained("guoyww/animatediff-motion-adapter-v1-5-3", torchdtype=torch.float16).to(device) controlnet = SparseControlNetModel.frompretrained("guoyww/animatediff-sparsectrl-scribble", torchdtype=torch.float16).to(device) vae = AutoencoderKL.frompretrained("stabilityai/sd-vae-ft-mse", torchdtype=torch.float16).to(device) pipe = AnimateDiffSparseControlNetPipeline.frompretrained( "SG161222/RealisticVisionV5.1noVAE", motionadapter=motionadapter, controlnet=controlnet, vae=vae, scheduler=scheduler, torchdtype=torch.float16, ).to(device) pipe.scheduler = DPMSolverMultistepScheduler.fromconfig(pipe.scheduler.config, betaschedule="linear", algorithmtype="dpmsolver++", usekarrassigmas=True) pipe.loadloraweights("guoyww/animatediff-motion-lora-v1-5-3", adaptername="motionlora") pipe.fuselora(lora_scale=1.0)

prompt = "an aerial view of a cyberpunk city, night time, neon lights, masterpiece, high quality" negative_prompt = "low quality, worst quality, letterboxed"

imagefiles = [ "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png", "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png", "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png" ] conditionframeindices = [0, 8, 15] conditioningframes = [loadimage(imgfile) for imgfile in imagefiles]

video = pipe( prompt=prompt, negativeprompt=negativeprompt, numinferencesteps=25, conditioningframes=conditioningframes, controlnetconditioningscale=1.0, controlnetframeindices=conditionframeindices, generator=torch.Generator().manualseed(1337), ).frames[0] exportto_gif(video, "output.gif") ```

📜 Check out the docs here.

FreeNoise for AnimateDiff

FreeNoise is a training-free method that allows extending the generative capabilities of pretrained video diffusion models beyond their existing context/frame limits.

Instead of initializing noises for all frames, FreeNoise reschedules a sequence of noises for long-range correlation and performs temporal attention over them using a window-based function. We have added FreeNoise to the AnimateDiff family of models in Diffusers, allowing them to generate videos beyond their default 32 frame limit.

```python import torch from diffusers import AnimateDiffPipeline, MotionAdapter, EulerAncestralDiscreteScheduler from diffusers.utils import exporttogif

adapter = MotionAdapter.frompretrained("guoyww/animatediff-motion-adapter-v1-5-2", torchdtype=torch.float16) pipe = AnimateDiffPipeline.frompretrained("SG161222/RealisticVisionV6.0B1noVAE", motionadapter=adapter, torchdtype=torch.float16) pipe.scheduler = EulerAncestralDiscreteScheduler( betaschedule="linear", betastart=0.00085, betaend=0.012, )

pipe.enablefreenoise() pipe.vae.enable_slicing()

pipe.enablemodelcpuoffload() frames = pipe( "An astronaut riding a horse on Mars.", numframes=64, numinferencesteps=20, guidancescale=7.0, decodechunk_size=2, ).frames[0]

exporttogif(frames, "freenoise-64.gif") ```

LoRA refactor

We have significantly refactored the loader classes associated with LoRA. Going forward, this will help in adding LoRA support for new pipelines and models. We now have a LoraBaseMixin class which is subclassed by the different pipeline-level LoRA loading classes such as StableDiffusionXLLoraLoaderMixin. This document provides an overview of the available classes.

Additionally, we have increased the coverage of methods within the PeftAdapterMixin class. This refactoring allows all the supported models to share common LoRA functionalities such set_adapter(), add_adapter(), and so on.

To learn more details, please follow this PR. If you see any LoRA-related issues stemming from these refactors, please open an issue.

🚨 Fixing attention projection fusion

We discovered that the implementation of fuse_qkv_projections() was broken. This was fixed in this PR. Additionally, this PR added the fusion support to AuraFlow and PixArt Sigma. A reasoning as to where this kind of fusion might be useful is available here.

All commits

[Release notification] add some info when there is an error. by @sayakpaul in #8718
Modify FlowMatch Scale Noise by @asomoza in #8678
Fix json WindowsPath crash by @vincedovy in #8662
Motion Model / Adapter versatility by @Arlaz in #8301
[Chore] perform better deprecation for vqmodeloutput by @sayakpaul in #8719
[Advanced dreambooth lora] adjustments to align with canonical script by @linoytsaban in #8406
[Tests] Fix precision related issues in slow pipeline tests by @DN6 in #8720
fix: ValueError when using FromOriginalModelMixin in subclasses #8440 by @fkcptlst in #8454
[Community pipeline] SD3 Differential Diffusion Img2Img Pipeline by @asomoza in #8679
Benchmarking workflow fix by @sayakpaul in #8389
add PAG support for SD architecture by @shauray8 in #8725
shift cache in benchmarking. by @sayakpaul in #8740
[traincontrolnetsdxl.py] Fix the LR schedulers when numtrainepochs is passed in a distributed training env by @Bhavay-2001 in #8476
fix the LR schedulers for dreambooth_lora by @WenheLI in #8510
[Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support by @gnobitab in #8747
Always raise from previous error by @Wauplin in #8751
[doc] add a tip about using SDXL refiner with hunyuan-dit and pixart by @yiyixuxu in #8735
Remove legacy single file model loading mixins by @DN6 in #8754
Allow from_transformer in SD3ControlNetModel by @haofanwang in #8749
[SD3 LoRA Training] Fix errors when not training text encoders by @asomoza in #8743
[Tests] add test suite for SD3 DreamBooth by @sayakpaul in #8650
[hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding by @yiyixuxu in #8761
Enforce ordering when running Pipeline slow tests by @DN6 in #8763
Fix warning in UNetMotionModel by @DN6 in #8756
Fix indent in dreambooth lora advanced SD 15 script by @DN6 in #8753
Fix mistake in Single File Docs page by @DN6 in #8765
Reflect few contributions on philosophy.md that were not reflected on #8294 by @mreraser in #8690
correct attention_head_dim for JointTransformerBlock by @yiyixuxu in #8608
[LoRA] introduce LoraBaseMixin to promote reusability. by @sayakpaul in #8670
Revert "[LoRA] introduce LoraBaseMixin to promote reusability." by @sayakpaul in #8773
Allow SD3 DreamBooth LoRA fine-tuning on a free-tier Colab by @sayakpaul in #8762
Update README.md to include Colab link by @sayakpaul in #8775
[Chore] add dummy lora attention processors to prevent failures in other libs by @sayakpaul in #8777
[advanced dreambooth lora] add clip_skip arg by @linoytsaban in #8715
[Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet by @gnobitab in #8783
Fix minor bug in SD3 img2img test by @a-r-r-o-w in #8779
[Tests] fix sharding tests by @sayakpaul in #8764
Add vae_roundtrip.py example by @thomaseding in #7104
[Single File] Allow loading T5 encoder in mixed precision by @DN6 in #8778
Fix saving text encoder weights and kohya weights in advanced dreambooth lora script by @DN6 in #8766
Improve model card for push_to_hub trainers by @apolinario in #8697
fix loading sharded checkpoints from subfolder by @yiyixuxu in #8798
[Alpha-VLLM Team] Add Lumina-T2X to diffusers by @PommesPeter in #8652
Fix static typing and doc typos by @zhuoqun-chen in #8807
Remove unnecessary lines by @tolgacangoz in #8569
Add pipelinestablediffusion3inpaint.py for SD3 Inference by @IrohXu in #8709
[Tests] fix more sharding tests by @sayakpaul in #8797
Reformat docstring for get_timestep_embedding by @alanhdu in #8811
Latte: Latent Diffusion Transformer for Video Generation by @maxin-cn in #8404
[Core] Add Kolors by @asomoza in #8812
[Core] Add AuraFlow by @sayakpaul in #8796
Add VAE tiling option for SD3 by @DN6 in #8791
Add single file loading support for AnimateDiff by @DN6 in #8819
[Docs] add AuraFlow docs by @sayakpaul in #8851
[Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU by @ustcuna in #8643
add PAG support sd15 controlnet by @tuanh123789 in #8820
[tests] fix typo in pag tests by @a-r-r-o-w in #8845
[Docker] include python3.10 dev and solve header missing problem by @sayakpaul in #8865
[Cont'd] Add the SDE variant of ~~DPM-Solver~~ and DPM-Solver++ to DPM Single Step by @tolgacangoz in #8269
modify pocs. by @sayakpaul in #8867
[Core] fix: shard loading and saving when variant is provided. by @sayakpaul in #8869
[Chore] allow auraflow latest to be torch compile compatible. by @sayakpaul in #8859
Add AuraFlowPipeline and KolorsPipeline to auto map by @Beinsezii in #8849
Fix multi-gpu case for train_cm_ct_unconditional.py by @tolgacangoz in #8653
[docs] pipeline docs for latte by @a-r-r-o-w in #8844
[Chore] add disable forward chunking to SD3 transformer. by @sayakpaul in #8838
[Core] remove resume_download from Hub related stuff by @sayakpaul in #8648
Add option to SSH into CPU runner. by @DN6 in #8884
SSH into cpu runner fix by @DN6 in #8888
SSH into cpu runner additional fix by @DN6 in #8893
[SDXL] Fix uncaught error with image to image by @asomoza in #8856
fix loop bug in SlicedAttnProcessor by @shinetzh in #8836
[fix code annotation] Adjust the dimensions of the rotary positional embedding. by @wangqixun in #8890
allow tensors in several schedulers step() call by @catwell in #8905
Use modelinfo.id instead of modelinfo.modelId by @Wauplin in #8912
[Training] SD3 training fixes by @sayakpaul in #8917
🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) by @Snailpong in #8804
[Docs] small fixes to pag guide. by @sayakpaul in #8920
Reflect few contributions on ethical_guidelines.md that were not reflected on #8294 by @mreraser in #8914
[Tests] proper skipping of request caching test by @sayakpaul in #8908
Add attentionless VAE support by @Gothos in #8769
[Benchmarking] check if runner helps to restore benchmarking by @sayakpaul in #8929
Update pipeline test fetcher by @DN6 in #8931
[Tests] reduce the model size in the audioldm2 fast test by @ariG23498 in #7846
fix: checkpoint save issue in advanced dreambooth lora sdxl script by @akbaig in #8926
[Tests] Improve transformers model test suite coverage - Temporal Transformer by @rootonchair in #8932
Fix Colab and Notebook checks for diffusers-cli env by @tolgacangoz in #8408
Fix name when saving text inversion embeddings in dreambooth advanced scripts by @DN6 in #8927
[Core] fix QKV fusion for attention by @sayakpaul in #8829
remove residual i from auraflow. by @sayakpaul in #8949
[CI] Skip flaky download tests in PR CI by @DN6 in #8945
[AuraFlow] fix long prompt handling by @sayakpaul in #8937
Added Code for Gradient Accumulation to work for basic_training by @RandomGamingDev in #8961
[AudioLDM2] Fix cache pos for GPT-2 generation by @sanchit-gandhi in #8964
[Tests] fix slices of 26 tests (first half) by @sayakpaul in #8959
[CI] Slow Test Updates by @DN6 in #8870
[tests] speed up animatediff tests by @a-r-r-o-w in #8846
[LoRA] introduce LoraBaseMixin to promote reusability. by @sayakpaul in #8774
Update TensorRT img2img community pipeline by @asfiyab-nvidia in #8899
Enable CivitAI SDXL Inpainting Models Conversion by @mazharosama in #8795
Revert "[LoRA] introduce LoraBaseMixin to promote reusability." by @yiyixuxu in #8976
fix guidance_scale value not equal to the value in comments by @efwfe in #8941
[Chore] remove all is from auraflow. by @sayakpaul in #8980
[Chore] add LoraLoaderMixin to the inits by @sayakpaul in #8981
Added accelerator based gradient accumulation for basic_example by @RandomGamingDev in #8966
[CI] Fix parallelism in nightly tests by @DN6 in #8983
[CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix by @DN6 in #8986
[fix] FreeInit step index out of bounds by @a-r-r-o-w in #8969
[core] AnimateDiff SparseCtrl by @a-r-r-o-w in #8897
remove unused code from pag attn procs by @a-r-r-o-w in #8928
[Kolors] Add IP Adapter by @asomoza in #8901
[CI] Update runner configuration for setup and nightly tests by @XciD in #9005
[Docs] credit where it's due for Lumina and Latte. by @sayakpaul in #9000
handle lora scale and clip skip in lpw sd and sdxl community pipelines by @noskill in #8988
[LoRA] fix: animate diff lora stuff. by @sayakpaul in #8995
Stable Audio integration by @ylacombe in #8716
[core] Move community AnimateDiff ControlNet to core by @a-r-r-o-w in #8972
Fix Stable Audio repository id by @ylacombe in #9016
PAG variant for AnimateDiff by @a-r-r-o-w in #8789
Updates deps for pipeline test fetcher by @DN6 in #9033
fix load sharded checkpoint from a subfolder (local path) by @yiyixuxu in #8913
[docs] fix pia example by @a-r-r-o-w in #9015
Flux pipeline by @sayakpaul in #9043
[Core] Add PAG support for PixArtSigma by @sayakpaul in #8921
[Flux] allow tests to run by @sayakpaul in #9050
Fix Nightly Deps by @DN6 in #9036
Update transformer_flux.py by @haofanwang in #9060
Errata: Fix typos & \s+$ by @tolgacangoz in #9008
[refactor] create modeling blocks specific to AnimateDiff by @a-r-r-o-w in #8979
Fix grammar mistake. by @prideout in #9072
[Flux] minor documentation fixes for flux. by @sayakpaul in #9048
Update TensorRT txt2img and inpaint community pipelines by @asfiyab-nvidia in #9037
type get_attention_scores as optional in get_attention_scores by @psychedelicious in #9075
[refactor] apply qk norm in attention processors by @a-r-r-o-w in #9071
[FLUX] support LoRA by @sayakpaul in #9057
[Tests] Improve transformers model test suite coverage - Latte by @rootonchair in #8919
PAG variant for HunyuanDiT, PAG refactor by @a-r-r-o-w in #8936
[Docs] add stable cascade unet doc. by @sayakpaul in #9066
add sentencepiece as a soft dependency by @yiyixuxu in #9065
Fix typos by @omahs in #9077
Update CLIPFeatureExtractor to CLIPImageProcessor and DPTFeatureExtractor to DPTImageProcessor by @tolgacangoz in #9002
[Core] add QKV fusion to AuraFlow and PixArt Sigma by @sayakpaul in #8952
[bug] remove unreachable normtype=adanorm_continuous from norm3 initialization conditions by @a-r-r-o-w in #9006
[Tests] Improve transformers model test suite coverage - Hunyuan DiT by @rootonchair in #8916
update by @DN6 (direct commit on v0.30.0-release)
[Docs] Add community projects section to docs by @DN6 in #9013
add PAG support for Stable Diffusion 3 by @sunovivid in #8861
Fix loading sharded checkpoints when we have variants by @SunMarc in #9061
[Single File] Add single file support for Flux Transformer by @DN6 in #9083
[Kolors] Add PAG by @asomoza in #8934
fix traindreamboothlora_sd3.py loading hook by @sayakpaul in #9107
[core] FreeNoise by @a-r-r-o-w in #8948
Flux fp16 inference fix by @latentCall145 in #9097
[feat] allow sparsectrl to be loaded from single file by @a-r-r-o-w in #9073
Freenoise change vae_batch_size to decode_chunk_size by @DN6 in #9110
Add CogVideoX text-to-video generation model by @zRzRzRzRzRzRzR in #9082
Release: v0.30.0 by @sayakpaul (direct commit on v0.30.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@DN6
- [Tests] Fix precision related issues in slow pipeline tests (#8720)
- Remove legacy single file model loading mixins (#8754)
- Enforce ordering when running Pipeline slow tests (#8763)
- Fix warning in UNetMotionModel (#8756)
- Fix indent in dreambooth lora advanced SD 15 script (#8753)
- Fix mistake in Single File Docs page (#8765)
- [Single File] Allow loading T5 encoder in mixed precision (#8778)
- Fix saving text encoder weights and kohya weights in advanced dreambooth lora script (#8766)
- Add VAE tiling option for SD3 (#8791)
- Add single file loading support for AnimateDiff (#8819)
- Add option to SSH into CPU runner. (#8884)
- SSH into cpu runner fix (#8888)
- SSH into cpu runner additional fix (#8893)
- Update pipeline test fetcher (#8931)
- Fix name when saving text inversion embeddings in dreambooth advanced scripts (#8927)
- [CI] Skip flaky download tests in PR CI (#8945)
- [CI] Slow Test Updates (#8870)
- [CI] Fix parallelism in nightly tests (#8983)
- [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix (#8986)
- Updates deps for pipeline test fetcher (#9033)
- Fix Nightly Deps (#9036)
- update
- [Docs] Add community projects section to docs (#9013)
- [Single File] Add single file support for Flux Transformer (#9083)
- Freenoise change vae_batch_size to decode_chunk_size (#9110)
@shauray8
- add PAG support for SD architecture (#8725)
@gnobitab
- [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support (#8747)
- [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet (#8783)
@yiyixuxu
- [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart (#8735)
- [hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding (#8761)
- correct attention_head_dim for JointTransformerBlock (#8608)
- fix loading sharded checkpoints from subfolder (#8798)
- Revert "[LoRA] introduce LoraBaseMixin to promote reusability." (#8976)
- fix load sharded checkpoint from a subfolder (local path) (#8913)
- add sentencepiece as a soft dependency (#9065)
@PommesPeter
- [Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652)
@IrohXu
- Add pipelinestablediffusion3inpaint.py for SD3 Inference (#8709)
@maxin-cn
- Latte: Latent Diffusion Transformer for Video Generation (#8404)
@ustcuna
- [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU (#8643)
@tuanh123789
- add PAG support sd15 controlnet (#8820)
@Snailpong
- 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) (#8804)
@asfiyab-nvidia
- Update TensorRT img2img community pipeline (#8899)
- Update TensorRT txt2img and inpaint community pipelines (#9037)
@ylacombe
- Stable Audio integration (#8716)
- Fix Stable Audio repository id (#9016)
@sunovivid
- add PAG support for Stable Diffusion 3 (#8861)
@zRzRzRzRzRzRzR
- Add CogVideoX text-to-video generation model (#9082)

- Python
Published by sayakpaul almost 2 years ago

diffusers - v0.29.2: fix deprecation and LoRA bugs 🐞

All commits

[SD3] Fix mis-matched shape when numimagesperprompt > 1 using without T5 (textencoder_3=None) by @Dalanke in #8558
[LoRA] refactor lora conversion utility. by @sayakpaul in #8295
[LoRA] fix conversion utility so that lora dora loads correctly by @sayakpaul in #8688
[Chore] remove deprecation from transformer2d regarding the output class. by @sayakpaul in #8698
[LoRA] fix vanilla fine-tuned lora loading. by @sayakpaul in #8691
Release: v0.29.2 by @sayakpaul (direct commit on v0.29.2-patch)

- Python
Published by sayakpaul almost 2 years ago

diffusers - v0.29.1: SD3 ControlNet, Expanded SD3 `from_single_file` support, Using long Prompts with T5 Text Encoder & Bug fixes

SD3 CntrolNet

```python import torch from diffusers import StableDiffusion3ControlNetPipeline from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel from diffusers.utils import load_image

controlnet = SD3ControlNetModel.frompretrained("InstantX/SD3-Controlnet-Canny", torchdtype=torch.float16)

pipe = StableDiffusion3ControlNetPipeline.frompretrained( "stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet, torchdtype=torch.float16 ) pipe.to("cuda") controlimage = loadimage("https://huggingface.co/InstantX/SD3-Controlnet-Canny/resolve/main/canny.jpg") prompt = "A girl holding a sign that says InstantX" image = pipe(prompt, controlimage=controlimage, controlnetconditioningscale=0.7).images[0] image.save("sd3.png") ``` 📜 Refer to the official docs here to learn more about it.

Thanks to @haofanwang @wangqixun from the @ResearcherXman team for contributing this pipeline!

Expanded single file support

We now support all available single-file checkpoints for sd3 in diffusers! To load the single file checkpoint with t5

```python import torch from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.fromsinglefile( "https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3mediuminclclipst5xxlfp8.safetensors", torchdtype=torch.float16, ) pipe.enablemodelcpuoffload()

image = pipe("a picture of a cat holding a sign that says hello world").images[0] image.save('sd3-single-file-t5-fp8.png') ```

Using Long Prompts with the T5 Text Encoder

We increased the default sequence length for the T5 Text Encoder from a maximum of 77 to 256! It can be adjusted to accept fewer or more tokens by setting the max_sequence_length to a maximum of 512. Keep in mind that longer sequences require additional resources and will result in longer generation times. This effect is particularly noticeable during batch inference.

```python prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus. This imaginative creature features the distinctive, bulky body of a hippo, but with a texture and appearance resembling a golden-brown, crispy waffle. The creature might have elements like waffle squares across its skin and a syrup-like sheen. It’s set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting, possibly including oversized utensils or plates in the background. The image should evoke a sense of playful absurdity and culinary fantasy."

image = pipe( prompt=prompt, negativeprompt="", numinferencesteps=28, guidancescale=4.5, maxsequencelength=512, ).images[0] ```

|Before|maxsequencelength=256|maxsequencelength=512 |---|---|---| | 20240612204503_2888268196 | 20240612204440_2888268196 | 20240613195139_569754043

All commits

Release: v0.29.0 by @sayakpaul (direct commit on v0.29.1-patch)
prepare for patch release by @yiyixuxu (direct commit on v0.29.1-patch)
fix warning log for Transformer SD3 by @sayakpaul in #8496
Add SD3 AutoPipeline mappings by @Beinsezii in #8489
Add Hunyuan AutoPipe mapping by @Beinsezii in #8505
Expand Single File support in SD3 Pipeline by @DN6 in #8517
[Single File Loading] Handle unexpected keys in CLIP models when accelerate isn't installed. by @DN6 in #8462
Fix sharding when no device_map is passed by @SunMarc in #8531
[SD3 Inference] T5 Token limit by @asomoza in #8506
Fix gradient checkpointing issue for Stable Diffusion 3 by @Carolinabanana in #8542
Support SD3 ControlNet and Multi-ControlNet. by @wangqixun in #8566
fix fromsinglefile for checkpoints with t5 by @yiyixuxu in #8631
[SD3] Fix mis-matched shape when numimagesperprompt > 1 using without T5 (textencoder_3=None) by @Dalanke in #8558

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@wangqixun
- Support SD3 ControlNet and Multi-ControlNet. (#8566)

- Python
Published by yiyixuxu almost 2 years ago

diffusers - v0.29.0: Stable Diffusion 3

This release emphasizes Stable Diffusion 3, Stability AI’s latest iteration of the Stable Diffusion family of models. It was introduced in Scaling Rectified Flow Transformers for High-Resolution Image Synthesis by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach.

As the model is gated, before using it with diffusers, you first need to go to the Stable Diffusion 3 Medium Hugging Face page, fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate.

bash huggingface-cli login

The code below shows how to perform text-to-image generation with SD3:

```python import torch from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.frompretrained("stabilityai/stable-diffusion-3-medium-diffusers", torchdtype=torch.float16) pipe = pipe.to("cuda")

image = pipe( "A cat holding a sign that says hello world", negativeprompt="", numinferencesteps=28, guidancescale=7.0, ).images[0] image ```

Refer to our documentation for learning all the optimizations you can apply to SD3 as well as the image-to-image pipeline.

Additionally, we support DreamBooth + LoRA fine-tuning of Stable Diffusion 3 through rectified flow. Check out this directory for more details.

- Python
Published by sayakpaul almost 2 years ago

diffusers - v0.28.2: fix `from_single_file` clip model checkpoint key error 🐞

Change checkpoint key used to identify CLIP models in single file checkpoints by @DN6 in #8319

- Python
Published by yiyixuxu almost 2 years ago

diffusers - v0.28.1: HunyuanDiT andTransformer2D model class variants

This patch release primarily introduces the Hunyuan DiT pipeline from the Tencent team.

Hunyuan DiT

Hunyuan DiT is a transformer-based diffusion pipeline, introduced in the Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding paper by the Tencent Hunyuan.

```python import torch from diffusers import HunyuanDiTPipeline

pipe = HunyuanDiTPipeline.frompretrained( "Tencent-Hunyuan/HunyuanDiT-Diffusers", torchdtype=torch.float16 ) pipe.to("cuda")

You may also use English prompt as HunyuanDiT supports both English and Chinese

prompt = "An astronaut riding a horse"

prompt = "一个宇航员在骑马" image = pipe(prompt).images[0] ```

🧠 This pipeline has support for multi-linguality.

📜 Refer to the official docs here to learn more about it.

Thanks to @gnobitab, for contributing Hunyuan DiT in #8240.

All commits

Release: v0.28.0 by @sayakpaul (direct commit on v0.28.1-patch)
[Core] Introduce class variants for Transformer2DModel by @sayakpaul in #7647
resolve comflicts by @toshas (direct commit on v0.28.1-patch)
Tencent Hunyuan Team: add HunyuanDiT related updates by @gnobitab in #8240
Tencent Hunyuan Team - Updated Doc for HunyuanDiT by @gnobitab in #8383
[Transformer2DModel] Handle norm_type safely while remapping by @sayakpaul in #8370
Release: v0.28.1 by @sayakpaul (direct commit on v0.28.1-patch)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@gnobitab
- Tencent Hunyuan Team: add HunyuanDiT related updates (#8240)
- Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383)

- Python
Published by sayakpaul almost 2 years ago

diffusers - v0.28.0: Marigold, PixArt Sigma, AnimateDiff SDXL, InstantStyle, VQGAN Training Script, and more

Diffusion models are known for their abilities in the space of generative modeling. This release of diffusers introduces the first official pipeline (Marigold) for discriminative tasks such as depth estimation and surface normals’ estimation!

Starting this release, we will also highlight the changes and features from the library that make it easy to integrate community checkpoints, features, and so on. Read on!

Marigold

Proposed in Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation, Marigold introduces a diffusion model and associated fine-tuning protocol for monocular depth estimation. It can also be extended to perform surface normals’ estimation.

marigold

(Image taken from the official repository)

The code snippet below shows how to use this pipeline for depth estimation:

```python import diffusers import torch

pipe = diffusers.MarigoldDepthPipeline.frompretrained( "prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torchdtype=torch.float16 ).to("cuda")

image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") depth = pipe(image)

vis = pipe.imageprocessor.visualizedepth(depth.prediction) vis[0].save("einstein_depth.png")

depth16bit = pipe.imageprocessor.exportdepthto16bitpng(depth.prediction) depth16bit[0].save("einsteindepth_16bit.png") ```

Check out the API documentation here. We also have a detailed guide about the pipeline here.

Thanks to @toshas, one of the authors of Marigold, who contributed this in #7847.

🌀 Massive Refactor of `from_single_file` 🌀

We have further refactored from_single_file to align its logic more closely to the from_pretrained method. The biggest benefit of doing this is that it allows us to expand single file loading support beyond Stable Diffusion-like pipelines and models. It also makes it easier to load models that are saved and shared in their original format.

Some of the changes introduced in this refactor:

When loading a single file checkpoint, we will attempt to use the keys present in the checkpoint to infer a model repository on the Hugging Face Hub that we can use to configure the pipeline. For example, if you are using a single file checkpoint based on SD 1.5, we would use the configuration files in the runwayml/stable-diffusion-v1-5 repository to configure the model components and pipeline.
Suppose this inferred configuration isn’t appropriate for your checkpoint. In that case, you can override it using the config argument and pass in either a path to a local model repo or a repo id on the Hugging Face Hub.

python pipe = StableDiffusionPipeline.from_single_file("...", config=<model repo id or local repo path>)

Deprecation of model configuration arguments for the from_single_file method in Pipelines such as num_in_channels, scheduler_type , image_size and upcast_attention . This is an anti-pattern that we have supported in previous versions of the library when we assumed that it would only be relevant to Stable Diffusion based models. However, given that there is a demand to support other model types, we feel it is necessary for single-file loading behavior to adhere to the conventions set in our other loading methods. Configuring individual model components through a pipeline loading method is not something we support in from_pretrained, and therefore, we will be deprecating support for this behavior in from_single_file as well.

PixArt Sigma

PixArt Simga is the successor to PixArt Alpha. PixArt Sigma is capable of directly generating images at 4K resolution. It can also produce images of markedly higher fidelity and improved alignment with text prompts. It comes with a massive sequence length of 300 (for reference, PixArt Alpha has a maximum sequence length of 120)!

(Taken from the project website.)

```python import torch from diffusers import PixArtSigmaPipeline

You can replace the checkpoint id with "PixArt-alpha/PixArt-Sigma-XL-2-512-MS" too.

pipe = PixArtSigmaPipeline.frompretrained( "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", torchdtype=torch.float16 )

Enable memory optimizations.

pipe.enablemodelcpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert." image = pipe(prompt).images[0] ```

📃 Refer to the documentation here to learn more about PixArt Sigma.

Thanks to @lawrence-cj, one of the authors of PixArt Sigma, who contributed this in #7857.

AnimateDiff SDXL

@a-r-r-o-w contributed the Stable Diffusion XL (SDXL) version of AnimateDiff in #6721. However, note that this is currently an experimental feature, as only a beta release of the motion adapter checkpoint is available.

```python import torch from diffusers.models import MotionAdapter from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler from diffusers.utils import exporttogif

adapter = MotionAdapter.frompretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torchdtype=torch.float16)

modelid = "stabilityai/stable-diffusion-xl-base-1.0" scheduler = DDIMScheduler.frompretrained( modelid, subfolder="scheduler", clipsample=False, betaschedule="linear", stepsoffset=1, ) pipe = AnimateDiffSDXLPipeline.frompretrained( modelid, motionadapter=adapter, scheduler=scheduler, torchdtype=torch.float16, variant="fp16", ).enablemodelcpu_offload()

enable memory savings

pipe.enablevaeslicing() pipe.enablevaetiling()

output = pipe( prompt="a panda surfing in the ocean, realistic, high quality", negativeprompt="low quality, worst quality", numinferencesteps=20, guidancescale=8, width=1024, height=1024, num_frames=16, )

frames = output.frames[0] exporttogif(frames, "animation.gif") ```

📜 Refer to the documentation to learn more.

Block-wise LoRA

@UmerHA contributed the support to control the scales of different LoRA blocks in a granular manner in #7352. Depending on the LoRA checkpoint one is using, this granular control can significantly impact the quality of the generated outputs. Following code block shows how this feature can be used while performing inference:

```python ...

adapterweightscales = { "unet": { "down": 0, "mid": 1, "up": 0} } pipe.setadapters("pixel", adapterweightscales) image = pipe( prompt, numinferencesteps=30, generator=torch.manualseed(0) ).images[0] ```

✍️ Refer to our documentation for more details and a full-fledged example.

InstantStyle

More granular control of scale could be extended to IP-Adapters too. @DannHuang contributed to the support of InstantStyle, aka granular control of IP-Adapter scales, in #7668. The following code block shows how this feature could be used when performing inference with IP-Adapters:

```python ...

scale = { "down": {"block2": [0.0, 1.0]}, "up": {"block0": [0.0, 1.0, 0.0]}, } pipeline.setipadapter_scale(scale) ```

This way, one can generate images following only the style or layout from the image prompt, with significantly improved diversity. This is achieved by only activating IP-Adapters to specific parts of the model.

Check out the documentation here.

ControlNetXS

ControlNet-XS was introduced in ControlNet-XS by Denis Zavadski and Carsten Rother. Based on the observation, the control model in the original ControlNet can be made much smaller and still produce good results. ControlNet-XS generates images comparable to a regular ControlNet, but it is 20-25% faster (see benchmark with StableDiffusion-XL) and uses ~45% less memory.

ControlNet-XS is supported for both Stable Diffusion and Stable Diffusion XL.

Thanks to @UmerHA for contributing ControlNet-XS in #5827 and #6772.

Custom Timesteps

We introduced custom timesteps support for some of our pipelines and schedulers. You can now set your scheduler with a list of arbitrary timesteps. For example, you can use the AYS timesteps schedule to achieve very nice results with only 10 denoising steps.

```python from diffusers.schedulers import AysSchedules samplingschedule = AysSchedules["StableDiffusionXLTimesteps"] pipe = StableDiffusionXLPipeline.frompretrained( "SG161222/RealVisXLV4.0", torchdtype=torch.float16, variant="fp16", ).to("cuda")

pipe.scheduler = DPMSolverMultistepScheduler.fromconfig(pipe.scheduler.config, algorithmtype="sde-dpmsolver++") prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" image = pipe(prompt=prompt, timesteps=sampling_schedule).images[0] ```

Check out the documentation here

`device_map` in Pipelines 🧪

We have introduced experimental support for device_map in our pipelines. This feature becomes relevant when you have multiple accelerators to distribute the components of a pipeline. Currently, we support only “balanced” device_map. However, we plan to support other device mapping strategies relevant to diffusion models in the future.

```python from diffusers import DiffusionPipeline import torch

pipeline = DiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", torchdtype=torch.float16, device_map="balanced" ) image = pipeline("a dog").images[0] ```

In cases where you might be limited to low VRAM accelerators, you can use device_map to benefit from them. Below, we simulate a situation where we have access to two GPUs, each having only a GB of VRAM (through the max_memory argument).

```python from diffusers import DiffusionPipeline import torch

maxmemory = {0:"1GB", 1:"1GB"} pipeline = DiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", torchdtype=torch.float16, usesafetensors=True, devicemap="balanced", maxmemory=max_memory ) image = pipeline("a dog").images[0] ```

📜 Refer to the documentation to learn more about it.

VQGAN Training Script 📈

VQGAN, proposed in Taming Transformers for High-Resolution Image Synthesis, is a crucial component in the modern generative image modeling toolbox. Once it is trained, its encoder can be leveraged to compute general-purpose tokens from input images.

Thanks to @isamu-isozaki, who contributed a script and related utilities to train VQGANs in #5483. For details, refer to the official training directory.

`VideoProcessor` Class

Similar to the VaeImageProcessor class, we have introduced a VideoProcessor to help make the preprocessing and postprocessing of videos easier and a little more streamlined across the pipelines. Refer to the documentation to learn more.

New Guides 📑

Starting with this release, we provide guides and tutorials to help users get started with some of the most frequently used tasks in image and video generation. For this release, we have a series of three guides about outpainting with different techniques:

ControlNet Outpainting: Learn how to do outpainting with a specific ControlNet model trained for this task. This method is best for creative outpainting.
Differential Diffusion Outpainting: Use a novel framework that enables customization of the amount of change per pixel or per image region, allowing seamless outpainting. This can be used for expanding images beyond their initial size.
Outpainting using an Inpaint Model: Using various techniques, learn how to use a regular inpainting model to do outpainting while preserving the original subject intact. This is ideal for product catalogs.

Official Callbacks

We introduced official callbacks that you can conveniently plug into your pipeline. For example, to turn off classifier-free guidance after denoising steps with SDXLCFGCutoffCallback.

```python import torch from diffusers import DiffusionPipeline from diffusers.callbacks import SDXLCFGCutoffCallback

callback = SDXLCFGCutoffCallback(cutoffstepratio=0.4) pipeline = StableDiffusionXLPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-1.0", torchdtype=torch.float16, variant="fp16", ).to("cuda") prompt = "a sports car at the road, best quality, high quality, high detail, 8k resolution" out = pipeline( prompt=prompt, numinferencesteps=25, callbackonstep_end=callback, ) ```

Community Pipelines and `from_pipe` API

Starting with this release note, we will highlight the new community pipelines! More and more of our pipelines were added as community pipelines first and graduated as official pipelines once people started to use them a lot! We do not require community pipelines to follow diffusers’ coding style, so it is the easiest way to contribute to diffusers 😊

We also introduced a from_pipe API that’s very useful for the community pipelines that share checkpoints with our official pipelines and improve generation quality in some way:) You can use from_pipe(...) to load many community pipelines without additional memory requirements. With this API, you can easily switch between different pipelines to apply different techniques.

Read more about from_pipe API in our documentation 📃.

Here are four new community pipelines since our last release.

BoxDiff

BoxDiff lets you use bounding box coordinates for a more controlled generation. Here is an example of how you can apply this technique on a stable diffusion pipeline you had created (i.e. pipe_sd in the below example)

```python pipebox = DiffusionPipeline.frompipe( pipesd, custompipeline="pipelinestablediffusionboxdiff", ) pipebox.enablemodelcpu_offload() phrases = ["aurora","reindeer","meadow","lake","mountain"] boxes = [[1,3,512,202], [75,344,421,495], [1,327,508,507], [2,217,507,341], [1,135,509,242]] boxes = [[x / 512 for x in box] for box in boxes]

generator = torch.Generator(device="cpu").manualseed(42) images = pipebox( prompt, boxdiffphrases=phrases, boxdiffboxes=boxes, boxdiffkwargs={ "attentionres": 16, "normalizeeot": True }, numinference_steps=50, generator=generator, ).images ```

Check out this community pipeline here

HD-Painter

HD-Painter can enhance inpainting pipelines with improved prompt faithfulness and generate higher resolution (up to 2k). You can switch from BoxDiff to HD-Painter like this

```python pipe = DiffusionPipeline.frompipe( pipebox, custompipeline="hdpainter" ) pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)

prompt = "wooden boat" initimage = loadimage("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/assets/samples/images/2.jpg") maskimage = loadimage("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/assets/samples/masks/2.png")

image = pipe (prompt, initimage, maskimage, userasg = True, usepainta = True, generator=torch.manual_seed(12345)).images[0] ```

Check out this community pipeline here

Differential Diffusion

Differential Diffusion enables customization of the amount of change per pixel or per image region. It’s very effective in inpainting and outpainting.

```python pipeline = DiffusionPipeline.frompipe( pipesdxl, custompipeline="pipelinestablediffusionxldifferentialimg2img", ).to("cuda") pipeline.scheduler = DPMSolverMultistepScheduler.fromconfig(pipeline.scheduler.config, usekarras_sigmas=True)

prompt = "a green pear" negative_prompt = "blurry"

image = pipeline( prompt=prompt, negativeprompt=negativeprompt, guidancescale=7.5, numinferencesteps=25, originalimage=image, image=image, strength=1.0, map=mask, ).images[0] ```

Check out this community pipeline here.

FRESCO

FRESCO aka FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation enables zero-shot video-to-video translation. Learn more about it from here.

All Commits

clean dep installation step in push_tests by @sayakpaul in #7382
[LoRA test suite] refactor the test suite and cleanse it by @sayakpaul in #7316
[Custom Pipelines with Custom Components] fix multiple things by @sayakpaul in #7304
Fix typos by @standardAI in #7411
fix: enable unet3dcondition to support timecondproj_dim by @yhZhai in #7364
add: space within docs to calculate mememory usage. by @sayakpaul (direct commit on v0.28.0-release)
Revert "add: space within docs to calculate mememory usage." by @sayakpaul (direct commit on v0.28.0-release)
[Docs] add missing output image by @sayakpaul in #7425
add a "Community Scripts" section by @yiyixuxu in #7358
add: space for calculating memory usagee. by @sayakpaul in #7414
[refactor] Fix FreeInit behaviour by @a-r-r-o-w in #7410
Remove distutils by @sayakpaul in #7455
[IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline by @standardAI in #7262
[Research Projects] ORPO diffusion for alignment by @sayakpaul in #7423
Additional Memory clean up for slow tests by @DN6 in #7436
Fix for str_to_bool definition in testing utils by @DN6 in #7461
[Docs] Fix typos by @standardAI in #7451
Fixed minor error in test_lora_layers_peft.py by @UmerHA in #7394
Small ldm3d fix by @estelleafl in #7464
[tests] skip dynamo tests when python is 3.12. by @sayakpaul in #7458
feat: support DoRA LoRA from community by @sayakpaul in #7371
Fix broken link by @salcc in #7472
Update traindreamboothlorasd15advanced.py by @ernestchu in #7433
[Training utils] add kohya conversion dict. by @sayakpaul in #7435
Fix Tiling in ConsistencyDecoderVAE by @standardAI in #7290
diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs by @bghira in #7446
Fix missing raise statements in check_inputs by @TonyLianLong in #7473
Add device arg to offloading with combined pipelines by @Disty0 in #7471
fix torch.compile for multi-controlnet of sdxl inpaint by @yiyixuxu in #7476
[chore] make the istructions on fetching all commits clearer. by @sayakpaul in #7474
Skip test_lora_fuse_nan on mps by @UmerHA in #7481
[Chore] Fix Colab notebook links in README.md by @thliang01 in #7495
[Modeling utils chore] import loadmodeldictintometa only once by @sayakpaul in #7437
Improve nightly tests by @sayakpaul in #7385
add: a helpful message when quality and repo consistency checks fail. by @sayakpaul in #7475
apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) by @bghira in #7447
cpu_offload: remove all hooks before offload by @yiyixuxu in #7448
Bug fix for controlnetpipeline check_image by @Fantast616 in #7103
fix OOM for testvaetiling by @yiyixuxu in #7510
[Tests] Speed up some fast pipeline tests by @sayakpaul in #7477
Memory clean up on all Slow Tests by @DN6 in #7514
Implements Blockwise lora by @UmerHA in #7352
Quick-Fix for #7352 block-lora by @UmerHA in #7523
add Instant id sdxl image2image pipeline by @linoytsaban in #7507
Perturbed-Attention Guidance by @HyoungwonCho in #7512
Add final_sigma_zero to UniPCMultistep by @Beinsezii in #7517
Fix IP Adapter Support for SAG Pipeline by @Stepheni12 in #7260
[Community pipeline] Marigold depth estimation update -- align with marigold v0.1.5 by @markkua in #7524
Fix typo in CPU offload test by @DN6 in #7542
Fix SVD bug (shape of time_context) by @KimbingNg in #7268
fix the cpu offload tests by @yiyixuxu in #7544
add HD-Painter pipeline by @haikmanukyan in #7520
add a from_pipe method to DiffusionPipeline by @yiyixuxu in #7241
[Community pipeline] SDXL Differential Diffusion Img2Img Pipeline by @asomoza in #7550
Fix FreeU tests by @DN6 in #7540
[Release tests] make nightly workflow dispatchable. by @sayakpaul in #7541
[Chore] remove class assignments for linear and conv. by @sayakpaul in #7553
[Tests] Speed up fast pipelines part II by @sayakpaul in #7521
7529 do not disable autocast for cuda devices by @bghira in #7530
add: utility to format our docs too 📜 by @sayakpaul in #7314
UniPC Multistep fix tensor dtype/device on order=3 by @Beinsezii in #7532
UniPC Multistep add rescale_betas_zero_snr by @Beinsezii in #7531
[Core] refactor transformers 2d into multiple init variants. by @sayakpaul in #7491
[Chore] increase number of workers for the tests. by @sayakpaul in #7558
Update pipelineanimatediffvideo2video.py by @AbhinavGopal in #7457
Skip test_freeu_enabled on MPS by @UmerHA in #7570
[Tests] reduce block sizes of UNet and VAE tests by @sayakpaul in #7560
[IF| add setbeginindex for all IF pipelines by @yiyixuxu in #7577
Add AudioLDM2 TTS by @tuanh123789 in #5381
Allow more arguments to be passed to convertfromckpt by @w4ffl35 in #7222
[Docs] fix bugs in callback docs by @Adenialzz in #7594
Add missing restore() EMA call in train SDXL script by @christopher-beckham in #7599
disable testconversionwhenusingdevice_map by @yiyixuxu in #7620
Multi-image masking for single IP Adapter by @fabiorigano in #7499
add utilities for updating diffusers pipeline metadata. by @sayakpaul in #7573
[Core] refactor transformer_2d forward logic into meaningful conditions. by @sayakpaul in #7489
[Workflows] remove installation of libsndfile1-dev and libgl1 from workflows by @sayakpaul in #7543
[Core] add "balanced" device_map support to pipelines by @sayakpaul in #6857
add the option of upsample function for tiny vae by @IDKiro in #7604
[docs] remove duplicate tip block. by @sayakpaul in #7625
Modularize instruct_pix2pix SD inferencing during and after training in examples by @satani99 in #7603
[Tests] reduce the model sizes in the SD fast tests by @sayakpaul in #7580
[docs] Prompt enhancer by @stevhliu in #7565
[docs] T2I by @stevhliu in #7623
Fix cpu offload related slow tests by @yiyixuxu in #7618
[Core] fix img2img pipeline for Playground by @sayakpaul in #7627
Skip PEFT LoRA Scaling if the scale is 1.0 by @stevenjlm in #7576
LCM Distill Scripts Fix Bug when Initializing Target U-Net by @dg845 in #6848
Fixed YAML loading. by @YiqinZhao in #7579
fix: Replaced deprecated logger.warn with logger.warning by @Sai-Suraj-27 in #7643
FIX Setting device for DoRA parameters by @BenjaminBossan in #7655
Add (Scheduled) Pseudo-Huber Loss training scripts to research projects by @kabachuha in #7527
make docker-buildx mandatory. by @sayakpaul in #7652
fix: metadata token by @sayakpaul in #7631
don't install peft from the source with uv for now. by @sayakpaul in #7679
Fixing implementation of ControlNet-XS by @UmerHA in #6772
[Core] is_cosxl_edit arg in SDXL ip2p. by @sayakpaul in #7650
[Docs] Add TGATE in section optimization by @WentianZhang-ML in #7639
fix: Updated ruff configuration to avoid deprecated configuration warning by @Sai-Suraj-27 in #7637
Don't install PEFT with UV in slow tests by @DN6 in #7697
[Workflows] remove installation of redundant modules from flax PR tests by @sayakpaul in #7662
[Docs] Update TGATE in section optimization. by @WentianZhang-ML in #7698
[docs] Pipeline loading by @stevhliu in #7684
Add tailscale action to push_test by @glegendre01 in #7709
Move IP Adapter Face ID to core by @fabiorigano in #7186
adding back testconversionwhenusingdevice_map by @yiyixuxu in #7704
Cast height, width to int inside prepare latents by @DN6 in #7691
Cleanup ControlnetXS by @DN6 in #7701
fix: Fixed type annotations for compatability with python 3.8 by @Sai-Suraj-27 in #7648
fix/add tailscale key in case of failure by @glegendre01 in #7719
Animatediff Controlnet Community Pipeline IP Adapter Fix by @AbhinavGopal in #7413
Update Wuerschten Test by @DN6 in #7700
Fix Kandinksy V22 tests by @DN6 in #7699
[docs] AutoPipeline by @stevhliu in #7714
Remove redundant lines by @philipbutler in #7396
Support InstantStyle by @DannHuang in #7668
Restore AttnProcessor20 in unloadip_adapter by @fabiorigano in #7727
fix: Fixed a wrong decorator by modifying it to @classmethod by @Sai-Suraj-27 in #7653
[Metadat utils] fix: json lines ordering. by @sayakpaul in #7744
[docs] Clean up toctree by @stevhliu in #7715
Fix failing VAE tiling test by @DN6 in #7747
Fix test for consistency decoder. by @DN6 in #7746
PixArt-Sigma Implementation by @lawrence-cj in #7654
[PixArt] fix small nits in pixart sigma by @sayakpaul in #7767
[Tests] mark UNetControlNetXSModelTests::testforwardno_control to be flaky by @sayakpaul in #7771
Fix lora device test by @sayakpaul in #7738
[docs] Reproducible pipelines by @stevhliu in #7769
[docs] Refactor image quality docs by @stevhliu in #7758
Convert RGB to BGR for the SDXL watermark encoder by @btlorch in #7013
[docs] Fix AutoPipeline docstring by @stevhliu in #7779
Add PixArtSigmaPipeline to AutoPipeline mapping by @Beinsezii in #7783
[Docs] Update image masking and face id example by @fabiorigano in #7780
Add DREAM training by @AmericanPresidentJimmyCarter in #6381
[Scheduler] introduce sigma schedule. by @sayakpaul in #7649
Update InstantStyle usage in IP-Adapter documentation by @DannHuang in #7806
Check for latents, before calling prepare_latents - sdxlImg2Img by @nileshkokane01 in #7582
Add debugging workflow by @DN6 in #7778
[Pipeline] Fix error of SVD pipeline when numvideosper_prompt > 1 by @wuyushuwys in #7786
Safetensor loading in AnimateDiff conversion scripts by @DN6 in #7764
Adding TextualInversionLoaderMixin for the controlnetinpaintsd_xl pipeline by @jschoormans in #7288
Added get_velocity function to EulerDiscreteScheduler. by @RuiningLi in #7733
Set maininputname in StableDiffusionSafetyChecker to "clip_input" by @clinty in #7500
[Tests] reduce the model size in the ddim fast test by @ariG23498 in #7803
[Tests] reduce the model size in the ddpm fast test by @ariG23498 in #7797
[Tests] reduce the model size in the amused fast test by @ariG23498 in #7804
[Core] introduce nosplit_modules to ModelMixin by @sayakpaul in #6396
Add B-Lora training option to the advanced dreambooth lora script by @linoytsaban in #7741
SSH Runner Workflow Update by @DN6 in #7822
Fix CPU offload in docstring by @standardAI in #7827
[docs] Community pipelines by @stevhliu in #7819
Fix for pipeline slow test fetcher by @DN6 in #7824
[Tests] fix: device map tests for models by @sayakpaul in #7825
update the logic of is_sequential_cpu_offload by @yiyixuxu in #7788
[ip-adapter] fix ip-adapter for StableDiffusionInstructPix2PixPipeline by @yiyixuxu in #7820
[Tests] reduce the model size in the audioldm fast test by @ariG23498 in #7833
Fix key error for dictionary with randomized order in convertldmunet_checkpoint by @yunseongcho in #7680
Fix hanging pipeline fetching by @DN6 in #7837
Update download diff format tests by @DN6 in #7831
Update CI cache by @DN6 in #7832
move to new runners by @glegendre01 in #7839
Change GPU Runners by @glegendre01 in #7840
Update deps for pipe test fetcher by @DN6 in #7838
[Tests] reduce the model size in the blipdiffusion fast test by @ariG23498 in #7849
Respect resume_download deprecation by @Wauplin in #7843
Remove installing python again in container by @DN6 in #7852
Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. by @HelloWorldBeginner in #7816
[docs] LCM by @stevhliu in #7829
Ci - change cache folder by @glegendre01 in #7867
[docs] Distilled inference by @stevhliu in #7834
Fix for "no lora weight found module" with some loras by @asomoza in #7875
7879 - adjust documentation to use naruto dataset, since pokemon is now gated by @bghira in #7880
Modification on the PAG community pipeline (re) by @HyoungwonCho in #7876
Fix image upcasting by @standardAI in #7858
Check shape and remove deprecated APIs in schedulingddpmflax.py by @ppham27 in #7703
[Pipeline] AnimateDiff SDXL by @a-r-r-o-w in #6721
fix offload test by @yiyixuxu in #7868
Allow users to save SDXL LoRA weights for only one text encoder by @dulacp in #7607
Remove dead code and fix f-string issue by @standardAI in #7720
Fix several imports by @standardAI in #7712
[Refactor] Better align from_single_file logic with from_pretrained by @DN6 in #7496
[Tests] fix things after #7013 by @sayakpaul in #7899
Set max parallel jobs on slow test runners by @DN6 in #7878
fix _optional_components in StableCascadeCombinedPipeline by @yiyixuxu in #7894
[scheduler] support custom timesteps and sigmas by @yiyixuxu in #7817
upgrade to python 3.10 in the Dockerfiles by @sayakpaul in #7893
add missing image processors to the docs by @sayakpaul in #7910
[Core] introduce videoprocessor. by @sayakpaul in #7776
#7535 Update FloatTensor type hints to Tensor by @vanakema in #7883
fix bugs when using deepspeed in sdxl by @HelloWorldBeginner in #7917
add custom sigmas and timesteps for StableDiffusionXLControlNet pipeline by @neuron-party in #7913
fix: Fixed a wrong link to supported python versions in contributing.md file by @Sai-Suraj-27 in #7638
[Core] fix offload behaviour when device_map is enabled. by @sayakpaul in #7919
Add Ascend NPU support for SDXL. by @HelloWorldBeginner in #7916
Official callbacks by @asomoza in #7761
fix AnimateDiff creation with a unet loaded with IP Adapter by @fabiorigano in #7791
[LoRA] Fix LoRA tests (side effects of RGB ordering) part ii by @sayakpaul in #7932
fix multicontrolnet save_pretrained logic for compatibility by @rebel-kblee in #7821
Update requirements.txt for texttoimage by @ktakita1011 in #7892
Bump transformers from 4.36.0 to 4.38.0 in /examples/research_projects/realfill by @dependabot[bot] in #7635
fix VAE loading issue in train_dreambooth by @bssrdf in #7632
Expansion proposal of diffusers-cli env by @standardAI in #7403
update to use hf-workflows for reporting the Docker build statuses by @sayakpaul in #7938
[Core] separate the loading utilities in modeling similar to pipelines. by @sayakpaul in #7943
Fix added_cond_kwargs when using IP-Adapter in StableDiffusionXLControlNetInpaintPipeline by @detkov in #7924
[Pipeline] Adding BoxDiff to community examples by @zjysteven in #7947
[tests] decorate StableDiffusion21PipelineSingleFileSlowTests with slow. by @sayakpaul in #7941
Adding VQGAN Training script by @isamu-isozaki in #5483
move to GH hosted M1 runner by @glegendre01 in #7949
[Workflows] add a workflow that can be manually triggered on a PR. by @sayakpaul in #7942
refactor: Refactored code by Merging isinstance calls by @Sai-Suraj-27 in #7710
Fix the text tokenizer name in logger warning of PixArt pipelines by @liang-hou in #7912
Fix AttributeError in trainlcmdistilllorasdxl_wds.py by @jainalphin in #7923
Consistent SDXL Controlnet callback tensor inputs by @asomoza in #7958
remove unsafe workflow. by @sayakpaul in #7967
[tests] fix Pixart Sigma tests by @sayakpaul in #7966
Fix typo in "attention" by @jacobmarks in #7977
Update pipelinecontrolnetinpaintsdxl.py by @detkov in #7983
[docs] add doc for PixArtSigmaPipeline by @lawrence-cj in #7857
Passing cross_attention_kwargs to StableDiffusionInstructPix2PixPipeline by @AlexeyZhuravlev in #7961
fix: Fixed few docstrings according to the Google Style Guide by @Sai-Suraj-27 in #7717
Make VAE compatible to torch.compile() by @rootonchair in #7984
[docs] VideoProcessor by @stevhliu in #7965
Use HF_TOKEN env var in CI by @Wauplin in #7993
fix: Attribute error in Logger object (logger.warning) by @AMohamedAakhil in #8183
Remove unnecessary single file tests for SD Cascade UNet by @DN6 in #7996
Fix resize issue in SVD pipeline with VideoProcessor by @DN6 in #8229
Create custom container for doc builder by @DN6 in #8263
Use freedesktop_os_release() in diffusers cli for Python >=3.10 by @DN6 in #8235
[Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation by @SingleZombie in #8239
[Chore] run the documentation workflow in a custom container. by @sayakpaul in #8266
Respect resume_download deprecation V2 by @Wauplin in #8267
Clean up from_single_file docs by @DN6 in #8268
sampling bug fix in diffusers tutorial "basic_training.md" by @yue-here in #8223
Fix a grammatical error in the raise messages by @standardAI in #8272
Fix CPU Offloading Usage & Typos by @standardAI in #8230
Add details about 1-stage implementation in I2VGen-XL docs by @dhaivat1729 in #8282
[Workflows] add a more secure way to run tests from a PR. by @sayakpaul in #7969
Add zip package to doc builder image by @DN6 in #8284
[Pipeline] Marigold depth and normals estimation by @toshas in #7847
Release: v0.28.0 by @sayakpaul (direct commit on v0.28.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@standardAI
- Fix typos (#7411)
- [IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline (#7262)
- [Docs] Fix typos (#7451)
- Fix Tiling in ConsistencyDecoderVAE (#7290)
- Fix CPU offload in docstring (#7827)
- Fix image upcasting (#7858)
- Remove dead code and fix f-string issue (#7720)
- Fix several imports (#7712)
- Expansion proposal of diffusers-cli env (#7403)
- Fix a grammatical error in the raise messages (#8272)
- Fix CPU Offloading Usage & Typos (#8230)
@a-r-r-o-w
- [refactor] Fix FreeInit behaviour (#7410)
- [Pipeline] AnimateDiff SDXL (#6721)
@UmerHA
- Fixed minor error in test_lora_layers_peft.py (#7394)
- Skip test_lora_fuse_nan on mps (#7481)
- Implements Blockwise lora (#7352)
- Quick-Fix for #7352 block-lora (#7523)
- Skip test_freeu_enabled on MPS (#7570)
- Fixing implementation of ControlNet-XS (#6772)
@bghira
- diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs (#7446)
- apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) (#7447)
- 7529 do not disable autocast for cuda devices (#7530)
- 7879 - adjust documentation to use naruto dataset, since pokemon is now gated (#7880)
@HyoungwonCho
- Perturbed-Attention Guidance (#7512)
- Modification on the PAG community pipeline (re) (#7876)
@haikmanukyan
- add HD-Painter pipeline (#7520)
@fabiorigano
- Multi-image masking for single IP Adapter (#7499)
- Move IP Adapter Face ID to core (#7186)
- Restore AttnProcessor20 in unloadip_adapter (#7727)
- [Docs] Update image masking and face id example (#7780)
- fix AnimateDiff creation with a unet loaded with IP Adapter (#7791)
@kabachuha
- Add (Scheduled) Pseudo-Huber Loss training scripts to research projects (#7527)
@lawrence-cj
- PixArt-Sigma Implementation (#7654)
- [docs] add doc for PixArtSigmaPipeline (#7857)
@vanakema
- #7535 Update FloatTensor type hints to Tensor (#7883)
@zjysteven
- [Pipeline] Adding BoxDiff to community examples (#7947)
@isamu-isozaki
- Adding VQGAN Training script (#5483)
@SingleZombie
- [Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation (#8239)
@toshas
- [Pipeline] Marigold depth and normals estimation (#7847)

- Python
Published by sayakpaul about 2 years ago

diffusers - v0.27.2: Fix scheduler `add_noise` 🐞, embeddings in StableCascade, `scale` when using LoRA

All commits

[scheduler] fix a bug in add_noise by @yiyixuxu in https://github.com/huggingface/diffusers/pull/7386
[LoRA] fix crossattentionkwargs problems and tighten tests by @sayakpaul in https://github.com/huggingface/diffusers/pull/7388
Fix issue with prompt embeds and latents in SD Cascade Decoder with multiple image embeddings for a single prompt. by @DN6 in https://github.com/huggingface/diffusers/pull/7381

- Python
Published by sayakpaul about 2 years ago

diffusers - v0.27.1: Clear `scale` argument confusion for LoRA

All commits

Release: v0.27.0 by @DN6 (direct commit on v0.27.1-patch)
[LoRA] pop the LoRA scale so that it doesn't get propagated to the weeds by @sayakpaul in #7338
Release: 0.27.1-patch by @sayakpaul (direct commit on v0.27.1-patch)

- Python
Published by sayakpaul about 2 years ago

diffusers - v0.27.0: Stable Cascade, Playground v2.5, EDM-style training, IP-Adapter image embeds, and more

Stable Cascade

We are adding support for a new text-to-image model building on Würstchen called Stable Cascade, which comes with a non-commercial license. The Stable Cascade line of pipelines differs from Stable Diffusion in that they are built upon three distinct models and allow for hierarchical compression of image patients, achieving remarkable outputs.

```python from diffusers import StableCascadePriorPipeline, StableCascadeDecoderPipeline import torch

prior = StableCascadePriorPipeline.frompretrained( "stabilityai/stable-cascade-prior", torchdtype=torch.bfloat16, ).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" imageemb = prior(prompt=prompt).imageembeddings[0]

decoder = StableCascadeDecoderPipeline.frompretrained( "stabilityai/stable-cascade", torchdtype=torch.bfloat16, ).to("cuda")

image = pipe(imageembeddings=imageemb, prompt=prompt).images[0] image ```

📜 Check out the docs here to know more about the model.

Note: You will need a torch>=2.2.0 to use the torch.bfloat16 data type with the Stable Cascade pipeline.

Playground v2.5

PlaygroundAI released a new v2.5 model (playgroundai/playground-v2.5-1024px-aesthetic), which particularly excels at aesthetics. The model closely follows the architecture of Stable Diffusion XL, except for a few tweaks. This release comes with support for this model:

```python from diffusers import DiffusionPipeline import torch

pipe = DiffusionPipeline.frompretrained( "playgroundai/playground-v2.5-1024px-aesthetic", torchdtype=torch.float16, variant="fp16", ).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt=prompt, numinferencesteps=50, guidance_scale=3).images[0] image ```

Loading from the original single-file checkpoint is also supported:

```python from diffusers import StableDiffusionXLPipeline, EDMDPMSolverMultistepScheduler import torch

url = "https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/playground-v2.5-1024px-aesthetic.safetensors" pipeline = StableDiffusionXLPipeline.fromsinglefile(url) pipeline.to(device="cuda", dtype=torch.float16)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipeline(prompt=prompt, guidancescale=3.0).images[0] image.save("playgroundtest_image.png") ```

You can also perform LoRA DreamBooth training with the playgroundai/playground-v2.5-1024px-aesthetic checkpoint:

python accelerate launch train_dreambooth_lora_sdxl.py \ --pretrained_model_name_or_path="playgroundai/playground-v2.5-1024px-aesthetic" \ --instance_data_dir="dog" \ --output_dir="dog-playground-lora" \ --mixed_precision="fp16" \ --instance_prompt="a photo of sks dog" \ --resolution=1024 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --learning_rate=1e-4 \ --use_8bit_adam \ --report_to="wandb" \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=500 \ --validation_prompt="A photo of sks dog in a bucket" \ --validation_epochs=25 \ --seed="0" \ --push_to_hub

To know more, follow the instructions here.

EDM-style training support

EDM refers to the training and sampling techniques introduced in the following paper: Elucidating the Design Space of Diffusion-Based Generative Models. We have introduced support for training using the EDM formulation in our train_dreambooth_lora_sdxl.py script.

To train stabilityai/stable-diffusion-xl-base-1.0 using the EDM formulation, you just have to specify the --do_edm_style_training flag in your training command, and voila 🤗

If you’re interested in extending this formulation to other training scripts, we refer you to this PR.

New schedulers with the EDM formulation

To better support the Playground v2.5 model and EDM-style training in general, we are bringing support for EDMDPMSolverMultistepScheduler and EDMEulerScheduler. These support the EDM formulations of the DPMSolverMultistepScheduler and EulerDiscreteScheduler, respectively.

Trajectory Consistency Distillation

Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Moreover, owing to the effective error mitigation during the distillation process, TCD demonstrates superior performance even under conditions of large inference steps. It was proposed in Trajectory Consistency Distillation.

This release comes with the support of a TCDScheduler that enables this kind of fast sampling. Much like LCM-LoRA, TCD requires an additional adapter for the acceleration. The code snippet below shows a usage:

```python import torch from diffusers import StableDiffusionXLPipeline, TCDScheduler

device = "cuda" basemodelid = "stabilityai/stable-diffusion-xl-base-1.0" tcdloraid = "h1t/TCD-SDXL-LoRA"

pipe = StableDiffusionXLPipeline.frompretrained(basemodelid, torchdtype=torch.float16, variant="fp16").to(device) pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.loadloraweights(tcdloraid) pipe.fuse_lora()

prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna."

image = pipe( prompt=prompt, numinferencesteps=4, guidancescale=0, eta=0.3, generator=torch.Generator(device=device).manualseed(0), ).images[0] ```

tcd_image

📜 Check out the docs here to know more about TCD.

Many thanks to @mhh0318 for contributing the TCDScheduler in #7174 and the guide in #7259.

IP-Adapter image embeddings and masking

All the pipelines supporting IP-Adapter accept a ip_adapter_image_embeds argument. If you need to run the IP-Adapter multiple times with the same image, you can encode the image once and save the embedding to the disk. This saves computation time and is especially useful when building UIs. Additionally, ComfyUI image embeddings for IP-Adapters are fully compatible in Diffusers and should work out-of-box.

We have also introduced support for providing binary masks to specify which portion of the output image should be assigned to an IP-Adapter. For each input IP-Adapter image, a binary mask and an IP-Adapter must be provided. Thanks to @fabiorigano for contributing this feature through #6847.

📜 To know about the exact usage of both of the above, refer to our official guide.

We thank our community members, @fabiorigano, @asomoza, and @cubiq, for their guidance and input on these features.

Guide on merging LoRAs

Merging LoRAs can be a fun and creative way to create new and unique images. Diffusers provides merging support with the set_adapters method which concatenates the weights of the LoRAs to merge.

Now, Diffusers also supports the add_weighted_adapter method from the PEFT library, unlocking more efficient merging method like TIES, DARE, linear, and even combinations of these merging methods like dare_ties.

📜 Take a look at the Merge LoRAs guide to learn more about merging in Diffusers.

LEDITS++

We are adding support to the real image editing technique called LEDITS++: Limitless Image Editing using Text-to-Image Models, a parameter-free method, requiring no fine-tuning nor any optimization. To edit real images, the LEDITS++ pipelines first invert the image DPM-solver++ scheduler that facilitates editing with as little as 20 total diffusion steps for inversion and inference combined. LEDITS++ guidance is defined such that it both reflects the direction of the edit (if we want to push away from/towards the edit concept) and the strength of the effect. The guidance also includes a masking term focused on relevant image regions which, for multiple edits especially, ensures that the corresponding guidance terms for each concept remain mostly isolated, limiting interference.

The code snippet below shows a usage:

```python import torch import PIL import requests from io import BytesIO from diffusers import LEditsPPPipelineStableDiffusionXL, AutoencoderKL

device = "cuda" basemodelid = "stabilityai/stable-diffusion-xl-base-1.0"

vae = AutoencoderKL.frompretrained("madebyollin/sdxl-vae-fp16-fix", torchdtype=torch.float16)

pipe = LEditsPPPipelineStableDiffusionXL.frompretrained( basemodelid, vae=vae, torchdtype=torch.float16 ).to(device)

def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB")

imgurl = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg" image = downloadimage(img_url)

_ = pipe.invert( image = image, numinversionsteps=50, skip=0.2 )

editedimage = pipe( editingprompt=["tennis ball","tomato"], reverseeditingdirection=[True,False], editguidancescale=[5.0,10.0], edit_threshold=[0.9,0.85],) ```

📜 Check out the docs here to learn more about LEDITS++.

Thanks to @manuelbrack for contributing this in #6074.

All commits

Fix flaky IP Adapter test by @DN6 in #6960
Move SDXL T2I Adapter lora test into PEFT workflow by @DN6 in #6965
Allow passing config_file argument to ControlNetModel when using from_single_file by @DN6 in #6959
[PEFT / docs] Add a note about torch.compile by @younesbelkada in #6864
[Core] Harmonize single file ckpt model loading by @sayakpaul in #6971
fix: controlnet inpaint single file. by @sayakpaul in #6975
[docs] IP-Adapter by @stevhliu in #6897
fix IPAdapter unloadipadapter test by @yiyixuxu in #6972
[advanced sdxl lora script] - fix #6967 bug when using prior preservation loss by @linoytsaban in #6968
[IP Adapters] feat: allow lowcpumem_usage in ip adapter loading by @sayakpaul in #6946
Fix diffusers import prompt2prompt by @ihkap11 in #6927
add: peft to the benchmark workflow by @sayakpaul in #6989
Fix procecss process by @co63oc in #6591
Standardize model card for textual inversion sdxl by @Stepheni12 in #6963
Update textual_inversion.py by @Bhavay-2001 in #6952
[docs] Fix callout by @stevhliu in #6998
[docs] Video generation by @stevhliu in #6701
start depcrecation cycle for loraattentionproc 👋 by @sayakpaul in #7007
Add documentation for strength parameter in Controlnet_img2img pipelines by @tlpss in #6951
Fixed typos in dosctrings of init() and in forward() of Unet3DConditionModel by @MK-2012 in #6663
[SVD] fix a bug when passing image as tensor by @yiyixuxu in #6999
Fix deprecation warning for torch.utils.pytree.registerpytreenode in PyTorch 2.2 by @zyinghua in #7008
[IP2P] Make text encoder truly optional in InstructPi2Pix by @sayakpaul in #6995
IP-Adapter attention masking by @fabiorigano in #6847
Fix Pixart Slow Tests by @DN6 in #6962
[fromsinglefile] pass torch_dtype to set_module_tensor_to_device by @yiyixuxu in #6994
[Refactor] FreeInit for AnimateDiff based pipelines by @DN6 in #6874
[Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU by @ustcuna in #6683
Add section on AnimateLCM to docs by @DN6 in #7024
IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline by @rootonchair in #6941
Supper IP Adapter weight loading in StableDiffusionXLControlNetInpaintPipeline by @tontan2545 in #7031
Fix alt text and image links in AnimateLCM docs by @DN6 in #7029
Update ControlNet Inpaint single file test by @DN6 in #7022
Fix load_model_dict_into_meta for ControlNet from_single_file by @DN6 in #7034
Remove disable_full_determinism from StableVideoDiffusion xformers test. by @DN6 in #7039
update header by @pravdomil in #6596
fix doc example for fomsinglefile by @yiyixuxu in #7015
Fix typos in texttoimage examples by @standardAI in #7050
Update checkpoint_merger pipeline to pass the "variant" argument by @lstein in #6670
allow explicit tokenizer & textencoder in unloadtextual_inversion by @H3zi in #6977
re-add unet refactor PR by @yiyixuxu in #7044
IPAdapterTesterMixin by @a-r-r-o-w in #6862
[Refactor] save_model_card function in text_to_image examples by @standardAI in #7051
Fix typos by @standardAI in #7068
Fix docstring of community pipeline imagic by @chongdashu in #7062
Change images to image. The variable images is not used anywhere by @bimsarapathiraja in #7074
fix: TensorRTStableDiffusionPipeline cannot set guidance_scale by @caiyueliang in #7065
[Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline by @standardAI in #7071
Fix truthy-ness condition in pipelines that use denoising_start by @a-r-r-o-w in #6912
Fix headtobatch_dim for IPAdapterAttnProcessor by @fabiorigano in #7077
[docs] Minor updates by @stevhliu in #7063
Modularize Dreambooth LoRA SD inferencing during and after training by @rootonchair in #6654
Modularize Dreambooth LoRA SDXL inferencing during and after training by @rootonchair in #6655
[Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet by @a-r-r-o-w in #7086
Pass uselinearprojection parameter to mid block in UNetMotionModel by @Stepheni12 in #7035
Resize image before crop by @jiqing-feng in #7095
Small change to download in dance diffusion convert script by @DN6 in #7070
Fix EMA in traintexttoimagesdxl.py by @standardAI in #7048
Make LoRACompatibleConv padding_mode work. by @jinghuan-Chen in #6031
[Easy] edit issue and PR templates by @sayakpaul in #7092
FIX [PEFT / Core] Copy the state dict when passing it to load_lora_weights by @younesbelkada in #7058
[Core] pass revision in the loading_kwargs. by @sayakpaul in #7019
[Examples] Multiple enhancements to the ControlNet training scripts by @sayakpaul in #7096
move to uv in the Dockerfiles. by @sayakpaul in #7094
Add tests to check configs when using single file loading by @DN6 in #7099
denormalize latents with the mean and std if available by @patil-suraj in #7111
[Dockerfile] remove uv from docker jax tpu by @sayakpaul in #7115
Add EDMEulerScheduler by @patil-suraj in #7109
add DPM scheduler with EDM formulation by @patil-suraj in #7120
[Docs] Fix typos by @standardAI in #7118
DPMSolverMultistep add rescale_betas_zero_snr by @Beinsezii in #7097
[Tests] make test steps dependent on certain things and general cleanup of the workflows by @sayakpaul in #7026
fix kwarg in the SDXL LoRA DreamBooth by @sayakpaul in #7124
[Diffusers CI] Switch slow test runners by @DN6 in #7123
[stalebot] don't close the issue if the stale label is removed by @yiyixuxu in #7106
refactor: move model helper function in pipeline to a mixin class by @ultranity in #6571
[docs] unet type hints by @a-r-r-o-w in #7134
use uv for installing stuff in the workflows. by @sayakpaul in #7116
limit documentation workflow runs for relevant changes. by @sayakpaul in #7125
add: support for notifying the maintainers about the docker ci status. by @sayakpaul in #7113
Fix setting fp16 dtype in AnimateDiff convert script. by @DN6 in #7127
[Docs] Fix typos by @standardAI in #7131
[ip-adapter] refactor prepare_ip_adapter_image_embeds and skip load image_encoder by @yiyixuxu in #7016
[CI] fix path filtering in the documentation workflows by @sayakpaul in #7153
[Urgent][Docker CI] pin uv version for now and a minor change in the Slack notification by @sayakpaul in #7155
Fix LCM benchmark test by @sayakpaul in #7158
[CI] Remove max parallel flag on slow test runners by @DN6 in #7162
Fix vaeencodingsfn hash in traintexttoimagesdxl.py by @lhoestq in #7171
fix: loading problem for sdxl lora dreambooth by @sayakpaul in #7166
Map speedup by @kopyl in #6745
[stalebot] fix a bug by @yiyixuxu in #7156
Support EDM-style training in DreamBooth LoRA SDXL script by @sayakpaul in #7126
Fix PixArt 256px inference by @lawrence-cj in #6789
[ip-adapter] fix problem using embeds with the plus version of ip adapters by @asomoza in #7189
feat: add ip adapter benchmark by @sayakpaul in #6936
[Docs] more elaborate example for peft torch.compile by @sayakpaul in #7161
adding callback_on_step_end for StableDiffusionLDM3DPipeline by @rootonchair in #7149
Update requirements.txt to remove huggingface-cli by @sayakpaul in #7202
[advanced dreambooth lora sdxl] add DoRA training feature by @linoytsaban in #7072
FIx torch and cuda version in ONNX tests by @DN6 in #7164
[training scripts] add tags of diffusers-training by @linoytsaban in #7206
fix a bug in from_config by @yiyixuxu in #7192
Fix: UNet2DModel::init type hints; fixes issue #4806 by @fpgaminer in #7175
Fix typos by @standardAI in #7181
Enable PyTorch's FakeTensorMode for EulerDiscreteScheduler scheduler by @thiagocrepaldi in #7151
[docs] Improve SVD pipeline docs by @a-r-r-o-w in #7087
[Docs] Update callback.md code example by @rootonchair in #7150
[Core] errors should be caught as soon as possible. by @sayakpaul in #7203
[Community] PromptDiffusion Pipeline by @iczaw in #6752
add TCD Scheduler by @mhh0318 in #7174
SDXL Turbo support and example launch by @bram-w in #6473
[bug] Fix float/int guidance scale not working in StableVideoDiffusionPipeline by @JinayJain in #7143
[Pipiline] Wuerstchen v3 aka Stable Cascasde pipeline by @kashif in #6487
Update traindreamboothlorasdxladvanced.py by @landmann in #7227
[Core] move out the utilities from pipeline_utils.py by @sayakpaul in #7234
Refactor Prompt2Prompt: Inherit from DiffusionPipeline by @ihkap11 in #7211
add DoRA training feature to sdxl dreambooth lora script by @linoytsaban in #7235
fix: remove duplicated code in TemporalBasicTransformerBlock. by @AsakusaRinne in #7212
[Examples] fix: prior preservation setting in DreamBooth LoRA SDXL script. by @sayakpaul in #7242
fix: support for loading playground v2.5 single file checkpoint. by @sayakpaul in #7230
Raise an error when trying to use SD Cascade Decoder with dtype bfloat16 and torch < 2.2 by @DN6 in #7244
Remove the line. Using it create wrong output by @bimsarapathiraja in #7075
[docs] Merge LoRAs by @stevhliu in #7213
use self.device by @pravdomil in #6595
[docs] Community tips by @stevhliu in #7137
[Core] throw error when patch inputs and layernorm are provided for Transformers2D by @sayakpaul in #7200
[Tests] fix: VAE tiling tests when setting the right device by @sayakpaul in #7246
[Utils] Improve " # Copied from ..." statements in the pipelines by @sayakpaul in #6917
[Easy] fix: savemodelcard utility of the DreamBooth SDXL LoRA script by @sayakpaul in #7258
Make mid block optional for flax UNet by @mar-muel in #7083
Solve missing clip_sample implementation in FlaxDDIMScheduler. by @hi-sushanta in #7017
[Tests] fix config checking tests by @sayakpaul in #7247
[docs] IP-Adapter image embedding by @stevhliu in #7226
Adds denoising_end parameter to ControlNetPipeline for SDXL by @UmerHA in #6175
Add npu support by @MengqingCao in #7144
[Community Pipeline] Skip Marigold depth_colored with color_map=None by @qqii in #7170
update the signature of fromsinglefile by @yiyixuxu in #7216
[UNetSpatioTemporalCondition] fix default numattentionheads in unetspatiotemporalcondition by @Wang-Xiaodong1899 in #7205
[docs/nits] Fix return values based on return_dict and minor doc updates by @a-r-r-o-w in #7105
[Chore] remove tf mention by @sayakpaul in #7245
Fix gmflow_dir by @pravdomil in #6583
Support latentsmean and latentsstd by @haofanwang in #7132
Inline InputPadder by @pravdomil in #6582
[Dockerfiles] add: a workflow to check if docker containers can be built in case of modifications by @sayakpaul in #7129
instruct pix2pix pipeline: remove sigma scaling when computing classifier free guidance by @erliding in #7006
Change export_to_video default by @DN6 in #6990
[Chore] switch to logger.warning by @sayakpaul in #7289
[LoRA] use the PyTorch classes wherever needed and start depcrecation cycles by @sayakpaul in #7204
Add single file support for Stable Cascade by @DN6 in #7274
Fix passing pooled prompt embeds to Cascade Decoder and Combined Pipeline by @DN6 in #7287
Fix loading Img2Img refiner components in from_single_file by @DN6 in #7282
[Chore] clean residue from copy-pasting in the UNet single file loader by @sayakpaul in #7295
Update Cascade documentation by @DN6 in #7257
Update Stable Cascade Conversion Scripts by @DN6 in #7271
[Pipeline] Add LEDITS++ pipelines by @manuelbrack in #6074
[PyPI publishing] feat: automate the process of pypi publication to some extent. by @sayakpaul in #7270
add: support for notifying maintainers about the nightly test status by @sayakpaul in #7117
Fix Wrong Text-encoder Grad Setting in Custom_Diffusion Training by @Rbrq03 in #7302
Add Intro page of TCD by @mhh0318 in #7259
Fix typos in UNet2DConditionModel documentation by @alexanderbonnet in #7291
Change step_offset scheduler docstrings by @Beinsezii in #7128
update getorderlist if statement by @kghamilton89 in #7309
add: pytest log installation by @sayakpaul in #7313
[Tests] Fix incorrect constant in VAE scaling test. by @DN6 in #7301
log loss per image by @noskill in #7278
add edm schedulers in doc by @patil-suraj in #7319
[Advanced DreamBooth LoRA SDXL] Support EDM-style training (follow up of #7126) by @linoytsaban in #7182
Update Cascade Tests by @DN6 in #7324
Release: v0.27.0 by @DN6 (direct commit on v0.27.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@ihkap11
- Fix diffusers import prompt2prompt (#6927)
- Refactor Prompt2Prompt: Inherit from DiffusionPipeline (#7211)
@ustcuna
- [Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU (#6683)
@rootonchair
- IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline (#6941)
- Modularize Dreambooth LoRA SD inferencing during and after training (#6654)
- Modularize Dreambooth LoRA SDXL inferencing during and after training (#6655)
- adding callback_on_step_end for StableDiffusionLDM3DPipeline (#7149)
- [Docs] Update callback.md code example (#7150)
@standardAI
- Fix typos in texttoimage examples (#7050)
- [Refactor] save_model_card function in text_to_image examples (#7051)
- Fix typos (#7068)
- [Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline (#7071)
- Fix EMA in traintexttoimagesdxl.py (#7048)
- [Docs] Fix typos (#7118)
- [Docs] Fix typos (#7131)
- Fix typos (#7181)
@a-r-r-o-w
- IPAdapterTesterMixin (#6862)
- Fix truthy-ness condition in pipelines that use denoising_start (#6912)
- [Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet (#7086)
- [docs] unet type hints (#7134)
- [docs] Improve SVD pipeline docs (#7087)
- [docs/nits] Fix return values based on return_dict and minor doc updates (#7105)
@ultranity
- refactor: move model helper function in pipeline to a mixin class (#6571)
@iczaw
- [Community] PromptDiffusion Pipeline (#6752)
@mhh0318
- add TCD Scheduler (#7174)
- Add Intro page of TCD (#7259)
@manuelbrack
- [Pipeline] Add LEDITS++ pipelines (#6074)

- Python
Published by sayakpaul about 2 years ago

diffusers - v0.26.3: Patch release to fix DPMSolverSinglestepScheduler and configuring VAE from single file mixin

All commits

Fix configuring VAE from single file mixin by @DN6 in #6950
[DPMSolverSinglestepScheduler] correct get_order_list for solver_order=2and lower_order_final=True by @yiyixuxu in #6953

- Python
Published by yiyixuxu over 2 years ago

diffusers - v0.26.2: Patch fix for adding `self.use_ada_layer_norm_*` params back to `BasicTransformerBlock`

In v0.26.0, we introduced a bug 🐛 in the BasicTransformerBlock by removing some boolean flags. This caused many popular libraries tomesd to break. We have fixed that in this release. Thanks to @vladmandic for bringing this to our attention.

All commits

add self.use_ada_layer_norm_* params back to BasicTransformerBlock by @yiyixuxu in #6841

- Python
Published by sayakpaul over 2 years ago

diffusers - v0.26.1: Patch release to fix `torchvision` dependency

In the v0.26.0 release, we slipped in the torchvision library as a required library, which shouldn't have been the case. This is now fixed.

All commits

add istorchvisionavailable by @yiyixuxu in #6800

- Python
Published by sayakpaul over 2 years ago

diffusers - v0.26.0: New video pipelines, single-file checkpoint revamp, multi IP-Adapter inference with multiple images

This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more.

I2VGenXL

I2VGenXL is an image-to-video pipeline, proposed in I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.

```python import torch from diffusers import I2VGenXLPipeline from diffusers.utils import exporttogif, load_image

repoid = "ali-vilab/i2vgen-xl" pipeline = I2VGenXLPipeline.frompretrained(repoid, torchdtype=torch.float16).to("cuda") pipeline.enablemodelcpu_offload()

imageurl = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgenxlimages/img0001.jpg" image = loadimage(imageurl).convert("RGB") prompt = "A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style." negativeprompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms" generator = torch.manualseed(8888)

frames = pipeline( prompt=prompt, image=image, numinferencesteps=50, negativeprompt=negativeprompt, generator=generator, ).frames exporttogif(frames[0], "i2v.gif") ```

masterpiece, bestquality, sunset.
library

📜 Check out the docs here.

PIA

PIA is a Personalized Image Animator, that aligns with condition images, controls motion by text, and is compatible with various T2I models without specific tuning. PIA uses a base T2I model with temporal alignment layers for image animation. A key component of PIA is the condition module, which transfers appearance information for individual frame synthesis in the latent space, thus allowing a stronger focus on motion alignment. PIA was introduced in PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.

```python import torch from diffusers import ( EulerDiscreteScheduler, MotionAdapter, PIAPipeline, ) from diffusers.utils import exporttogif, load_image

adapter = MotionAdapter.frompretrained("openmmlab/PIA-condition-adapter") pipe = PIAPipeline.frompretrained("SG161222/RealisticVisionV6.0B1noVAE", motionadapter=adapter, torchdtype=torch.float16)

pipe.scheduler = EulerDiscreteScheduler.fromconfig(pipe.scheduler.config) pipe.enablemodelcpuoffload() pipe.enablevaeslicing()

image = loadimage( "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat6.png?download=true" ) image = image.resize((512, 512)) prompt = "cat in a field" negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"

generator = torch.Generator("cpu").manualseed(0) output = pipe(image=image, prompt=prompt, generator=generator) frames = output.frames[0] exportto_gif(frames, "pia-animation.gif") ```

masterpiece, bestquality, sunset.
cat in a field

📜 Check out the docs here.

Multiple IP-Adapters + Multiple reference images support (“Instant LoRA” Feature)

IP-Adapters are becoming quite popular, so we have added support for performing inference multiple IP-Adapters and multiple reference images! Thanks to @asomoza for their help. Get started with the code below:

```python import torch from diffusers import AutoPipelineForText2Image, DDIMScheduler from transformers import CLIPVisionModelWithProjection from diffusers.utils import load_image

imageencoder = CLIPVisionModelWithProjection.frompretrained( "h94/IP-Adapter", subfolder="models/imageencoder", torchdtype=torch.float16, )

pipeline = AutoPipelineForText2Image.frompretrained( "stabilityai/stable-diffusion-xl-base-1.0", torchdtype=torch.float16, imageencoder=imageencoder, ) pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

pipeline.loadipadapter("h94/IP-Adapter", subfolder="sdxlmodels", weightname=["ip-adapter-plussdxlvit-h.safetensors", "ip-adapter-plus-facesdxlvit-h.safetensors"]) pipeline.setipadapter_scale([0.7, 0.3])

pipeline.enablemodelcpu_offload()

faceimage = loadimage("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")

stylefolder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/styleziggy" styleimages = [loadimage(f"{style_folder}/img{i}.png") for i in range(10)]

generator = torch.Generator(device="cpu").manual_seed(0)

image = pipeline( prompt="wonderwoman", ipadapterimage=[styleimages, faceimage], negativeprompt="monochrome, lowres, bad anatomy, worst quality, low quality", numinference_steps=50 generator=generator, ).images[0] ```

Reference style images:

Reference face Image	Output Image

📜 Check out the docs here.

Single-file checkpoint loading

from_single_file() utility has been refactored for better readability and to follow similar semantics as from_pretrained() . Support for loading single file checkpoints and configs from URLs has also been added.

DPM scheduler fix

We introduced a fix for DPM schedulers, so now you can use it with SDXL to generate high-quality images in fewer steps than the Euler scheduler.

Apart from these, we have done a myriad of refactoring to improve the library design and will continue to do so in the coming days.

All commits

[docs] Fix missing API function by @stevhliu in #6604
Fix failing tests due to Posix Path by @DN6 in #6627
Update convertfromckpt.py / read checkpoint config yaml contents by @spezialspezial in #6633
[Community] Experimental AnimateDiff Image to Video (open to improvements) by @a-r-r-o-w in #6509
refactor: extract init/forward function in UNet2DConditionModel by @ultranity in #6478
Modularize InstructPix2Pix SDXL inferencing during and after training in examples by @sang-k in #6569
Fixed the bug related to saving DeepSpeed models. by @HelloWorldBeginner in #6628
fix DPM Scheduler with use_karras_sigmas option by @yiyixuxu in #6477
fix SDXL-kdiffusion tests by @yiyixuxu in #6647
add paddingmaskcrop to all inpaint pipelines by @rootonchair in #6360
add Sa-Solver by @lawrence-cj in #5975
Add tearDown method to LoRA tests. by @DN6 in #6660
[Diffusion DPO] apply fixes from #6547 by @sayakpaul in #6668
Update README by @standardAI in #6669
[Big refactor] move unets to unets module 🦋 by @sayakpaul in #6630
Standardise outputs for video pipelines by @DN6 in #6626
fix dpm related slow test failure by @yiyixuxu in #6680
[Tests] Test for passing local config file to from_single_file() by @sayakpaul in #6638
[Refactor] Update from single file by @DN6 in #6428
[WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow by @ayushtues in #6057
Add InstantID Pipeline by @haofanwang in #6673
[Docs] update: tutorials ja | AUTOPIPELINE.md by @YasunaCoffee in #6629
[Fix bugs] pipelinecontrolnetsd_xl.py by @haofanwang in #6653
SD 1.5 Support For Advanced Lora Training (traindreamboothlorasdxladvanced.py) by @brandostrong in #6449
AnimateDiff Video to Video by @a-r-r-o-w in #6328
[docs] UViT2D by @stevhliu in #6643
Correct sigmas cpu settings by @patrickvonplaten in #6708
[docs] AnimateDiff Video-to-Video by @a-r-r-o-w in #6712
fix community README by @a-r-r-o-w in #6645
fix custom diffusion training with concept list by @AIshutin in #6710
Add IP Adapters to slow tests by @DN6 in #6714
Move tests for SD inference variant pipelines into their own modules by @DN6 in #6707
Add Community Example Consistency Training Script by @dg845 in #6717
Add UFOGenScheduler to Community Examples by @dg845 in #6650
[Hub] feat: explicitly tag to diffusers when using pushtohub by @sayakpaul in #6678
Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator by @thuliu-yt16 in #6307
changed to posix unet by @gzguevara in #6719
Change os.path to pathlib Path by @Stepheni12 in #6737
correct hflip arg by @sayakpaul in #6743
Add unloadtextualinversion method by @fabiorigano in #6656
[Core] move transformer scripts to transformers modules by @sayakpaul in #6747
Update lora.md with a more accurate description of rank by @xhedit in #6724
Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. by @sajadn in #6751
udpate ip-adapter slow tests by @yiyixuxu in #6760
Update export to video to support new tensor_to_vid function in video pipelines by @DN6 in #6715
[DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement. by @woshiyyya in #6704
Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten by @dg845 in #6736
add note about serialization by @sayakpaul in #6764
Update traindiffusiondpo.py by @viettmab in #6754
Pin torch < 2.2.0 in test runners by @DN6 in #6780
[Kandinsky tests] add is_flaky to testmodelcpuoffloadforward_pass by @sayakpaul in #6762
add ipo, hinge and cpo loss to dpo trainer by @kashif in #6788
Fix setting scaling factor in VAE config by @DN6 in #6779
Add PIA Model/Pipeline by @DN6 in #6698
[docs] Add missing parameter by @stevhliu in #6775
[IP-Adapter] Support multiple IP-Adapters by @yiyixuxu in #6573
[sdxl k-diffusion pipeline]move sigma to device by @yiyixuxu in #6757
[Feat] add I2VGenXL for image-to-video generation by @sayakpaul in #6665
Release: v0.26.0 by @ (direct commit on v0.26.0-release)
fix torchvision import by @patrickvonplaten in #6796

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@a-r-r-o-w
- [Community] Experimental AnimateDiff Image to Video (open to improvements) (#6509)
- AnimateDiff Video to Video (#6328)
- [docs] AnimateDiff Video-to-Video (#6712)
- fix community README (#6645)
@ultranity
- refactor: extract init/forward function in UNet2DConditionModel (#6478)
@lawrence-cj
- add Sa-Solver (#5975)
@ayushtues
- [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (#6057)
@haofanwang
- Add InstantID Pipeline (#6673)
- [Fix bugs] pipelinecontrolnetsd_xl.py (#6653)
@brandostrong
- SD 1.5 Support For Advanced Lora Training (traindreamboothlorasdxladvanced.py) (#6449)
@dg845
- Add Community Example Consistency Training Script (#6717)
- Add UFOGenScheduler to Community Examples (#6650)
- Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten (#6736)

- Python
Published by sayakpaul over 2 years ago

diffusers - Patch release

Make sure diffusers can correctly be used in offline mode again: https://github.com/huggingface/diffusers/pull/1767#issuecomment-1896194917

Respect offline mode when loading pipeline by @Wauplin in #6456
Fix offline mode import by @Wauplin in #6467

- Python
Published by patrickvonplaten over 2 years ago

diffusers - v0.25.0: aMUSEd, faster SDXL, interruptable pipelines

aMUSEd

collage_full

aMUSEd is a lightweight text to image model based off of the MUSE architecture. aMUSEd is particularly useful in applications that require a lightweight and fast model, such as generating many images quickly at once. aMUSEd is currently a research release.

aMUSEd is a VQVAE token-based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with MUSE, it uses the smaller text encoder CLIP-L/14 instead of T5-XXL. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.

Text-to-image generation

```python import torch from diffusers import AmusedPipeline

pipe = AmusedPipeline.frompretrained( "amused/amused-512", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

prompt = "cowboy" image = pipe(prompt, generator=torch.manualseed(8)).images[0] image.save("text2image512.png") ```

Image-to-image generation

```python import torch from diffusers import AmusedImg2ImgPipeline from diffusers.utils import load_image

pipe = AmusedImg2ImgPipeline.frompretrained( "amused/amused-512", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

prompt = "apple watercolor" inputimage = ( loadimage( "https://huggingface.co/amused/amused-512/resolve/main/assets/image2image256orig.png" ) .resize((512, 512)) .convert("RGB") )

image = pipe(prompt, inputimage, strength=0.7, generator=torch.manualseed(3)).images[0] image.save("image2image_512.png") ```

Inpainting

```python import torch from diffusers import AmusedInpaintPipeline from diffusers.utils import load_image from PIL import Image

pipe = AmusedInpaintPipeline.frompretrained( "amused/amused-512", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

prompt = "a man with glasses" inputimage = ( loadimage( "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting256orig.png" ) .resize((512, 512)) .convert("RGB") ) mask = ( loadimage( "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting256_mask.png" ) .resize((512, 512)) .convert("L") )

image = pipe(prompt, inputimage, mask, generator=torch.manualseed(3)).images[0] image.save(f"inpainting_512.png") ```

📜 Docs: https://huggingface.co/docs/diffusers/main/en/api/pipelines/amused

🛠️ Models:

mused-256: https://huggingface.co/amused/amused-256 (603M params)
amused-512: https://huggingface.co/amused/amused-512 (608M params)

Faster SDXL

We’re excited to present an array of optimization techniques that can be used to accelerate the inference latency of text-to-image diffusion models. All of these can be done in native PyTorch without requiring additional C++ code.

SDXL_Batch_Size__1_Steps__30

These techniques are not specific to Stable Diffusion XL (SDXL) and can be used to improve other text-to-image diffusion models too. Starting from default fp32 precision, we can achieve a 3x speed improvement by applying different PyTorch optimization techniques. We encourage you to check out the detailed docs provided below.

Note: Compared to the default way most people use Diffusers which is fp16 + SDPA, applying all the optimization explained in the blog below yields a 30% speed-up.

📜 Docs: https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion 🌠 PyTorch blog post: https://pytorch.org/blog/accelerating-generative-ai-3/

Interruptible pipelines

Interrupting the diffusion process is particularly useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback.

This callback function should take the following arguments: pipe, i, t, and callback_kwargs (this must be returned). Set the pipeline's _interrupt attribute to True to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback.

In this example, the diffusion process is stopped after 10 steps even though num_inference_steps is set to 50.

```python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5") pipe.enablemodelcpuoffload() numinferencesteps = 50

def interruptcallback(pipe, i, t, callbackkwargs): stopidx = 10 if i == stopidx: pipe._interrupt = True

return callback_kwargs

pipe( "A photo of a cat", numinferencesteps=numinferencesteps, callbackonstepend=interruptcallback, ) ```

📜 Docs: https://huggingface.co/docs/diffusers/main/en/using-diffusers/callback

`peft` in our LoRA training examples

We incorporated peft in all the officially supported training examples concerning LoRA. This greatly simplifies the code and improves readability. LoRA training hasn't been easier, thanks to peft!

More memory-friendly version of LCM LoRA SDXL training

We incorporated best practices from peft to make LCM LoRA training for SDXL more memory-friendly. As such, you don't have to initialize two UNets (teacher and student) anymore. This version also integrates with the datasets library for quick experimentation. Check out this section for more details.

All commits

[docs] Fix video link by @stevhliu in #5986
Fix LLMGroundedDiffusionPipeline super class arguments by @KristianMischke in #5993
Remove a duplicated line? by @sweetcocoa in #6010
[examples/advanceddiffusiontraining] bug fixes and improvements for LoRA Dreambooth SDXL advanced training script by @linoytsaban in #5935
[advanceddreamboothlorasdxltranining_script] readme fix by @linoytsaban in #6019
[docs] Fix SVD video by @stevhliu in #6004
[Easy] minor edits to setup.py by @sayakpaul in #5996
[From Single File] Allow Text Encoder to be passed by @patrickvonplaten in #6020
[Community Pipeline] Regional Prompting Pipeline by @hako-mikan in #6015
[logging] Fix assertion bug by @standardAI in #6012
[Docs] Update a link by @standardAI in #6014
added attentionheaddim, attentiontype, resolutionidx by @charchit7 in #6011
fix style by @patrickvonplaten (direct commit on v0.25.0)
[Kandinsky 3.0] Follow-up TODOs by @yiyixuxu in #5944
[schedulers] create self.sigmas during init by @yiyixuxu in #6006
Post Release: v0.24.0 by @patrickvonplaten in #5985
LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft by @TonyLianLong in #6023
adapt PixArtAlphaPipeline for pixart-lcm model by @lawrence-cj in #5974
[PixArt Tests] remove fast tests from slow suite by @sayakpaul in #5945
[LoRA serialization] fix: duplicate unet prefix problem. by @sayakpaul in #5991
[advanced dreambooth lora sdxl training script] improve help tags by @linoytsaban in #6035
fix StableDiffusionTensorRT super args error by @gujingit in #6009
Update valueguidedsampling.py by @Parth38 in #6027
Update Tests Fetcher by @DN6 in #5950
Add variant argument to dreambooth lora sdxl advanced by @levi in #6021
[Feature] Support IP-Adapter Plus by @okotaku in #5915
[Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ by @RuoyiDu in #6022
[advanced dreambooth lora training script][bugfix] change tokenabstraction type to str by @linoytsaban in #6040
[docs] Add Kandinsky 3 by @stevhliu in #5988
[docs] #Copied from mechanism by @stevhliu in #6007
Move kandinsky convert script by @DN6 in #6047
Pin Ruff Version by @DN6 in #6059
Ldm unet convert fix by @DN6 in #6038
Fix demofusion by @radames in #6049
[From single file] remove depr warning by @patrickvonplaten in #6043
[advanceddreamboothlorasdxltranining_script] save embeddings locally fix by @apolinario in #6058
Device agnostic testing by @arsalanu in #5612
[feat] allow SDXL pipeline to run with fused QKV projections by @sayakpaul in #6030
fix by @DN6 (direct commit on v0.25.0)
Use CC12M for LCM WDS training example by @pcuenca in #5908
Disable Tests Fetcher by @DN6 in #6060
[Advanced Diffusion Training] Cache latents to avoid VAE passes for every training step by @apolinario in #6076
[Euler Discrete] Fix sigma by @patrickvonplaten in #6078
Harmonize HF environment variables + deprecate useauthtoken by @Wauplin in #6066
[docs] SDXL Turbo by @stevhliu in #6065
Add ControlNet-XS support by @UmerHA in #5827
Fix typing inconsistency in Euler discrete scheduler by @iabaldwin in #6052
[PEFT] Adapt example scripts to use PEFT by @younesbelkada in #5388
Fix clearing backend cache from device agnostic testing by @DN6 in #6075
[Community] AnimateDiff + Controlnet Pipeline by @a-r-r-o-w in #5928
EulerDiscreteScheduler add rescale_betas_zero_snr by @Beinsezii in #6024
Add support for IPAdapterFull by @fabiorigano in #5911
Fix a bug in add_noise function by @yiyixuxu in #6085
[Advanced Diffusion Script] Add Widget default text by @apolinario in #6100
[Advanced Training Script] Fix pipe example by @apolinario in #6106
IP-Adapter for StableDiffusionControlNetImg2ImgPipeline by @charchit7 in #5901
IP adapter support for most pipelines by @a-r-r-o-w in #5900
Correct type annotation for VaeImageProcessor.numpy_to_pil by @edwardwli in #6111
[Docs] Fix typos by @standardAI in #6122
[feat: Benchmarking Workflow] add stuff for a benchmarking workflow by @sayakpaul in #5839
[Community] Add SDE Drag pipeline by @Monohydroxides in #6105
[docs] IP-Adapter API doc by @stevhliu in #6140
Add missing subclass docs, Fix broken example in SD_safe by @a-r-r-o-w in #6116
[advanced dreambooth lora sdxl training script] load pipeline for inference only if validation prompt is used by @linoytsaban in #6171
[docs] Add missing \ in lora.md by @pierd in #6174
[Sigmas] Keep sigmas on CPU by @patrickvonplaten in #6173
LoRA test fixes by @DN6 in #6163
Add PEFT to training deps by @DN6 in #6148
Clean Up Comments in LCM(-LoRA) Distillation Scripts. by @dg845 in #6145
Compile test fix by @DN6 in #6104
[LoRA] add an error message when dealing with bestguessweightname ofline by @sayakpaul in #6184
[Core] feat: enable fused attention projections for other SD and SDXL pipelines by @sayakpaul in #6179
[Benchmarks] fix: lcm benchmarking reporting by @sayakpaul in #6198
[Refactor autoencoders] feat: introduce autoencoders module by @sayakpaul in #6129
Fix the test script in examples/texttoimage/README.md by @krahets in #6209
Nit fix to training params by @osanseviero in #6200
[Training] remove depcreated method from lora scripts. by @sayakpaul in #6207
Fix SDXL Inpainting from single file with Refiner Model by @DN6 in #6147
Fix possible re-conversion issues after extracting from safetensors by @d8ahazard in #6097
Fix t2i. blog url by @abinthomasonline in #6205
[Text-to-Video] Clean up pipeline by @patrickvonplaten in #6213
[Torch Compile] Fix torch compile for svd vae by @patrickvonplaten in #6217
Deprecate Pipelines by @DN6 in #6169
Update README.md by @TilmannR in #6191
Support img2img and inpaint in lpw-xl by @a-r-r-o-w in #6114
Update traintexttoimagelora.py by @haofanwang in #6144
[SVD] Fix guidance scale by @patrickvonplaten in #6002
Slow Test for Pipelines minor fixes by @DN6 in #6221
Add converter method for ip adapters by @fabiorigano in #6150
offload the optional module image_encoder by @yiyixuxu in #6151
fix: init for vae during pixart tests by @sayakpaul in #6215
[T2I LoRA training] fix: unscale fp16 gradient problem by @sayakpaul in #6119
ControlNetXS fixes. by @DN6 in #6228
add peft dependency to fast push tests by @sayakpaul in #6229
[refactor embeddings]pixart-alpha by @yiyixuxu in #6212
[Docs] Fix a code example in the ControlNet Inpainting documentation by @raven38 in #6236
[docs] Batched seeds by @stevhliu in #6237
[Fix] Fix Regional Prompting Pipeline by @hako-mikan in #6188
EulerAncestral add rescale_betas_zero_snr by @Beinsezii in #6187
[Refactor upsamplers and downsamplers] separate out upsamplers and downsamplers. by @sayakpaul in #6128
Bump transformers from 4.34.0 to 4.36.0 in /examples/research_projects/realfill by @dependabot[bot] in #6255
fix: unscale fp16 gradient problem & potential error by @lvzii in #6086)
[Refactor] move diffedit out of stable_diffusion by @sayakpaul in #6260
move attend and excite out of stable_diffusion by @sayakpaul (direct commit on v0.25.0)
Revert "move attend and excite out of stable_diffusion" by @sayakpaul (direct commit on v0.25.0)
[Training] remove depcreated method from lora scripts again by @Yimi81 in #6266
[Refactor] move k diffusion out of stable_diffusion by @sayakpaul in #6267
[Refactor] move gligen out of stable diffusion. by @sayakpaul in #6265
[Refactor] move sag out of stable_diffusion by @sayakpaul in #6264
TST Fix LoRA test that fails with PEFT >= 0.7.0 by @BenjaminBossan in #6216
[Refactor] move attend and excite out of stable_diffusion. by @sayakpaul in #6261
[Refactor] move panorama out of stable_diffusion by @sayakpaul in #6262
[Deprecated pipelines] remove pix2pix zero from init by @sayakpaul in #6268
[Refactor] move ldm3d out of stable_diffusion. by @sayakpaul in #6263
open muse by @williamberman in #5437
Remove ONNX inpaint legacy by @DN6 in #6269
Remove peft tests from old lora backend tests by @DN6 in #6273
Allow diffusers to load with Flax, w/o PyTorch by @pcuenca in #6272
[Community Pipeline] Add Marigold Monocular Depth Estimation by @markkua in #6249
Fix Prodigy optimizer in SDXL Dreambooth script by @apolinario in #6290
[LoRA PEFT] fix LoRA loading so that correct alphas are parsed by @sayakpaul in #6225
LoRA Unfusion test fix by @DN6 in #6291
Fix typos in the ValueError for a nested image list as StableDiffusionControlNetPipeline input. by @celestialphineas in #6286
fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same in traintexttoimagelora.py by @mwkldeveloper in #6259
fix: t2i apdater paper link by @sayakpaul in #6314
fix: lora peft dummy components by @sayakpaul in #6308
[Tests] Speed up example tests by @sayakpaul in #6319
fix: cannot set guidance_scale by @Jannchie in #6326
Change LCM-LoRA README Script Example Learning Rates to 1e-4 by @dg845 in #6304
[Peft] fix saving / loading when unet is not "unet" by @kashif in #6046
[Wuerstchen] fix fp16 training and correct lora args by @kashif in #6245
[docs] fix: animatediff docs by @sayakpaul in #6339
[Training] Add datasets version of LCM LoRA SDXL by @sayakpaul in #5778
[Peft / Lora] Add adapter_names in fuse_lora by @younesbelkada in #5823
[Diffusion fast] add doc for diffusion fast by @sayakpaul in #6311
Add rescalebetaszero_snr Argument to DDPMScheduler by @dg845 in #6305
Interruptable Pipelines by @DN6 in #5867
Update Animatediff docs by @DN6 in #6341
Add AnimateDiff conversion scripts by @DN6 in #6340
amused other pipelines docs by @williamberman in #6343
[Docs] fix: video rendering on svd. by @sayakpaul in #6330
[SDXL-IP2P] Update README_sdxl, Replace the link for wandb log with the correct run by @priprapre in #6270
adding auto1111 features to inpainting pipeline by @yiyixuxu in #6072
Remove unused parameters and fixed FutureWarning by @Justin900429 in #6317
amused update links to new repo by @williamberman in #6344
[LoRA] make LoRAs trained with peft loadable when peft isn't installed by @sayakpaul in #6306
Move ControlNetXS into Community Folder by @DN6 in #6316
fix: use retrieve_latents by @Jannchie in #6337
Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. by @dg845 in #6279
Fix "pushtohub only create repo in consistency model lora SDXL training script" by @aandyw in #6102
Fix chunking in SVD by @DN6 in #6350
Add PEFT to advanced training script by @apolinario in #6294
Release: v0.25.0 by @sayakpaul (direct commit on v0.25.0)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@hako-mikan
- [Community Pipeline] Regional Prompting Pipeline (#6015)
- [Fix] Fix Regional Prompting Pipeline (#6188)
@TonyLianLong
- LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft (#6023)
@okotaku
- [Feature] Support IP-Adapter Plus (#5915)
@RuoyiDu
- [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ (#6022)
@UmerHA
- Add ControlNet-XS support (#5827)
@a-r-r-o-w
- [Community] AnimateDiff + Controlnet Pipeline (#5928)
- IP adapter support for most pipelines (#5900)
- Add missing subclass docs, Fix broken example in SD_safe (#6116)
- Support img2img and inpaint in lpw-xl (#6114)
@Monohydroxides
- [Community] Add SDE Drag pipeline (#6105)
@dg845
- Clean Up Comments in LCM(-LoRA) Distillation Scripts. (#6145)
- Change LCM-LoRA README Script Example Learning Rates to 1e-4 (#6304)
- Add rescalebetaszero_snr Argument to DDPMScheduler (#6305)
- Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. (#6279)
@markkua
- [Community Pipeline] Add Marigold Monocular Depth Estimation (#6249)

- Python
Published by sayakpaul over 2 years ago

diffusers - v0.24.0: IP Adapters, Kandinsky 3.0, Stable Video Diffusion, SDXL Turbo

Stable Video Diffusion, SDXL Turbo, IP Adapters, Kandinsky 3.0

Stable Diffusion Video

Stable Video Diffusion is a powerful image-to-video generation model that can generate high resolution (576x1024) 2-4 seconds videos conditioned on the input image.

Image to Video Generation

There are two variants of SVD. SVD and SVD-XT. The SVD checkpoint is trained to generate 14 frames and the SVD-XT checkpoint is further finetuned to generate 25 frames.

You need to condition the generation on an initial image, as follows:

```python import torch

from diffusers import StableVideoDiffusionPipeline from diffusers.utils import loadimage, exportto_video

pipe = StableVideoDiffusionPipeline.frompretrained( "stabilityai/stable-video-diffusion-img2vid-xt", torchdtype=torch.float16, variant="fp16" ) pipe.enablemodelcpu_offload()

Load the conditioning image

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true") image = image.resize((1024, 576))

generator = torch.manualseed(42) frames = pipe(image, decodechunk_size=8, generator=generator).frames[0]

exporttovideo(frames, "generated.mp4", fps=7) ```

Since generating videos is more memory intensive, we can use the decode_chunk_size argument to control how many frames are decoded at once. This will reduce the memory usage. It's recommended to tweak this value based on your GPU memory. Setting decode_chunk_size=1 will decode one frame at a time and will use the least amount of memory, but the video might have some flickering.

Additionally, we also use model cpu offloading to reduce the memory usage.

rocket_generated

SDXL Turbo

SDXL Turbo is an adversarial time-distilled Stable Diffusion XL (SDXL) model capable of running inference in as little as 1 step. Also, it does not use classifier-free guidance, further increasing its speed. On a good consumer GPU, you can now generate an image in just 100ms.

Text-to-Image

For text-to-image, pass a text prompt. By default, SDXL Turbo generates a 512x512 image, and that resolution gives the best results. You can try setting the height and width parameters to 768x768 or 1024x1024, but you should expect quality degradations when doing so.

Make sure to set guidance_scale to 0.0 to disable, as the model was trained without it. A single inference step is enough to generate high quality images. Increasing the number of steps to 2, 3 or 4 should improve image quality.

```py from diffusers import AutoPipelineForText2Image import torch

pipelinetext2image = AutoPipelineForText2Image.frompretrained("stabilityai/sdxl-turbo", torchdtype=torch.float16, variant="fp16") pipelinetext2image = pipeline_text2image.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipelinetext2image(prompt=prompt, guidancescale=0.0, numinferencesteps=1).images[0] image ```

Image-to-image

For image-to-image generation, make sure that num_inference_steps * strength is larger or equal to 1. The image-to-image pipeline will run for int(num_inference_steps * strength) steps, e.g. 0.5 * 2.0 = 1 step in our example below.

```py from diffusers import AutoPipelineForImage2Image from diffusers.utils import loadimage, makeimage_grid

use from_pipe to avoid consuming additional memory when loading a checkpoint

pipeline = AutoPipelineForImage2Image.frompipe(pipelinetext2image).to("cuda")

initimage = loadimage("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") initimage = initimage.resize((512, 512))

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipeline(prompt, image=initimage, strength=0.5, guidancescale=0.0, numinferencesteps=2).images[0] makeimagegrid([init_image, image], rows=1, cols=2) ```

Image-to-image generation sample using SDXL Turbo

IP Adapters

IP Adapters have shown to be remarkably powerful at images conditioned on other images.

Thanks to @okotaku, we have added IP adapters to the most important pipelines allowing you to combine them for a variety of different workflows, e.g. they work with Img2Img2, ControlNet, and LCM-LoRA out of the box.

LCM-LoRA

```python from diffusers import DiffusionPipeline, LCMScheduler import torch from diffusers.utils import load_image

modelid = "sd-dreambooth-library/herge-style" lcmlora_id = "latent-consistency/lcm-lora-sdv1-5"

pipe = DiffusionPipeline.frompretrained(modelid, torch_dtype=torch.float16)

pipe.loadipadapter("h94/IP-Adapter", subfolder="models", weightname="ip-adaptersd15.bin") pipe.loadloraweights(lcmloraid) pipe.scheduler = LCMScheduler.fromconfig(pipe.scheduler.config) pipe.enablemodelcpuoffload()

prompt = "best quality, high quality" image = loadimage("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png") images = pipe( prompt=prompt, ipadapterimage=image, numinferencesteps=4, guidancescale=1, ).images[0] ``` yiyi_test_2_out

ControlNet

```py from diffusers import StableDiffusionControlNetPipeline, ControlNetModel import torch from diffusers.utils import load_image

controlnetmodelpath = "lllyasviel/controlv11f1psd15depth" controlnet = ControlNetModel.frompretrained(controlnetmodelpath, torch_dtype=torch.float16)

pipeline = StableDiffusionControlNetPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torchdtype=torch.float16) pipeline.to("cuda")

image = loadimage("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png") depthmap = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png")

pipeline.loadipadapter("h94/IP-Adapter", subfolder="models", weightname="ip-adaptersd15.bin")

generator = torch.Generator(device="cpu").manualseed(33) images = pipeline( prompt='best quality, high quality', image=depthmap, ipadapterimage=image, negativeprompt="monochrome, lowres, bad anatomy, worst quality, low quality", numinferencesteps=50, generator=generator, ).images images[0].save("yiyitest2out.png") ```

For more information: - :pointright: https://huggingface.co/docs/diffusers/main/en/using-diffusers/loadingadapters#ip-adapter

Kandinsky 3.0

Kandinsky has released the 3rd version, which has much improved text-to-image alignment thanks to using Flan-T5 as the text encoder.

Text-to-Image

```py from diffusers import AutoPipelineForText2Image import torch

pipe = AutoPipelineForText2Image.frompretrained("kandinsky-community/kandinsky-3", variant="fp16", torchdtype=torch.float16) pipe.enablemodelcpu_offload()

prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."

generator = torch.Generator(device="cpu").manualseed(0) image = pipe(prompt, numinference_steps=25, generator=generator).images[0] ```

Image-to-Image

```py from diffusers import AutoPipelineForImage2Image from diffusers.utils import load_image import torch

pipe = AutoPipelineForImage2Image.frompretrained("kandinsky-community/kandinsky-3", variant="fp16", torchdtype=torch.float16) pipe.enablemodelcpu_offload()

prompt = "A painting of the inside of a subway train with tiny raccoons." image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")

generator = torch.Generator(device="cpu").manualseed(0) image = pipe(prompt, image=image, strength=0.75, numinference_steps=25, generator=generator).images[0] ```

Check it out: - :point_right: https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky3#kandinsky-3

All commits

LCM-LoRA docs by @patil-suraj in #5782
[Docs] Update and make improvements by @standardAI in #5819
[docs] Fix title by @stevhliu in #5831
Improve setup.py and add dependency check by @patrickvonplaten in #5826
[Docs] add: japanese sdxl as a reference by @sayakpaul in #5844
Set usedforsecurity=False in hashlib methods (FIPS compliance) by @Wauplin in #5790
fix memory consistency decoder test by @williamberman in #5828
[PEFT] Unpin peft by @patrickvonplaten in #5850
Speed up the peft lora unload by @pacman100 in #5741
[Tests/LoRA/PEFT] Test also on PEFT / transformers / accelerate latest by @younesbelkada in #5820
UnboundLocalError in SDXLInpaint.prepare_latents() by @a-r-r-o-w in #5648
[ControlNet] fix import in single file loading by @sayakpaul in #5834
[Styling] stylify using ruff by @kashif in #5841
[Community] [WIP] LCM Interpolation Pipeline by @a-r-r-o-w in #5767
[JAX] Replace uses of jax.devices("cpu") with jax.local_devices(backend="cpu") by @hvaara in #5864
[test / peft] Fix silent behaviour on PR tests by @younesbelkada in #5852
fix an issue that ipex occupy too much memory, it will not impact per… by @linlifan in #5625
Update LCMScheduler Inference Timesteps to be More Evenly Spaced by @dg845 in #5836
Revert "[Docs] Update and make improvements" by @standardAI in #5858
[docs] Loader APIs by @stevhliu in #5813
Update README.md by @co63oc in #5855
Add tests fetcher by @DN6 in #5848
Addition of new callbacks to controlnets by @a-r-r-o-w in #5812
[docs] MusicLDM by @stevhliu in #5854
Add features to the Dreambooth LoRA SDXL training script by @linoytsaban in #5508
[feat] IP Adapters (author @okotaku ) by @yiyixuxu in #5713
[Lora] Seperate logic by @patrickvonplaten in #5809
ControlNet+Adapter pipeline, and ControlNet+Adapter+Inpaint pipeline by @affromero in #5869
Adds an advanced version of the SD-XL DreamBooth LoRA training script supporting pivotal tuning by @linoytsaban in #5883
[bug fix] fix small bug in readme template of sdxl lora training script by @linoytsaban in #5906
[bug fix] fix small bug in readme template of sdxl lora training script by @linoytsaban in #5914
[Docs] add: 8bit inference with pixart alpha by @sayakpaul in #5814
[@cene555][Kandinsky 3.0] Add Kandinsky 3.0 by @patrickvonplaten in #5913
[Examples] Allow downloading variant model files by @patrickvonplaten in #5531
[Fix: pixart-alpha] random 512px resolution bug by @lawrence-cj in #5842
[Core] add support for gradient checkpointing in transformer_2d by @sayakpaul in #5943
Deprecate KarrasVeScheduler and ScoreSdeVpScheduler by @a-r-r-o-w in #5269
Add Custom Timesteps Support to LCMScheduler and Supported Pipelines by @dg845 in #5874
set the model to train state before accelerator prepare by @sywangyi in #5099
Avoid computing min() that is expensive when do_normalize is False in the image processor by @ivanprado in #5896
Fix LCM Stable Diffusion distillation bug related to parsing unettimecondprojdim by @dg845 in #5893
add LoRA weights load and fuse support for IPEX pipeline by @linlifan in #5920
Replace multiple variables with one variable. by @hi-sushanta in #5715
fix: error on device for lpw_stable_diffusion_xl pipeline if pipe.enable_sequential_cpu_offload() enabled by @VicGrygorchyk in #5885
[Vae] Make sure all vae's work with latent diffusion models by @patrickvonplaten in #5880
[Tests] Make sure that we don't run tests multiple times by @patrickvonplaten in #5949
[Community Pipeline] Diffusion Posterior Sampling for General Noisy Inverse Problems by @tongdaxu in #5939
[From_pretrained] Fix warning by @patrickvonplaten in #5948
[loadtextualinversion]: allow multiple tokens by @yiyixuxu in #5837
[docs] Fix space by @stevhliu in #5898
fix: minor typo in docstring by @soumik12345 in #5961
[ldm3d] Ldm3d upscaler to community pipeline by @estelleafl in #5870
[docs] Update pipeline list by @stevhliu in #5952
[Tests] Refactor test_examples.py for better readability by @sayakpaul in #5946
added doc for Kandinsky3.0 by @charchit7 in #5937
[bug fix] Inpainting for MultiAdapter by @affromero in #5922
Rename output_dir argument by @linhqyy in #5916
[LoRA refactor] move several state dict conversion utils out of lora.py by @sayakpaul in #5955
Support of ip-adapter to the StableDiffusionControlNetInpaintPipeline by @juancopi81 in #5887
[docs] LCM training by @stevhliu in #5796
Controlnet ssd 1b support by @MarkoKostiv in #5779
[Pipeline] Add TextToVideoZeroSDXLPipeline by @vahramtadevosyan in #4695
[Wuerstchen] Adapt lora training example scripts to use PEFT by @kashif in #5959
Fixed custom module importing on Windows by @PENGUINLIONG in #5891
Add SVD by @patil-suraj in #5895
[SDXL Turbo] Add some docs by @patrickvonplaten in #5982
Fix SVD doc by @patil-suraj in #5983

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@a-r-r-o-w
- UnboundLocalError in SDXLInpaint.prepare_latents() (#5648)
- [Community] [WIP] LCM Interpolation Pipeline (#5767)
- Addition of new callbacks to controlnets (#5812)
- Deprecate KarrasVeScheduler and ScoreSdeVpScheduler (#5269)
@dg845
- Update LCMScheduler Inference Timesteps to be More Evenly Spaced (#5836)
- Add Custom Timesteps Support to LCMScheduler and Supported Pipelines (#5874)
- Fix LCM Stable Diffusion distillation bug related to parsing unettimecondprojdim (#5893)
@affromero
- ControlNet+Adapter pipeline, and ControlNet+Adapter+Inpaint pipeline (#5869)
- [bug fix] Inpainting for MultiAdapter (#5922)
@tongdaxu
- [Community Pipeline] Diffusion Posterior Sampling for General Noisy Inverse Problems (#5939)
@estelleafl
- [ldm3d] Ldm3d upscaler to community pipeline (#5870)
@vahramtadevosyan
- [Pipeline] Add TextToVideoZeroSDXLPipeline (#4695)

- Python
Published by patrickvonplaten over 2 years ago

diffusers - [Patch release] Make sure we install correct PEFT version

Small patch release to make sure the correct PEFT version is installed.

All commits

Improve setup.py and add dependency check by @patrickvonplaten in #5826

- Python
Published by patrickvonplaten over 2 years ago

diffusers - v0.23.0: LCM LoRA, SDXL LCM, Consistency Decoder from DALL-E 3

LCM LoRA, LCM SDXL, Consistency Decoder

LCM LoRA

Latent Consistency Models (LCM) made quite the mark in the Stable Diffusion community by enabling ultra-fast inference. LCM author @luosiallen, alongside @patil-suraj and @dg845, managed to extend the LCM support for Stable Diffusion XL (SDXL) and pack everything into a LoRA.

The approach is called LCM LoRA.

Below is an example of using LCM LoRA, taking just 4 inference steps:

```python from diffusers import DiffusionPipeline, LCMScheduler import torch

modelid = "stabilityai/stable-diffusion-xl-base-1.0" lcmlora_id = "latent-consistency/lcm-lora-sdxl"

pipe = DiffusionPipeline.frompretrained(modelid, variant="fp16", torch_dtype=torch.float16).to("cuda")

pipe.loadloraweights(lcmloraid) pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux" image = pipe( prompt=prompt, numinferencesteps=4, guidance_scale=1, ).images[0] ``` You can combine the LoRA with Img2Img, Inpaint, ControlNet, ...

as well as with other LoRAs 🤯

image (31)

👉 Checkpoints 📜 Docs

If you want to learn more about the approach, please have a look at the following:

Paper
Blog

LCM SDXL

Continuing the work of Latent Consistency Models (LCM), we've applied the approach to SDXL as well and give you SSD-1B and SDXL fine-tuned checkpoints.

```python from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler import torch

unet = UNet2DConditionModel.frompretrained( "latent-consistency/lcm-sdxl", torchdtype=torch.float16, variant="fp16", ) pipe = DiffusionPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torchdtype=torch.float16 ).to("cuda") pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manualseed(0) image = pipe( prompt=prompt, numinferencesteps=4, generator=generator, guidancescale=1.0 ).images[0] ```

👉 Checkpoints 📜 Docs

Consistency Decoder

OpenAI open-sourced the consistency decoder used in DALL-E 3. It improves the decoding part in the Stable Diffusion v1 family of models.

```python import torch from diffusers import DiffusionPipeline, ConsistencyDecoderVAE

vae = ConsistencyDecoderVAE.frompretrained("openai/consistency-decoder", torchdtype=pipe.torchdtype) pipe = StableDiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16 ).to("cuda")

pipe("horse", generator=torch.manual_seed(0)).images ```

Find the documentation here to learn more.

All commits

[Custom Pipelines] Make sure that community pipelines can use repo revision by @patrickvonplaten in #5659
post release (v0.22.0) by @sayakpaul in #5658
Add Pixart to AUTOTEXT2IMAGEPIPELINES_MAPPING by @Beinsezii in #5664
Update custom diffusion attn processor by @DN6 in #5663
Model tests xformers fixes by @DN6 in #5679
Update free model hooks by @DN6 in #5680
Fix Basic Transformer Block by @DN6 in #5683
Explicit torch/flax dependency check by @DN6 in #5673
[PixArt-Alpha] fix mask_feature so that precomputed embeddings work with a batch size > 1 by @sayakpaul in #5677
Make sure DDPM and diffusers can be used without Transformers by @sayakpaul in #5668
[PixArt-Alpha] Support non-square images by @sayakpaul in #5672
Improve LCMScheduler by @dg845 in #5681
[Docs] Fix typos, improve, update at Using Diffusers' Task page by @standardAI in #5611
Replacing the nn.Mish activation function with a get_activation function. by @hi-sushanta in #5651
speed up Shap-E fast test by @yiyixuxu in #5686
Fix the misaligned pipeline usage in dreamshaper docstrings by @kirill-fedyanin in #5700
Fixed issafetensorscompatible() handling of windows path separators by @PhilLab in #5650
[LCM] Fix img2img by @patrickvonplaten in #5698
[PixArt-Alpha] fix mask feature condition. by @sayakpaul in #5695
Fix styling issues by @patrickvonplaten in #5699
Add adapter fusing + PEFT to the docs by @apolinario in #5662
Fix prompt bug in AnimateDiff by @DN6 in #5702
[Bugfix] fix error of peft lora when xformers enabled by @okotaku in #5697
Install accelerate from PyPI in PR test runner by @DN6 in #5721
consistency decoder by @williamberman in #5694
Correct consist dec by @patrickvonplaten in #5722
LCM Add Tests by @patrickvonplaten in #5707
[LCM] add: locm docs. by @sayakpaul in #5723
Add LCM Scripts by @patil-suraj in #5727

- Python
Published by sayakpaul over 2 years ago

diffusers - v0.22.3: Fix PixArtAlpha and LCM Image-to-Image pipelines

🐛 There were some sneaky bugs in the PixArt-Alpha and LCM Image-to-Image pipelines which have been fixed in this release.

All commits

[LCM] Fix img2img by @patrickvonplaten in #5698
[PixArt-Alpha] fix mask feature condition. by @sayakpaul in #5695

- Python
Published by sayakpaul over 2 years ago

diffusers - Patch Release v0.22.2: Fix Animate Diff, fix DDPM import, Pixart various

Fix Basic Transformer Block by @DN6 in #5683
[PixArt-Alpha] fix mask_feature so that precomputed embeddings work with a batch size > 1 by @sayakpaul in #5677
Make sure DDPM and diffusers can be used without Transformers by @sayakpaul in #5668
[PixArt-Alpha] Support non-square images by @sayakpaul in #5672

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Fix community vs. hub pipelines revision

[Custom Pipelines] Make sure that community pipelines can use repo revision by @patrickvonplaten

- Python
Published by patrickvonplaten over 2 years ago

diffusers - v0.22.0: LCM, PixArt-Alpha, AnimateDiff, PEFT integration for LoRA, and more

Latent Consistency Models (LCM)

Untitled

LCMs enable a significantly fast inference process for diffusion models. They require far fewer inference steps to produce high-resolution images without compromising the image quality too much. Below is a usage example:

```python import torch from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.frompretrained("SimianLuo/LCMDreamshaperv7", torchdtype=torch.float32)

To save GPU memory, torch.float16 can be used, but it may compromise image quality.

pipe.to(torchdevice="cuda", torchdtype=torch.float32)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.

numinferencesteps = 4

images = pipe(prompt=prompt, numinferencesteps=numinferencesteps, guidance_scale=8.0).images ```

Refer to the documentation to learn more.

LCM comes with both text-to-image and image-to-image pipelines and they were contributed by @luosiallen, @nagolinc, and @dg845.

PixArt-Alpha

header_collage

PixArt-Alpha is a Transformer-based text-to-image diffusion model that rivals the quality of the existing state-of-the-art ones, such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient.

It was trained T5 text embeddings and has a maximum sequence length of 120. Thus, it allows for more detailed prompt inputs, unlocking better quality generations.

Despite the large text encoder, with model offloading, it takes a little under 11GBs of VRAM to run the PixArtAlphaPipeline:

```python from diffusers import PixArtAlphaPipeline import torch

pipelineid = "PixArt-alpha/PixArt-XL-2-1024-MS" pipeline = PixArtAlphaPipeline.frompretrained(pipelineid, torchdtype=torch.float16) pipeline.enablemodelcpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert." image = pipe(prompt).images[0] image.save("sahara.png") ```

Check out the docs to learn more.

AnimateDiff

animatediff-doc

AnimateDiff is a modelling framework that allows you to create videos using pre-existing Stable Diffusion text-to-image models. It achieves this by inserting motion module layers into a frozen text-to-image model and training it on video clips to extract a motion prior.

These motion modules are applied after the ResNet and Attention blocks in the Stable Diffusion UNet. Their purpose is to introduce coherent motion across image frames. To support these modules, we introduce the concepts of a MotionAdapter and a UNetMotionModel. These serve as a convenient way to use these motion modules with existing Stable Diffusion models.

The following example demonstrates how you can utilize the motion modules with an existing Stable Diffusion text-to-image model.

```python import torch from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler from diffusers.utils import exporttogif

Load the motion adapter

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load SD 1.5 based finetuned model

modelid = "SG161222/RealisticVisionV5.1noVAE" pipe = AnimateDiffPipeline.frompretrained(modelid, motionadapter=adapter) scheduler = DDIMScheduler.frompretrained( modelid, subfolder="scheduler", clipsample=False, timestepspacing="linspace", stepsoffset=1 ) pipe.scheduler = scheduler

enable memory savings

pipe.enablevaeslicing() pipe.enablemodelcpu_offload()

output = pipe( prompt=( "masterpiece, bestquality, highlydetailed, ultradetailed, sunset, " "orange sky, warm lighting, fishing boats, ocean waves seagulls, " "rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, " "golden hour, coastal landscape, seaside scenery" ), negativeprompt="bad quality, worse quality", numframes=16, guidancescale=7.5, numinferencesteps=25, generator=torch.Generator("cpu").manualseed(42), ) frames = output.frames[0] exporttogif(frames, "animation.gif") ```

You can convert an existing 2D UNet into a UNetMotionModel:

```python from diffusers import MotionAdapter, UNetMotionModel, UNet2DConditionModel

unet = UNetMotionModel()

Load from an existing 2D UNet and MotionAdapter

unet2D = UNet2DConditionModel.frompretrained("SG161222/RealisticVisionV5.1noVAE", subfolder="unet") motionadapter = MotionAdapter.frompretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load motion adapter here

unetmotion = UNetMotionModel.fromunet2d(unet2D, motion_adapter: Optional = None)

Or load motion modules after init

unetmotion.loadmotionmodules(motionadapter)

freeze all 2D UNet layers except for the motion modules for finetuning

unetmotion.freezeunet2d_params()

Save only motion modules

unetmotion.savemotionmodule(, pushto_hub=True) ```

AnimateDiff also comes with motion LoRA modules, letting you control subtleties:

```python import torch from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler from diffusers.utils import exporttogif

Load the motion adapter

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load SD 1.5 based finetuned model

modelid = "SG161222/RealisticVisionV5.1noVAE" pipe = AnimateDiffPipeline.frompretrained(modelid, motionadapter=adapter) pipe.loadloraweights("guoyww/animatediff-motion-lora-zoom-out", adaptername="zoom-out")

scheduler = DDIMScheduler.frompretrained( modelid, subfolder="scheduler", clipsample=False, timestepspacing="linspace", steps_offset=1 ) pipe.scheduler = scheduler

enable memory savings

pipe.enablevaeslicing() pipe.enablemodelcpu_offload()

Check out the documentation to learn more.

PEFT 🤝 Diffusers

There are many adapters (LoRA, for example) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images. With the 🤗 PEFT integration in 🤗 Diffusers, it is really easy to load and manage adapters for inference.

Here is an example of combining multiple LoRAs using this new integration:

```python from diffusers import DiffusionPipeline import torch

pipeid = "stabilityai/stable-diffusion-xl-base-1.0" pipe = DiffusionPipeline.frompretrained(pipeid, torchdtype=torch.float16).to("cuda")

Load LoRA 1.

pipe.loadloraweights("CiroN2022/toy-face", weightname="toyfacesdxl.safetensors", adaptername="toy")

Load LoRA 2.

pipe.loadloraweights("nerijs/pixel-art-xl", weightname="pixel-art-xl.safetensors", adaptername="pixel")

Combine the adapters.

pipe.setadapters(["pixel", "toy"], adapterweights=[0.5, 1.0])

Perform inference.

prompt = "toyface of a hacker with a hoodie, pixel art" image = pipe( prompt, numinferencesteps=30, crossattentionkwargs={"scale": 1.0}, generator=torch.manualseed(0) ).images[0] image ```

Untitled 1

Refer to the documentation to learn more.

Community components with community pipelines

We have had support for community pipelines for a while now. This enables fast integration for pipelines we cannot directly integrate within the core codebase of the library. However, community pipelines always rely on the building blocks from Diffusers, which can be restrictive for advanced use cases.

To elevate this, we’re elevating community pipelines with community components starting this release 🤗 By specifying trust_remote_code=True and writing the pipeline repository in a specific way, users can customize their pipeline and component code as flexibly as possible:

```python from diffusers import DiffusionPipeline import torch

pipeline = DiffusionPipeline.frompretrained( "/", trustremotecode=True, torchdtype=torch.float16 ).to("cuda")

prompt = "hello"

Text embeds

promptembeds, negativeembeds = pipeline.encode_prompt(prompt)

Keyframes generation (8x64x40, 2fps)

videoframes = pipeline( promptembeds=promptembeds, negativepromptembeds=negativeembeds, numframes=8, height=40, width=64, numinferencesteps=2, guidancescale=9.0, output_type="pt" ).frames ```

Refer to the documentation to learn more.

Dynamic callbacks

Most 🤗 Diffusers pipelines now accept a callback_on_step_end argument that allows you to change the default behavior of denoising loop with custom defined functions. Here is an example of a callback function we can write to disable classifier free guidance after 40% of inference steps to save compute with minimum tradeoff in performance.

```python def callbackdynamiccfg(pipe, stepindex, timestep, callbackkwargs):
# adjust the batchsize of promptembeds according to guidancescale if stepindex == int(pipe.numtimestep * 0.4): promptembeds = callbackkwargs["promptembeds"] promptembeds =promptembeds.chunk(2)[-1]

# update guidance_scale and prompt_embeds
pipe._guidance_scale = 0.0
callback_kwargs["prompt_embeds"] = prompt_embeds
return callback_kwargs

```

Here’s how you can use it:

```python import torch from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5", torchdtype=torch.float16) pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"

generator = torch.Generator(device="cuda").manualseed(1) out= pipe(prompt, generator=generator, callbackonstepend=callbackcustomcfg, callbackonstependtensorinputs=['promptembeds'])

out.images[0].save("outcustomcfg.png") ```

Check out the docs to learn more.

All commits

[PEFT / LoRA ] Fix text encoder scaling by @younesbelkada in #5204
Fix doc KO unconditionalimagegeneration.md by @mishig25 in #5236
Flax: Ignore PyTorch, ONNX files when they coexist with Flax weights by @pcuenca in #5237
Fixed constants.py not using hugging face hub environment variable by @Zanz2 in #5222
Compile test fixes by @DN6 in #5235
[PEFT warnings] Only sure deprecation warnings in the future by @patrickvonplaten in #5240
Add docstrings in forward methods of adapter model by @Nandika-A in #5253
make style by @patrickvonplaten (direct commit on main)
[WIP] Refactor UniDiffuser Pipeline and Tests by @dg845 in #4948
fix: how print training resume logs. by @sayakpaul in #5117
Add docstring for the AutoencoderKL's decode by @freespirit in #5242
Add a docstring for the AutoencoderKL's encode by @freespirit in #5239
Update UniPC to support 1D diffusion. by @leng-yue in #5199
[Schedulers] Fix callback steps by @patrickvonplaten in #5261
make fix copies by @patrickvonplaten (direct commit on main)
[Research folder] Add SDXL example by @patrickvonplaten in #5275
Fix UniPC scheduler for 1D by @patrickvonplaten in #5276
New Pipeline Slow Test runners by @DN6 in #5131
handle case when controlnet is list or tuple by @noskill in #5179
make style by @patrickvonplaten (direct commit on main)
Zh doc by @WADreaming in #4807
✨ [Core] Add FreeU mechanism by @kadirnar in #5164
pin torch version by @DN6 in #5297
add: entry for DDPO support. by @sayakpaul in #5250
Min-SNR Gamma: correct the fix for SNR weighted loss in v-prediction … by @bghira in #5238
Update bug-report.yml by @patrickvonplaten (direct commit on main)
Bump tolerance on shape test by @DN6 in #5289
Add from single file to StableDiffusionUpscalePipeline and StableDiffusionLatentUpscalePipeline by @DN6 in #5194
[LoRA] fix: torch.compile() for lora conv by @sayakpaul in #5298
[docs] Improved inpaint docs by @stevhliu in #5210
Minor fixes by @TimothyAlexisVass in #5309
[Hacktoberfest]Fixing issues #5241 by @jgyfutub in #5255
Update README.md by @ShubhamJagtap2000 in #5267
fix typo in train dreambooth lora description by @themez in #5332
Fix [core/GLIGEN]: TypeError when iterating over 0-d tensor with In-painting mode when EulerAncestralDiscreteScheduler is used by @rchuzh99 in #5305
fix inference in custom diffusion by @caopulan in #5329
Improve performance of fast test by reducing down blocks by @sepal in #5290
make-fast-test-for-StableDiffusionControlNetPipeline-faster by @m0saan in #5292
Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5299
Add py.typed for PEP 561 compliance by @byarbrough in #5326
[HacktoberFest] Add missing docstrings to diffusers/models by @a-r-r-o-w in #5248
make style by @patrickvonplaten (direct commit on main)
Fix links in docs to adapter code by @johnowhitaker in #5323
replace references to deprecated KeyArray & PRNGKeyArray by @jakevdp in #5324
Fix loading broken LoRAs that could give NaN by @patrickvonplaten in #5316
[JAX] Replace uses of jnp.array in types with jnp.ndarray. by @hvaara in #4719
Add missing dependency in requirements file by @juliensimon in #5345
fix problem of 'accelerator.ismainprocess' to run in mutiple GPUs by @jiaqiw09 in #5340
[docs] Create a mask for inpainting by @stevhliu in #5322
Adding PyTorch XLA support for sdxl inference by @ssusie in #5273
[Examples] use loralinear instead of depecrecated lora attn procs. by @sayakpaul in #5331
Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5312
Fix StableDiffusionXLImg2ImgPipeline creation in sdxl tutorial by @soumik12345 in #5367
I Added Doc-String Into The class. by @hi-sushanta in #5293
make style by @patrickvonplaten (direct commit on main)
[docs] Minor fixes by @stevhliu in #5369
New xformers test runner by @DN6 in #5349
[Core] Add FreeU to all the core pipelines and their (mostly-used) derivatives by @sayakpaul in #5376
[core / PEFT / LoRA] Integrate PEFT into Unet by @younesbelkada in #5151
[Bot] FIX stale.py uses timezone-aware datetime by @sayakpaul in #5396
[Examples] fix unconditioning generation training example for mixed-precision training by @sayakpaul in #5407
[Wuerstchen] text to image training script by @kashif in #5052
[Docs] add docs on peft diffusers integration by @sayakpaul in #5359
chore: fix typos by @afuetterer in #5386
[Examples] Update with HFApi by @sayakpaul in #5393
Add ability to mix usage of T2I-Adapter(s) and ControlNet(s). by @GreggHelt2 in #5362
make style by @patrickvonplaten (direct commit on main)
[Core] Fix/pipeline without text encoders for SDXL by @sayakpaul in #5301
[Examples] Follow up of #5393 by @sayakpaul in #5420
changed channel parameters for UNET and VAE. Changed configs parameters of CLIPText by @aeros29 in #5370
Chore: Typo fixed in multiple files by @SusheelThapa in #5422
Update base image for slow CUDA tests by @DN6 in #5426
Fix pipe fetcher for slow tests by @DN6 in #5424
make fix copies by @patrickvonplaten (direct commit on main)
Merge branch 'main' of https://github.com/huggingface/diffusers by @patrickvonplaten (direct commit on main)
[from_single_file()]fix: local single file loading. by @sayakpaul in #5440
Add latent consistency by @patrickvonplaten in #5438
Update-DeepFloyd-IF-Pipelines-Docstrings by @m0saan in #5304
style(sdxl): remove identity assignments by @liang-hou in #5418
Fix the order of width and height of original size in SDXL training script by @linjiapro in #5382
make style by @patrickvonplaten (direct commit on main)
Beautiful Doc string added into the UNetMidBlock2D class. by @hi-sushanta in #5389
make style by @patrickvonplaten (direct commit on main)
fix une2td ignoring class_labels by @kesimeg in #5401
Added support to create asymmetrical U-Net structures by @Gothos in #5400
[PEFT] Fix scale unscale with LoRA adapters by @younesbelkada in #5417
Make T2I-Adapter downscale padding match the UNet by @RyanJDick in #5435
Update README.md by @anvilarth in #5497
fixed SDXL text encoder training bug #5016 by @shyammarjit in #5078
make style by @patrickvonplaten (direct commit on main)
[torch.compile] fix graph break problems partially by @sayakpaul in #5453
Fix Slow Tests by @DN6 in #5469
Fix typo in controlnet docs by @MrSyee in #5486
[BUG] in transformer_temporal Fix Bugs by @zideliu in #5496
[docs] Fix links by @stevhliu in #5499
fix a few issues in controlnet inpaint pipelines by @yiyixuxu in #5470
Fixed autoencoder typo by @abhisharsinha in #5500
[Core] Refactor activation and normalization layers by @sayakpaul in #5493
Register BaseOutput subclasses as supported torch.utils._pytree nodes by @BowenBao in #5459
Japanese docs by @isamu-isozaki in #5478
[docs] General updates by @stevhliu in #5378
Add Latent Consistency Models Pipeline by @dg845 in #5448
fix typo by @mymusise in #5505
fix error of peft lora when xformers enabled by @AnyISalIn in #5506
fix a bug in 2nd order schedulers when using in ensemble of experts config by @yiyixuxu in #5511
[Schedulers] Fix 2nd order other than heun by @patrickvonplaten in #5526
Add a new community pipeline by @nagolinc in #5477
make style by @patrickvonplaten (direct commit on main)
Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5391
make fix-copies by @patrickvonplaten (direct commit on main)
Fix missing punctuation in PHILOSOPHY.md by @RampagingSloth in #5530
fix a bug on torch_dtype argument in from_single_file of ControlNetModel by @xuyxu in #5528
[docs] Loader docs by @stevhliu in #5473
Add from_pt flag to enable model from PT by @RissyRan in #5501
Remove multiple if-else statement in the get_activation function. by @hi-sushanta in #5446
[Tests] Speed up expert of mixture tests by @patrickvonplaten in #5533
[Tests] Optimize test configurations for faster execution by @p1kit in #5535
[Remote code] Add functionality to run remote models, schedulers, pipelines by @patrickvonplaten in #5472
Update train_dreambooth.py - fix typos by @nickkolok in #5539
correct checkpoint in kandinsky2.2 doc page by @yiyixuxu in #5550
[Core] fix FreeU disable method by @sayakpaul in #5552
[docs] Internal classes API by @stevhliu in #5513
fix error reported 'findunusedparameters' running in mutiple GPUs by @jiaqiw09 in #5355
docs: initial pt translation by @SirMonteiro in #5549
Fix moved expandmask function by @patrickvonplaten in #5581
[PEFT / Tests] Add peft slow tests on push by @younesbelkada in #5419
Add realfill by @thuanz123 in #5456
add fix to be able use StableDiffusionXLAdapterPipeline.fromsinglefile by @pshtif in #5547
Stabilize DPM++, especially for SDXL and SDE-DPM++ by @LuChengTHU in #5541
Fix incorrect loading of custom pipeline by @a-r-r-o-w in #5568
[core / PEFT ]Bump transformers min version for PEFT integration by @younesbelkada in #5579
Fix divide by zero RuntimeWarning by @TimothyAlexisVass in #5543
[Community Pipelines] add textual inversion support for stablediffusionipex by @miaojinc in #5571
fix a mistake in text2image training script for kandinsky2.2 by @yiyixuxu in #5244
Update docker image for xformers by @DN6 in #5597
[Docs] Fix typos by @standardAI in #5583
[Docs] Fix typos, improve, update at Tutorials page by @standardAI in #5586
[docs] Lu lambdas by @stevhliu in #5602
Update final CPU offloading code for more diffusion pipelines by @clarencechen in #5589
[Core] enable lora for sdxl adapters too and add slow tests. by @ilisparrow in #5555
fix by @patrickvonplaten (direct commit on main)
Remove Redundant Variables from Encoder and Decoder by @hi-sushanta in #5569
Revert "Fix the order of width and height of original size in SDXL training script" by @patrickvonplaten in #5614
[PEFT / LoRA] Fix civitai bug when network alpha is an empty dict by @younesbelkada in #5608
[Docs] Fix typos, improve, update at Get Started page by @standardAI in #5587
[SDXL Adapter] Revert load lora by @patrickvonplaten in #5615
[docs] Kandinsky guide by @stevhliu in #4555
[remote code] document trust remote code. by @sayakpaul in #5620
[Tests] Fix cpu offload test by @patrickvonplaten in #5626
[Docs] Fix typos, improve, update at Conceptual Guides page by @standardAI in #5585
Animatediff Proposal by @DN6 in #5413
[Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page by @standardAI in #5584
[LCM] Make sure img2img works by @patrickvonplaten in #5632
Update animatediff docs to include section on Motion LoRAs by @DN6 in #5639
[Easy] Minor AnimateDiff Doc nits by @sayakpaul in #5640
fix a bug in AutoPipeline.from_pipe() when creating a controlnet pipeline from an existing controlnet by @yiyixuxu in #5638
[Easy] clean up the LCM docstrings. by @sayakpaul in #5637
Model loading speed optimization by @RyanJDick in #5635
Clean up LCM Pipeline and Test Code. by @dg845 in #5641
[Docs] Fix typos, improve, update at Using Diffusers' Tecniques page by @standardAI in #5627
[Core] support for tiny autoencoder in img2img by @sayakpaul in #5636
Remove the redundant line from the adapter.py file. by @hi-sushanta in #5618
add callbacks to denoising step by @yiyixuxu in #5427
[Feat] PixArt-Alpha by @sayakpaul in #5642
correct pipeline class name by @sayakpaul in #5652

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@dg845
- [WIP] Refactor UniDiffuser Pipeline and Tests (#4948)
- Add Latent Consistency Models Pipeline (#5448)
- Clean up LCM Pipeline and Test Code. (#5641)
@kadirnar
- ✨ [Core] Add FreeU mechanism (#5164)
@a-r-r-o-w
- Improve typehints and docs in diffusers/models (#5299)
- [HacktoberFest] Add missing docstrings to diffusers/models (#5248)
- Improve typehints and docs in diffusers/models (#5312)
- Improve typehints and docs in diffusers/models (#5391)
- Fix incorrect loading of custom pipeline (#5568)
@isamu-isozaki
- Japanese docs (#5478)
@nagolinc
- Add a new community pipeline (#5477)
@SirMonteiro
- docs: initial pt translation (#5549)
@thuanz123
- Add realfill (#5456)
@standardAI
- [Docs] Fix typos (#5583)
- [Docs] Fix typos, improve, update at Tutorials page (#5586)
- [Docs] Fix typos, improve, update at Get Started page (#5587)
- [Docs] Fix typos, improve, update at Conceptual Guides page (#5585)
- [Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page (#5584)
- [Docs] Fix typos, improve, update at Using Diffusers' Tecniques page (#5627)

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Fix Lora fusing/unfusing

[Lora] fix lora fuse unfuse in #5003 by @patrickvonplaten

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Fix LoRA attention processor for xformers.

[LoRA, Xformers] Fix xformers lora by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/5201

- Python
Published by sayakpaul over 2 years ago

diffusers - Patch Release: CPU offloading + Lora load/Text inv load & Multi Adapter

[Textual inversion] Refactor textual inversion to make it cleaner by @patrickvonplaten in #5076
t2i Adapter community member fix by @williamberman in #5090
remove unused adapter weights in constructor by @williamberman in #5088
[LoRA] don't break offloading for incompatible lora ckpts. by @sayakpaul in #5085

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release v0.21.1: Fix import and config loading for `from_single_file`

Fix model offload bug when key isn't present by @DN6 in #5030
[Import] Don't force transformers to be installed by @patrickvonplaten in #5035
allow loading of sd models from safetensors without online lookups using local config files by @vladmandic in #5019
[Import] Add missing settings / Correct some dummy imports by @patrickvonplaten in #5036

- Python
Published by patrickvonplaten over 2 years ago

diffusers - v0.21.0: Würstchen, Faster LoRA loading, Faster imports, T2I Adapters for SDXL, and more

Würstchen

Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images, allowing cheaper and faster inference.

Here is how to use the Würstchen as a pipeline:

```python import torch from diffusers import AutoPipelineForText2Image from diffusers.pipelines.wuerstchen import DEFAULTSTAGEC_TIMESTEPS

pipeline = AutoPipelineForText2Image.frompretrained("warp-ai/wuerstchen", torchdtype=torch.float16).to("cuda")

caption = "Anthropomorphic cat dressed as a firefighter" images = pipeline( caption, height=1024, width=1536, priortimesteps=DEFAULTSTAGECTIMESTEPS, priorguidancescale=4.0, numimagesper_prompt=4, ).images ```

To learn more about the pipeline, check out the official documentation.

This pipeline was contributed by one of the authors of Würstchen, @dome272, with help from @kashif and @patrickvonplaten.

👉 Try out the model here: https://huggingface.co/spaces/warp-ai/Wuerstchen

T2I Adapters for Stable Diffusion XL (SDXL)

T2I-Adapter is an efficient plug-and-play model that provides extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

In collaboration with the Tencent ARC researchers, we trained T2I Adapters on various conditions: sketch, canny, lineart, depth, and openpose.

Below is an how to use the StableDiffusionXLAdapterPipeline.

First ensure, the controlnet_aux is installed:

bash pip install -U controlnet_aux==0.0.7

Then we can initialize the pipeline:

```python import torch from controlnetaux.lineart import LineartDetector from diffusers import (AutoencoderKL, EulerAncestralDiscreteScheduler, StableDiffusionXLAdapterPipeline, T2IAdapter) from diffusers.utils import loadimage, makeimagegrid

load adapter

adapter = T2IAdapter.frompretrained( "TencentARC/t2i-adapter-lineart-sdxl-1.0", torchdtype=torch.float16, varient="fp16" ).to("cuda")

load pipeline

modelid = "stabilityai/stable-diffusion-xl-base-1.0" eulera = EulerAncestralDiscreteScheduler.frompretrained( modelid, subfolder="scheduler" ) vae = AutoencoderKL.frompretrained( "madebyollin/sdxl-vae-fp16-fix", torchdtype=torch.float16 ) pipe = StableDiffusionXLAdapterPipeline.frompretrained( modelid, vae=vae, adapter=adapter, scheduler=eulera, torchdtype=torch.float16, variant="fp16", ).to("cuda")

load lineart detector

linedetector = LineartDetector.frompretrained("lllyasviel/Annotators").to("cuda") ```

We then load an image to compute the lineart conditionings:

python url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg" image = load_image(url) image = line_detector(image, detect_resolution=384, image_resolution=1024)

Then we generate:

python prompt = "Ice dragon roar, 4k photo" negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured" gen_images = pipe( prompt=prompt, negative_prompt=negative_prompt, image=image, num_inference_steps=30, adapter_conditioning_scale=0.8, guidance_scale=7.5, ).images[0]

Refer to the official documentation to learn more about StableDiffusionXLAdapterPipeline.

This blog post summarizes our experiences and provides all the resources (including the pre-trained T2I Adapter checkpoints) to get started using T2I Adapters for SDXL.

We’re also releasing a training script for training your custom T2I Adapters on SDXL. Check out the documentation to learn more.

Thanks to @MC-E (one of the authors of T2I Adapters) for contributing the StableDiffusionXLAdapterPipeline in #4696.

Faster imports

We introduced “lazy imports” (#4829) to significantly improve the time it takes to import our modules (such as pipelines, models, and so on). Below is a comparison of the timings with and without lazy imports on import diffusers.

With lazy imports:

bash real 0m0.417s user 0m0.714s sys 0m0.499s

Without lazy imports:

bash real 0m5.391s user 0m5.299s sys 0m1.273s

Faster LoRA loading

Previously, loading LoRA parameters using the load_lora_weights() used to be time-consuming as reported in #4975. To this end, we introduced a low_cpu_mem_usage argument to the load_lora_weights() method in #4994 which should speed up the loading time significantly. Just pass low_cpu_mem_usage=True to take the benefits.

LoRA fusing

LoRA weights can now be fused into the model weights, thus allowing models that have loaded LoRA weights to run as fast as models without. It also enables to fuse multiple LoRAs into the same model.

For more information, have a look at the documentation and the original PR: https://github.com/huggingface/diffusers/pull/4473.

More support for LoRAs

Almost all LoRA formats out there for SDXL are now supported. For a more details, please check the documentation.

All commits

fix: lora sdxl tests by @sayakpaul in #4652
Support tiled encode/decode for AutoencoderTiny by @Isotr0py in #4627
Add SDXL long weighted prompt pipeline (replace pr:4629) by @xhinker in #4661
add configfile to fromsingle_file by @zuojianghua in #4614
Add AudioLDM 2 by @sanchit-gandhi in #4549
[docs] Add note in UniDiffusers Doc about PyTorch 1.X numerical stability issue by @dg845 in #4703
[Core] enable lora for sdxl controlnets too and add slow tests. by @sayakpaul in #4666
[LoRA] ensure different LoRA ranks for text encoders can be properly handled by @sayakpaul in #4669
[LoRA] default to None when fc alphas are not available. by @sayakpaul in #4706
Replaces DIFFUSERS_TEST_DEVICE backend list with trying device by @vvvm23 in #4673
add convert diffuser pipeline of XL to original stable diffusion by @realliujiaxu in #4596
Add referenceattn & referenceadain support for sdxl by @zideliu in #4502
[Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
rename test file to run, so that examples tests do not fail by @patrickvonplaten in #4715
Revert "Move controlnet load local tests to nightly by @patrickvonplaten in #4543)"
Fix all docs by @patrickvonplaten in #4721
fix bad error message when transformers is missing by @patrickvonplaten in #4714
Fix AutoencoderTiny encoder scaling convention by @madebyollin in #4682
[Examples] fix checkpointing and casting bugs in train_text_to_image_lora_sdxl.py by @sayakpaul in #4632
[AudioLDM Docs] Fix docs for output by @sanchit-gandhi in #4737
[docs] add variant="fp16" flag by @realliujiaxu in #4678
[AudioLDM Docs] Update docstring by @sanchit-gandhi in #4744
fix dummy import for AudioLDM2 by @patil-suraj in #4741
change validation scheduler for train_dreambooth.py when training IF by @wyz894272237 in #4333
add a step_index counter by @yiyixuxu in #4347
[AudioLDM2] Doc fixes by @sanchit-gandhi in #4739
Bugfix for SDXL model loading in low ram system. by @Symbiomatrix in #4628
Clean up flaky behaviour on Slow CUDA Pytorch Push Tests by @DN6 in #4759
[Tests] Fix paint by example by @patrickvonplaten in #4761
[fix] multi t2i adapter set totaldownscalefactor by @williamberman in #4621
[Examples] Add madebyollin VAE to SDXL LoRA example, along with an explanation by @mnslarcher in #4762
[LoRA] relax lora loading logic by @sayakpaul in #4610
[Examples] fix sdxl dreambooth lora checkpointing. by @sayakpaul in #4749
fix sdxllwp empty negprompt error issue by @xhinker in #4743
improve setup.py by @sayakpaul in #4748
Torch device by @patrickvonplaten in #4755
[AudioLDM 2] Pipeline fixes by @sanchit-gandhi in #4738
Convert MusicLDM by @sanchit-gandhi in #4579
[WIP ] Proposal to address precision issues in CI by @DN6 in #4775
fix a bug in from_pretrained when load optional components by @yiyixuxu in #4745
fix bug of progress bar in clip guided images mixing by @scnuhealthy in #4729
Fixed broken link of CLIP doc in evaluation doc by @mayank2 in #4760
instanceprompt->classprompt by @williamberman in #4784
refactor preparemaskandmaskedimage with VaeImageProcessor by @yiyixuxu in #4444
Allow passing a checkpoint statedict to convertfrom_ckpt (instead of just a string path) by @cmdr2 in #4653
[SDXL] Add docs about forcing passed embeddings to be 0 by @patrickvonplaten in #4783
[Core] Support negative conditions in SDXL by @sayakpaul in #4774
Unet fix by @canberk17 in #4769
[Tests] Tighten up LoRA loading relaxation by @sayakpaul in #4787
[docs] Fix syntax for compel by @stevhliu in #4794
[Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795
[SDXL Lora] Fix last ben sdxl lora by @patrickvonplaten in #4797
[LoRA Attn Processors] Refactor LoRA Attn Processors by @patrickvonplaten in #4765
Update loaders.py by @chillpixelfun in #4805
[WIP] Add Fabric by @shauray8 in #4201
Fix save_path bug in textual inversion training script by @Yead in #4710
[Examples] Save SDXL LoRA weights with chosen precision by @mnslarcher in #4791
Fix Disentangle ONNX and non-ONNX pipeline by @DN6 in #4656
fix bug in StableDiffusionXLControlNetPipeline when use guess_mode by @yiyixuxu in #4799
fix autopipeline: pass kwargs to loadconfig by @yiyixuxu in #4793
add StableDiffusionXLControlNetImg2ImgPipeline by @yiyixuxu in #4592
add models for T2I-Adapter-XL by @MC-E in #4696
Fuse loras by @patrickvonplaten in #4473
Fix convertoriginalstablediffusionto_diffusers script by @wingrime in #4817
Support saving multiple t2i adapter models under one checkpoint by @VitjanZ in #4798
fix typo by @zideliu in #4822
VaeImageProcessor: Allow image resizing also for torch and numpy inputs by @gajendr-nikhil in #4832
[Core] refactor encode_prompt by @sayakpaul in #4617
Add loading ckpt from file for SDXL controlNet by @antigp in #4683
Fix Unfuse Lora by @patrickvonplaten in #4833
sketch inpaint from a1111 for non-inpaint models by @noskill in #4824
[docs] SDXL by @stevhliu in #4428
[Docs] improve the LoRA doc. by @sayakpaul in #4838
Fix potential type mismatch errors in SDXL pipelines by @hyk1996 in #4796
Fix image processor inputs width by @echarlaix in #4853
Remove warn with deprecate by @patrickvonplaten in #4850
[docs] ControlNet guide by @stevhliu in #4640
[SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858
fix sdxl-inpaint fast test by @yiyixuxu in #4859
[docs] Add inpainting example for forcing the unmasked area to remain unchanged to the docs by @dg845 in #4536
Add GLIGEN Text Image implementation by @tuanh123789 in #4777
Test Cleanup Precision issues by @DN6 in #4812
Fix link from API to using-diffusers by @pcuenca in #4856
[Docs] Korean translation update by @Snailpong in #4684
fix a bug in sdxl-controlnet-img2img when using MultiControlNetModel by @yiyixuxu in #4862
support AutoPipeline.from_pipe between a pipeline and its ControlNet pipeline counterpart by @yiyixuxu in #4861
[WIP] maskedlatentinputs for inpainting pipeline by @yiyixuxu in #4819
[docs] DiffEdit guide by @stevhliu in #4722
[docs] Shap-E guide by @stevhliu in #4700
[ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL by @harutatsuakiyama in #4694
[Tests] Add combined pipeline tests by @patrickvonplaten in #4869
Retrieval Augmented Diffusion Models by @isamu-isozaki in #3297
check for unetloralayers in sdxl pipeline's saveloraweights method by @ErwannMillon in #4821
Fix getdummyinputs for Stable Diffusion Inpaint Tests by @dg845 in #4845
allow passing components to connected pipelines when use the combined pipeline by @yiyixuxu in #4883
[Core] LoRA improvements pt. 3 by @sayakpaul in #4842
Add dropout parameter to UNet2DModel/UNet2DConditionModel by @dg845 in #4882
[Core] better support offloading when side loading is enabled. by @sayakpaul in #4855
Add --vae_precision option to the SDXL pix2pix script so that we have… by @bghira in #4881
[Test] Reduce CPU memory by @patrickvonplaten in #4897
fix a bug in StableDiffusionUpscalePipeline.runsafetychecker by @yiyixuxu in #4886
remove latent input for kandinsky prior_emb2emb pipeline by @yiyixuxu in #4887
[docs] Add stronger warning for SDXL height/width by @stevhliu in #4867
[Docs] add doc entry to explain lora fusion and use of different scales. by @sayakpaul in #4893
[Textual inversion] Relax loading textual inversion by @patrickvonplaten in #4903
[docs] Fix typo in Inpainting force unmasked area unchanged example by @dg845 in #4910
Würstchen model by @kashif in #3849
[InstructPix2Pix] Fix pipeline implementation and add docs by @sayakpaul in #4844
[StableDiffusionXLAdapterPipeline] add adapterconditioningfactor by @patil-suraj in #4937
[StableDiffusionXLAdapterPipeline] allow negative micro conds by @patil-suraj in #4941
[examples] T2IAdapter training script by @patil-suraj in #4934
[Tests] add: tests for t2i adapter training. by @sayakpaul in #4947
guard save model hooks to only execute on main process by @williamberman in #4929
[Docs] add t2i adapter entry to overview of training scripts. by @sayakpaul in #4946
Temp Revert "[Core] better support offloading when side loading is enabled… by @williamberman in #4927
Revert revert and install accelerate main by @williamberman in #4963
[Docs] fix: minor formatting in the Würstchen docs by @sayakpaul in #4965
Lazy Import for Diffusers by @DN6 in #4829
[Core] Remove TF import checks by @patrickvonplaten in #4968
Make sure Flax pipelines can be loaded into PyTorch by @patrickvonplaten in #4971
Update README.md by @patrickvonplaten in #4973
Wuerstchen fixes by @kashif in #4942
Refactor model offload by @patrickvonplaten in #4514
[Bug Fix] Should pass the dtype instead of torch_dtype by @zhiqiang-canva in #4917
[Utils] Correct custom init sort by @patrickvonplaten in #4967
remove extra gligen in import by @DN6 in #4987
fix E721 Do not compare types, use isinstance() by @kashif in #4992
[Wuerstchen] fix combined pipeline's numimagesper_prompt by @kashif in #4989
fix image variation slow test by @DN6 in #4995
fix custom diffusion tests by @DN6 in #4996
[Lora] Speed up lora loading by @patrickvonplaten in #4994
[docs] Fix DiffusionPipeline.enablesequentialcpu_offload docstring by @dg845 in #4952
Fix safety checker seq offload by @patrickvonplaten in #4998
Fix PR template by @stevhliu in #4984
examples fix t2i training by @patrickvonplaten in #5001

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@xhinker
- Add SDXL long weighted prompt pipeline (replace pr:4629) (#4661)
- fix sdxllwp empty negprompt error issue (#4743)
@zideliu
- Add referenceattn & referenceadain support for sdxl (#4502)
- fix typo (#4822)
@shauray8
- [WIP] Add Fabric (#4201)
@MC-E
- add models for T2I-Adapter-XL (#4696)
@tuanh123789
- Add GLIGEN Text Image implementation (#4777)
@Snailpong
- [Docs] Korean translation update (#4684)
@harutatsuakiyama
- [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL (#4694)
@isamu-isozaki
- Retrieval Augmented Diffusion Models (#3297)

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release 0.20.2 - Correct SDXL Inpaint Strength Default

Stable Diffusion XL's strength default was accidentally set to 1.0 when creating the pipeline. The default should be set to 0.9999 instead. This patch release fixes that.

All commits

[SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Fix `torch.compile()` support for ControlNets

https://github.com/huggingface/diffusers/commit/3eb498e7b4868bca7460d41cda52d33c3ede5502#r125606630 introduced a 🐛 that broke the torch.compile() support for ControlNets. This patch release fixes that.

All commits

[Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
[Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795

- Python
Published by sayakpaul almost 3 years ago

diffusers - v0.20.0: SDXL ControlNets with MultiControlNet, GLIGEN, Tiny Autoencoder, SDXL DreamBooth LoRA in free-tier Colab, and more

SDXL ControlNets 🚀

The 🧨 diffusers team has trained two ControlNets on Stable Diffusion XL (SDXL):

Canny (diffusers/controlnet-canny-sdxl-1.0)
Depth (diffusers/controlnet-depth-sdxl-1.0)

image_grid_controlnet_sdxl

You can find all the SDXL ControlNet checkpoints here, including some smaller ones (5 to 7x smaller).

To know more about how to use these ControlNets to perform inference, check out the respective model cards and the documentation. To train custom SDXL ControlNets, you can try out our training script.

MultiControlNet for SDXL

This release also introduces support for combining multiple ControlNets trained on SDXL and performing inference with them. Refer to the documentation to learn more.

GLIGEN

The GLIGEN model was developed by researchers and engineers from University of Wisconsin-Madison, Columbia University, and Microsoft. The StableDiffusionGLIGENPipeline can generate photorealistic images conditioned on grounding inputs. Along with text and bounding boxes, if input images are given, this pipeline can insert objects described by text at the region defined by bounding boxes. Otherwise, it’ll generate an image described by the caption/prompt and insert objects described by text at the region defined by bounding boxes. It’s trained on COCO2014D and COCO2014CD datasets, and the model uses a frozen CLIP ViT-L/14 text encoder to condition itself on grounding inputs.

gligen_gif

(GIF from the official website)

Grounded inpainting

```python import torch from diffusers import StableDiffusionGLIGENPipeline from diffusers.utils import load_image

Insert objects described by text at the region defined by bounding boxes

pipe = StableDiffusionGLIGENPipeline.frompretrained( "masterful/gligen-1-4-inpainting-text-box", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

inputimage = loadimage( "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/gligen/livingroom_modern.png" ) prompt = "a birthday cake" boxes = [[0.2676, 0.6088, 0.4773, 0.7183]] phrases = ["a birthday cake"]

images = pipe( prompt=prompt, gligenphrases=phrases, gligeninpaintimage=inputimage, gligenboxes=boxes, gligenscheduledsamplingbeta=1, outputtype="pil", numinference_steps=50, ).images images[0].save("./gligen-1-4-inpainting-text-box.jpg") ```

Grounded generation

```python import torch from diffusers import StableDiffusionGLIGENPipeline from diffusers.utils import load_image

Generate an image described by the prompt and

insert objects described by text at the region defined by bounding boxes

pipe = StableDiffusionGLIGENPipeline.frompretrained( "masterful/gligen-1-4-generation-text-box", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

prompt = "a waterfall and a modern high speed train running through the tunnel in a beautiful forest with fall foliage" boxes = [[0.1387, 0.2051, 0.4277, 0.7090], [0.4980, 0.4355, 0.8516, 0.7266]] phrases = ["a waterfall", "a modern high speed train running through the tunnel"]

images = pipe( prompt=prompt, gligenphrases=phrases, gligenboxes=boxes, gligenscheduledsamplingbeta=1, outputtype="pil", numinferencesteps=50, ).images images[0].save("./gligen-1-4-generation-text-box.jpg") ```

Refer to the documentation to learn more.

Thanks to @nikhil-masterful for contributing GLIGEN in #4441.

Tiny Autoencoder

@madebyollin trained two Autoencoders (on Stable Diffusion and Stable Diffusion XL, respectively) to dramatically cut down the image decoding time. The effects are especially pronounced when working with larger-resolution images. You can use AutoencoderTiny to take advantage of it.

Here’s the example usage for Stable Diffusion:

```python import torch from diffusers import DiffusionPipeline, AutoencoderTiny

pipe = DiffusionPipeline.frompretrained( "stabilityai/stable-diffusion-2-1-base", torchdtype=torch.float16 ) pipe.vae = AutoencoderTiny.frompretrained("madebyollin/taesd", torchdtype=torch.float16) pipe = pipe.to("cuda")

prompt = "slice of delicious New York-style berry cheesecake" image = pipe(prompt, numinferencesteps=25).images[0] image.save("cheesecake.png") ```

Refer to the documentation to learn more. Refer to this material to understand the implications of using this Autoencoder in terms of inference latency and memory footprint.

Fine-tuning Stable Diffusion XL with DreamBooth and LoRA on a free-tier Colab Notebook

Stable Diffusion XL’s (SDXL) high memory requirements often seem restrictive when it comes to using it for downstream applications. Even if one uses parameter-efficient fine-tuning techniques like LoRA, fine-tuning just the UNet component of SDXL can be quite memory-intensive. So, running it on a free-tier Colab Notebook (that usually has a 16 GB T4 GPU attached) seems impossible.

Now, with better support for gradient checkpointing and other recipes like 8 Bit Adam (via bitsandbytes), it is possible to fine-tune the UNet of SDXL with DreamBooth and LoRA on a free-tier Colab Notebook.

Check out the Colab Notebook to learn more.

Thanks to @ethansmith2000 for improving the gradient checkpointing support in #4474.

Support of `push_to_hub` for models, schedulers, and pipelines

Our models, schedulers, and pipelines now support an option of push_to_hub via the save_pretrained() and also come with a push_to_hub() method. Below are some examples of usage.

Models

```python from diffusers import ControlNetModel

controlnet = ControlNetModel( blockoutchannels=(32, 64), layersperblock=2, inchannels=4, downblocktypes=("DownBlock2D", "CrossAttnDownBlock2D"), crossattentiondim=32, conditioningembeddingoutchannels=(16, 32), ) controlnet.pushtohub("my-controlnet-model")

or controlnet.savepretrained("my-controlnet-model", pushto_hub=True)

```

Schedulers

```python from diffusers import DDIMScheduler

scheduler = DDIMScheduler( betastart=0.00085, betaend=0.012, betaschedule="scaledlinear", clipsample=False, setalphatoone=False, ) scheduler.pushtohub("my-controlnet-scheduler") ```

Pipelines

```python from diffusers import ( UNet2DConditionModel, AutoencoderKL, DDIMScheduler, StableDiffusionPipeline, ) from transformers import CLIPTextModel, CLIPTextConfig, CLIPTokenizer

unet = UNet2DConditionModel( blockoutchannels=(32, 64), layersperblock=2, samplesize=32, inchannels=4, outchannels=4, downblocktypes=("DownBlock2D", "CrossAttnDownBlock2D"), upblocktypes=("CrossAttnUpBlock2D", "UpBlock2D"), crossattention_dim=32, )

scheduler = DDIMScheduler( betastart=0.00085, betaend=0.012, betaschedule="scaledlinear", clipsample=False, setalphatoone=False, )

vae = AutoencoderKL( blockoutchannels=[32, 64], inchannels=3, outchannels=3, downblocktypes=["DownEncoderBlock2D", "DownEncoderBlock2D"], upblocktypes=["UpDecoderBlock2D", "UpDecoderBlock2D"], latent_channels=4, )

textencoderconfig = CLIPTextConfig( bostokenid=0, eostokenid=2, hiddensize=32, intermediatesize=37, layernormeps=1e-05, numattentionheads=4, numhiddenlayers=5, padtokenid=1, vocabsize=1000, ) textencoder = CLIPTextModel(textencoderconfig) tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")

components = { "unet": unet, "scheduler": scheduler, "vae": vae, "textencoder": textencoder, "tokenizer": tokenizer, "safetychecker": None, "featureextractor": None, } pipeline = StableDiffusionPipeline(**components) pipeline.pushtohub("my-pipeline") ```

Refer to the documentation to know more.

Thanks to @Wauplin for his generous and constructive feedback (refer to this #4218) on this feature.

Better support for loading Kohya-trained LoRA checkpoints

Providing seamless support for loading Kohya-trained LoRA checkpoints from diffusers is important for us. This is why we continue to improve our load_lora_weights() method. Check out the documentation to know more about what’s currently supported and the current limitations.

Thanks to @isidentical for extending their help in improving this support.

Better documentation for prompt weighting

Prompt weighting provides a way to emphasize or de-emphasize certain parts of a prompt, allowing for more control over the generated image. compel provides an easy way to do prompt weighting compatible with diffusers. To this end, we have worked on an improved guide. Check it out here.

Defaulting to serialize with `.safetensors`

Starting with this release, we will default to using .safetensors as our preferred serialization method. This change is reflected in all the training examples that we officially support.

All commits

0.20.0dev0 by @patrickvonplaten in #4299
update Kandinsky doc by @yiyixuxu in #4301
[Torch.compile] Fixes torch compile graph break by @patrickvonplaten in #4315
Fix SDXL conversion from original to diffusers by @duongna21 in #4280
fix a bug in StableDiffusionUpscalePipeline when prompt is None by @yiyixuxu in #4278
[Local loading] Correct bug with local files only by @patrickvonplaten in #4318
Fix typo documentation by @echarlaix in #4320
fix validation option for dreambooth training example by @xinyangli in #4317
[Tests] add test for pipeline import. by @sayakpaul in #4276
Honor the SDXL 1.0 licensing from the training scripts. by @sayakpaul in #4319
Update README_sdxl.md to correct the header by @sayakpaul in #4330
[SDXL Refiner] Fix refiner forward pass for batched input by @patrickvonplaten in #4327
correct doc string for default value of guidance_scale by @Tanupriya-Singh in #4339
[ONNX] Don't download ONNX model by default by @patrickvonplaten in #4338
Fix repeat of negative prompt by @kathath in #4335
[SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by @patrickvonplaten in #4346
[Feat] Support SDXL Kohya-style LoRA by @sayakpaul in #4287
fix fp type in t2i adapter docs by @williamberman in #4350
Update README.md to have PyPI-friendly path by @sayakpaul in #4351
[SDXL-IP2P] Add gif for demonstrating training processes by @harutatsuakiyama in #4342
[SDXL] Fix dummy imports incorrect naming by @patrickvonplaten in #4370
Clean up duplicate lines in encode_prompt by @avoroshilov in #4369
minor doc fixes. by @sayakpaul in #4380
Update docs of unet_1d.py by @nishant42491 in #4394
[AutoPipeline] Correct naming by @patrickvonplaten in #4420
[ldm3d] documentation fixing typos by @estelleafl in #4284
Cleanup pass for flaky Slow Tests for Stable diffusion by @DN6 in #4415
support fromsinglefile for SDXL inpainting by @yiyixuxu in #4408
fix testfloat16inference by @yiyixuxu in #4412
train dreambooth fix pre encode class prompt by @williamberman in #4395
[docs] Fix SDXL docstring by @stevhliu in #4397
Update documentation by @echarlaix in #4422
remove mentions of textual inversion from sdxl. by @sayakpaul in #4404
[LoRA] Fix SDXL text encoder LoRAs by @sayakpaul in #4371
[docs] AutoPipeline tutorial by @stevhliu in #4273
[Pipelines] Add community pipeline for Zero123 by @kxhit in #4295
[Feat] add tiny Autoencoder for (almost) instant decoding by @sayakpaul in #4384
can call encode_prompt with out setting a text encoder instance variable by @williamberman in #4396
Accept pooledpromptembeds in the SDXL Controlnet pipeline. Fixes an error if prompt_embeds are passed. by @cmdr2 in #4309
Prevent online access when desired when using downloadfromoriginalstablediffusion_ckpt by @w4ffl35 in #4271
move tests to nightly by @DN6 in #4451
auto type conversion by @isNeil in #4270
Fix typerror in pipeline handling for MultiControlNets which only contain a single ControlNet by @Georgehe4 in #4454
Add rank argument to traindreamboothlora_sdxl.py by @levi in #4343
[docs] Distilled SD by @stevhliu in #4442
Allow controlnets to be loaded (from ckpt) in a parallel thread with a SD model (ckpt), and speed it up slightly by @cmdr2 in #4298
fix typo to ensure make test-examples work correctly by @statelesshz in #4329
Fix bug caused by typo by @HeliosZhao in #4357
Delete the duplicate code for the contolnet img 2 img by @VV-A-VV in #4411
Support different strength for Stable Diffusion TensorRT Inpainting pipeline by @jinwonkim93 in #4216
add sdxl to prompt weighting by @patrickvonplaten in #4439
a few fix for kandinsky combined pipeline by @yiyixuxu in #4352
fix-format by @yiyixuxu in #4458
Cleanup Pass on flaky slow tests for Stable Diffusion by @DN6 in #4455
Fixed multi-token textual inversion training by @manosplitsis in #4452
TensorRT Inpaint pipeline: minor fixes by @asfiyab-nvidia in #4457
[Tests] Adds integration tests for SDXL LoRAs by @sayakpaul in #4462
Update README_sdxl.md by @patrickvonplaten in #4472
[SDXL] Allow SDXL LoRA to be run with less than 16GB of VRAM by @patrickvonplaten in #4470
Add a datadir parameter to the loaddataset method. by @AisingioroHao0 in #4482
[Examples] Support traintexttoimagelora_sdxl.py by @okotaku in #4365
Log global_step instead of epoch to tensorboard by @mrlzla in #4493
Update lora.md to clarify SDXL support by @sayakpaul in #4503
[SDXL LoRA] fix batch size lora by @patrickvonplaten in #4509
Make sure fp16-fix is used as default by @patrickvonplaten in #4510
grad checkpointing by @ethansmith2000 in #4474
move pipeline only when running validation by @patrickvonplaten in #4515
Moving certain pipelines slow tests to nightly by @DN6 in #4469
add pipelineclassname argument to Stable Diffusion conversion script by @yiyixuxu in #4461
Fix misc typos by @Georgehe4 in #4479
fix indexing issue in sd reference pipeline by @DN6 in #4531
Copy lora functions to XLPipelines by @wooyeolBaek in #4512
introduce minimalistic reimplementation of SDXL on the SDXL doc by @cloneofsimo in #4532
Fix pushtohub in traintexttoimagelora_sdxl.py example by @ra100 in #4535
Update README_sdxl.md to include the free-tier Colab Notebook by @sayakpaul in #4540
Changed code that converts tensors to PIL images in the writeyourown_pipeline notebook by @jere357 in #4489
Move slow tests to nightly by @DN6 in #4526
pin ruff version for quality checks by @DN6 in #4539
[docs] Clean scheduler api by @stevhliu in #4204
Move controlnet load local tests to nightly by @DN6 in #4543
Revert "introduce minimalistic reimplementation of SDXL on the SDXL doc" by @patrickvonplaten in #4548
fix some typo error by @VV-A-VV in #4546
improve controlnet sdxl docs now that we have a good checkpoint. by @sayakpaul in #4556
[Doc] update sdxl-controlnet repo name by @yiyixuxu in #4564
[docs] Expand prompt weighting by @stevhliu in #4516
[docs] Remove attention slicing by @stevhliu in #4518
[docs] Add safetensors flag by @stevhliu in #4245
Convert Stable Diffusion ControlNet to TensorRT by @dotieuthien in #4465
Remove code snippets containing is_safetensors_available() by @chiral-carbon in #4521
Fixing repo_id regex validation error on windows platforms by @Mystfit in #4358
[Examples] fix: networkalpha -> networkalphas by @sayakpaul in #4572
[docs] Fix ControlNet SDXL docstring by @stevhliu in #4582
[Utility] adds an image grid utility by @sayakpaul in #4576
Fixed invalid pipelineclassname parameter. by @AisingioroHao0 in #4590
Fix git-lfs command typo in docs by @clairefro in #4586
[Examples] Update InstructPix2Pix README_sdxl.md to fix mentions by @sayakpaul in #4574
[Pipeline utils] feat: implement pushtohub for standalone models, schedulers as well as pipelines by @sayakpaul in #4128
An invalid clerical error in sdxl finetune by @XDUWQ in #4608
[Docs] fix links in the controlling generation doc. by @sayakpaul in #4612
add: pushtohubmixin to pipelines and schedulers docs overview. by @sayakpaul in #4607
add: train to text image with sdxl script. by @sayakpaul in #4505
Add GLIGEN implementation by @nikhil-masterful in #4441
Update text2image.md to fix the links by @sayakpaul in #4626
Fix unipc usekarrassigmas exception - fixes huggingface/diffusers#4580 by @reimager in #4581
[research_projects] SDXL controlnet script by @patil-suraj in #4633
[Core] feat: MultiControlNet support for SDXL ControlNet pipeline by @sayakpaul in #4597
[docs] PushToHubMixin by @stevhliu in #4622
[docs] MultiControlNet by @stevhliu in #4635
fix loading custom text encoder when using from_single_file by @DN6 in #4571
make things clear in the controlnet sdxl doc. by @sayakpaul in #4644
Fix UnboundLocalError during LoRA loading by @slessans in #4523
Support higher dimension LoRAs by @isidentical in #4625
[Safetensors] Make safetensors the default way of saving weights by @patrickvonplaten in #4235

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@kxhit
- [Pipelines] Add community pipeline for Zero123 (#4295)
@okotaku
- [Examples] Support traintexttoimagelora_sdxl.py (#4365)
@dotieuthien
- Convert Stable Diffusion ControlNet to TensorRT (#4465)
@nikhil-masterful
- Add GLIGEN implementation (#4441)

- Python
Published by sayakpaul almost 3 years ago

diffusers - Patch release: Fix incorrect filenaming

0.19.3 is a patch release to make sure import diffusers works without transformers being installed.

It includes a fix of this issue.

All commits

[SDXL] Fix dummy imports incorrect naming by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/4370

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - Patch Release: Support for SDXL Kohya-style LoRAs, Fix batched inference SDXL Img2Img, Improve watermarker

We still had some bugs 🐛 in 0.19.1 some bugs, notably:

SDXL (Kohya-style) LoRA

The official SD-XL 1.0 LoRA (Kohya-styled) is now supported thanks to https://github.com/huggingface/diffusers/pull/4287. You can try it as follows:

```py from diffusers import DiffusionPipeline import torch

pipe = DiffusionPipeline.frompretrained("stabilityai/stable-diffusion-xl-base-1.0", torchdtype=torch.float16) pipe.loadloraweights("stabilityai/stable-diffusion-xl-base-1.0", weightname="sdxloffsetexample-lora1.0.safetensors") pipe.to(torchdtype=torch.float16) pipe.to("cuda")

prompt = "beautiful scenery nature glass bottle landscape, purple galaxy bottle" negative_prompt = "text, watermark"

image = pipe(prompt, negativeprompt=negativeprompt, numinferencesteps=25).images[0] ```

256872357-33ce5e16-2bbd-472e-a72d-6499a2114ee1

In addition, a couple more SDXL LoRAs are now supported:

(SDXL 0.9:) * https://civitai.com/models/22279?modelVersionId=118556 * https://civitai.com/models/104515/sdxlor30costumesrevue-starlight-saijoclaudine-lora * https://civitai.com/models/108448/daiton-sdxl-test * https://filebin.net/2ntfqqnapiu9q3zx/pixelbuildings128-v1.safetensors

To know more details and the known limitations, please check out the documentation.

Thanks to @isidentical for their sincere help in the PR.

Batched inference

@bghira found that for SDXL Img2Img batched inference led to weird artifacts. That is fixed in: https://github.com/huggingface/diffusers/pull/4327.

Downloads

Under some circumstances SD-XL 1.0 can download ONNX weights which is corrected in https://github.com/huggingface/diffusers/pull/4338.

Improved SDXL behavior

https://github.com/huggingface/diffusers/pull/4346 allows the user to disable the watermarker under certain circumstances to improve the usability of SDXL.

All commits:

[SDXL Refiner] Fix refiner forward pass for batched input by @patrickvonplaten in #4327
[ONNX] Don't download ONNX model by default by @patrickvonplaten in #4338
[SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by @patrickvonplaten in #4346
[Feat] Support SDXL Kohya-style LoRA by @sayakpaul in #4287

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - Patch Release: Fix torch compile and local_files_only

In 0.19.0 some bugs :bug: found their way into the release. We're very sorry about this :pray:

This patch releases fixes all of them.

All commits

update Kandinsky doc by @yiyixuxu in #4301
[Torch.compile] Fixes torch compile graph break by @patrickvonplaten in #4315
Fix SDXL conversion from original to diffusers by @duongna21 in #4280
fix a bug in StableDiffusionUpscalePipeline when prompt is None by @yiyixuxu in #4278
[Local loading] Correct bug with local files only by @patrickvonplaten in #4318
Release: v0.19.1 by @patrickvonplaten (direct commit on v0.19.1-patch)

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - v0.19.0: SD-XL 1.0 (permissive license), AutoPipelines, Improved Kanidnsky & Asymmetric VQGAN

SDXL 1.0

Stable Diffusion XL (SDXL) 1.0 with permissive CreativeML Open RAIL++-M License was released today. We provide full compatibility with SDXL in diffusers.

```py from diffusers import DiffusionPipeline import torch

pipe = DiffusionPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-1.0", torchdtype=torch.float16, variant="fp16", use_safetensors=True ) pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt=prompt).images[0] image ```

download (6)

Many additional cool features are released: - Pipelines for - Img2Img - Inpainting - Torch compile support - Model offloading - Ensemble of Denoising Exports (E-Diffi approach) - thanks to @bghira @SytanSD @Birch-san @AmericanPresidentJimmyCarter

Refer to the documentation to know more.

New training scripts for SDXL

When there’s a new pipeline, there ought to be new training scripts. We added support for the following training scripts that build on top of SDXL:

Shoutout to @harutatsuakiyama for contributing the training script for InstructPix2Pix in #4079.

New pipelines for SDXL

The ControlNet and InstructPix2Pix training scripts also needed their respective pipelines. So, we added support for the following pipelines as well:

StableDiffusionXLControlNetPipeline
StableDiffusionXLInstructPix2PixPipeline

The ControlNet and InstructPix2Pix pipelines don’t have interesting checkpoints yet. We hope that the community will be able to leverage the training scripts from this release to help produce some.

Shoutout to @harutatsuakiyama for contributing the StableDiffusionXLInstructPix2PixPipeline in #4079.

The AutoPipeline API

We now support Auto APIs for the following tasks: text-to-image, image-to-image, and inpainting:

Here is how to use one:

```python from diffusers import AutoPipelineForTextToImage import torch

pipet2i = AutoPipelineForText2Image.frompretrained( "runwayml/stable-diffusion-v1-5", requiressafetychecker=False, torch_dtype=torch.float16 ).to("cuda")

prompt = "photo a majestic sunrise in the mountains, best quality, 4k" image = pipe_t2i(prompt).images[0] image.save("image.png") ```

Without any extra memory, you can then switch to Image-to-Image

```python from diffusers import AutoPipelineForImageToImage

pipei2i = AutoPipelineForImageToImage.frompipe(pipe_t2i)

image = pipe_t2i("sunrise in snowy mountains", image=image, strength=0.75).images[0] image.save("image.png") ```

Supported Pipelines: SDv1, SDv2, SDXL, Kandinksy, ControlNet, IF ... with more to come.

Refer to the documentation to know more.

A new “combined pipeline” for the Kandinsky series

We introduced a new “combined pipeline” for the Kandinsky series to make it easier to use the Kandinsky prior and decoder together. This eliminates the need to initialize and use multiple pipelines for Kandinsky to generate images. Here is an example:

```python from diffusers import AutoPipelineForTextToImage import torch

pipe = AutoPipelineForTextToImage.frompretrained( "kandinsky-community/kandinsky-2-2-decoder", torchdtype=torch.float16 ) pipe.enablemodelcpu_offload()

prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k" image = pipe(prompt=prompt, numinferencesteps=25).images[0] image.save("image.png") ```

The following pipelines, which can be accessed via the "Auto" pipelines were added:

To know more, check out the following pages:

🚨🚨🚨 Breaking change for Kandinsky Mask Inpainting 🚨🚨🚨

NOW: mask_image repaints white pixels and preserves black pixels.

Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers API is aligned. We cannot have different mask formats for different pipelines.

Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:

```py

For PIL input

import PIL.ImageOps mask = PIL.ImageOps.invert(mask)

For PyTorch and Numpy input

mask = 1 - mask ```

Asymmetric VQGAN

Designing a Better Asymmetric VQGAN for StableDiffusion introduced a VQGAN that is particularly well-suited for inpainting tasks. This release brings the support of this new VQGAN. Here is how it can be used:

```python from io import BytesIO from PIL import Image import requests from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline

def download_image(url: str) -> Image.Image: response = requests.get(url) return Image.open(BytesIO(response.content)).convert("RGB")

prompt = "a photo of a person" imgurl = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celebahq256.png" maskurl = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"

image = downloadimage(imgurl).resize((256, 256)) maskimage = downloadimage(mask_url).resize((256, 256))

pipe = StableDiffusionInpaintPipeline.frompretrained("runwayml/stable-diffusion-inpainting") pipe.vae = AsymmetricAutoencoderKL.frompretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5") pipe.to("cuda")

image = pipe(prompt=prompt, image=image, maskimage=maskimage).images[0] image.save("image.jpeg") ```

Refer to the documentation to know more.

Thanks to @cross-attention for contributing this model in #3956.

Improved support for loading Kohya-style LoRA checkpoints

We are committed to providing seamless interoperability support of Kohya-trained checkpoints from diffusers. To that end, we improved the existing support for loading Kohya-trained checkpoints in diffusers. Users can expect further improvements in the upcoming releases.

Thanks to @takuma104 and @isidentical for contributing the improvements in #4147.

All commits

📝 Fix broken link to models documentation by @kadirnar in #4026
move to 0.19.0dev by @patrickvonplaten in #4048
[SDXL] Partial diffusion support for Text2Img and Img2Img Pipelines by @bghira in #4015
Correct sdxl docs by @patrickvonplaten in #4058
Add circular padding for artifact-free StableDiffusionPanoramaPipeline by @EvgenyKashin in #4025
Update train_unconditional.py by @hjmnbnb in #3899
Trigger CI on ci-* branches by @Wauplin in #3635
Fix kandinsky remove safety by @patrickvonplaten in #4065
Multiply lr scheduler steps by num_processes. by @eliphatfs in #3983
[Community] Implementation of the IADB community pipeline by @tchambon in #3996
add kandinsky to readme table by @yiyixuxu in #4081
[From Single File] Force accelerate to be installed by @patrickvonplaten in #4078
fix requirement in SDXL by @killah-t-cell in #4082
fix: minor things in the SDXL docs. by @sayakpaul in #4070
[Invisible watermark] Correct version by @patrickvonplaten in #4087
[Feat] add: utility for unloading lora. by @sayakpaul in #4034
[tests] use parent class for monkey patching to not break other tests by @patrickvonplaten in #4088
Allow low precision vae sd xl by @patrickvonplaten in #4083
[SD-XL] Add inpainting by @patrickvonplaten in #4098
[Stable Diffusion Inpaint ]Fix dtype inpaint by @patrickvonplaten in #4113
[From ckpt] replace with os path join by @patrickvonplaten in #3746
[From single file] Make accelerate optional by @patrickvonplaten in #4132
add noise_sampler_seed to StableDiffusionKDiffusionPipeline.__call__ by @sunhs in #3911
Make setup.py compatible with pipenv by @apoorvaeternity in #4121
📝 Update doc with more descriptive title and filename for "IF" section by @kadirnar in #4049
t2i pipeline by @williamberman in #3932
[Docs] Korean translation update by @Snailpong in #4022
[Enhance] Add rank in dreambooth by @okotaku in #4112
Refactor execution device & cpu offload by @patrickvonplaten in #4114
Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler by @clarencechen in #3865
[Core] add: controlnet support for SDXL by @sayakpaul in #4038
Docs/bentoml integration by @larme in #4090
Fixed SDXL single file loading to use the correct requested pipeline class by @Mystfit in #4142
feat: add act_fn param to OutValueFunctionBlock by @SauravMaheshkar in #3994
Add controlnet and vae from single file by @patrickvonplaten in #4084
fix incorrect attention head dimension in AttnProcessor2_0 by @zhvng in #4154
Fix bug in ControlNetPipelines with MultiControlNetModel of length 1 by @greentfrapp in #4032
Asymmetric vqgan by @cross-attention in #3956
Shap-E: add support for mesh output by @yiyixuxu in #4062
[From single file] Make sure that controlnet stays False for fromsinglefile by @patrickvonplaten in #4181
[ControlNet Training] Remove safety from controlnet by @patrickvonplaten in #4180
remove bentoml doc in favor of blogpost by @williamberman in #4182
Fix unloading of LoRAs when xformers attention procs are in use by @isidentical in #4179
[Safetensors] make safetensors a required dep by @patrickvonplaten in #4177
make enablesequentialcpu_offload more generic for third-party devices by @statelesshz in #4191
Allow passing different prompts to each text_encoder on stable_diffusion_xl pipelines by @apolinario in #4156
[SDXL ControlNet Training] Follow-up fixes by @sayakpaul in #4188
📄 Renamed File for Better Understanding by @kadirnar in #4056
[docs] Clean up pipeline apis by @stevhliu in #3905
docs: Typo in dreambooth example README.md by @askulkarni2 in #4203
[fix] network_alpha when loading unet lora from old format by @Jackmin801 in #4221
fix no CFG for kandinsky pipelines by @yiyixuxu in #4193
fix a bug of prompt embeds in sdxl by @xiaohu2015 in #4099
Raise initial HTTPError if pipeline is not cached locally by @Wauplin in #4230
[SDXL] Fix sd xl encode prompt by @patrickvonplaten in #4237
[SD-XL] Fix sdxl controlnet inference by @patrickvonplaten in #4238
[docs] Changed path for ControlNet in docs by @rcmtcristian in #4215
Allow specifying denoisingstart and denoisingend as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers by @AmericanPresidentJimmyCarter in #4115
[docs] Other modalities by @stevhliu in #4205
docs: Add missing import statement in textual_inversion inference example by @askulkarni2 in #4227
[Docs] Fix from pretrained docs by @patrickvonplaten in #4240
[ControlNet SDXL training] fixes in the training script by @sayakpaul in #4223
[SDXL DreamBooth LoRA] add support for text encoder fine-tuning by @sayakpaul in #4097
Resolve bf16 error as mentioned in this issue by @nupurkmr9 in #4214
do not pass list to accelerator.init_trackers by @williamberman in #4248
[From Single File] Allow vae to be loaded by @patrickvonplaten in #4242
[SDXL] Improve docs by @patrickvonplaten in #4196
[draft v2] AutoPipeline by @yiyixuxu in #4138
Update README_sdxl.md to change the note on default hyperparameters by @sayakpaul in #4258
[fromsinglefile] Fix circular import by @patrickvonplaten in #4259
Model path for sdxl wrong in dreambooth README by @rrva in #4261
[SDXL and IP2P]: instruction pix2pix XL training and pipeline by @harutatsuakiyama in #4079
[docs] Fix image in SDXL docs by @stevhliu in #4267
[SDXL DreamBooth LoRA] multiple fixes by @sayakpaul in #4262
Load Kohya-ss style LoRAs with auxilary states by @isidentical in #4147
Fix all missing optional import statements from pipeline folders by @patrickvonplaten in #4272
[Kandinsky] Add combined pipelines / Fix cpu model offload / Fix inpainting by @patrickvonplaten in #4207
Where did this 'x' come from, Elon? by @camenduru in #4277
add openvino and onnx runtime SD XL documentation by @echarlaix in #4285
Rename by @patrickvonplaten in #4294

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Snailpong
- [Docs] Korean translation update (#4022)
@clarencechen
- Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler (#3865)
@cross-attention
- Asymmetric vqgan (#3956)
@AmericanPresidentJimmyCarter
- Allow specifying denoisingstart and denoisingend as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers (#4115)
@harutatsuakiyama
- [SDXL and IP2P]: instruction pix2pix XL training and pipeline (#4079)

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - Patch Release: v0.18.2

Patch release to fix: - 1. torch.compile for SD-XL for certain GPUs - 2. from_single_file for all SD models - 3. Fix broken ONNX export - 4. Fix incorrect VAE FP16 casting - 5. Deprecate loading variants that don't exist

Note:

Loading any stable diffusion safetensors or ckpt with StableDiffusionPipeline.from_single_file or StableDiffusionmg2ImgIPipeline.from_single_file or StableDiffusionInpaintPipeline.from_single_file or StableDiffusionXLPipeline.from_single_file, ...

is now almost as fast as from_pretrained(...) and it's much more tested now.

All commits:

Make sure torch compile doesn't access unet config by @patrickvonplaten in #4008
[DiffusionPipeline] Deprecate not throwing error when loading non-existant variant by @patrickvonplaten in #4011
Correctly keep vae in float16 when using PyTorch 2 or xFormers by @pcuenca in #4019
minor improvements to the SDXL doc. by @sayakpaul in #3985
Remove remaining not in upscale pipeline by @pcuenca in #4020
FIX force_download in download utility by @Wauplin in #4036
Improve single loading file by @patrickvonplaten in #4041
keep usedefault_values as a list type by @oOraph in #4040

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - Patch Release for Stable Diffusion XL 0.9

Patch release 0.18.1: Stable Diffusion XL 0.9 Research Release

Stable Diffusion XL 0.9 is now fully supported under the SDXL 0.9 Research License license here.

Having received access to stabilityai/stable-diffusion-xl-base-0.9, you can easily use it with diffusers:

Text-to-Image

```py from diffusers import StableDiffusionXLPipeline import torch

pipe = StableDiffusionXLPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-0.9", torchdtype=torch.float16, variant="fp16", use_safetensors=True ) pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt=prompt).images[0] ```

aaa (1)

Refining the image output

```py from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline import torch

pipe = StableDiffusionXLPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-0.9", torchdtype=torch.float16, variant="fp16", use_safetensors=True ) pipe.to("cuda")

refiner = StableDiffusionXLImg2ImgPipeline.frompretrained( "stabilityai/stable-diffusion-xl-refiner-0.9", torchdtype=torch.float16, use_safetensors=True, variant="fp16" ) refiner.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

image = pipe(prompt=prompt, outputtype="latent" if userefiner else "pil").images[0] image = refiner(prompt=prompt, image=image[None, :]).images[0] ```

Loading single file checkpoitns / original file format

```py from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline import torch

pipe = StableDiffusionXLPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-0.9", torchdtype=torch.float16, variant="fp16", use_safetensors=True ) pipe.to("cuda")

refiner = StableDiffusionXLImg2ImgPipeline.frompretrained( "stabilityai/stable-diffusion-xl-refiner-0.9", torchdtype=torch.float16, use_safetensors=True, variant="fp16" ) refiner.to("cuda") ```

Memory optimization via model offloading

diff - pipe.to("cuda") + pipe.enable_model_cpu_offload()

and

diff - refiner.to("cuda") + refiner.enable_model_cpu_offload()

Speed-up inference with torch.compile

diff + pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) + refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)

Note: If you're running the model with < torch 2.0, please make sure to run:

diff +pipe.enable_xformers_memory_efficient_attention() +refiner.enable_xformers_memory_efficient_attention()

For more details have a look at the official docs.

All commits

typo in safetensors (safetenstors) by @YoraiLevi in #3976
Fix code snippet for Audio Diffusion by @osanseviero in #3987
feat: add Dropout to Flax UNet by @SauravMaheshkar in #3894
Add 'rank' parameter to Dreambooth LoRA training script by @isidentical in #3945
Don't use bare prints in a library by @cmd410 in #3991
[Tests] Fix some slow tests by @patrickvonplaten in #3989
Add sdxl prompt embeddings by @patrickvonplaten in #3995

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - Shap-E, Consistency Models, Video2Video

Shap-E

Shap-E is a 3D image generation model from OpenAI introduced in Shap-E: Generating Conditional 3D Implicit Functions.

We provide support for text-to-3d image generation and 2d-to-3d image generation from Diffusers.

Text to 3D

```py import torch from diffusers import ShapEPipeline from diffusers.utils import exporttogif

ckptid = "openai/shap-e" pipe = ShapEPipeline.frompretrained(ckpt_id).to("cuda")

guidancescale = 15.0 prompt = "A birthday cupcake" images = pipe( prompt, guidancescale=guidancescale, numinferencesteps=64, framesize=256, ).images

gifpath = exporttogif(images[0], "cake3d.gif") ```

cake_3d

Image to 3D

```py import torch from diffusers import ShapEImg2ImgPipeline from diffusers.utils import exporttogif, load_image

ckptid = "openai/shap-e-img2img" pipe = ShapEImg2ImgPipeline.frompretrained(ckpt_id).to("cuda")

imgurl = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shape/burgerin.png" image = loadimage(img_url)

generator = torch.Generator(device="cuda").manualseed(0) batchsize = 4 guidance_scale = 3.0

images = pipe( image, numimagesperprompt=batchsize, generator=generator, guidancescale=guidancescale, numinferencesteps=64, framesize =256, outputtype="pil" ).images

gifpath = exporttogif(images[0], "burgersampled_3d.gif") ```

Original image

Generated burger_sampled_3d

For more details, check out the official documentation.

The model was contributed by @yiyixuxu in https://github.com/huggingface/diffusers/pull/3742.

Consistency models

Consistency models are diffusion models supporting fast one or few-step image generation. It was proposed by OpenAI in Consistency Models.

```python import torch

from diffusers import ConsistencyModelPipeline

device = "cuda"

Load the cdimagenet64l2 checkpoint.

modelidorpath = "openai/diffusers-cdimagenet64l2" pipe = ConsistencyModelPipeline.frompretrained(modelidorpath, torchdtype=torch.float16) pipe.to(device)

Onestep Sampling

image = pipe(numinferencesteps=1).images[0] image.save("consistencymodelonestep_sample.png")

Onestep sampling, class-conditional image generation

ImageNet-64 class label 145 corresponds to king penguins

image = pipe(numinferencesteps=1, classlabels=145).images[0] image.save("consistencymodelonestepsample_penguin.png")

Multistep sampling, class-conditional image generation

Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.

https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77

image = pipe(timesteps=[22, 0], classlabels=145).images[0] image.save("consistencymodelmultistepsample_penguin.png") ```

For more details, see the official docs.

The model was contributed by our community members @dg845 and @ayushtues in https://github.com/huggingface/diffusers/pull/3492.

# Video-to-Video

Previous video generation pipelines tended to produce watermarks because those watermarks were present in their pretraining dataset. With the latest additions of the following checkpoints, we can now generate watermark-free videos:

```python import torch from diffusers import DiffusionPipeline from diffusers.utils import exporttovideo

pipe = DiffusionPipeline.frompretrained("cerspense/zeroscopev2576w", torchdtype=torch.float16) pipe.enablemodelcpu_offload()

memory optimization

pipe.unet.enableforwardchunking(chunksize=1, dim=1) pipe.enablevae_slicing()

prompt = "Darth Vader surfing a wave" videoframes = pipe(prompt, numframes=24).frames videopath = exporttovideo(videoframes) ```

darth_vader_waves

For more details, check out the official docs.

It was contributed by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/3900.

All commits

remove seed by @yiyixuxu in #3734
Correct Token to upload docs by @patrickvonplaten in #3744
Correct another push token by @patrickvonplaten in #3745
[Stable Diffusion Inpaint & ControlNet inpaint] Correct timestep inpaint by @patrickvonplaten in #3749
[Documentation] Replace dead link to Flax install guide by @JeLuF in #3739
[documentation] grammatical fixes in installation.mdx by @LiamSwayne in #3735
Text2video zero refinements by @19and99 in #3733
[Tests] Relax tolerance of flaky failing test by @patrickvonplaten in #3755
[MultiControlNet] Allow save and load by @patrickvonplaten in #3747
Update pipelineflaxstablediffusioncontrolnet.py by @jfozard in #3306
update conversion script for Kandinsky unet by @yiyixuxu in #3766
[docs] Fix Colab notebook cells by @stevhliu in #3777
[Bug Report template] modify the issue template to include core maintainers. by @sayakpaul in #3785
[Enhance] Update reference by @okotaku in #3723
Fix broken cpu-offloading in legacy inpainting SD pipeline by @cmdr2 in #3773
Fix some bad comment in training scripts by @patrickvonplaten in #3798
Added LoRA loading to StableDiffusionKDiffusionPipeline by @tripathiarpan20 in #3751
UnCLIP Image Interpolation -> Keep same initial noise across interpolation steps by @Abhinay1997 in #3782
feat: add PR template. by @sayakpaul in #3786
Ldm3d first PR by @estelleafl in #3668
Complete setattnprocessor for prior and vae by @patrickvonplaten in #3796
fix typo by @Isotr0py in #3800
manual check for checkpointstotallimit instead of using accelerate by @williamberman in #3681
[train text to image] add note to loading from checkpoint by @williamberman in #3806
device map legacy attention block weight conversion by @williamberman in #3804
[docs] Zero SNR by @stevhliu in #3776
[ldm3d] Fixed small typo by @estelleafl in #3820
[Examples] Improve the model card pushed from the train_text_to_image.py script by @sayakpaul in #3810
[Docs] add missing pipelines from the overview pages and minor fixes by @sayakpaul in #3795
[Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models by @AndyShih12 in #3716
Update control_brightness.mdx by @dqueue in #3825
Support ControlNet models with different number of channels in control images by @JCBrouwer in #3815
Add ddpm kandinsky by @yiyixuxu in #3783
[docs] More API stuff by @stevhliu in #3835
relax tol attention conversion test by @williamberman in #3842
fix: random module seeding by @sayakpaul in #3846
fix audio_diffusion tests by @teticio in #3850
Correct bad attn naming by @patrickvonplaten in #3797
[Conversion] Small fixes by @patrickvonplaten in #3848
Fix some audio tests by @patrickvonplaten in #3841
[Docs] add: contributor note in the paradigms docs. by @sayakpaul in #3852
Update Habana Gaudi doc by @regisss in #3863
Add guidance start/stop by @holwech in #3770
feat: rename single-letter vars in resnet.py by @SauravMaheshkar in #3868
Fixing the global_step key not found by @VincentNeemie in #3844
Support for manual CLIP loading in StableDiffusionPipeline - txt2img. by @WadRex in #3832
fix sde add noise typo by @UranusITS in #3839
[Tests] add test for checking soft dependencies. by @sayakpaul in #3847
[Enhance] Add LoRA rank args in traintexttoimagelora by @okotaku in #3866
[docs] Model API by @stevhliu in #3562
fix/docs: Fix the broken doc links by @Aisuko in #3897
Add video img2img by @patrickvonplaten in #3900
fix/doc-code: Updating to the latest version parameters by @Aisuko in #3924
fix/doc: no import torch issue by @Aisuko in #3923
Correct controlnet out of list error by @patrickvonplaten in #3928
Adding better way to define multiple concepts and also validation capabilities. by @mauricio-repetto in #3807
[ldm3d] Update code to be functional with the new checkpoints by @estelleafl in #3875
Improve memory text to video by @patrickvonplaten in #3930
revert automatic chunking by @patrickvonplaten in #3934
avoid upcasting by assigning dtype to noise tensor by @prathikr in #3713
Fix failing np tests by @patrickvonplaten in #3942
Add timestep_spacing and steps_offset to schedulers by @pcuenca in #3947
Add Consistency Models Pipeline by @dg845 in #3492
Update consistency_models.mdx by @sayakpaul in #3961
Make UNet2DConditionOutput pickle-able by @prathikr in #3857
[Consistency Models] correct checkpoint url in the doc by @sayakpaul in #3962
[Text-to-video] Add torch.compile() compatibility by @sayakpaul in #3949
[SD-XL] Add new pipelines by @patrickvonplaten in #3859
Kandinsky 2.2 by @cene555 in #3903
Add Shap-E by @yiyixuxu in #3742
disable num attenion heads by @patrickvonplaten in #3969
Improve SD XL by @patrickvonplaten in #3968
fix/doc-code: import torch and fix the broken document address by @Aisuko in #3941

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@estelleafl
- Ldm3d first PR (#3668)
- [ldm3d] Fixed small typo (#3820)
- [ldm3d] Update code to be functional with the new checkpoints (#3875)
@AndyShih12
- [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models (#3716)
@dg845
- Add Consistency Models Pipeline (#3492)

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - Patch Release: v0.17.1

Patch release to fix timestep for inpainting - Stable Diffusion Inpaint & ControlNet inpaint - Correct timestep inpaint in #3749 by @patrickvonplaten

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - v0.17.0 Improved LoRA, Kandinsky 2.1, Torch Compile Speed-up & More

Kandinsky 2.1

Kandinsky 2.1 inherits best practices from DALL-E 2 and Latent Diffusion while introducing some new ideas.

Installation

bash pip install diffusers transformers accelerate

Code example

```python from diffusers import DiffusionPipeline import torch

pipeprior = DiffusionPipeline.frompretrained("kandinsky-community/kandinsky-2-1-prior", torchdtype=torch.float16) pipeprior.to("cuda")

t2ipipe = DiffusionPipeline.frompretrained("kandinsky-community/kandinsky-2-1", torchdtype=torch.float16) t2ipipe.to("cuda")

prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" negative_prompt = "low quality, bad quality"

generator = torch.Generator(device="cuda").manualseed(12) imageembeds, negativeimageembeds = pipeprior(prompt, negativeprompt, guidancescale=1.0, generator=generator).totuple()

image = t2ipipe(prompt, negativeprompt=negativeprompt, imageembeds=imageembeds, negativeimageembeds=negativeimageembeds).images[0] image.save("cheeseburgermonster.png")

```

To learn more about the Kandinsky pipelines, and more details about speed and memory optimizations, please have a look at the docs.

Thanks @ayushtues, for helping with the integration of Kandinsky 2.1!

UniDiffuser

UniDiffuser introduces a multimodal diffusion process that is capable of handling different generation tasks using a single unified approach:

Unconditional image and text generation
Joint image-text generation
Text-to-image generation
Image-to-text generation
Image variation
Text variation

Below is an example of how to use UniDiffuser for text-to-image generation:

```python import torch from diffusers import UniDiffuserPipeline

modelidorpath = "thu-ml/unidiffuser-v1" pipe = UniDiffuserPipeline.frompretrained(modelidorpath, torchdtype=torch.float16) pipe.to("cuda")

This mode can be inferred from the input provided to the `pipe`.

pipe.settexttoimagemode()

prompt = "an elephant under the sea" sample = pipe(prompt=prompt, numinferencesteps=20, guidance_scale=8.0).images[0] sample.save("elephant.png") ```

Check out the UniDiffuser docs to know more.

UniDiffuser was added by @dg845 in this PR.

LoRA

We're happy to support the A1111 formatted CivitAI LoRA checkpoints in a limited capacity.

First, download a checkpoint. We’ll use this one for demonstration purposes.

bash wget https://civitai.com/api/download/models/15603 -O light_and_shadow.safetensors

Next, we initialize a DiffusionPipeline:

```python import torch

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.frompretrained( "gsdf/Counterfeit-V2.5", torchdtype=torch.float16, safetychecker=None ).to("cuda") pipeline.scheduler = DPMSolverMultistepScheduler.fromconfig( pipeline.scheduler.config, usekarrassigmas=True ) ```

We then load the checkpoint downloaded from CivitAI:

python pipeline.load_lora_weights(".", weight_name="light_and_shadow.safetensors")

(If you’re loading a checkpoint in the safetensors format, please ensure you have safetensors installed.)

And then it’s time for running inference:

```python prompt = "masterpiece, best quality, 1girl, at dusk" negative_prompt = ("(low quality, worst quality:1.4), (bad anatomy), (inaccurate limb:1.2), " "bad composition, inaccurate eyes, extra digit, fewer digits, (extra arms:1.2), large breasts")

images = pipeline(prompt=prompt, negativeprompt=negativeprompt, width=512, height=768, numinferencesteps=15, numimagesperprompt=4, generator=torch.manualseed(0) ).images ```

Below is a comparison between the LoRA and the non-LoRA results:

Check out the docs to learn more.

Thanks to @takuma104 for contributing this feature via this PR.

Torch 2.0 Compile Speed-up

We introduced Torch 2.0 support for computing attention efficiently in 0.13.0. Since then, we have made a number of improvements to ensure the number of "graph breaks" in our models is reduced so that the models can be compiled with torch.compile(). As a result, we are happy to report massive improvements in the inference speed of our most popular pipelines. Check out this doc to know more.

Thanks to @Chillee for helping us with this. Thanks to @patrickvonplaten for fixing the problems stemming from "graph breaks" in this PR.

VAE pre-processing

We added a Vae Image processor class that provides a unified API for pipelines to prepare their image inputs, as well as post-processing their outputs. It supports resizing, normalization, and conversion between PIL Image, PyTorch, and Numpy arrays.

With that, all Stable diffusion pipelines now accept image inputs in the format of Pytorch Tensor and Numpy array, in addition to PIL Image, and can produce outputs in these 3 formats. It will also accept and return latents. This means you can now take generated latents from one pipeline and pass them to another as inputs, without leaving the latent space. If you work with multiple pipelines, you can pass Pytorch Tensor between them without converting to PIL Image.

To learn more about the API, check out our doc here

ControlNet Img2Img & Inpainting

ControlNet is one of the most used diffusion models and upon strong demand from the community we added controlnet img2img and controlnet inpaint pipelines. This allows to use any controlnet checkpoint for both image-2-image setting as well as for inpaint.

:pointright: Inpaint: See controlnet inpaint model here :pointright: Image-to-Image: Any controlnet checkpoint can be used for image to image, e.g.: ```py from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, UniPCMultistepScheduler from diffusers.utils import load_image import numpy as np import torch

import cv2 from PIL import Image

download an image

image = loadimage( "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inputimagevermeer.png" ) npimage = np.array(image)

get canny image

npimage = cv2.Canny(npimage, 100, 200) npimage = npimage[:, :, None] npimage = np.concatenate([npimage, npimage, npimage], axis=2) cannyimage = Image.fromarray(npimage)

load control net and stable diffusion v1-5

controlnet = ControlNetModel.frompretrained("lllyasviel/sd-controlnet-canny", torchdtype=torch.float16) pipe = StableDiffusionControlNetImg2ImgPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torchdtype=torch.float16 )

speed up diffusion process with faster scheduler and memory optimization

pipe.scheduler = UniPCMultistepScheduler.fromconfig(pipe.scheduler.config) pipe.enablemodelcpuoffload()

generate image

generator = torch.manualseed(0) image = pipe( "futuristic-looking woman", numinferencesteps=20, generator=generator, image=image, controlimage=canny_image, ).images[0] ```

Diffedit Zero-Shot Inpainting Pipeline

This pipeline (introduced in DiffEdit: Diffusion-based semantic image editing with mask guidance) allows for image editing with natural language. Below is an end-to-end example.

First, let’s load our pipeline:

```python import torch from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionDiffEditPipeline

sdmodelckpt = "stabilityai/stable-diffusion-2-1" pipeline = StableDiffusionDiffEditPipeline.frompretrained( sdmodelckpt, torchdtype=torch.float16, safetychecker=None, ) pipeline.scheduler = DDIMScheduler.fromconfig(pipeline.scheduler.config) pipeline.inversescheduler = DDIMInverseScheduler.fromconfig(pipeline.scheduler.config) pipeline.enablemodelcpuoffload() pipeline.enablevaeslicing() generator = torch.manualseed(0) ```

Then, we load an input image to edit using our method:

```python from diffusers.utils import load_image

imgurl = "https://github.com/Xiang-cd/DiffEdit-stable-diffusion/raw/main/assets/origin.png" rawimage = loadimage(imgurl).convert("RGB").resize((768, 768)) ```

Then, we employ the source and target prompts to generate the editing mask:

python source_prompt = "a bowl of fruits" target_prompt = "a basket of fruits" mask_image = pipeline.generate_mask( image=raw_image, source_prompt=source_prompt, target_prompt=target_prompt, generator=generator, )

Then, we employ the caption and the input image to get the inverted latents:

python inv_latents = pipeline.invert(prompt=source_prompt, image=raw_image, generator=generator).latents

Now, generate the image with the inverted latents and semantically generated mask:

python image = pipeline( prompt=target_prompt, mask_image=mask_image, image_latents=inv_latents, generator=generator, negative_prompt=source_prompt, ).images[0] image.save("edited_image.png")

Check out the docs to learn more about this pipeline.

Thanks to @clarencechen for contributing this pipeline in this PR.

Docs

Apart from these, we have made multiple improvements to the overall quality-of-life of our docs.

Thanks to @stevhliu for leading the charge here.

Misc

xformers attention processor fix when using LoRA (PR by @takuma104)
Pytorch 2.0 SDPA implementation of the LoRA attention processor (PR)

All commits

Post release for 0.16.0 by @patrickvonplaten in #3244
[docs] only mention one stage by @pcuenca in #3246
Write model card in controlnet training script by @pcuenca in #3229
[2064]: Add stochastic sampler (sampledpmppsde) by @nipunjindal in #3020
[Stochastic Sampler][Slow Test]: Cuda test fixes by @nipunjindal in #3257
Remove required from trackerprojectname by @pcuenca in #3260
adding required parameters while calling the getupblock and getdownblock by @init-22 in #3210
[docs] Update interface in repaint.mdx by @ernestchu in #3119
Update IF name to XL by @apolinario in #3262
fix typo in score sde pipeline by @fecet in #3132
Fix typo in textual inversion JAX training script by @jairtrejo in #3123
AudioDiffusionPipeline - fix encode method after config changes by @teticio in #3114
Revert "Revert "[Community Pipelines] Update lpwstablediffusion pipeline"" by @patrickvonplaten in #3265
Fix community pipelines by @patrickvonplaten in #3266
update notebook by @yiyixuxu in #3259
[docs] add notes for stateful model changes by @williamberman in #3252
[LoRA] quality of life improvements in the loading semantics and docs by @sayakpaul in #3180
[Community Pipelines] EDICT pipeline implementation by @Joqsan in #3153
[Docs]zh translated docs update by @DrDavidS in #3245
Update logging.mdx by @standardAI in #2863
Add multiple conditions to StableDiffusionControlNetInpaintPipeline by @timegate in #3125
Let's make sure that dreambooth always uploads to the Hub by @patrickvonplaten in #3272
Diffedit Zero-Shot Inpainting Pipeline by @clarencechen in #2837
add constant learning rate with custom rule by @jason9075 in #3133
Allow disabling torch 2_0 attention by @patrickvonplaten in #3273
[doc] add link to training script by @yiyixuxu in #3271
temp disable spectogram diffusion tests by @williamberman in #3278
Changed sample[0] to images[0] by @IliaLarchenko in #3304
Typo in tutorial by @IliaLarchenko in #3295
Torch compile graph fix by @patrickvonplaten in #3286
Postprocessing refactor img2img by @yiyixuxu in #3268
[Torch 2.0 compile] Fix more torch compile breaks by @patrickvonplaten in #3313
fix: scale_lr and sync example readme and docs. by @sayakpaul in #3299
Update stable_diffusion.mdx by @mu94-csl in #3310
Fix missing variable assign in DeepFloyd-IF-II by @gitmylo in #3315
Correct doc build for patch releases by @patrickvonplaten in #3316
Add Stable Diffusion RePaint to community pipelines by @Markus-Pobitzer in #3320
Fix multistep dpmsolver for cosine schedule (suitable for deepfloyd-if) by @LuChengTHU in #3314
[docs] Improve LoRA docs by @stevhliu in #3311
Added input pretubation by @isamu-isozaki in #3292
Update writeownpipeline.mdx by @csaybar in #3323
update controlling generation doc with latest goodies. by @sayakpaul in #3321
[Quality] Make style by @patrickvonplaten in #3341
Fix config dpm by @patrickvonplaten in #3343
Add the SDE variant of DPM-Solver and DPM-Solver++ by @LuChengTHU in #3344
Add upsample_size to AttnUpBlock2D, AttnDownBlock2D by @will-rice in #3275
Rename --onlysaveembeds to --saveasfull_pipeline by @arrufat in #3206
[AudioLDM] Generalise conversion script by @sanchit-gandhi in #3328
Fix TypeError when using promptembeds and negativeprompt by @At-sushi in #2982
Fix pipeline class on README by @themrzmaster in #3345
Inpainting: typo in docs by @LysandreJik in #3331
Add use_Karras_sigmas to LMSDiscreteScheduler by @Isotr0py in #3351
Batched load of textual inversions by @pdoane in #3277
[docs] Fix docstring by @stevhliu in #3334
if dreambooth lora by @williamberman in #3360
Postprocessing refactor all others by @yiyixuxu in #3337
[docs] Improve safetensors docstring by @stevhliu in #3368
add: a warning message when using xformers in a PT 2.0 env. by @sayakpaul in #3365
StableDiffusionInpaintingPipeline - resize image w.r.t height and width by @rupertmenneer in #3322
[docs] Adapt a model by @stevhliu in #3326
[docs] Load safetensors by @stevhliu in #3333
[Docs] Fix stable_diffusion.mdx typo by @sudowind in #3398
Support ControlNet v1.1 shuffle properly by @takuma104 in #3340
[Tests] better determinism by @sayakpaul in #3374
[docs] Add transformers to install by @stevhliu in #3388
[deepspeed] partial ZeRO-3 support by @stas00 in #3076
Add omegaconf for tests by @patrickvonplaten in #3400
Fix various bugs with LoRA Dreambooth and Dreambooth script by @patrickvonplaten in #3353
Fix docker file by @patrickvonplaten in #3402
fix: deepseepd_plugin retrieval from accelerate state by @sayakpaul in #3410
[Docs] Add sigmoid beta_scheduler to docstrings of relevant Schedulers by @Laurent2916 in #3399
Don't install accelerate and transformers from source by @patrickvonplaten in #3415
Don't install transformers and accelerate from source by @patrickvonplaten in #3414
Improve fast tests by @patrickvonplaten in #3416
attention refactor: the trilogy by @williamberman in #3387
[Docs] update the PT 2.0 optimization doc with latest findings by @sayakpaul in #3370
Fix style rendering by @pcuenca in #3433
unCLIP scheduler do not use note by @williamberman in #3417
Replace deprecated command with environment file by @jongwooo in #3409
fix warning message pipeline loading by @patrickvonplaten in #3446
add stable diffusion tensorrt img2img pipeline by @asfiyab-nvidia in #3419
Refactor controlnet and add img2img and inpaint by @patrickvonplaten in #3386
[Scheduler] DPM-Solver (++) Inverse Scheduler by @clarencechen in #3335
[Docs] Fix incomplete docstring for resnet.py by @Laurent2916 in #3438
fix tiled vae blend extent range by @superlabs-dev in #3384
Small update to "Next steps" section by @pcuenca in #3443
Allow arbitrary aspect ratio in IFSuperResolutionPipeline by @devxpy in #3298
Adding 'strength' parameter to StableDiffusionInpaintingPipeline by @rupertmenneer in #3424
[WIP] Bugfix - Pipeline.from_pretrained is broken when the pipeline is partially downloaded by @vimarshc in #3448
Fix gradient checkpointing bugs in freezing part of models (requires_grad=False) by @7eu7d7 in #3404
Make dreambooth lora more robust to orig unet by @patrickvonplaten in #3462
Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) by @cmdr2 in #3463
Add min snr to text2img lora training script by @wfng92 in #3459
Add inpaint lora scale support by @Glaceon-Hyy in #3460
[From ckpt] Fix from_ckpt by @patrickvonplaten in #3466
Update full dreambooth script to work with IF by @williamberman in #3425
Add IF dreambooth docs by @williamberman in #3470
parameterize pass single args through tuple by @williamberman in #3477
attend and excite tests disable determinism on the class level by @williamberman in #3478
dreambooth docs torch.compile note by @williamberman in #3471
add: if entry in the dreambooth training docs. by @sayakpaul in #3472
[docs] Textual inversion inference by @stevhliu in #3473
[docs] Distributed inference by @stevhliu in #3376
[{Up,Down}sample1d] explicit view kernel size as number elements in flattened indices by @williamberman in #3479
mps & onnx tests rework by @pcuenca in #3449
[Attention processor] Better warning message when shifting to AttnProcessor2_0 by @sayakpaul in #3457
[Docs] add note on local directory path. by @sayakpaul in #3397
Refactor full determinism by @patrickvonplaten in #3485
Fix DPM single by @patrickvonplaten in #3413
Add use_Karras_sigmas to DPMSolverSinglestepScheduler by @Isotr0py in #3476
Adds localfilesonly bool to prevent forced online connection by @w4ffl35 in #3486
[Docs] Korean translation (optimization, training) by @Snailpong in #3488
DataLoader respecting EXIF data in Training Images by @Ambrosiussen in #3465
feat: allow disk offload for diffuser models by @hari10599 in #3285
[Community] reference only control by @okotaku in #3435
Support for cross-attention bias / mask by @Birch-san in #2634
do not scale the initial global step by gradient accumulation steps when loading from checkpoint by @williamberman in #3506
Fix bug in panorama pipeline when using dpmsolver scheduler by @Isotr0py in #3499
[Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU by @yingjie-han in #3105
[Community] ControlNet Reference by @okotaku in #3508
Allow custom pipeline loading by @patrickvonplaten in #3504
Make sure Diffusers works even if Hub is down by @patrickvonplaten in #3447
Improve README by @patrickvonplaten in #3524
Update README.md by @patrickvonplaten in #3525
Run torch.compile tests in separate subprocesses by @pcuenca in #3503
fix attention mask pad check by @williamberman in #3531
explicit broadcasts for assignments by @williamberman in #3535
[Examples/DreamBooth] refactor savemodelcard utility in dreambooth examples by @sayakpaul in #3543
Fix panorama to support all schedulers by @Isotr0py in #3546
Add open parti prompts to docs by @patrickvonplaten in #3549
Add Kandinsky 2.1 by @yiyixuxu @ayushtues in #3308
fix broken change for vq pipeline by @yiyixuxu in #3563
[Stable Diffusion Inpainting] Allow standard text-to-img checkpoints to be useable for SD inpainting by @patrickvonplaten in #3533
Fix loaded_token reference before definition by @eminn in #3523
renamed variable to input_ and output_ by @vikasmech in #3507
Correct inpainting controlnet docs by @patrickvonplaten in #3572
Fix controlnet guess mode euler by @patrickvonplaten in #3571
[docs] Add AttnProcessor to docs by @stevhliu in #3474
[WIP] Add UniDiffuser model and pipeline by @dg845 in #2963
Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled by @takuma104 in #3556
fix dreambooth attention mask by @linbo0518 in #3541
[IF super res] correctly normalize PIL input by @williamberman in #3536
[docs] Maintenance by @stevhliu in #3552
[docs] update the broken links by @brandonJY in #3568
[docs] Working with different formats by @stevhliu in #3534
remove print statements from attention processor. by @sayakpaul in #3592
Fix temb attention by @patrickvonplaten in #3607
[docs] update the broken links by @kadirnar in #3577
[UniDiffuser Tests] Fix some tests by @sayakpaul in #3609
#3487 Fix inpainting strength for various samplers by @rupertmenneer in #3532
[Community] Support StableDiffusionTilingPipeline by @kadirnar in #3586
[Community, Enhancement] Add reference tricks in README by @okotaku in #3589
[Feat] Enable State Dict For Textual Inversion Loader by @ghunkins in #3439
[Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline by @TheDenk in #3587
fix tests by @patrickvonplaten in #3614
Make sure we also change the config when setting encoder_hid_dim_type=="text_proj" and allow xformers by @patrickvonplaten in #3615
goodbye frog by @williamberman in #3617
update code to reflect latest changes as of May 30th by @prathikr in #3616
update dreambooth lora to work with IF stage II by @williamberman in #3560
Full Dreambooth IF stage II upscaling by @williamberman in #3561
[Docs] include the instruction-tuning blog link in the InstructPix2Pix docs by @sayakpaul in #3644
[Kandinsky] Improve kandinsky API a bit by @patrickvonplaten in #3636
Support Kohya-ss style LoRA file format (in a limited capacity) by @takuma104 in #3437
Iterate over unique tokens to avoid duplicate replacements for multivector embeddings by @lachlan-nicholson in #3588
fixed typo in example traintextto_image.py by @kashif in #3608
fix inpainting pipeline when providing initial latents by @yiyixuxu in #3641
[Community Doc] Updated the filename and readme file. by @kadirnar in #3634
add Stable Diffusion TensorRT Inpainting pipeline by @asfiyab-nvidia in #3642
set config from original module but set compiled module on class by @williamberman in #3650
dreambooth if docs - stage II, more info by @williamberman in #3628
linting fix by @williamberman in #3653
Set steprules correctly for piecewiseconstant scheduler by @0x1355 in #3605
Allow setting numcycles for cosinewith_restarts lr scheduler by @0x1355 in #3606
[docs] Load A1111 LoRA by @stevhliu in #3629
dreambooth upscaling fix added latents by @williamberman in #3659
Correct multi gpu dreambooth by @patrickvonplaten in #3673
Fix from_ckpt not working properly on windows by @LyubimovVladislav in #3666
Update Compel documentation for textual inversions by @pdoane in #3663
[UniDiffuser test] fix one test so that it runs correctly on V100 by @sayakpaul in #3675
[docs] More API fixes by @stevhliu in #3640
[WIP]Vae preprocessor refactor (PR1) by @yiyixuxu in #3557
small tweaks for parsing thibaudz controlnet checkpoints by @williamberman in #3657
move activation dispatches into helper function by @williamberman in #3656
[docs] Fix link to loader method by @stevhliu in #3680
Add function to remove monkey-patch for text encoder LoRA by @takuma104 in #3649
[LoRA] feat: add lora attention processor for pt 2.0. by @sayakpaul in #3594
refactor Image processor for x4 upscaler by @yiyixuxu in #3692
feat: when using PT 2.0 use LoRAAttnProcessor2_0 for text enc LoRA. by @sayakpaul in #3691
Fix the Kandinsky docstring examples by @freespirit in #3695
Support views batch for panorama by @Isotr0py in #3632
Fix from_ckpt for Stable Diffusion 2.x by @ctrysbita in #3662
Add draft for lora text encoder scale by @patrickvonplaten in #3626

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@nipunjindal
- [2064]: Add stochastic sampler (sampledpmppsde) (#3020)
- [Stochastic Sampler][Slow Test]: Cuda test fixes (#3257)
@clarencechen
- Diffedit Zero-Shot Inpainting Pipeline (#2837)
- [Scheduler] DPM-Solver (++) Inverse Scheduler (#3335)
@Markus-Pobitzer
- Add Stable Diffusion RePaint to community pipelines (#3320)
@takuma104
- Support ControlNet v1.1 shuffle properly (#3340)
- Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled (#3556)
- Support Kohya-ss style LoRA file format (in a limited capacity) (#3437)
- Add function to remove monkey-patch for text encoder LoRA (#3649)
@asfiyab-nvidia
- add stable diffusion tensorrt img2img pipeline (#3419)
- add Stable Diffusion TensorRT Inpainting pipeline (#3642)
@Snailpong
- [Docs] Korean translation (optimization, training) (#3488)
@okotaku
- [Community] reference only control (#3435)
- [Community] ControlNet Reference (#3508)
- [Community, Enhancement] Add reference tricks in README (#3589)
@Birch-san
- Support for cross-attention bias / mask (#2634)
@yingjie-han
- [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU (#3105)
@dg845
- [WIP] Add UniDiffuser model and pipeline (#2963)
@kadirnar
- [docs] update the broken links (#3577)
- [Community] Support StableDiffusionTilingPipeline (#3586)
- [Community Doc] Updated the filename and readme file. (#3634)
@TheDenk
- [Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline (#3587)
@prathikr
- update code to reflect latest changes as of May 30th (#3616)

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - Patch Release: v0.16.1

v0.16.1: Patch Release to fix IF naming, community pipeline versioning, and to allow disable VAE PT 2 attention

merge conflict by @apolinario (direct commit on v0.16.1-patch)
Fix community pipelines by @patrickvonplaten in #3266
Allow disabling torch 2_0 attention by @patrickvonplaten in #3273

- Python
Published by patrickvonplaten about 3 years ago

diffusers - v0.16.0 DeepFloyd IF & ControlNet v1.1

DeepFloyd's IF: The open-sourced Imagen

IF

IF is a pixel-based text-to-image generation model and was released in late April 2023 by DeepFloyd.

The model architecture is strongly inspired by Google's closed-sourced Imagen and a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding:

nabla (1)

Installation

pip install torch --upgrade # diffusers' IF is optimized for torch 2.0 pip install diffusers --upgrade

Accept the License

Before you can use IF, you need to accept its usage conditions. To do so:

Make sure to have a Hugging Face account and be logged in
Accept the license on the model card of DeepFloyd/IF-I-XL-v1.0
Log-in locally

```py from huggingface_hub import login

and enter your Hugging Face Hub access token.

Code example

```py from diffusers import DiffusionPipeline from diffusers.utils import pttopil import torch

stage 1

stage1 = DiffusionPipeline.frompretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torchdtype=torch.float16) stage1.enablemodelcpu_offload()

stage 2

stage2 = DiffusionPipeline.frompretrained( "DeepFloyd/IF-II-L-v1.0", textencoder=None, variant="fp16", torchdtype=torch.float16 ) stage2.enablemodelcpuoffload()

stage 3

safetymodules = { "featureextractor": stage1.featureextractor, "safetychecker": stage1.safetychecker, "watermarker": stage1.watermarker, } stage3 = DiffusionPipeline.frompretrained( "stabilityai/stable-diffusion-x4-upscaler", **safetymodules, torchdtype=torch.float16 ) stage3.enablemodelcpuoffload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"' generator = torch.manual_seed(1)

text embeds

promptembeds, negativeembeds = stage1.encodeprompt(prompt)

stage 1

image = stage1( promptembeds=promptembeds, negativepromptembeds=negativeembeds, generator=generator, outputtype="pt" ).images pttopil(image)[0].save("./ifstage_I.png") ```

```py

stage 2

image = stage2( image=image, promptembeds=promptembeds, negativepromptembeds=negativeembeds, generator=generator, outputtype="pt", ).images pttopil(image)[0].save("./ifstage_II.png") ```

```py

stage 3

image = stage3(prompt=prompt, image=image, noiselevel=100, generator=generator).images image[0].save("./ifstageIII.png") ```

For more details about speed and memory optimizations, please have a look at the blog or docs below.

Useful links

:pointright: The official codebase :pointright: Blog post :pointright: Space Demo :pointright: In-detail docs

ControlNet v1.1

Lvmin Zhang has released improved ControlNet checkpoints as well as a couple of new ones.

You can find all :firecracker: Diffusers checkpoints here Please have a look directly at the model cards on how to use the checkpoins:

Improved checkpoints:

| Model Name | Control Image Overview| Control Image Example | Generated Image Example | |---|---|---|---| |lllyasviel/controlv11psd15_canny
Trained with canny edge detection | A monochrome image with white edges on a black background.||| |lllyasviel/controlv11psd15_mlsd
Trained with multi-level line segment detection | An image with annotated line segments.||| |lllyasviel/controlv11f1psd15_depth
Trained with depth estimation | An image with depth information, usually represented as a grayscale image.||| |lllyasviel/controlv11psd15_normalbae
Trained with surface normal estimation | An image with surface normal information, usually represented as a color-coded image.||| |lllyasviel/controlv11psd15_seg
Trained with image segmentation | An image with segmented regions, usually represented as a color-coded image.||| |lllyasviel/controlv11psd15_lineart
Trained with line art generation | An image with line art, usually black lines on a white background.||| |lllyasviel/controlv11psd15_openpose
Trained with human pose estimation | An image with human poses, usually represented as a set of keypoints or skeletons.||| |lllyasviel/controlv11psd15_scribble
Trained with scribble-based image generation | An image with scribbles, usually random or user-drawn strokes.||| |lllyasviel/controlv11psd15_softedge
Trained with soft edge image generation | An image with soft edges, usually to create a more painterly or artistic effect.|||

New checkpoints:

| Model Name | Control Image Overview| Control Image Example | Generated Image Example | |---|---|---|---| |lllyasviel/controlv11esd15_ip2p
Trained with pixel to pixel instruction | No condition .||| |lllyasviel/controlv11psd15_inpaint
Trained with image inpainting | No condition.||| |lllyasviel/controlv11esd15_shuffle
Trained with image shuffling | An image with shuffled patches or regions.||| |lllyasviel/controlv11psd15s2lineartanime
Trained with anime line art generation | An image with anime-style line art.|||

All commits

[Tests] Speed up panorama tests by @sayakpaul in #3067
[Post release] v0.16.0dev by @patrickvonplaten in #3072
Adds profiling flags, computes train metrics average. by @andsteing in #3053
[Pipelines] Make sure that None functions are correctly not saved by @patrickvonplaten in #3080
doc string example remove from_pt by @yiyixuxu in #3083
[Tests] parallelize by @patrickvonplaten in #3078
Throw deprecation warning for returncachedfolder by @patrickvonplaten in #3092
Allow SD attend and excite pipeline to work with any size output images by @jcoffland in #2835
[docs] Update community pipeline docs by @stevhliu in #2989
Add to support Guess Mode for StableDiffusionControlnetPipleline by @takuma104 in #2998
fix default value for attend-and-excite by @yiyixuxu in #3099
remvoe one line as requested by gc team by @yiyixuxu in #3077
ddpm custom timesteps by @williamberman in #3007
Fix breaking change in pipeline_stable_diffusion_controlnet.py by @remorses in #3118
Add global pooling to controlnet by @patrickvonplaten in #3121
[Bug fix] Fix img2img processor with safety checker by @patrickvonplaten in #3127
[Bug fix] Make sure correct timesteps are chosen for img2img by @patrickvonplaten in #3128
Improve deprecation warnings by @patrickvonplaten in #3131
Fix config deprecation by @patrickvonplaten in #3129
feat: verfication of multi-gpu support for select examples. by @sayakpaul in #3126
speed up attend-and-excite fast tests by @yiyixuxu in #3079
Optimize logvalidation in traincontrolnet_flax by @cgarciae in #3110
make style by @patrickvonplaten (direct commit on main)
Correct textual inversion readme by @patrickvonplaten in #3145
Add unet act fn to other model components by @williamberman in #3136
class labels timestep embeddings projection dtype cast by @williamberman in #3137
[ckpt loader] Allow loading the Inpaint and Img2Img pipelines, while loading a ckpt model by @cmdr2 in #2705
add from_ckpt method as Mixin by @1lint in #2318
Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils by @asfiyab-nvidia in #2974
Correct Transformer2DModel.forward docstring by @off99555 in #3074
Update pipelinestablediffusioninpaintlegacy.py by @hwuebben in #2903
Modified altdiffusion pipline to support altdiffusion-m18 by @superhero-7 in #2993
controlnet training resize inputs to multiple of 8 by @williamberman in #3135
adding custom diffusion training to diffusers examples by @nupurkmr9 in #3031
Update custom_diffusion.mdx by @mishig25 in #3165
Added distillation for quantization example on textual inversion. by @XinyuYe-Intel in #2760
make style by @patrickvonplaten (direct commit on main)
Merge branch 'main' of https://github.com/huggingface/diffusers by @patrickvonplaten (direct commit on main)
Update Noise Autocorrelation Loss Function for Pix2PixZero Pipeline by @clarencechen in #2942
[DreamBooth] add text encoder LoRA support in the DreamBooth training script by @sayakpaul in #3130
Update Habana Gaudi documentation by @regisss in #3169
Add model offload to x4 upscaler by @patrickvonplaten in #3187
[docs] Deterministic algorithms by @stevhliu in #3172
Update custom_diffusion.mdx to credit the author by @sayakpaul in #3163
Fix TensorRT community pipeline device set function by @asfiyab-nvidia in #3157
make from_flax work for controlnet by @yiyixuxu in #3161
[docs] Clarify training args by @stevhliu in #3146
Multi Vector Textual Inversion by @patrickvonplaten in #3144
Add Karras sigmas to HeunDiscreteScheduler by @youssefadr in #3160
[AudioLDM] Fix dtype of returned waveform by @sanchit-gandhi in #3189
Fix bug in traindreamboothlora by @crywang in #3183
[Community Pipelines] Update lpwstablediffusion pipeline by @SkyTNT in #3197
Make sure VAE attention works with Torch 2_0 by @patrickvonplaten in #3200
Revert "[Community Pipelines] Update lpwstablediffusion pipeline" by @williamberman in #3201
[Bug fix] Fix batch size attention head size mismatch by @patrickvonplaten in #3214
fix mixed precision training on traindreamboothinpaint_lora by @themrzmaster in #3138
adding enablevaetiling and disablevaetiling functions by @init-22 in #3225
Add ControlNet v1.1 docs by @patrickvonplaten in #3226
Fix issue in maybeconvertprompt by @pdoane in #3188
Sync cache version check from transformers by @ychfan in #3179
Fix docs text inversion by @patrickvonplaten in #3166
add model by @patrickvonplaten in #3230
Allow return pt x4 by @patrickvonplaten in #3236
Allow fp16 attn for x4 upscaler by @patrickvonplaten in #3239
fix fast test by @patrickvonplaten in #3241
Adds a document on token merging by @sayakpaul in #3208
[AudioLDM] Update docs to use updated ckpt by @sanchit-gandhi in #3240
Release: v0.16.0 by @patrickvonplaten (direct commit on main)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@1lint
- add from_ckpt method as Mixin (#2318)
@asfiyab-nvidia
- Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils (#2974)
- Fix TensorRT community pipeline device set function (#3157)
@nupurkmr9
- adding custom diffusion training to diffusers examples (#3031)
@XinyuYe-Intel
- Added distillation for quantization example on textual inversion. (#2760)
@SkyTNT
- [Community Pipelines] Update lpwstablediffusion pipeline (#3197)

- Python
Published by patrickvonplaten about 3 years ago

diffusers - v0.15.1: Patch Release to fix safety checker, config access and uneven scheduler

Fixes bugs related to missing global pooling in controlnet, img2img processor issue with safety checker, uneven timesteps and better config deprecation

[Bug fix] Add global pooling to controlnet by @patrickvonplaten in #3121
[Bug fix] Fix img2img processor with safety checker by @patrickvonplaten in #3127
[Bug fix] Make sure correct timesteps are chosen for img2img by @patrickvonplaten in #3128
[Bug fix] Fix config deprecation by @patrickvonplaten in #3129

- Python
Published by patrickvonplaten about 3 years ago

diffusers - v0.15.0 Beyond Image Generation

Taking Diffusers Beyond Image Generation

We are very excited about this release! It brings new pipelines for video and audio to diffusers, showing that diffusion is a great choice for all sorts of generative tasks. The modular, pluggable approach of diffusers was crucial to integrate the new models intuitively and cohesively with the rest of the library. We hope you appreciate the consistency of the APIs and implementations, as our ultimate goal is to provide the best toolbox to help you solve the tasks you're interested in. Don't hesitate to get in touch if you use diffusers for other projects!

In addition to that, diffusers 0.15 includes a lot of new features and improvements. From performance and deployment improvements (faster pipeline loading) to increased flexibility for creative tasks (Karras sigmas, weight prompting, support for Automatic1111 textual inversion embeddings) to additional customization options (Multi-ControlNet) to training utilities (ControlNet, Min-SNR weighting). Read on for the details!

🎬 Text-to-Video

Text-guided video generation is not a fantasy anymore - it's as simple as spinning up a colab and running any of the two powerful open-sourced video generation models.

Text-to-Video

Alibaba's DAMO Vision Intelligence Lab has open-sourced a first research-only video generation model that can generatae some powerful video clips of up to a minute. To see Darth Vader riding a wave simply copy-paste the following lines into your favorite Python interpreter:

```py import torch from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler from diffusers.utils import exporttovideo

pipe = DiffusionPipeline.frompretrained("damo-vilab/text-to-video-ms-1.7b", torchdtype=torch.float16, variant="fp16") pipe.scheduler = DPMSolverMultistepScheduler.fromconfig(pipe.scheduler.config) pipe.enablemodelcpuoffload()

prompt = "Spiderman is surfing" videoframes = pipe(prompt, numinferencesteps=25).frames videopath = exporttovideo(video_frames) ```

vader

For more information you can have a look at "damo-vilab/text-to-video-ms-1.7b"

Text-to-Video Zero

Text2Video-Zero is a zero-shot text-to-video synthesis diffusion model that enables low cost yet consistent video generation with only pre-trained text-to-image diffusion models using simple pre-trained stable diffusion models, such as Stable Diffusion v1-5. Text2Video-Zero also naturally supports cool extension works of pre-trained text-to-image models such as Instruct Pix2Pix, ControlNet and DreamBooth, and based on which we present Video Instruct Pix2Pix, Pose Conditional, Edge Conditional and, Edge Conditional and DreamBooth Specialized applications.

https://user-images.githubusercontent.com/23423619/231516176-813133f9-1216-4845-8b49-4e062610f12c.mp4

For more information please have a look at PAIR/Text2Video-Zero

🔉 Audio Generation

Text-guided audio generation has made great progress over the last months with many advances being based on diffusion models. The 0.15.0 release includes two powerful audio diffusion models.

AudioLDM

Inspired by Stable Diffusion, AudioLDM is a text-to-audio latent diffusion model (LDM) that learns continuous audio representations from CLAP latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional sound effects, human speech and music.

```python from diffusers import AudioLDMPipeline import torch

repoid = "cvssp/audioldm" pipe = AudioLDMPipeline.frompretrained(repoid, torchdtype=torch.float16) pipe = pipe.to("cuda")

prompt = "Techno music with a strong, upbeat tempo and high melodic riffs" audio = pipe(prompt, numinferencesteps=10, audiolengthin_s=5.0).audios[0] ```

The resulting audio output can be saved as a .wav file: ```python import scipy

scipy.io.wavfile.write("techno.wav", rate=16000, data=audio) ```

For more information see cvssp/audioldm

Spectrogram Diffusion

This model from the Magenta team is a MIDI to audio generator. The pipeline takes a MIDI file as input and autoregressively generates 5-sec spectrograms which are concated together in the end and decoded to audio via a Spectrogram decoder.

```python from diffusers import SpectrogramDiffusionPipeline, MidiProcessor

pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion") pipe = pipe.to("cuda") processor = MidiProcessor()

Download MIDI from: wget http://www.piano-midi.de/midis/beethoven/beethovenhammerklavier2.mid

output = pipe(processor("beethovenhammerklavier2.mid"))

audio = output.audios[0] ```

📗 New Docs

Documentation is crucially important for diffusers, as it's one of the first resources where people try to understand how everything works and fix any issues they are observing. We have spent a lot of time in this release reviewing all documents, adding new ones, reorganizing sections and bringing code examples up to date with the latest APIs. This effort has been led by @stevhliu (thanks a lot! 🙌) and @yiyixuxu, but many others have chimed in and contributed.

Check it out: https://huggingface.co/docs/diffusers/index

Don't hesitate to open PRs for fixes to the documentation, they are greatly appreciated as discussed in our (revised, of course) contribution guide.

Screenshot from 2023-04-12 18-08-35

🪄 Stable UnCLIP

Stable UnCLIP is the best open-sourced image variation model out there. Pass an initial image and optionally a prompt to generate variations of the image:

```py from diffusers import DiffusionPipeline from diffusers.utils import load_image import torch

pipe = DiffusionPipeline.frompretrained("stabilityai/stable-diffusion-2-1-unclip-small", torchdtype=torch.float16) pipe.to("cuda")

get image

url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stableunclip/tarsiladoamaral.png" image = loadimage(url)

run image variation

image = pipe(image).images[0] ```

For more information you can have a look at "stabilityai/stable-diffusion-2-1-unclip"

https://user-images.githubusercontent.com/23423619/231513081-ace66d77-39d4-4064-bb20-2db2ce6b000a.mp4

🚀 More ControlNet

ControlNet was released in diffusers in version 0.14.0, but we have some exciting developments: Multi-ControlNet, a training script, and upcoming event and a community image-to-image pipeline contributed by @mikegarts!

Multi-ControlNet

Thanks to community member @takuma104, it's now possible to use several ControlNet conditioning models at once! It works with the same API as before, only supplying a list of ControlNets instead of just once:

```Python import torch from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnetcanny = ControlNetModel.frompretrained("lllyasviel/sd-controlnet-canny", torchdtype=torch.float16).to("cuda") controlnetpose = ControlNetModel.frompretrained("lllyasviel/sd-controlnet-openpose", torchdtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.frompretrained( "example/a-sd15-variant-model", torchdtype=torch.float16, controlnet=[controlnetpose, controlnetcanny] ).to("cuda")

poseimage = ... cannyimage = ... prompt = ...

image = pipe(prompt=prompt, image=[poseimage, cannyimage]).images[0] ```

And this is an example of how this affects generation:

|Control Image1|Control Image2|Generated| |---|---|---| |||| ||(none)|| ||(none)||

ControlNet Training

We have created a training script for ControlNet, and can't wait to see what new ideas the community may come up with! In fact, we are so pumped about it that we are organizing a JAX Diffusers sprint with a special focus on ControlNet, where participant teams will be assigned TPUs v4-8 to work on their projects :exploding_head:. Those are some mean machines, so make sure you join our discord to follow the event: https://discord.com/channels/879548962464493619/897387888663232554/1092751149217615902.

🐈‍⬛ Textual Inversion, Revisited

Several great contributors have been working on textual inversion to get the most of it. @isamu-isozaki made it possible to perform multitoken training, and @piEsposito & @GuiyeC created an easy way to load textual inversion embeddings. These contributors are always a pleasure to work with 🙌, we feel honored and proud of this community 🙏

Loading textual inversion embeddings is compatible with the Automatic1111 format, so you can download embeddings from other services (such as civitai), and easily apply them in diffusers. Please check the updated documentation for details.

🏃 Faster loading of cached pipelines

We conducted a thorough investigation of the pipeline loading process to make it as fast as possible. This is the before and after:

Previous: 2.27 sec Now: 1.1 sec

Instead of performing 3 HTTP operations, we now get all we need with just one. That single call is necessary to check whether any of the components in the pipeline were updated – if that's the case, then we need to download the new files. This improvement also applies when you load individual models instead of pre-trained pipelines.

This may not sound as much, but many people use diffusers for user-facing services where models and pipelines have to be reused on demand. By minimizing latency, they can provide a better service to their users and minimize operating costs.

This can be further reduced by forcing diffusers to just use the items on disk and never check for updates. This is not recommended for most users, but can be interesting in production environments.

🔩 Weight prompting using `compel`

Weight prompting is a popular method to increase the importance of some of the elements that appear in a text prompt, as a way to force image generation to obey to those concepts. Because diffusers is used in multitude of services and projects, we wanted to provide a very flexible way to adopt prompt weighting, so users can ultimately build the system they prefer. Our apprach was to:

Make the Stable Diffusion pipelines accept raw prompt embeddings. You are free to create the embeddings however you see fit, so users can come up with new ideas to express weighting in their projects.
At the same time, we adopted compel, by @damian0815, as a higher-level library to create the weighted embeddings.

You don't have to use compel to create the embeddings, but if you do, this is an example of how it looks in practice:

```Python from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler from compel import Compel

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-4") pipe.scheduler = UniPCMultistepScheduler.fromconfig(pipe.scheduler.config)

compelproc = Compel(tokenizer=pipe.tokenizer, textencoder=pipe.textencoder) prompt = "a red cat playing with a ball++" promptembeds = compel_proc(prompt)

image = pipe(promptembeds=promptembeds, numinferencesteps=20).images[0] ```

As you can see, we assign more weight to the ball word using a compel-specific syntax (ball++). You can use other libraries (or your own) to create appropriate embeddings to pass to the pipeline.

You can read more details in the documentation.

🎲 Karras Sigmas for schedulers

Some diffusers schedulers now support Karras sigmas! Thanks @nipunjindal !

See Add Karras pattern to discrete euler in #2956 for more information.

All commits

Adding support for safetensors and LoRa. by @Narsil in #2448
[Post release] Push post release by @patrickvonplaten in #2546
Correct section docs by @patrickvonplaten in #2540
adds xformers support to train_unconditional.py by @vvvm23 in #2520
Bug Fix: Remove explicit message argument in deprecate by @alvanli in #2421
Update pipelinestablediffusioninpaintlegacy.py resize to integer multiple of 8 instead of 32 for init image and mask by @Laveraaa in #2350
move test numimagesper_prompt to pipeline mixin by @williamberman in #2488
Training tutorial by @stevhliu in #2473
Fix regression introduced in #2448 by @Narsil in #2551
Fix for InstructPix2PixPipeline to allow for prompt embeds to be passed in without prompts. by @DN6 in #2456
[PipelineTesterMixin] Handle non-image outputs for attn slicing test by @sanchit-gandhi in #2504
[Community Pipeline] Unclip Image Interpolation by @Abhinay1997 in #2400
Fix: controlnet docs format by @vicoooo26 in #2559
ema step, don't empty cuda cache by @williamberman in #2563
Add custom vae (diffusers type) to onnx converter by @ForserX in #2325
add OnnxStableDiffusionUpscalePipeline pipeline by @ssube in #2158
Support convert LoRA safetensors into diffusers format by @haofanwang in #2403
[Unet1d] correct docs by @patrickvonplaten in #2565
[Training] Fix tensorboard typo by @patrickvonplaten in #2566
allow Attend-and-excite pipeline work with different image sizes by @yiyixuxu in #2476
Allow textualinversionflax script to use save_steps and revision flag by @haixinxu in #2075
add intermediate logging for dreambooth training script by @yiyixuxu in #2557
community controlnet inpainting pipelines by @williamberman in #2561
[docs] Move relevant code for text2image to docs by @stevhliu in #2537
[docs] Move DreamBooth training materials to docs by @stevhliu in #2547
[docs] Move text-to-image LoRA training from blog to docs by @stevhliu in #2527
Update quicktour by @stevhliu in #2463
Support revision in Flax text-to-image training by @pcuenca in #2567
fix the default value of doc by @xiaohu2015 in #2539
Added multitoken training for textual inversion. Issue 369 by @isamu-isozaki in #661
[Docs]Fix invalid link to Pokemons dataset by @zxypro1 in #2583
[Docs] Weight prompting using compel by @patrickvonplaten in #2574
community stablediffusion controlnet img2img pipeline by @mikegarts in #2584
Improve dynamic thresholding and extend to DDPM and DDIM Schedulers by @clarencechen in #2528
[docs] Move Textual Inversion training examples to docs by @stevhliu in #2576
add deps table check updated to ci by @williamberman in #2590
Add notebook doc img2img by @yiyixuxu in #2472
[docs] Build notebooks from Markdown by @stevhliu in #2570
[Docs] Fix link to colab by @patrickvonplaten in #2604
[docs] Update unconditional image generation docs by @stevhliu in #2592
Add OpenVINO documentation by @echarlaix in #2569
Support LoRA for text encoder by @haofanwang in #2588
fix: un-existing tmp config file in linux, avoid unnecessary disk IO by @knoopx in #2591
Fixed incorrect width/height assignment in StableDiffusionDepth2ImgPi… by @antoche in #2558
add flax pipelines to api doc + doc string examples by @yiyixuxu in #2600
Fix typos by @standardAI in #2608
Migrate blog content to docs by @stevhliu in #2477
Add cache_dir to docs by @patrickvonplaten in #2624
Make sure that DEIS, DPM and UniPC can correctly be switched in & out by @patrickvonplaten in #2595
Revert "[docs] Build notebooks from Markdown" by @patrickvonplaten in #2625
Up vesion at which we deprecate "revision='fp16'" since transformers is not released yet by @patrickvonplaten in #2623
[Tests] Split scheduler tests by @patrickvonplaten in #2630
Improve ddim scheduler and fix bug when prediction type is "sample" by @PeterL1n in #2094
update paint by example docs by @williamberman in #2598
[From pretrained] Speed-up loading from cache by @patrickvonplaten in #2515
add translated docs by @LolitaSian in #2587
[Dreambooth] Editable number of class images by @Mr-Philo in #2251
Update quicktour.mdx by @standardAI in #2637
Update basic_training.mdx by @standardAI in #2639
controlnet sd 2.1 checkpoint conversions by @williamberman in #2593
[docs] Update readme by @stevhliu in #2612
[Pipeline loading] Remove send_telemetry by @patrickvonplaten in #2640
[docs] Build Jax notebooks for real by @stevhliu in #2641
Update loading.mdx by @standardAI in #2642
Support non square image generation for StableDiffusionSAGPipeline by @AkiSakurai in #2629
Update schedulers.mdx by @standardAI in #2647
[attention] Fix attention by @patrickvonplaten in #2656
Add support for Multi-ControlNet to StableDiffusionControlNetPipeline by @takuma104 in #2627
[Tests] Adds a test suite for EMAModel by @sayakpaul in #2530
fix the in-place modification in unet condition when using controlnet by @andrehuang in #2586
image generation main process checks by @williamberman in #2631
[Hub] Upgrade to 0.13.2 by @patrickvonplaten in #2670
AutoencoderKL: clamp indices of blendh and blendv to input size by @kig in #2660
Update README.md by @qwjaskzxl in #2653
[Lora] correct lora saving & loading by @patrickvonplaten in #2655
Add ddim noise comparative analysis pipeline by @aengusng8 in #2665
Add support for different model prediction types in DDIMInverseScheduler by @clarencechen in #2619
controlnet integration tests numinferencesteps=3 by @williamberman in #2672
Controlnet training by @Ttl in #2545
[Docs] Adds a documentation page for evaluating diffusion models by @sayakpaul in #2516
[Tests] fix: slow serialization test by @sayakpaul in #2678
Update Dockerfile CUDA by @patrickvonplaten in #2682
T5Attention support for cross-attention by @kashif in #2654
Update custompipelineoverview.mdx by @standardAI in #2684
Update kerascv.mdx by @standardAI in #2685
Update img2img.mdx by @standardAI in #2688
Update conditionalimagegeneration.mdx by @standardAI in #2687
Update controlling_generation.mdx by @standardAI in #2690
Update unconditionalimagegeneration.mdx by @standardAI in #2686
Add image_processor by @yiyixuxu in #2617
[docs] Add overviews to each section by @stevhliu in #2657
[docs] Create better navigation on index by @stevhliu in #2658
[docs] Reorganize table of contents by @stevhliu in #2671
Rename attention by @patrickvonplaten in #2691
Adding use_safetensors argument to give more control to users by @Narsil in #2123
[docs] Add safety checker to ethical guidelines by @stevhliu in #2699
train_unconditional save restore unet parameters by @williamberman in #2706
Improve deprecation error message when using cross_attention import by @patrickvonplaten in #2710
fix image link in inpaint doc by @yiyixuxu in #2693
[docs] Update ONNX doc to use optimum by @sayakpaul in #2702
Enabling gradient checkpointing for VAE by @Pie31415 in #2536
[Tests] Correct PT2 by @patrickvonplaten in #2724
Update mps.mdx by @standardAI in #2749
Update torch2.0.mdx by @standardAI in #2748
Update fp16.mdx by @standardAI in #2746
Update dreambooth.mdx by @standardAI in #2742
Update philosophy.mdx by @standardAI in #2752
Update text_inversion.mdx by @standardAI in #2751
add: controlnet entry to training section in the docs. by @sayakpaul in #2677
Update numbers for Habana Gaudi in documentation by @regisss in #2734
Improve Contribution Doc by @patrickvonplaten in #2043
Fix typos by @apivovarov in #2715
[1929]: Add CLIP guidance for Img2Img stable diffusion pipeline by @nipunjindal in #2723
Add guidance start/end parameters to StableDiffusionControlNetImg2ImgPipeline by @hyowon-ha in #2731
Fix mps tests on torch 2.0 by @pcuenca in #2766
Add option to set dtype in pipeline.to() method by @1lint in #2317
stable diffusion depth batching fix by @williamberman in #2757
[docs] update torch 2 benchmark by @pcuenca in #2764
[docs] Clarify purpose of reproducibility docs by @stevhliu in #2756
[MS Text To Video] Add first text to video by @patrickvonplaten in #2738
mps: remove warmup passes by @pcuenca in #2771
Support for Offset Noise in examples by @haofanwang in #2753
add: section on multiple controlnets. by @sayakpaul in #2762
[Examples] InstructPix2Pix instruct training script by @sayakpaul in #2478
deduplicate training section in the docs. by @sayakpaul in #2788
[UNet3DModel] Fix with attn processor by @patrickvonplaten in #2790
[doc wip] literalinclude by @mishig25 in #2718
Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' by @ainoya in #2732
Music Spectrogram diffusion pipeline by @kashif in #1044
[2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline by @nipunjindal in #2779
[Docs] small fixes to the text to video doc. by @sayakpaul in #2787
Update traintexttoimagelora.py by @haofanwang in #2767
Skip mps in text-to-video tests by @pcuenca in #2792
Flax controlnet by @yiyixuxu in #2727
[docs] Add Colab notebooks and Spaces by @stevhliu in #2713
Add AudioLDM by @sanchit-gandhi in #2232
Update traintexttoimagelora.py by @haofanwang in #2795
Add ModelEditing pipeline by @bahjat-kawar in #2721
Relax DiT test by @kashif in #2808
Update onnxruntime package candidates by @PeixuanZuo in #2666
[Stable UnCLIP] Finish Stable UnCLIP by @patrickvonplaten in #2814
[Docs] update docs (Stable unCLIP) to reflect the updated ckpts. by @sayakpaul in #2815
StableDiffusionModelEditingPipeline documentation by @bahjat-kawar in #2810
Update examples README.md to include the latest examples by @sayakpaul in #2839
Ruff: apply same rules as in transformers by @pcuenca in #2827
[Tests] Fix slow tests by @patrickvonplaten in #2846
Fix StableUnCLIPImg2ImgPipeline handling of explicitly passed image embeddings by @unishift in #2845
Helper function to disable custom attention processors by @pcuenca in #2791
improve stable unclip doc. by @sayakpaul in #2823
add: better warning messages when handling multiple conditionings. by @sayakpaul in #2804
[WIP]Flax training script for controlnet by @yiyixuxu in #2818
Make dynamo wrapped modules work with save_pretrained by @pcuenca in #2726
[Init] Make sure shape mismatches are caught early by @patrickvonplaten in #2847
updated onnx pndm test by @kashif in #2811
[Stable Diffusion] Allow users to disable Safety checker if loading model from checkpoint by @Stax124 in #2768
fix KarrasVePipeline bug by @junhsss in #2828
StableDiffusionLongPromptWeightingPipeline: Do not hardcode pad token by @AkiSakurai in #2832
Remove suggestion to use cuDNN benchmark in docs by @d1g1t in #2793
Remove duplicate sentence in docstrings by @qqaatw in #2834
Update the legacy inpainting SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2842
Fix link to LoRA training guide in DreamBooth training guide by @ushuz in #2836
[WIP][Docs] Use DiffusionPipeline Instead of Child Classes when Loading Pipeline by @dg845 in #2809
Add last_epoch argument to optimization.get_scheduler by @felixblanke in #2850
[WIP] Check UNet shapes in StableDiffusionInpaintPipeline init by @dg845 in #2853
[2761]: Add documentation for extrainchannels UNet1DModel by @nipunjindal in #2817
[Tests] Adds a test to check if image_embeds None case is handled properly in StableUnCLIPImg2ImgPipeline by @sayakpaul in #2861
Update evaluation.mdx by @standardAI in #2862
Update overview.mdx by @standardAI in #2864
Update alt_diffusion.mdx by @standardAI in #2865
Update paintbyexample.mdx by @standardAI in #2869
Update stablediffusionsafe.mdx by @standardAI in #2870
[Docs] Correct phrasing by @patrickvonplaten in #2873
[Examples] Add streaming support to the ControlNet training example in JAX by @sayakpaul in #2859
feat: allow offset_noise in dreambooth training example by @yamanahlawat in #2826
[docs] Performance tutorial by @stevhliu in #2773
[Docs] add an example use for StableUnCLIPPipeline in the pipeline docs by @sayakpaul in #2897
add flax requirement by @yiyixuxu in #2894
Support fp16 in conversion from original ckpt by @burgalon in #2733
img2img.multiple.controlnets.pipeline by @mikegarts in #2833
add load textual inversion embeddings to stable diffusion by @piEsposito in #2009
[docs] add the Stable diffusion with Jax/Flax Guide into the docs by @yiyixuxu in #2487
Add support Karras sigmas for StableDiffusionKDiffusionPipeline by @takuma104 in #2874
Fix textual inversion loading by @GuiyeC in #2914
Fix slow tests text inv by @patrickvonplaten in #2915
Fix check_inputs in upscaler pipeline to allow embeds by @d1g1t in #2892
Modify example with intel optimization by @mengfei25 in #2896
[2884]: Fix crossattentionkwargs in StableDiffusionImg2ImgPipeline by @nipunjindal in #2902
[Tests] Speed up test by @patrickvonplaten in #2919
Have fix current pipeline link by @guspan-tanadi in #2910
Update image_variation.mdx by @standardAI in #2911
Update controlnet.mdx by @standardAI in #2912
Update pipelinestablediffusion_controlnet.py by @patrickvonplaten in #2917
Check for all different packages of opencv by @wfng92 in #2901
fix: norm group test for UNet3D. by @sayakpaul in #2959
Update euler_ancestral.mdx by @standardAI in #2932
Update unipc.mdx by @standardAI in #2936
Update scoresdeve.mdx by @standardAI in #2937
Update scoresdevp.mdx by @standardAI in #2938
Update ddim.mdx by @standardAI in #2926
Update ddpm.mdx by @standardAI in #2929
Removing explicit markdown extension by @guspan-tanadi in #2944
Ensure validation image RGB not RGBA by @ernestchu in #2945
Use upload_folder in training scripts by @Wauplin in #2934
allow use custom local dataset for controlnet training scripts by @yiyixuxu in #2928
fix post-processing by @yiyixuxu in #2968
[docs] Simplify loading guide by @stevhliu in #2694
update flax controlnet training script by @yiyixuxu in #2951
[Pipeline download] Improve pipeline download for index and passed co… by @patrickvonplaten in #2980
The variable name has been updated. by @kadirnar in #2970
Update the K-Diffusion SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2962
[Examples] Add support for Min-SNR weighting strategy for better convergence by @sayakpaul in #2899
[scheduler] fix some scheduler dtype error by @furry-potato-maker in #2992
minor fix in controlnet flax example by @yiyixuxu in #2986
Explain how to install test dependencies by @pcuenca in #2983
docs: Link Navigation Path API Pipelines by @guspan-tanadi in #2976
add Min-SNR loss to Controlnet flax train script by @yiyixuxu in #3016
dynamic threshold sampling bug fixes and docs by @williamberman in #3003
Initial draft of Core ML docs by @pcuenca in #2987
[Pipeline] Add TextToVideoZeroPipeline by @19and99 in #2954
Small typo correction in comments by @rogerioagjr in #3012
mps: skip unstable test by @pcuenca in #3037
Update contribution.mdx by @mishig25 in #3054
fix report tool by @patrickvonplaten in #3047
Fix config prints and save, load of pipelines by @patrickvonplaten in #2849
[docs] Reusing components by @stevhliu in #3000
Fix imports for composablestablediffusion pipeline by @nthh in #3002
config fixes by @williamberman in #3060
accelerate min version for ProjectConfiguration import by @williamberman in #3042
AttentionProcessor.group_norm numchannels should be `querydim` by @williamberman in #3046
Update documentation by @George-Ogden in #2996
Fix scheduler type mismatch by @pcuenca in #3041
Fix invocation of some slow Flax tests by @pcuenca in #3058
add only cross attention to simple attention blocks by @williamberman in #3011
Fix typo and format BasicTransformerBlock attributes by @off99555 in #2953
unet time embedding activation function by @williamberman in #3048
Attention processor cross attention norm group norm by @williamberman in #3021
Attn added kv processor torch 2.0 block by @williamberman in #3023
[Examples] Fix type-casting issue in the ControlNet training script by @sayakpaul in #2994
[LoRA] Enabling limited LoRA support for text encoder by @sayakpaul in #2918
fix slow tsets by @patrickvonplaten in #3066
Fix InstructPix2Pix training in multi-GPU mode by @sayakpaul in #2978
[Docs] update Self-Attention Guidance docs by @SusungHong in #2952
Flax memory efficient attention by @pcuenca in #2889
[WIP] implement rest of the test cases (LoRA tests) by @Pie31415 in #2824
fix pipeline setattr value == None by @williamberman in #3063
add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines by @ssube in #2597
[2064]: Add Karras to DPMSolverMultistepScheduler by @nipunjindal in #3001
Finish docs textual inversion by @patrickvonplaten in #3068
[Docs] refactor text-to-video zero by @sayakpaul in #3049
Update Flax TPU tests by @pcuenca in #3069
Fix a bug of pano when not doing CFG by @ernestchu in #3030
Text2video zero refinements by @19and99 in #3070

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Abhinay1997
- [Community Pipeline] Unclip Image Interpolation (#2400)
@ssube
- add OnnxStableDiffusionUpscalePipeline pipeline (#2158)
- add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines (#2597)
@haofanwang
- Support convert LoRA safetensors into diffusers format (#2403)
- Support LoRA for text encoder (#2588)
- Support for Offset Noise in examples (#2753)
- Update traintexttoimagelora.py (#2767)
- Update traintexttoimagelora.py (#2795)
@isamu-isozaki
- Added multitoken training for textual inversion. Issue 369 (#661)
@mikegarts
- community stablediffusion controlnet img2img pipeline (#2584)
- img2img.multiple.controlnets.pipeline (#2833)
@LolitaSian
- add translated docs (#2587)
@Ttl
- Controlnet training (#2545)
@nipunjindal
- [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline (#2723)
- [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline (#2779)
- [2761]: Add documentation for extrainchannels UNet1DModel (#2817)
- [2884]: Fix crossattentionkwargs in StableDiffusionImg2ImgPipeline (#2902)
- [2905]: Add Karras pattern to discrete euler (#2956)
- [2064]: Add Karras to DPMSolverMultistepScheduler (#3001)
@bahjat-kawar
- Add ModelEditing pipeline (#2721)
- StableDiffusionModelEditingPipeline documentation (#2810)
@piEsposito
- add load textual inversion embeddings to stable diffusion (#2009)
@19and99
- [Pipeline] Add TextToVideoZeroPipeline (#2954)
- Text2video zero refinements (#3070)
@MuhHanif
- Flax memory efficient attention (#2889)

- Python
Published by patrickvonplaten about 3 years ago

diffusers - ControlNet, 8K VAE decoding

:rocket: ControlNet comes to 🧨 Diffusers!

Thanks to an amazing collaboration with community member @takuma104 🙌, diffusers fully supports ControlNet! All 8 control models from the paper are available for you to use: depth, scribbles, edges, and more. Best of all is that you can take advantage of all the other goodies and optimizations that Diffusers provides out of the box, making this an ultra fast implementation of ControlNet. Take it for a spin to see for yourself.

ControlNet works by training a copy of some of the layers of the original Stable Diffusion model on additional signals, such as depth maps or scribbles. After training, you can provide a depth map as a strong hint of the composition you want to achieve, and have Stable Diffusion fill in the details for you. For example:

Before	After

Currently, there are 8 published control models, all of which were trained on runwayml/stable-diffusion-v1-5 (i.e., Stable Diffusion version 1.5). This is an example that uses the scribble controlnet model:

Before	After

Or you can turn a cartoon into a realistic photo with incredible coherence:

ControlNet showing a photo generated from a cartoon frame

How do you use ControlNet in diffusers? Just like this (example for the canny edges control model):

```Python from diffusers import StableDiffusionControlNetPipeline, ControlNetModel import torch

controlnet = ControlNetModel.frompretrained("lllyasviel/sd-controlnet-canny", torchdtype=torch.float16) pipe = StableDiffusionControlNetPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torchdtype=torch.float16 ) ```

As usual, you can use all the features in the diffusers toolbox: super-fast schedulers, memory-efficient attention, model offloading, etc. We think 🧨 Diffusers is the best way to iterate on your ControlNet experiments!

Please, refer to our blog post and documentation for details.

(And, coming soon, ControlNet training – stay tuned!)

:diamondshapewithadot_inside: VAE tiling for ultra-high resolution generation

Another community member, @kig, conceived, proposed and fully implemented an amazing PR that allows generation of ultra-high resolution images without memory blowing up 🤯. They follow a tiling approach during the image decoding phase of the process, generating a piece of the image at a time and then stitching them all together. Tiles are blended carefully to avoid visible seems between them, and the final result is amazing. This is the additional code you need to use to enjoy high-resolution generations:

Python pipe.vae.enable_tiling()

That's it!

For a complete example, refer to the PR or the code snippet we reproduce here for your convenience:

```Python import torch from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torchdtype=torch.float16) pipe = pipe.to("cuda") pipe.enablexformersmemoryefficientattention() pipe.vae.enable_tiling()

prompt = "a beautiful landscape photo" image = pipe(prompt, width=4096, height=2048, numinferencesteps=10).images[0]

image.save("4k_landscape.jpg") ```

All commits

[Docs] Add a note on SDEdit by @sayakpaul in #2433
small bugfix at StableDiffusionDepth2ImgPipeline call to check_inputs and batch size calculation by @mikegarts in #2423
add demo by @yiyixuxu in #2436
fix: code snippet of instruct pix2pix from the docs. by @sayakpaul in #2446
Update traintexttoimagelora.py by @haofanwang in #2464
mps test fixes by @pcuenca in #2470
Fix test train_unconditional by @pcuenca in #2481
add MultiDiffusion to controlling generation by @omerbt in #2490
imagenoiser -> imagenormalizer comment by @williamberman in #2496
[Safetensors] Make sure metadata is saved by @patrickvonplaten in #2506
Add 4090 benchmark (PyTorch 2.0) by @pcuenca in #2503
[Docs] Improve safetensors by @patrickvonplaten in #2508
Disable ONNX tests by @patrickvonplaten in #2509
attend and excite batch test causing timeouts by @williamberman in #2498
move pipeline based test skips out of pipeline mixin by @williamberman in #2486
pix2pix tests no write to fs by @williamberman in #2497
[Docs] Include more information in the "controlling generation" doc by @sayakpaul in #2434
Use "hub" directory for cache instead of "diffusers" by @pcuenca in #2005
Sequential cpu offload: require accelerate 0.14.0 by @pcuenca in #2517
issafetensorscompatible refactor by @williamberman in #2499
[Copyright] 2023 by @patrickvonplaten in #2524
Bring Flax attention naming in sync with PyTorch by @pcuenca in #2511
[Tests] Fix slow tests by @patrickvonplaten in #2526
PipelineTesterMixin parameter configuration refactor by @williamberman in #2502
Add a ControlNet model & pipeline by @takuma104 in #2407
8k Stable Diffusion with tiled VAE by @kig in #1441
Textual inv make save log both steps by @isamu-isozaki in #2178
Fix convert SD to diffusers error by @fkunn1326 in #1979)
Small fixes for controlnet by @patrickvonplaten in #2542
Fix ONNX checkpoint loading by @anton-l in #2544
[Model offload] Add nice warning by @patrickvonplaten in #2543

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@takuma104
- Add a ControlNet model & pipeline (#2407)

New Contributors

@mikegarts made their first contribution in https://github.com/huggingface/diffusers/pull/2423
@fkunn1326 made their first contribution in https://github.com/huggingface/diffusers/pull/2529

Full Changelog: https://github.com/huggingface/diffusers/compare/v0.13.0...v0.14.0

- Python
Published by patrickvonplaten about 3 years ago

diffusers - v0.13.1: Patch Release to fix warning when loading from `revision="fp16"`

fix transformers naming by @patrickvonplaten in #2430
remove author names. by @sayakpaul in #2428
Fix deprecation warning by @patrickvonplaten in #2426
fix the get_indices function by @yiyixuxu in #2418
Update pipeline_utils.py by @haofanwang in #2415

- Python
Published by patrickvonplaten over 3 years ago

diffusers - Controllable Generation: Pix2Pix0, Attend and Excite, SEGA, SAG, ...

:dart: Controlling Generation

There has been much recent work on fine-grained control of diffusion networks!

Diffusers now supports: 1. Instruct Pix2Pix 2. Pix2Pix 0, more details in docs 3. Attend and excite, more details in docs 4. Semantic guidance, more details in docs 5. Self-attention guidance, more details in docs 6. Depth2image 7. MultiDiffusion panorama, more details in docs

See our doc on controlling image generation and the individual pipeline docs for more details on the individual methods.

:up: Latent Upscaler

Latent Upscaler is a diffusion model that is designed explicitly for Stable Diffusion. You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it. It is incredibly flexible and can work with any SD checkpoints. Original output image | 2x upscaled output image :-------------------------:|:-------------------------: |

The model was developed by Katherine Crowson in collaboration with Stability AI ```python from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline import torch

pipeline = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-4", torchdtype=torch.float16) pipeline.to("cuda")

upscaler = StableDiffusionLatentUpscalePipeline.frompretrained("stabilityai/sd-x2-latent-upscaler", torchdtype=torch.float16) upscaler.to("cuda")

prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic" generator = torch.manual_seed(33)

we stay in latent space! Let's make sure that Stable Diffusion returns the image

in latent space

lowreslatents = pipeline(prompt, generator=generator, output_type="latent").images

upscaledimage = upscaler( prompt=prompt, image=lowreslatents, numinferencesteps=20, guidancescale=0, generator=generator, ).images[0]

Let's save the upscaled image under "upscaled_astronaut.png"

upscaledimage.save("astronaut1024.png")

as a comparison: Let's also save the low-res image

with torch.nograd(): image = pipeline.decodelatents(lowreslatents) image = pipeline.numpytopil(image)[0]

image.save("astronaut_512.png")

```

:zap: Optimization

In addition to new features and an increasing number of pipelines, diffusers cares a lot about performance. This release brings a number of optimizations that you can turn on easily.

xFormers

Memory efficient attention, as implemented by xFormers, has been available in diffusers for some time. The problem was that installing xFormers could be complicated because there were no official pip wheels (or they were outdated), and you had to resort to installing from source.

From xFormers 0.0.16, official pip wheels are now published with every release, so installing and using xFormers is now as simple as these two steps:

pip install xformers in your terminal.
pipe.enable_xformers_memory_efficient_attention() in your code to opt-in in your pipelines.

These actions will unlock dramatic memory savings, and usually faster inference too!

See more details in the documentation.

Torch 2.0

Speaking of memory-efficient attention, Accelerated PyTorch 2.0 Transformers now comes with built-in native support for it! When PyTorch 2.0 is released you'll no longer have to install xFormers or any third-party package to take advantage of it. In diffusers we are already preparing for that, and it works out of the box. So, if you happen to be using the latest "nightlies" of PyTorch 2.0 beta, then you're all set – diffusers will use Accelerated PyTorch 2.0 Transformers by default.

In our tests, the built-in PyTorch 2.0 implementation is usually as fast as xFormers', and sometimes even faster. Performance depends on the card you are using and whether you run your code in float16 or float32, so check our documentation for details.

Coarse-grained CPU offload

Community member @keturn, with whom we have enjoyed thoughtful software design conversations, called our attention to the fact that enabling sequential cpu offloading via enable_sequential_cpu_offload worked great to save a lot of memory, but made inference much slower.

This is because enable_sequential_cpu_offload() is optimized for memory, and it recursively works across all the submodules contained in a model, moving them to GPU when they are needed and back to CPU when another submodule needs to run. These cpu-to-gpu-to-cpu transfers happen hundreds of times during the stable diffusion denoising loops, because the UNet runs multiple times and it consists of several PyTorch modules.

This release of diffusers introduces a coarser enable_model_cpu_offload() pipeline API, which copies whole models (not modules) to GPU and makes sure they stay there until another model needs to run. The consequences are: - Less memory savings than enable_sequential_cpu_offload, but: - Almost as fast inference as when the pipeline is used without any type of offloading.

Pix2Pix Zero

Remember the CycleGAN days where one would turn a horse into a zebra in an image while keeping the rest of the content almost untouched? Well, that day has arrived but in the context of Diffusion. Pix2Pix Zero allows users to edit a particular image (be it real or generated), targeting a source concept (horse, for example) and replacing it with a target concept (zebra, for example).

Input image | Edited image :-------------------------:|:-------------------------: original | edited

Pix2Pix Zero was proposed in Zero-shot Image-to-Image Translation. The StableDiffusionPix2PixZeroPipeline allows you to

Edit an image generated from an input prompt
Provide an input image and edit it

For the latter, it uses the newly introduced DDIMInverseScheduler to first obtain the inverted noise from the input image and use that in the subsequent generation process.

Both of the use cases leverage the idea of "edit directions", used for steering the generation toward the target concept gradually from the source concept. To know more, we recommend checking out the official documentation.

Attend and excite

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models. Attend-and-Excite, guides the generative model to modify the cross-attention values during the image synthesis process to generate images that more faithfully depict the input text prompt. It allows creating images that are more semantically faithful with respect to the input text prompts. Thanks to community contributor @evinpinar for leading the charge to add this pipeline!

Attend and excite 2 by @evinpinar @yiyixuxu #2369

Semantic guidance

Semantic Guidance for Diffusion Models was proposed in SEGA: Instructing Diffusion using Semantic Dimensions and provides strong semantic control over image generation. Small changes to the text prompt usually result in entirely different output images. However, with SEGA, a variety of changes to the image are enabled that can be controlled easily and intuitively and stay true to the original image composition. Thanks to the lead author of SEFA, Manuel (@manuelbrack), who added the pipeline in #2223.

Here is a simple demo:

```py import torch from diffusers import SemanticStableDiffusionPipeline

pipe = SemanticStableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5", torchdtype=torch.float16) pipe = pipe.to("cuda")

out = pipe( prompt="a photo of the face of a woman", numimagesperprompt=1, guidancescale=7, editingprompt=[ "smiling, smile", # Concepts to apply "glasses, wearing glasses", "curls, wavy hair, curly hair", "beard, full beard, mustache", ], reverseeditingdirection=[False, False, False, False], # Direction of guidance i.e. increase all concepts editwarmupsteps=[10, 10, 10, 10], # Warmup period for each concept editguidancescale=[4, 5, 5, 5.4], # Guidance scale for each concept editthreshold=[ 0.99, 0.975, 0.925, 0.96, ], # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions editmomentumscale=0.3, # Momentum scale that will be added to the latent guidance editmombeta=0.6, # Momentum beta edit_weights=[1, 1, 1, 1, 1], # Weights of the individual concepts against each other ) ```

Self-attention guidance

SAG was proposed in Improving Sample Quality of Diffusion Models Using Self-Attention Guidance. SAG works by extracting the intermediate attention map from a diffusion model at every iteration and selects tokens above a certain attention score for masking and blurring to obtain a partially blurred input. Then, the dissimilarity is measured between the predicted noise outputs obtained from feeding the blurred and original input to the diffusion model and this is further leveraged as guidance. With this guidance, the authors observe apparent improvements in a wide range of diffusion models.

```python import torch from diffusers import StableDiffusionSAGPipeline from accelerate.utils import set_seed

pipe = StableDiffusionSAGPipeline.frompretrained("CompVis/stable-diffusion-v1-4", torchdtype=torch.float16) pipe = pipe.to("cuda")

seed = 8978 prompt = "." guidancescale = 7.5 numimagesperprompt = 1

sag_scale = 1.0

setseed(seed) images = pipe( prompt, numimagesperprompt=numimagesperprompt, guidancescale=guidancescale, sagscale=sag_scale ).images images[0].save("example.png") ```

SAG was contributed by @SusungHong (lead author of SAG) in https://github.com/huggingface/diffusers/pull/2193.

MultiDiffusion panorama

Proposed in MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, it presents a new generation process, "MultiDiffusion", based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints.

```python import torch from diffusers import StableDiffusionPanoramaPipeline, DDIMScheduler

modelckpt = "stabilityai/stable-diffusion-2-base" scheduler = DDIMScheduler.frompretrained(modelckpt, subfolder="scheduler") pipe = StableDiffusionPanoramaPipeline.frompretrained(modelckpt, scheduler=scheduler, torchdtype=torch.float16)

pipe = pipe.to("cuda")

prompt = "a photo of the dolomites" image = pipe(prompt).images[0] image.save("dolomites.png") ```

The pipeline was contributed by @omerbt (lead author of MultiDiffusion Panorama) and @sayakpaul in #2393.

Ethical Guidelines

Diffusers is no stranger to the different opinions and perspectives about the challenges that generative technologies bring. Thanks to @giadilli, we have drafted our first Diffusers' Ethical Guidelines with which we hope to initiate a fruitful conversation with the community.

Keras Integration

Many practitioners find it easy to fine-tune the Stable Diffusion models shipped by KerasCV. At the same time, diffusers provides a lot of options for inference, deployment and optimization. We have made it possible to easily import and use KerasCV Stable Diffusion checkpoints in diffusers, read more about the process in our new guide.

:clock3: UniPC scheduler

UniPC is a new fast scheduler in diffusion town! UniPC is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders. The orginal codebase can be found here. Thanks to @wl-zhao for the great work and integrating UniPC into the diffusers!

add the UniPC scheduler by @wl-zhao in #2373

:runner: Training: consistent EMA support

As part of 0.13.0 we improved the support for EMA in training. We added a common EMAModel in diffusers.training_utils which can be used by all scripts. The EMAModel is improved to support distributed training, new methods to easily evaluate the EMA model during training and a consistent way to save and load the EMA model similar to other models in diffusers.

Fix EMA for multi-gpu training in the unconditional example by @anton-l, @patil-suraj #1930
[Utils] Adds store() and restore() methods to EMAModel by @sayakpaul #2302
Use accelerate save & loading hooks to have better checkpoint structure by @patrickvonplaten #2048

:dog: Ruff & black

We have replaced flake8 with ruff (much faster), and updated our version of black. These tools are now in sync with the ones used in transformers, so the contributing experience is now more consistent for people using both codebases :)

All commits

[lora] Fix bug with training without validation by @orenwang in #2106
[Bump version] 0.13.0dev0 & Deprecate predict_epsilon by @patrickvonplaten in #2109
[dreambooth] check the low-precision guard before preparing model by @patil-suraj in #2102
[textual inversion] Allow validation images by @pcuenca in #2077
Allow UNet2DModel to use arbitrary class embeddings by @pcuenca in #2080
make scaling factor a config arg of vae/vqvae by @patil-suraj in #1860
[Import Utils] Fix naming by @patrickvonplaten in #2118
Fix unable to save_pretrained when using pathlib by @Cyberes in #1972
fuse attention mask by @williamberman in #2111
Fix model card of LoRA by @hysts in #2114
[nit] torch_dtype used twice in doc string by @williamberman in #2126
[LoRA] Make sure LoRA can be disabled after it's run by @patrickvonplaten in #2128
remove redundant allow_patterns by @williamberman in #2130
Allow lora from pipeline by @patrickvonplaten in #2129
Fix typos in loaders.py by @kuotient in #2137
Typo fix: torwards -> towards by @RahulBhalley in #2134
Don't call the Hub if local_files_only is specifiied by @patrickvonplaten in #2119
[from_pretrained] only load config one time by @williamberman in #2131
Adding some safetensors docs. by @Narsil in #2122
Fix typo by @pcuenca in #2138
fix typo in EMAModel's loadstatedict() by @dasayan05 in #2151
[diffusers-cli] Fix typo in accelerate and transformers versions by @pcuenca in #2154
[Design philosopy] Create official doc by @patrickvonplaten in #2140
Section on using LoRA alpha / scale by @pcuenca in #2139
Don't copy when unwrapping model by @pcuenca in #2166
Add instance prompt to model card of lora dreambooth example by @hysts in #2112
[Bug]: fix DDPM scheduler arbitrary infer steps count. by @dudulightricks in #2076
[examples] Fix CLI argument in the launch script command for text2image with LoRA by @sayakpaul in #2171
[Breaking change] fix legacy inpaint noise and resize mask tensor by @1lint in #2147
Use requests instead of wget in convert_from_ckpt.py by @Abhishek-Varma in #2168
[Docs] Add components to docs by @patrickvonplaten in #2175
[Docs] remove license by @patrickvonplaten in #2188
Pass LoRA rank to LoRALinearLayer by @asadm in #2191
add: guide on kerascv conversion tool. by @sayakpaul in #2169
Fix a dimension bug in Transform2d by @lmxyy in #2144
[Loading] Better error message on missing keys by @patrickvonplaten in #2198
Update xFormers docs by @pcuenca in #2208
add CITATION.cff by @kashif in #2211
Create traindreamboothinpaint_lora.py by @thedarkzeno in #2205
Docs: short section on changing the scheduler in Flax by @pcuenca in #2181
[Bug] schedulingddpm: fix variance in the case of learnedrange type. by @dudulightricks in #2090
refactor onnxruntime integration by @prathikr in #2042
Fix timestep dtype in legacy inpaint by @dymil in #2120
[nit] negative_prompt typo by @williamberman in #2227
removes ~s in favor of full-fledged links. by @sayakpaul in #2229
[LoRA] Make sure validation works in multi GPU setup by @patrickvonplaten in #2172
fix: flagged_images implementation by @justinmerrell in #1947
Hotfix textual inv logging by @isamu-isozaki in #2183
Fixes LoRAXFormersCrossAttnProcessor by @jorgemcgomes in #2207
Fix typo in StableDiffusionInpaintPipeline by @hutec in #2197
[Flax DDPM] Make key optional so default pipelines don't fail by @pcuenca in #2176
Show error when loading safetychecker `fromflax` by @pcuenca in #2187
Fix kdpm2 & kdpm2_a on MPS by @psychedelicious in #2241
Fix a typo: bfloa16 -> bfloat16 by @nickkolok in #2243
Mention training problems with xFormers 0.0.16 by @pcuenca in #2254
fix distributed init twice by @Fazziekey in #2252
Fixes prompt input checks in StableDiffusion img2img pipeline by @jorgemcgomes in #2206
Create convertvaepttodiffusers.py by @chavinlo in #2215
Stable Diffusion Latent Upscaler by @yiyixuxu in #2059
[Examples] Remove datasets important that is not needed by @patrickvonplaten in #2267
Make center crop and random flip as args for unconditional image generation by @wfng92 in #2259
[Tests] Fix slow tests by @patrickvonplaten in #2271
Fix torchvision.transforms and transforms function naming clash by @wfng92 in #2274
mps cross-attention hack: don't crash on fp16 by @pcuenca in #2258
Use accelerate save & loading hooks to have better checkpoint structure by @patrickvonplaten in #2048
Replace flake8 with ruff and update black by @patrickvonplaten in #2279
Textual inv save log memory by @isamu-isozaki in #2184
EMA: fix state_dict() and load_state_dict() & add cur_decay_value by @chenguolin in #2146
[Examples] Test all examples on CPU by @patrickvonplaten in #2289
fix pix2pix docs by @patrickvonplaten in #2290
misc fixes by @williamberman in #2282
Run same number of DDPM steps in inference as training by @bencevans in #2263
[LoRA] Freezing the model weights by @erkams in #2245
Fast CPU tests should also run on main by @patrickvonplaten in #2313
Correct fast tests by @patrickvonplaten in #2314
remove ddpm testfullinference by @williamberman in #2291
convert ckpt script docstring fixes by @williamberman in #2293
[Community Pipeline] UnCLIP Text Interpolation Pipeline by @Abhinay1997 in #2257
[Tests] Refactor push tests by @patrickvonplaten in #2329
Add ethical guidelines by @giadilli in #2330
Fix running LoRA with xformers by @bddppq in #2286
Fix typo in loadpipelinefromoriginalstablediffusionckpt() method by @p1atdev in #2320
[Docs] Fix ethical guidelines docs by @patrickvonplaten in #2333
[Versatile Diffusion] Fix tests by @patrickvonplaten in #2336
[Latent Upscaling] Remove unused noise by @patrickvonplaten in #2298
[Tests] Remove unnecessary tests by @patrickvonplaten in #2337
karlo image variation use kakaobrain upload by @williamberman in #2338
github issue forum link by @williamberman in #2335
dreambooth checkpointing tests and docs by @williamberman in #2339
unet check length inputs by @williamberman in #2327
unCLIP variant by @williamberman in #2297
Log Unconditional Image Generation Samples to W&B by @bencevans in #2287
Fix callback type hints - no optional function argument by @patrickvonplaten in #2357
[Docs] initial docs about KarrasDiffusionSchedulers by @kashif in #2349
KarrasDiffusionSchedulers type note by @williamberman in #2365
[Tests] Add MPS skip decorator by @patrickvonplaten in #2362
Funky spacing issue by @meg-huggingface in #2368
schedulers add glide noising schedule by @williamberman in #2347
add total number checkpoints to training scripts by @williamberman in #2367
checkpointingstepstotallimit->checkpointstotal_limit by @williamberman in #2374
Fix 3-way merging with the checkpoint_merger community pipeline by @damian0815 in #2355
[Variant] Add "variant" as input kwarg so to have better UX when downloading no_ema or fp16 weights by @patrickvonplaten in #2305
[Pipelines] Adds pix2pix zero by @sayakpaul in #2334
Add Self-Attention-Guided (SAG) Stable Diffusion pipeline by @SusungHong in #2193
[SchedulingPNDM ] reset curmodeloutput after each call by @patil-suraj in #2376
traintextto_image EMAModel saving by @williamberman in #2341
[Utils] Adds store() and restore() methods to EMAModel by @sayakpaul in #2302
enable_model_cpu_offload by @pcuenca in #2285
add the UniPC scheduler by @wl-zhao in #2373
Replace torch.concat calls by torch.cat by @fxmarty in #2378
Make diffusers importable with transformers < 4.26 by @pcuenca in #2380
[Examples] Make sure EMA works with any device by @patrickvonplaten in #2382
[Dummy imports] Add missing if else statements for SD] by @patrickvonplaten in #2381
Attend and excite 2 by @yiyixuxu in #2369
[Pix2Pix0] Add utility function to get edit vector by @patrickvonplaten in #2383
Revert "[Pix2Pix0] Add utility function to get edit vector" by @patrickvonplaten in #2384
Fix stable diffusion onnx pipeline error when batch_size > 1 by @tianleiwu in #2366
[Docs] Fix UniPC docs by @wl-zhao in #2386
[Pix2Pix Zero] Fix slow tests by @sayakpaul in #2391
[Pix2Pix] Add utility function by @patrickvonplaten in #2385
Fix UniPC tests and remove some test warnings by @pcuenca in #2396
[Pipelines] Add a section on generating captions and embeddings for Pix2Pix Zero by @sayakpaul in #2395
Torch2.0 scaleddotproduct_attention processor by @patil-suraj in #2303
add: inversion to pix2pix zero docs. by @sayakpaul in #2398
Add semantic guidance pipeline by @manuelbrack in #2223
Add ddim inversion pix2pix by @patrickvonplaten in #2397
add MultiDiffusionPanorama pipeline by @omerbt in #2393
Fixing typos in documentation by @anagri in #2389
controlling generation docs by @williamberman in #2388
applyforwardhook simply returns if no accelerate by @daquexian in #2387
Revert "Release: v0.13.0" by @williamberman in #2405
controlling generation doc nits by @williamberman in #2406
Fix typo in AttnProcessor2_0 symbol by @pcuenca in #2404
add index page by @yiyixuxu in #2401
add xformers 0.0.16 warning message by @williamberman in #2345

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@thedarkzeno
- Create traindreamboothinpaint_lora.py (#2205)
@prathikr
- refactor onnxruntime integration (#2042)
@Abhinay1997
- [Community Pipeline] UnCLIP Text Interpolation Pipeline (#2257)
@SusungHong
- Add Self-Attention-Guided (SAG) Stable Diffusion pipeline (#2193)
@wl-zhao
- add the UniPC scheduler (#2373)
- [Docs] Fix UniPC docs (#2386)
@manuelbrack
- Add semantic guidance pipeline (#2223)
@omerbt
- add MultiDiffusionPanorama pipeline (#2393)

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.12.1: Patch Release to fix local files only

Make sure cached models can be loaded in offline mode.

Don't call the Hub if local_files_only is specifiied by @patrickvonplaten in #2119

- Python
Published by patrickvonplaten over 3 years ago

diffusers - Instruct-Pix2Pix, DiT, LoRA

🪄 Instruct-Pix2Pix

Instruct-Pix2Pix is a Stable Diffusion model fine-tuned for editing images from human instructions. Given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.

The model was released with the paper InstructPix2Pix: Learning to Follow Image Editing Instructions. More information about the model can be found in the paper.

pip install diffusers transformers safetensors accelerate

```python import PIL import requests import torch from diffusers import StableDiffusionInstructPix2PixPipeline

modelid = "timbrooks/instruct-pix2pix" pipe = StableDiffusionInstructPix2PixPipeline.frompretrained(modelid, torchdtype=torch.float16).to("cuda")

url = "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" def downloadimage(url): image = PIL.Image.open(requests.get(url, stream=True).raw) image = PIL.ImageOps.exiftranspose(image) image = image.convert("RGB") return image image = download_image(url)

prompt = "make the mountains snowy" edit = pipe(prompt, image=image, numinferencesteps=20, imageguidancescale=1.5, guidancescale=7).images[0] images[0].save("snowymountains.png") ``` * Add InstructPix2Pix pipeline by @patil-suraj #2040

🤖 DiT

Diffusion Transformers (DiTs) is a class conditional latent diffusion model which replaces the commonly used U-Net backbone with a transformer operating on latent patches. The pretrained model is trained on the ImageNet-1K dataset and is able to generate class conditional images of 256x256 or 512x512 pixels.

dit

The model was released with the paper Scalable Diffusion Models with Transformers.

```python import torch from diffusers import DiTPipeline

modelid = "facebook/DiT-XL-2-256" pipe = DiTPipeline.frompretrained(modelid, torchdtype=torch.float16).to("cuda")

pick words that exist in ImageNet

words = ["white shark", "umbrella"] classids = pipe.getlabel_ids(words)

output = pipe(classlabels=classids) image = output.images[0] # label 'white shark' ```

⚡ LoRA

LoRA is a technique for performing parameter-efficient fine-tuning for large models. LoRA works by adding so-called "update matrices" to specific blocks of a pre-trained model. During fine-tuning, only these update matrices are updated while the pre-trained model parameters are kept frozen. This allows us to achieve greater memory efficiency as well as easier portability during fine-tuning.

LoRA was proposed in LoRA: Low-Rank Adaptation of Large Language Models. In the original paper, the authors investigated LoRA for fine-tuning large language models like GPT-3. cloneofsimo was the first to try out LoRA training for Stable Diffusion in the popular lora GitHub repository.

Diffusers now supports LoRA! This means you can now fine-tune a model like Stable Diffusion using consumer GPUs like Tesla T4 or RTX 2080 Ti. LoRA support was added to UNet2DConditionModel and DreamBooth training script by @patrickvonplaten in #1884.

By using LoRA, the fine-tuned checkpoints will be just 3 MBs in size. After fine-tuning, you can use the LoRA checkpoints like so:

```py from diffusers import StableDiffusionPipeline import torch

modelpath = "sayakpaul/sd-model-finetuned-lora-t4" pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-4", torchdtype=torch.float16) pipe.unet.loadattnprocs(modelpath) pipe.to("cuda")

prompt = "A pokemon with blue eyes." image = pipe(prompt, numinferencesteps=30, guidance_scale=7.5).images[0] image.save("pokemon.png") ```

pokemon-image

You can follow these resources to know more about how to use LoRA in diffusers:

text2image fine-tuning script (by @sayakpaul in #2031).
Official documentation discussing how LoRA is supported (by @sayakpaul in #2086).

📐 Customizable Cross Attention

LoRA leverages a new method to customize the cross attention layers deep in the UNet. This can be useful for other creative approaches such as Prompt-to-Prompt, and it makes it easier to apply optimizers like xFormers. This new "attention processor" abstraction was created by @patrickvonplaten in #1639 after discussing the design with the community, and we have used it to rewrite our xFormers and attention slicing implementations!

🌿 Flax => PyTorch

A long requested feature, prolific community member @camenduru took up the gauntlet in #1900 and created a way to convert Flax model weights for PyTorch. This means that you can train or fine-tune models super fast using Google TPUs, and then convert the weights to PyTorch for everybody to use. Thanks @camenduru!

🌀 Flax Img2Img

Another community member, @dhruvrnaik, ported the image-to-image pipeline to Flax in #1355! Using a TPU v2-8 (available in Colab's free tier), you can generate 8 images at once in a few seconds!

🎲 DEIS Scheduler

DEIS (Diffusion Exponential Integrator Sampler) is a new fast mult step scheduler that can generate high-quality samples in fewer steps. The scheduler was introduced in the paper Fast Sampling of Diffusion Models with Exponential Integrator. More information about the scheduler can be found in the paper.

```python from diffusers import StableDiffusionPipeline, DEISMultistepScheduler import torch

pipe = StableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5", torchdtype=torch.float16) pipe.scheduler = DEISMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars" generator = torch.Generator(device="cuda").manualseed(0) image = pipe(prompt, generator=generator, numinference_steps=25).images[0 ``` * feat : add log-rho deis multistep scheduler by @qsh-zh #1432

Reproducibility

One can now pass CPU generators to all pipelines even if the pipeline is on GPU. This ensures much better reproducibility across GPU hardware:

```python import torch from diffusers import DDIMPipeline import numpy as np

model_id = "google/ddpm-cifar10-32"

load model and scheduler

ddim = DDIMPipeline.frompretrained(modelid) ddim.to("cuda")

create a generator for reproducibility

generator = torch.manual_seed(0)

run pipeline for just two steps and return numpy tensor

image = ddim(numinferencesteps=2, output_type="np", generator=generator).images print(np.abs(image).sum()) ```

See: #1902 and https://huggingface.co/docs/diffusers/using-diffusers/reproducibility

Important New Guides

Stable Diffusion 101: https://huggingface.co/docs/diffusers/stable_diffusion
Reproducibility: https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
LoRA: https://huggingface.co/docs/diffusers/training/lora

Important Bug Fixes

Don't download safetensors if library is not installed: #2057
Make sure that save_pretrained(...) doesn't accidentally delete files: #2038
Fix CPU offload docs for maximum memory gain: #1968
Fix conversion for exotically sorted weight names: #1959
Fix intermediate checkpointing for textual inversion, thanks @lstein #2072

All commits

update composable diffusion for an updated diffuser library by @nanlliu in #1697
[Tests] Fix UnCLIP cpu offload tests by @anton-l in #1769
Bump to 0.12.0.dev0 by @anton-l in #1771
[Dreambooth] flax fixes by @pcuenca in #1765
update trainunconditionalort.py by @prathikr in #1775
Only test for xformers when enabling them #1773 by @kig in #1776
expose polynomial:power and cosinewithrestarts:num_cycles params by @zetyquickly in #1737
[Flax] Stateless schedulers, fixes and refactors by @skirsten in #1661
Correct hf hub download by @patrickvonplaten in #1767
Dreambooth docs: minor fixes by @pcuenca in #1758
Fix num images per prompt unclip by @patil-suraj in #1787
Add Flax stable diffusion img2img pipeline by @dhruvrnaik in #1355
Refactor cross attention and allow mechanism to tweak cross attention function by @patrickvonplaten in #1639
Fix OOM when using PyTorch with JAX installed. by @pcuenca in #1795
reorder model wrap + bug fix by @prathikr in #1799
Remove hardcoded names from PT scripts by @patrickvonplaten in #1778
[textualinversion] unwrapmodel text encoder before accessing weights by @patil-suraj in #1816
fix small mistake in annotation: 32 -> 64 by @Line290 in #1780
Make safety_checker optional in more pipelines by @pcuenca in #1796
Device to use (e.g. cpu, cuda:0, cuda:1, etc.) by @camenduru in #1844
Avoid duplicating PyTorch + safetensors downloads. by @pcuenca in #1836
Width was typod as weight by @Helw150 in #1800
fix: resize transform now preserves aspect ratio by @parlance-zz in #1804
Make xformers optional even if it is available by @kn in #1753
Allow selecting precision to make Dreambooth class images by @kabachuha in #1832
unCLIP image variation by @williamberman in #1781
[Community Pipeline] MagicMix by @daspartho in #1839
[Versatile Diffusion] Fix crossattentionkwargs by @patrickvonplaten in #1849
[Dtype] Align dtype casting behavior with Transformers and Accelerate by @patrickvonplaten in #1725
[StableDiffusionInpaint] Correct test by @patrickvonplaten in #1859
[textual inversion] add gradient checkpointing and small fixes. by @patil-suraj in #1848
Flax: Fix img2img and align with other pipeline by @skirsten in #1824
Make repo structure consistent by @patrickvonplaten in #1862
[Unclip] Make sure textembeddings & imageembeddings can directly be passed to enable interpolation tasks. by @patrickvonplaten in #1858
Fix ema decay by @pcuenca in #1868
[Docs] Improve docs by @patrickvonplaten in #1870
[examples] update loss computation by @patil-suraj in #1861
[traintextto_image] allow using non-ema weights for training by @patil-suraj in #1834
[Attention] Finish refactor attention file by @patrickvonplaten in #1879
Fix typo in traindreamboothinpaint by @pcuenca in #1885
Update ONNX Pipelines to use np.float64 instead of np.float by @agizmo in #1789
[examples] misc fixes by @patil-suraj in #1886
Fixes to the help for report_to in training scripts by @pcuenca in #1888
updated doc for stable diffusion pipelines by @yiyixuxu in #1770
Add UnCLIPImageVariationPipeline to dummy imports by @anton-l in #1897
Add accelerate and xformers versions to diffusers-cli env by @anton-l in #1898
[addresses issue #1642] add add_noise to scheduling-sde-ve by @aengusng8 in #1827
Add condtional generation to AudioDiffusionPipeline by @teticio in #1826
Fixes in comments in SD2 D2I by @neverix in #1903
[Deterministic torch randn] Allow tensors to be generated on CPU by @patrickvonplaten in #1902
[Docs] Remove duplicated API doc string by @patrickvonplaten in #1901
fix: DDPMScheduler.set_timesteps() by @Joqsan in #1912
Fix --resumefromcheckpoint step in traintextto_image.py by @merfnad in #1914
Support training SD V2 with Flax by @yasyf in #1783
Fix lr-scaling storetrue & default=True cli argument for textualinversion training. by @aredden in #1090
Various Fixes for Flax Dreambooth by @yasyf in #1782
Test ResnetBlock2D by @hchings in #1850
Init for korean docs by @seriousran in #1910
New Pipeline: Tiled-upscaling with depth perception to avoid blurry spots by @peterwilli in #1615
Improve reproduceability 2/3 by @patrickvonplaten in #1906
feat : add log-rho deis multistep scheduler by @qsh-zh in #1432
Feature/colossalai by @Fazziekey in #1793
[Docs] Add TRANSLATING.md file by @seriousran in #1920
[StableDiffusionimg2img] validating input type by @Shubhamai in #1913
[dreambooth] low precision guard by @williamberman in #1916
[Stable Diffusion Guide] 101 Stable Diffusion Guide directly into the docs by @patrickvonplaten in #1927
[Conversion] Make sure ema weights are extracted correctly by @patrickvonplaten in #1937
fix path to logo by @vvssttkk in #1939
Add automatic doc sorting by @patrickvonplaten in #1940
update to latest colossalai by @Fazziekey in #1951
fix typo in imagicstablediffusion.py by @andreemic in #1956
[Conversion SD] Make sure weirdly sorted keys work as well by @patrickvonplaten in #1959
allow loading ddpm models into ddim by @patrickvonplaten in #1932
[Community] Correct checkpoint merger by @patrickvonplaten in #1965
Update CLIPGuidedStableDiffusion.feature_extractor.size to fix TypeError by @oxidase in #1938
[CPU offload] correct cpu offload by @patrickvonplaten in #1968
[Docs] Update README.md by @haofanwang in #1960
Research project multi subject dreambooth by @klopsahlong in #1948
Example tests by @patrickvonplaten in #1982
Fix slow tests by @patrickvonplaten in #1983
Fix unused upcastattn flag in convertoriginalstablediffusiontodiffusers script by @kn in #1942
Allow converting Flax to PyTorch by adding a "from_flax" keyword by @camenduru in #1900
Update docstring by @Warvito in #1971
[SD Img2Img] resize source images to multiple of 8 instead of 32 by @vvsotnikov in #1571
Update README.md to include our blog post by @sayakpaul in #1998
Fix a couple typos in Dreambooth readme by @pcuenca in #2004
Add tests for 2D UNet blocks by @hchings in #1945
[Conversion] Support convert diffusers to safetensors by @hua1995116 in #1996
[Community] Fix merger by @patrickvonplaten in #2006
[Conversion] Improve safetensors by @patrickvonplaten in #1989
[Black] Update black library by @patrickvonplaten in #2007
Fix typos in ColossalAI example by @haofanwang in #2001
Use pipeline tests mixin for UnCLIP pipeline tests + unCLIP MPS fixes by @williamberman in #1908
Change PNDMPipeline to use PNDMScheduler by @willdalh in #2003
[train_unconditional] fix LR scheduler init by @patil-suraj in #2010
[Docs] No more autocast by @patrickvonplaten in #2021
[Flax] Add Flax inpainting impl by @xvjiarui in #1966
Check k-diffusion version is at least 0.0.12 by @pcuenca in #2022
DiT Pipeline by @kashif in #1806
fix dit doc header by @patil-suraj in #2027
[LoRA] Add LoRA training script by @patrickvonplaten in #1884
[Dit] Fix dit tests by @patrickvonplaten in #2034
Fix typos and minor redundancies by @Joqsan in #2029
[Lora] Model card by @patrickvonplaten in #2032
[Save Pretrained] Remove dead code lines that can accidentally remove pytorch files by @patrickvonplaten in #2038
Fix EMA for multi-gpu training in the unconditional example by @anton-l in #1930
Minor fix in the documentation of LoRA by @hysts in #2045
Add InstructPix2Pix pipeline by @patil-suraj in #2040
Create repo before cloning in examples by @Wauplin in #2047
Remove modelcards dependency by @Wauplin in #2050
Module-ise "original stable diffusion to diffusers" conversion script by @damian0815 in #2019
[StableDiffusionInstructPix2Pix] use cpu generator in slow tests by @patil-suraj in #2051
[From pretrained] Don't download .safetensors files if safetensors is… by @patrickvonplaten in #2057
Correct Pix2Pix example by @patrickvonplaten in #2056
add community pipeline: StableUnCLIPPipeline by @budui in #2037
[LoRA] Adds example on text2image fine-tuning with LoRA by @sayakpaul in #2031
Safetensors loading in "convertdiffuserstooriginalstable_diffusion" by @cafeai in #2054
[examples] add dataloadernumworkers argument by @patil-suraj in #2070
Dreambooth: reduce VRAM usage by @gleb-akhmerov in #2039
[Paint by example] Fix cpu offload for paint by example by @patrickvonplaten in #2062
[textual_inversion] Fix resuming state when using gradient checkpointing by @pcuenca in #2072
[lora] Log images when using tensorboard by @pcuenca in #2078
Fix resume epoch for all training scripts except textual_inversion by @pcuenca in #2079
[dreambooth] fix multi on gpu. by @patil-suraj in #2088
Run inference on a specific condition and fix call of manual_seed() by @shirayu in #2074
[Feat] checkpoint_merger works on local models as well as ones that use safetensors by @lstein in #2060
xFormers attention op arg by @takuma104 in #2049
[docs] [dreambooth] note random crop by @williamberman in #2085
Remove wandb from texttoimage requirements.txt by @pcuenca in #2092
[doc] update example for pix2pix by @patil-suraj in #2101
Add lora tag to the model tags by @apolinario in #2103
[docs] Adds a doc on LoRA support for diffusers by @sayakpaul in #2086
Allow directly passing text embeddings to Stable Diffusion Pipeline for prompt weighting by @patrickvonplaten in #2071
Improve transformers versions handling by @patrickvonplaten in #2104
Reproducibility 3/3 by @patrickvonplaten in #1924

🙌 Significant community contributions 🙌

The following contributors have made significant changes to the library over the last release:

@nanlliu
- update composable diffusion for an updated diffuser library (#1697)
@skirsten
- [Flax] Stateless schedulers, fixes and refactors (#1661)
- Flax: Fix img2img and align with other pipeline (#1824)
@hchings
- Test ResnetBlock2D (#1850)
- Add tests for 2D UNet blocks (#1945)
@seriousran
- Init for korean docs (#1910)
- [Docs] Add TRANSLATING.md file (#1920)
@qsh-zh
- feat : add log-rho deis multistep scheduler (#1432)
@Fazziekey
- Feature/colossalai (#1793)
- update to latest colossalai (#1951)
@klopsahlong
- Research project multi subject dreambooth (#1948)
@xvjiarui
- [Flax] Add Flax inpainting impl (#1966)
@damian0815
- Module-ise "original stable diffusion to diffusers" conversion script (#2019)
@camenduru
- Allow converting Flax to PyTorch by adding a "from_flax" keyword (#1900)

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.11.1: Patch release

This patch release fixes a bug with num_images_per_prompt in the UnCLIPPipeline * Fix num images per prompt unclip by @patil-suraj in #1787

- Python
Published by anton-l over 3 years ago

diffusers - v0.11.0: Karlo UnCLIP, safetensors, pipeline versions

:magic_wand: Karlo UnCLIP by Kakao Brain

Karlo is a text-conditional image generation model based on OpenAI's unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details in a small number of denoising steps.

This alpha version of Karlo is trained on 115M image-text pairs, including COYO-100M high-quality subset, CC3M, and CC12M. For more information about the architecture, see the Karlo repository: https://github.com/kakaobrain/karlo

pip install diffusers transformers safetensors accelerate

```python import torch from diffusers import UnCLIPPipeline

pipe = UnCLIPPipeline.frompretrained("kakaobrain/karlo-v1-alpha", torchdtype=torch.float16) pipe = pipe.to("cuda")

prompt = "a high-resolution photograph of a big red frog on a green leaf." image = pipe(prompt).images[0] ```

:octocat: Community pipeline versioning

The community pipelines hosted in diffusers/examples/community will now follow the installed version of the library.

E.g. if you have diffusers==0.9.0 installed, the pipelines from the v0.9.0 branch will be used: https://github.com/huggingface/diffusers/tree/v0.9.0/examples/community

If you've installed diffusers from source, e.g. with pip install git+https://github.com/huggingface/diffusers then the latest versions of the pipelines will be fetched from the main branch.

To change the custom pipeline version, set the custom_revision variable like so: python pipeline = DiffusionPipeline.from_pretrained( "google/ddpm-cifar10-32", custom_pipeline="one_step_unet", custom_revision="0.10.2" )

:safety_vest: safetensors

Many of the most important checkpoints now have https://github.com/huggingface/safetensors available. Upon installing safetensors with:

pip install safetensors

You will see a nice speed-up when loading your model :rocket:

Some of the most improtant checkpoints have safetensor weights added now: - https://huggingface.co/stabilityai/stable-diffusion-2 - https://huggingface.co/stabilityai/stable-diffusion-2-1 - https://huggingface.co/stabilityai/stable-diffusion-2-depth - https://huggingface.co/stabilityai/stable-diffusion-2-inpainting

Batched generation bug fixes :bug:

Make sure all pipelines can run with batched input by @patrickvonplaten in #1669

We fixed a lot of bugs for batched generation. All pipelines should now correctly process batches of prompts and images :hugs: Also we made it much easier to tweak images with reproducible seeds: https://huggingface.co/docs/diffusers/using-diffusers/reusing_seeds

:memo: Changelog

Remove spurious arg in training scripts by @pcuenca in #1644
dreambooth: fix #1566: maintain fp32 wrapper when saving a checkpoint to avoid crash when running fp16 by @timh in #1618
Allow k pipeline to generate > 1 images by @pcuenca in #1645
Remove unnecessary offset in img2img by @patrickvonplaten in #1653
Remove unnecessary kwargs in depth2img by @maruel in #1648
Add text encoder conversion by @lawfordp2017 in #1559
VersatileDiffusion: fix input processing by @LukasStruppek in #1568
tensor format ort bug fix by @prathikr in #1557
Deprecate init image correctly by @patrickvonplaten in #1649
fix bug if we don't doclassifierfree_guidance by @MKFMIKU in #1601
Handle missing globalstep key in scripts/convertoriginalstablediffusiontodiffusers.py by @Cyberes in #1612
[SD] Make sure scheduler is correct when converting by @patrickvonplaten in #1667
[Textual Inversion] Do not update other embeddings by @patrickvonplaten in #1665
Added Community pipeline for comparing Stable Diffusion v1.1-4 checkpoints by @suvadityamuk in #1584
Fix wrong type checking in convert_diffusers_to_original_stable_diffusion.py by @apolinario in #1681
[Version] Bump to 0.11.0.dev0 by @patrickvonplaten in #1682
Dreambooth: save / restore training state by @pcuenca in #1668
Disable telemetry when DISABLE_TELEMETRY is set by @w4ffl35 in #1686
Change one-step dummy pipeline for testing by @patrickvonplaten in #1690
[Community pipeline] Add github mechanism by @patrickvonplaten in #1680
Dreambooth: use warnings instead of logger in parse_args() by @pcuenca in #1688
manually update trainunconditionalort by @prathikr in #1694
Remove all local telemetry by @anton-l in #1702
Update main docs by @patrickvonplaten in #1706
[Readme] Clarify package owners by @anton-l in #1707
Fix the bug that torch version less than 1.12 throws TypeError by @chinoll in #1671
RePaint fast tests and API conforming by @anton-l in #1701
Add state checkpointing to other training scripts by @pcuenca in #1687
Improve pipelinestablediffusioninpaintlegacy.py by @cyber-meow in #1585
apply amp bf16 on textual inversion by @jiqing-feng in #1465
Add examples with Intel optimizations by @hshen14 in #1579
Added a README page for docs and a "schedulers" page by @yiyixuxu in #1710
Accept latents as optional input in Latent Diffusion pipeline by @daspartho in #1723
Fix ONNX img2img preprocessing and add fast tests coverage by @anton-l in #1727
Fix ldm tests on master by not running the CPU tests on GPU by @patrickvonplaten in #1729
Docs: recommend xformers by @pcuenca in #1724
Nightly integration tests by @anton-l in #1664
[Batched Generators] This PR adds generators that are useful to make batched generation fully reproducible by @patrickvonplaten in #1718
Fix ONNX img2img preprocessing by @peterto in #1736
Fix MPS fast test warnings by @anton-l in #1744
Fix/update the LDM pipeline and tests by @anton-l in #1743
kakaobrain unCLIP by @williamberman in #1428
[fix] pipeline_unclip generator by @williamberman in #1751
unCLIP docs by @williamberman in #1754
Correct help text for scheduler_type flag in scripts. by @msiedlarek in #1749
Add resnettimescale_shift to VD layers by @anton-l in #1757
Add attention mask to uclip by @patrickvonplaten in #1756
Support attn2==None for xformers by @anton-l in #1759
[UnCLIPPipeline] fix numimagesper_prompt by @patil-suraj in #1762
Add CPU offloading to UnCLIP by @anton-l in #1761
[Versatile] fix attention mask by @patrickvonplaten in #1763
[Revision] Don't recommend using revision by @patrickvonplaten in #1764
[Examples] Update train_unconditional.py to include logging argument for Wandb by @ash0ts in #1719
Transformers version req for UnCLIP by @anton-l in #1766

- Python
Published by anton-l over 3 years ago

diffusers - v0.10.2: Patch release

This patch removes the hard requirement for transformers>=4.25.1 in case external libraries were downgrading the library upon startup in a non-controllable way.

do not automatically enable xformers by @patrickvonplaten in #1640
Adapt to forced transformers version in some dependent libraries by @anton-l in #1638
Re-add xformers enable to UNet2DCondition by @patrickvonplaten in #1627

🚨🚨🚨 Note that xformers in not automatically enabled anymore 🚨🚨🚨

The reasons for this are given here: https://github.com/huggingface/diffusers/pull/1640#discussion_r1044651551:

We should not automatically enable xformers for three reasons:

It's not PyTorch-like API. PyTorch doesn't by default enable all the fastest options available We allocate GPU memory before the user even does .to("cuda") This behavior is not consistent with cases where xformers is not installed

=> This means: If you were used to have xformers automatically enabled, please make sure to add the following now:

```python from diffusers.utils.importutils import isxformers_available

unet = ... # load unet

if isxformersavailable(): try: unet.enablexformersmemoryefficientattention(True) except Exception as e: logger.warning( "Could not enable memory efficient attention. Make sure xformers is installed" f" correctly and a GPU is available: {e}" ) ```

for the UNet (e.g. in dreambooth) or for the pipeline:

```py from diffusers.utils.importutils import isxformers_available

pipe = ... # load pipeline

if isxformersavailable(): try: pipe.enablexformersmemoryefficientattention(True) except Exception as e: logger.warning( "Could not enable memory efficient attention. Make sure xformers is installed" f" correctly and a GPU is available: {e}" ) ```

- Python
Published by anton-l over 3 years ago

diffusers - v0.10.1: Patch release

This patch returns enable_xformers_memory_efficient_attention() to UNet2DCondition to restore backward compatibility.

Re-add xformers enable to UNet2DCondition by @patrickvonplaten in #1627

- Python
Published by anton-l over 3 years ago

diffusers - v0.10.0: Depth Guidance and Safer Checkpoints

🐳 Depth-Guided Stable Diffusion and 2.1 checkpoints

The new depth-guided stable diffusion model is fully supported in this release. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.

Installing the transformers library from source is required for the MiDaS model: bash pip install --upgrade git+https://github.com/huggingface/transformers/ ```python import torch import requests from PIL import Image from diffusers import StableDiffusionDepth2ImgPipeline

pipe = StableDiffusionDepth2ImgPipeline.frompretrained( "stabilityai/stable-diffusion-2-depth", torchdtype=torch.float16, ).to("cuda")

url = "http://images.cocodataset.org/val2017/000000039769.jpg" init_image = Image.open(requests.get(url, stream=True).raw)

prompt = "two tigers" npropmt = "bad, deformed, ugly, bad anotomy" image = pipe(prompt=prompt, image=initimage, negativeprompt=npropmt, strength=0.7).images[0] ```

The updated Stable Diffusion 2.1 checkpoints are also released and fully supported: * https://huggingface.co/stabilityai/stable-diffusion-2-1 * https://huggingface.co/stabilityai/stable-diffusion-2-1-base

:safety_vest: Safe Tensors

We now support SafeTensors: a new simple format for storing tensors safely (as opposed to pickle) that is still fast (zero-copy). * [Proposal] Support loading from safetensors if file is present. by @Narsil in #1357 * [Proposal] Support saving to safetensors by @MatthieuBizien in #1494

| Format | Safe | Zero-copy | Lazy loading | No file size limit | Layout control | Flexibility | Bfloat16 | ----------------------- | --- | --- | --- | --- | --- | --- | --- | | pickle (PyTorch) | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✓ | | H5 (Tensorflow) | ✓ | ✗ | ✓ | ✓ | ~ | ~ | ✗ | | SavedModel (Tensorflow) | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ | | MsgPack (flax) | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | | SafeTensors | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |

**More details about the comparison here: https://github.com/huggingface/safetensors#yet-another-format- pip install safetensors ```python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("stabilityai/stable-diffusion-2-1") pipe.savepretrained("./safe-stable-diffusion-2-1", safe_serialization=True)

you can also push this checkpoint to the HF Hub and load from there

safepipe = StableDiffusionPipeline.frompretrained("./safe-stable-diffusion-2-1") ```

New Pipelines

:paintbrush: Paint-by-example

An implementation of Paint by Example: Exemplar-based Image Editing with Diffusion Models by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen * Add paint by example by @patrickvonplaten in #1533

```python import PIL import requests import torch from io import BytesIO from diffusers import DiffusionPipeline

def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB")

imgurl = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/image/example1.png" maskurl = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/mask/example1.png" exampleurl = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/reference/example1.jpg"

initimage = downloadimage(imgurl).resize((512, 512)) maskimage = downloadimage(maskurl).resize((512, 512)) exampleimage = downloadimage(example_url).resize((512, 512))

pipe = DiffusionPipeline.frompretrained("Fantasy-Studio/Paint-by-Example", torchdtype=torch.float16) pipe = pipe.to("cuda")

image = pipe(image=initimage, maskimage=maskimage, exampleimage=example_image).images[0] ```

Audio Diffusion and Latent Audio Diffusion

Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to and from mel spectrogram images. * add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334 by @teticio in #1426 ```python from IPython.display import Audio from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to("cuda")

output = pipe() display(output.images[0]) display(Audio(output.audios[0], rate=pipe.mel.getsamplerate())) ```

[Experimental] K-Diffusion pipeline for Stable Diffusion

This pipeline is added to support the latest schedulers from @crowsonkb's k-diffusion The purpose of this pipeline is to compare scheduler implementations and updates, so new features from other pipelines are unlikely to be supported!

[K Diffusion] Add k diffusion sampler natively by @patrickvonplaten in #1603 pip install k-diffusion ```python from diffusers import StableDiffusionKDiffusionPipeline import torch

pipe = StableDiffusionKDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base") pipe = pipe.to("cuda")

pipe.setscheduler("sampleheun") image = pipe("astronaut riding horse", numinferencesteps=25).images[0] ```

New Schedulers

Heun scheduler inspired by Karras et. al

Algorithm 1 of Karras et. al. Scheduler ported from @crowsonkb’s k-diffusion

Add 2nd order heun scheduler by @patrickvonplaten in #1336 ```python from diffusers import HeunDiscreteScheduler

pipe = StableDiffusionPipeline.frompretrained("stabilityai/stable-diffusion-2-1") pipe.scheduler = HeunDiscreteScheduler.fromconfig(pipe.scheduler.config) ```

Single step DPM-Solver

Original paper can be found here and the improved version. The original implementation can be found here. * Add Singlestep DPM-Solver (singlestep high-order schedulers) by @LuChengTHU in #1442 ```python from diffusers import DPMSolverSinglestepScheduler

pipe = StableDiffusionPipeline.frompretrained("stabilityai/stable-diffusion-2-1") pipe.scheduler = DPMSolverSinglestepScheduler.fromconfig(pipe.scheduler.config) ```

:memo: Changelog

[Proposal] Support loading from safetensors if file is present. by @Narsil in #1357
Hotfix for AttributeErrors in OnnxStableDiffusionInpaintPipelineLegacy by @anton-l in #1448
Speed up test and remove kwargs from call by @patrickvonplaten in #1446
v-prediction training support by @patil-suraj in #1455
Fix Flax from_pt by @pcuenca in #1436
Ensure Flax pipeline always returns numpy array by @pcuenca in #1435
Add 2nd order heun scheduler by @patrickvonplaten in #1336
fix slow tests by @patrickvonplaten in #1467
Flax support for Stable Diffusion 2 by @pcuenca in #1423
Updates Image to Image Inpainting community pipeline README by @vvvm23 in #1370
StableDiffusion: Decode latents separately to run larger batches by @kig in #1150
Fix bug in half precision for DPMSolverMultistepScheduler by @rtaori in #1349
[Train unconditional] Unwrap model before EMA by @anton-l in #1469
Add ort_nightly_directml to the onnxruntime candidates by @anton-l in #1458
Allow saving trained betas by @patrickvonplaten in #1468
Fix dtype model loading by @patrickvonplaten in #1449
[Dreambooth] Make compatible with alt diffusion by @patrickvonplaten in #1470
Add better docs xformers by @patrickvonplaten in #1487
Remove reminder comment by @pcuenca in #1489
Bump to 0.10.0.dev0 + deprecations by @anton-l in #1490
Add doc for Stable Diffusion on Habana Gaudi by @regisss in #1496
Replace deprecated hub utils in train_unconditional_ort by @anton-l in #1504
[Deprecate] Correct stacklevel by @patrickvonplaten in #1483
simplyfy AttentionBlock by @patil-suraj in #1492
Standardize on using image argument in all pipelines by @fboulnois in #1361
support v prediction in other schedulers by @patil-suraj in #1505
Fix Flax flipsinto_cos by @akashgokul in #1369
Add an explicit --image_size to the conversion script by @anton-l in #1509
fix heun scheduler by @patil-suraj in #1512
[docs] [dreambooth training] accelerate.utils.writebasicconfig by @williamberman in #1513
[docs] [dreambooth training] numclassimages clarification by @williamberman in #1508
[From pretrained] Allow returning local path by @patrickvonplaten in #1450
Update conversion script to correctly handle SD 2 by @patrickvonplaten in #1511
[refactor] Making the xformers mem-efficient attention activation recursive by @blefaudeux in #1493
Do not use torch.long in mps by @pcuenca in #1488
Fix Imagic example by @dhruvrnaik in #1520
Fix training docs to install datasets by @pedrogengo in #1476
Finalize 2nd order schedulers by @patrickvonplaten in #1503
Fixed mask+masked_image in sd inpaint pipeline by @antoche in #1516
Create traindreamboothinpaint.py by @thedarkzeno in #1091
Update FlaxLMSDiscreteScheduler by @dzlab in #1474
[Proposal] Support saving to safetensors by @MatthieuBizien in #1494
Add xformers attention to VAE by @kig in #1507
[CI] Add slow MPS tests by @anton-l in #1104
[Stable Diffusion Inpaint] Allow tensor as input image & mask by @patrickvonplaten in #1527
Compute embedding distances with torch.cdist by @blefaudeux in #1459
[Upscaling] Fix batch size by @patrickvonplaten in #1525
Update bug-report.yml by @patrickvonplaten in #1548
[Community Pipeline] Checkpoint Merger based on Automatic1111 by @Abhinay1997 in #1472
[textual_inversion] Add an option for only saving the embeddings by @allo- in #781
[examples] use from_pretrained to load scheduler by @patil-suraj in #1549
fix mask discrepancies in traindreamboothinpaint by @thedarkzeno in #1529
[refactor] make setattentionslice recursive by @patil-suraj in #1532
Research folder by @patrickvonplaten in #1553
add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334 by @teticio in #1426
[Community download] Fix cache dir by @patrickvonplaten in #1555
[Docs] Correct docs by @patrickvonplaten in #1554
Fix typo by @pcuenca in #1558
[docs] [dreambooth training] default accelerate config by @williamberman in #1564
Mega community pipeline by @patrickvonplaten in #1561
[examples] add checkminversion by @patil-suraj in #1550
[dreambooth] make collate_fn global by @patil-suraj in #1547
Standardize fast pipeline tests with PipelineTestMixin by @anton-l in #1526
Add paint by example by @patrickvonplaten in #1533
[Community Pipeline] fix lpwstablediffusion by @SkyTNT in #1570
[Paint by Example] Better default for image width by @patrickvonplaten in #1587
Add from_pretrained telemetry by @anton-l in #1461
Correct order height & width in pipelinepaintby_example.py by @Fantasy-Studio in #1589
Fix common tests for FP16 by @anton-l in #1588
[UNet2DConditionModel] add an option to upcast attention to fp32 by @patil-suraj in #1590
Flax: avoid recompilation when params change by @pcuenca in #1096
Add Singlestep DPM-Solver (singlestep high-order schedulers) by @LuChengTHU in #1442
fix upcast in slice attention by @patil-suraj in #1591
Update scheduling_repaint.py by @Randolph-zeng in #1582
Update RL docs for better sharing / adding models by @natolambert in #1563
Make cross-attention check more robust by @pcuenca in #1560
[ONNX] Fix flaky tests by @anton-l in #1593
Trivial fix for undefined symbol in train_dreambooth.py by @bcsherma in #1598
[K Diffusion] Add k diffusion sampler natively by @patrickvonplaten in #1603
[Versatile Diffusion] add upcast_attention by @patil-suraj in #1605
Fix PyCharm/VSCode static type checking for dummy objects by @anton-l in #1596

- Python
Published by anton-l over 3 years ago

diffusers - v0.9.0: Stable Diffusion 2

:art: Stable Diffusion 2 is here!

Installation

pip install diffusers[torch]==0.9 transformers

Stable Diffusion 2.0 is available in several flavors:

Stable Diffusion 2.0-V at `768x768`

New stable diffusion model (Stable Diffusion 2.0-v) at 768x768 resolution. Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. SD 2.0-v is a so-called v-prediction model.

```python import torch from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repoid = "stabilityai/stable-diffusion-2" pipe = DiffusionPipeline.frompretrained(repoid, torchdtype=torch.float16, revision="fp16") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda")

prompt = "High quality photo of an astronaut riding a horse in space" image = pipe(prompt, guidancescale=9, numinference_steps=25).images[0] image.save("astronaut.png") ```

Stable Diffusion 2.0-base at `512x512`

The above model is finetuned from SD 2.0-base, which was trained as a standard noise-prediction model on 512x512 images and is also made available.

```python import torch from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repoid = "stabilityai/stable-diffusion-2-base" pipe = DiffusionPipeline.frompretrained(repoid, torchdtype=torch.float16, revision="fp16") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda")

prompt = "High quality photo of an astronaut riding a horse in space" image = pipe(prompt, numinferencesteps=25).images[0] image.save("astronaut.png") ```

Stable Diffusion 2.0 for Inpanting

This model for text-guided inpanting is finetuned from SD 2.0-base. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning.

```python import PIL import requests import torch from io import BytesIO from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB")

imgurl = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpaintingexamples/overture-creations-5sI6fQgYIuo.png" maskurl = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpaintingexamples/overture-creations-5sI6fQgYIuomask.png" initimage = downloadimage(imgurl).resize((512, 512)) maskimage = downloadimage(mask_url).resize((512, 512))

repoid = "stabilityai/stable-diffusion-2-inpainting" pipe = DiffusionPipeline.frompretrained(repoid, torchdtype=torch.float16, revision="fp16") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench" image = pipe(prompt=prompt, image=initimage, maskimage=maskimage, numinferencesteps=25).images[0] image.save("yellowcat.png") ```

Stable Diffusion X4 Upscaler

The model was trained on crops of size 512x512 and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a noise_level as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule.

```python import requests from PIL import Image from io import BytesIO from diffusers import StableDiffusionUpscalePipeline import torch

modelid = "stabilityai/stable-diffusion-x4-upscaler" pipeline = StableDiffusionUpscalePipeline.frompretrained(modelid, revision="fp16", torchdtype=torch.float16) pipeline = pipeline.to("cuda")

url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/lowrescat.png" response = requests.get(url) lowresimg = Image.open(BytesIO(response.content)).convert("RGB") lowresimg = lowresimg.resize((128, 128))

prompt = "a white cat" upscaledimage = pipeline(prompt=prompt, image=lowresimg).images[0] upscaledimage.save("upsampled_cat.png") ```

Saving & Loading is fixed for Versatile Diffusion

Previously there was a :bug: when saving & loading versatile diffusion - this is fixed now so that memory efficient saving & loading works as expected.

[Versatile Diffusion] Fix remaining tests by @patrickvonplaten in #1418

:memo: Changelog

add v prediction by @patil-suraj in #1386
Adapt UNet2D for supre-resolution by @patil-suraj in #1385
Version 0.9.0.dev0 by @anton-l in #1394
Make height and width optional by @patrickvonplaten in #1401
[Config] Add optional arguments by @patrickvonplaten in #1395
Upscaling fixed by @patrickvonplaten in #1402
Add the new SD2 attention params to the VD text unet by @anton-l in #1400
Deprecate sample size by @patrickvonplaten in #1406
Support SD2 attention slicing by @anton-l in #1397
Add SD2 inpainting integration tests by @anton-l in #1412
Fix sample size conversion script by @patrickvonplaten in #1408
fix clip guided by @patrickvonplaten in #1414
Fix all stable diffusion by @patrickvonplaten in #1415
[MPS] call contiguous after permute by @kashif in #1411
Deprecate predict_epsilon by @pcuenca in #1393
Fix ONNX conversion and inference by @anton-l in #1416
Allow to set config params directly in init by @patrickvonplaten in #1419
Add tests for Stable Diffusion 2 V-prediction 768x768 by @anton-l in #1420
StableDiffusionUpscalePipeline by @patil-suraj in #1396
added initial v-pred support to DPM-solver by @kashif in #1421
SD2 docs by @patrickvonplaten in #1424

- Python
Published by anton-l over 3 years ago

diffusers - v0.8.1: Patch release

This patch release fixes an error with CLIPVisionModelWithProjection imports on a non-git transformers installation.

:warning: Please upgrade with pip install --upgrade diffusers or pip install diffusers==0.8.1

[Bad dependencies] Fix imports (https://github.com/huggingface/diffusers/pull/1382) by @patrickvonplaten

- Python
Published by anton-l over 3 years ago

diffusers - v0.8.0: Versatile Diffusion - Text, Images and Variations All in One Diffusion Model

🙆‍♀️ New Models

VersatileDiffusion

VersatileDiffusion, released by SHI-Labs, is a unified multi-flow multimodal diffusion model that is capable of doing multiple tasks such as text2image, image variations, dual-guided(text+image) image generation, image2text.

[Versatile Diffusion] Add versatile diffusion model by @patrickvonplaten @anton-l #1283 Make sure to install transformers from "main":

bash pip install git+https://github.com/huggingface/transformers

Then you can run:

```python from diffusers import VersatileDiffusionPipeline import torch import requests from io import BytesIO from PIL import Image

pipe = VersatileDiffusionPipeline.frompretrained("shi-labs/versatile-diffusion", torchdtype=torch.float16) pipe = pipe.to("cuda")

initial image

url = "https://huggingface.co/datasets/diffusers/images/resolve/main/benz.jpg" response = requests.get(url) image = Image.open(BytesIO(response.content)).convert("RGB")

prompt

prompt = "a red car"

text to image

image = pipe.texttoimage(prompt).images[0]

image variation

image = pipe.image_variation(image).images[0]

image variation

image = pipe.dual_guided(prompt, image).images[0] ```

More in-depth details can be found on: - Model card - Docs

AltDiffusion

AltDiffusion is a multilingual latent diffusion model that supports text-to-image generation for 9 different languages: English, Chinese, Spanish, French, Japanese, Korean, Arabic, Russian and Italian.

Add AltDiffusion by @patrickvonplaten @patil-suraj #1299

Stable Diffusion Image Variations

StableDiffusionImageVariationPipeline by @justinpinkney is a stable diffusion model that takes an image as an input and generates variations of that image. It is conditioned on CLIP image embeddings instead of text.

StableDiffusionImageVariationPipeline by @patil-suraj #1365

Safe Latent Diffusion

Safe Latent Diffusion (SLD), released by ml-research@TUDarmstadt group, is a new practical and sophisticated approach to prevent unsolicited content from being generated by diffusion models. One of the authors of the research contributed their implementation to diffusers.

Add Safe Stable Diffusion Pipeline by @manuelbrack #1244

VQ-Diffusion with classifier-free sampling

vq diffusion classifier free sampling by @williamberman #1294

LDM super resolution

LDM super resolution is a latent 4x super-resolution diffusion model released by CompVis.

Add LDM Super Resolution pipeline by @duongna21 #1116

CycleDiffusion

CycleDiffusion is a method that uses Text-to-Image Diffusion Models for Image-to-Image Editing. It is capable of

Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion. Traditional unpaired image-to-image translation with diffusion models trained on two related domains.
Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion. Traditional unpaired image-to-image translation with diffusion models trained on two related domains.

Add CycleDiffusion pipeline using Stable Diffusion by @ChenWu98 #888

CLIPSeg + StableDiffusionInpainting.

Uses CLIPSeg to automatically generate a mask using segmentation, and then applies Stable Diffusion in-painting.

K-Diffusion wrapper

K-Diffusion Pipeline is community pipeline that allows to use any sampler from K-diffusion with diffusers models.

[Community Pipelines] K-Diffusion Pipeline by @patrickvonplaten #1360

🌀New SOTA Scheduler

DPMSolverMultistepScheduler is the 🧨 diffusers implementation of DPM-Solver++, a state-of-the-art scheduler that was contributed by one of the authors of the paper. This scheduler is able to achieve great quality in as few as 20 steps. It's a drop-in replacement for the default Stable Diffusion scheduler, so you can use it to essentially half generation times. It works so well that we adopted it for the Stable Diffusion demo Spaces: https://huggingface.co/spaces/stabilityai/stable-diffusion, https://huggingface.co/spaces/runwayml/stable-diffusion-v1-5.

You can use it like this:

```Python from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repoid = "runwayml/stable-diffusion-v1-5" scheduler = DPMSolverMultistepScheduler.frompretrained(repoid, subfolder="scheduler") stablediffusion = DiffusionPipeline.frompretrained(repoid, scheduler=scheduler) ```

🌐 Better scheduler API

The example above also demonstrates how to load schedulers using a new API that is coherent with model loading and therefore more natural and intuitive.

You can load a scheduler using from_pretrained, as demonstrated above, or you can instantiate one from an existing scheduler configuration. This is a way to replace the scheduler of a pipeline that was previously loaded:

```Python from diffusers import DiffusionPipeline, EulerDiscreteScheduler

pipeline = DiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5") pipeline.scheduler = DDIMScheduler.fromconfig(pipeline.scheduler.config) ```

Read more about these changes in the documentation. See also the community pipeline that allows using any of the K-diffusion samplers with diffusers, as mentioned above!

🎉 Performance

We work relentlessly to incorporate performance optimizations and memory reduction techniques to 🧨 diffusers. These are two of the most noteworthy incorporations in this release:

Enable memory-efficient attention by default if xFormers is installed.
Use batched-matmuls when possible.

🎁 Quality of Life improvements

Fix/Enable all schedulers for in-painting
Easier loading of local pipelines
cpu offloading: mutli GPU support

:memo: Changelog

Add multistep DPM-Solver discrete scheduler by @LuChengTHU in #1132
Remove warning about half precision on MPS by @pcuenca in #1163
Fix typo latens -> latents by @duongna21 in #1171
Fix community pipeline links by @pcuenca in #1162
[Docs] Add loading script by @patrickvonplaten in #1174
Fix dtype safety checker inpaint legacy by @patrickvonplaten in #1137
Community pipeline img2img inpainting by @vvvm23 in #1114
[Community Pipeline] Add multilingual stable diffusion to community pipelines by @juancopi81 in #1142
[Flax examples] Load text encoder from subfolder by @duongna21 in #1147
Link to Dreambooth blog post instead of W&B report by @pcuenca in #1180
Fix small typo by @pcuenca in #1178
[DDIMScheduler] fix noise device in ddim step by @patil-suraj in #1189
MPS schedulers: don't use float64 by @pcuenca in #1169
Warning for invalid options without "--withpriorpreservation" by @shirayu in #1065
[ONNX] Improve ONNXPipeline scheduler compatibility, fix safety_checker by @anton-l in #1173
Restore compatibility with deprecated StableDiffusionOnnxPipeline by @pcuenca in #1191
Update pr docs actions by @mishig25 in #1194
handle dtype xformers attention by @patil-suraj in #1196
[Scheduler] Move predict epsilon to init by @patrickvonplaten in #1155
add licenses to pipelines by @natolambert in #1201
Fix cpu offloading by @anton-l in #1177
Fix slow tests by @patrickvonplaten in #1210
[Flax] fix extra copy pasta 🍝 by @camenduru in #1187
[CLIPGuidedStableDiffusion] support DDIM scheduler by @patil-suraj in #1190
Fix layer names convert LDM script by @duongna21 in #1206
[Loading] Make sure loading edge cases work by @patrickvonplaten in #1192
Add LDM Super Resolution pipeline by @duongna21 in #1116
[Conversion] Improve conversion script by @patrickvonplaten in #1218
DDIM docs by @patrickvonplaten in #1219
apply repeat_interleave fix for mps to stable diffusion image2image pipeline by @jncasey in #1135
Flax tests: don't hardcode number of devices by @pcuenca in #1175
Improve documentation for the LPW pipeline by @exo-pla-net in #1182
Factor out encode text with Copied from by @patrickvonplaten in #1224
Match the generator device to the pipeline for DDPM and DDIM by @anton-l in #1222
[Tests] Fix mps+generator fast tests by @anton-l in #1230
[Tests] Adjust TPU test values by @anton-l in #1233
Add a reference to the name 'Sampler' by @apolinario in #1172
Fix Flax usage comments by @pcuenca in #1211
[Docs] improve img2img example by @ruanrz in #1193
[Stable Diffusion] Fix padding / truncation by @patrickvonplaten in #1226
Finalize stable diffusion refactor by @patrickvonplaten in #1269
Edited attention.py for older xformers by @Lime-Cakes in #1270
Fix wrong link in text2img fine-tuning documentation by @daspartho in #1282
[StableDiffusionInpaintPipeline] fix batch_size for mask and masked latents by @patil-suraj in #1279
Add UNet 1d for RL model for planning + colab by @natolambert in #105
Fix documentation typo for UNet2DModel and UNet2DConditionModel by @xenova in #1275
add source link to composable diffusion model by @nanliu1 in #1293
Fix incorrect link to Stable Diffusion notebook by @dhruvrnaik in #1291
[dreambooth] link to bitsandbytes readme for installation by @0xdevalias in #1229
Add Scheduler.from_pretrained and better scheduler changing by @patrickvonplaten in #1286
Add AltDiffusion by @patrickvonplaten in #1299
Better error message for transformers dummy by @patrickvonplaten in #1306
Revert "Update pr docs actions" by @mishig25 in #1307
[AltDiffusion] add tests by @patil-suraj in #1311
Add improved handling of pil by @patrickvonplaten in #1309
cpu offloading: mutli GPU support by @dblunk88 in #1143
vq diffusion classifier free sampling by @williamberman in #1294
doc string args shape fix by @kamalkraj in #1243
[Community Pipeline] CLIPSeg + StableDiffusionInpainting by @unography in #1250
Temporary local test for PIL_INTERPOLATION by @pcuenca in #1317
Fix gpu_id by @anton-l in #1326
integrate ort by @prathikr in #1110
[Custom pipeline] Easier loading of local pipelines by @patrickvonplaten in #1327
[ONNX] Support Euler schedulers by @anton-l in #1328
img2text Typo by @patrickvonplaten in #1329
add docs for multi-modal examples by @natolambert in #1227
[Flax] Fix loading scheduler from subfolder by @skirsten in #1319
Fix/Enable all schedulers for in-painting by @patrickvonplaten in #1331
Correct path to schedlure by @patrickvonplaten in #1322
Avoid nested fix-copies by @anton-l in #1332
Fix img2img speed with LMS-Discrete Scheduler by @NotNANtoN in #896
Fix the order of casts for onnx inpainting by @anton-l in #1338
Legacy Inpainting Pipeline for Onnx Models by @ctsims in #1237
Jax infer support negative prompt by @entrpn in #1337
Update README.md: IMAGIC example code snippet misspelling by @ki-arie in #1346
Update README.md: Minor change to Imagic code snippet, missing dir error by @ki-arie in #1347
Handle batches and Tensors in pipeline_stable_diffusion_inpaint.py:prepare_mask_and_masked_image by @vict0rsch in #1003
change the sample model by @shunxing1234 in #1352
Add bit diffusion [WIP] by @kingstut in #971
perf: prefer batched matmuls for attention by @Birch-san in #1203
[Community Pipelines] K-Diffusion Pipeline by @patrickvonplaten in #1360
Add Safe Stable Diffusion Pipeline by @manuelbrack in #1244
[examples] fix mixed_precision arg by @patil-suraj in #1359
use memoryefficientattention by default by @patil-suraj in #1354
Replace logger.warn by logger.warning by @regisss in #1366
Fix using non-square images with UNet2DModel and DDIM/DDPM pipelines by @jenkspt in #1289
handle fp16 in UNet2DModel by @patil-suraj in #1216
StableDiffusionImageVariationPipeline by @patil-suraj in #1365

- Python
Published by patil-suraj over 3 years ago

diffusers - v0.7.2: Patch release

This patch release fixes a bug that broken the Flax Stable Diffusion Inference. Thanks a mille for spotting it @camenduru in https://github.com/huggingface/diffusers/issues/1145 and thanks a lot to @pcuenca and @kashif for fixing it in https://github.com/huggingface/diffusers/pull/1149

Flax: Flip sin to cos in time embeddings #1149 by @pcuenca

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.7.1: Patch release

This patch release makes accelerate a soft dependency to avoid an error when installing diffusers with pre-existing torch.

Move accelerate to a soft-dependency #1134 by @patrickvonplaten

- Python
Published by anton-l over 3 years ago

diffusers - v0.7.0: Optimized for Apple Silicon, Improved Performance, Awesome Community

:heart: PyTorch + Accelerate

:warning: The PyTorch pipelines now require accelerate for improved model loading times! Install Diffusers with pip install --upgrade diffusers[torch] to get everything in a single command.

🍎 Apple Silicon support with PyTorch 1.13

PyTorch and Apple have been working on improving mps support in PyTorch 1.13, so Apple Silicon is now a first-class citizen in diffusers 0.7.0!

Requirements

Mac computer with Apple silicon (M1/M2) hardware.
macOS 12.6 or later (13.0 or later recommended, as support is even better).
arm64 version of Python.
PyTorch 1.13.0 official release, installed from pip or the conda channels.

Memory efficient generation

Memory management is crucial to achieve fast generation speed. We recommend to always use attention slicing on Apple Silicon, as it drastically reduces memory pressure and prevents paging or swapping. This is especially important for computers with less than 64 GB of Unified RAM, and may be the difference between generating an image in seconds rather than in minutes. Use it like this:

```Python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = pipe.to("mps")

Recommended if your computer has < 64 GB of RAM

pipe.enableattentionslicing()

prompt = "a photo of an astronaut riding a horse on mars"

First-time "warmup" pass

_ = pipe(prompt, numinferencesteps=1)

image = pipe(prompt).images[0] image.save("astronaut.png") ```

Continuous Integration

Our automated tests now include a full battery of tests on the mps device. This will be helpful to identify issues early and ensure the quality on Apple Silicon going forward.

See more details in the documentation.

💃 Dance Diffusion

diffusers goes audio 🎵 Dance Diffusion by Harmonai is the first audio model in 🧨Diffusers!

[Dance Diffusion] Add dance diffusion by @patrickvonplaten #803

Try it out to generate some random music:

```python from diffusers import DiffusionPipeline import scipy

modelid = "harmonai/maestro-150k" pipeline = DiffusionPipeline.frompretrained(model_id) pipeline = pipeline.to("cuda")

audio = pipeline(audiolengthin_s=4.0).audios[0]

To save locally

scipy.io.wavfile.write("maestrotest.wav", pipe.unet.samplerate, audio.transpose()) ```

🎉 Euler schedulers

These are the Euler schedulers, from the paper Elucidating the Design Space of Diffusion-Based Generative Models by Karras et al. (2022). The diffusers implementation is based on the original k-diffusion implementation by Katherine Crowson. The Euler schedulers are fast, often times generating really good outputs with 20-30 steps.

k-diffusion-euler by @hlky #1019

```python from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

eulerscheduler = EulerDiscreteScheduler.fromconfig("runwayml/stable-diffusion-v1-5", subfolder="scheduler") pipeline = StableDiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", scheduler=eulerscheduler, revision="fp16", torch_dtype=torch.float16 ) pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars" image = pipeline(prompt, numinferencesteps=25).images[0] ```

```python from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler

eulerancestralscheduler = EulerAncestralDiscreteScheduler.fromconfig("runwayml/stable-diffusion-v1-5", subfolder="scheduler") pipeline = StableDiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", scheduler=eulerscheduler, revision="fp16", torchdtype=torch.float16 ) pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars" image = pipeline(prompt, numinferencesteps=25).images[0] ```

🔥 Up to 2x faster inference with `memory_efficient_attention`

Even faster and memory efficient stable diffusion using the efficient flash attention implementation from xformers

Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR #532

To leverage it just make sure you have: - PyTorch > 1.12 - Cuda available - Installed the xformers library ```python from diffusers import StableDiffusionPipeline import torch

pipe = StableDiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", revision="fp16", torchdtype=torch.float16, ).to("cuda")

pipe.enablexformersmemoryefficientattention()

with torch.inference_mode(): sample = pipe("a small cat")

optional: You can disable it via

pipe.disablexformersmemoryefficientattention()

```

🚀 Much faster loading

Thanks to accelerate, pipeline loading is much, much faster. There are two parts to it:

First, when a model is created PyTorch initializes its weights by default. This takes a good amount of time. Using low_cpu_mem_usage (enabled by default), no initialization will be performed.
Optionally, you can also use device_map="auto" to automatically select the best device(s) where the pre-trained weights will be initially sent to.

In our tests, loading time was more than halved on CUDA devices, and went down from 12s to 4s on an Apple M1 computer.

As a side effect, CPU usage will be greatly reduced during loading, because no temporary copies of the weights are necessary.

This feature requires PyTorch 1.9 or better and accelerate 0.8.0 or higher.

🎨 RePaint

RePaint allows to reuse any pretrained DDPM model for free-form inpainting by adding restarts to the denoising schedule. Based on the paper RePaint: Inpainting using Denoising Diffusion Probabilistic Models by Andreas Lugmayr et al.

```python from diffusers import RePaintPipeline, RePaintScheduler

Load the RePaint scheduler and pipeline based on a pretrained DDPM model

scheduler = RePaintScheduler.fromconfig("google/ddpm-ema-celebahq-256") pipe = RePaintPipeline.frompretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler) pipe = pipe.to("cuda")

generator = torch.Generator(device="cuda").manualseed(0) output = pipe( originalimage=originalimage, maskimage=maskimage, numinferencesteps=250, eta=0.0, jumplength=10, jumpnsample=10, generator=generator, ) inpainted_image = output.images[0] ```

:earth_africa: Community Pipelines

Long Prompt Weighting Stable Diffusion

The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]". The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class. For a code example, see Long Prompt Weighting Stable Diffusion * [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907

Speech to Image

Generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion. For a code example, see Speech to Image * [Examples] add speech to image pipeline example by @MikailINTech in https://github.com/huggingface/diffusers/pull/897

Wildcard Stable Diffusion

A minimal implementation that allows for users to add "wildcards", denoted by __wildcard__ to prompts that are used as placeholders for randomly sampled values given by either a dictionary or a .txt file. For a code example, see Wildcard Stable Diffusion * Wildcard stable diffusion pipeline by @shyamsn97 in #900

Composable Stable Diffusion

Use logic operators to do compositional generation. For a code example, see Composable Stable Diffusion * Add Composable diffusion to community pipeline examples by @MarkRich in #951

Imagic Stable Diffusion

Image editing with Stable Diffusion. For a code example, see Imagic Stable Diffusion * Add imagic to community pipelines by @MarkRich in #958

Seed Resizing

Allows to generate a larger image while keeping the content of the original image. For a code example, see Seed Resizing * Add seed resizing to community pipelines by @MarkRich in #1011

:memo: Changelog

[Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907
[Stable Diffusion] Add components function by @patrickvonplaten in #889
[PNDM Scheduler] Make sure list cannot grow forever by @patrickvonplaten in #882
[DiffusionPipeline.from_pretrained] add warning when passing unused k… by @patrickvonplaten in #870
DOC Dreambooth Add --samplebatchsize=1 to the 8 GB dreambooth example script by @leszekhanusz in #829
[Examples] add speech to image pipeline example by @MikailINTech in #897
[dreambooth] dont use safety check when generating prior images by @patil-suraj in #922
Dreambooth class image generation: using unique names to avoid overwriting existing image by @leszekhanusz in #847
fix test_components by @patil-suraj in #928
Fix Compatibility with Nvidia NGC Containers by @tasercake in #919
[Community Pipelines] Fix padtokensandweights in lpwstable_diffusion by @SkyTNT in #925
Bump the version to 0.7.0.dev0 by @anton-l in #912
Introduce the copy mechanism by @anton-l in #924
[Tests] Move stable diffusion into their own files by @patrickvonplaten in #936
[Flax] dont warn for bf16 weights by @patil-suraj in #923
Support LMSDiscreteScheduler in LDMPipeline by @mkshing in #891
Wildcard stable diffusion pipeline by @shyamsn97 in #900
[MPS] fix mps failing tests by @kashif in #934
fix a small typo in pipeline_ddpm.py by @chenguolin in #948
Reorganize pipeline tests by @anton-l in #963
v1-5 docs updates by @apolinario in #921
add community pipeline docs; add minimal text to some empty doc pages by @natolambert in #930
Fix typo: torch_type -> torch_dtype by @pcuenca in #972
add numinferencesteps arg to DDPM by @tmabraham in #935
Add Composable diffusion to community pipeline examples by @MarkRich in #951
[Flax] added broadcasttoshapefromleft helper and Scheduler tests by @kashif in #864
[Tests] Fix mps reproducibility issue when running with pytest-xdist by @anton-l in #976
mps changes for PyTorch 1.13 by @pcuenca in #926
[Onnx] support half-precision and fix bugs for onnx pipelines by @SkyTNT in #932
[Dance Diffusion] Add dance diffusion by @patrickvonplaten in #803
[Dance Diffusion] FP16 by @patrickvonplaten in #980
[Dance Diffusion] Better naming by @patrickvonplaten in #981
Fix typo in documentation title by @echarlaix in #975
Add --pretrainedmodelnamerevision option to traindreambooth.py by @shirayu in #933
Do not use torch.float64 on the mps device by @pcuenca in #942
CompVis -> diffusers script - allow converting from merged checkpoint to either EMA or non-EMA by @patrickvonplaten in #991
fix a bug in the new version by @xiaohu2015 in #957
Fix typos by @shirayu in #978
Add missing import by @juliensimon in #979
minimal stable diffusion GPU memory usage with accelerate hooks by @piEsposito in #850
[inpaint pipeline] fix bug for multiple prompts inputs by @xiaohu2015 in #959
Enable multi-process DataLoader for dreambooth by @skirsten in #950
Small modification to enable usage by external scripts by @briancw in #956
[Flax] Add Textual Inversion by @duongna21 in #880
Continuation of #942: additional float64 failure by @pcuenca in #996
fix dreambooth script. by @patil-suraj in #1017
[Accelerate model loading] Fix meta device and super low memory usage by @patrickvonplaten in #1016
[Flax] Add finetune Stable Diffusion by @duongna21 in #999
[DreamBooth] Set train mode for text encoder by @duongna21 in #1012
[Flax] Add DreamBooth by @duongna21 in #1001
Deprecate init_git_repo, refactor train_unconditional.py by @anton-l in #1022
update readme for flax examples by @patil-suraj in #1026
Probably nicer to specify dependency on tensorboard in the training example by @lukovnikov in #998
Add --dataloader_num_workers to the DDPM training example by @anton-l in #1027
Document sequential CPU offload method on Stable Diffusion pipeline by @piEsposito in #1024
Support grayscale images in numpy_to_pil by @anton-l in #1025
[Flax SD finetune] Fix dtype by @duongna21 in #1038
fix F.interpolate() for large batch sizes by @NouamaneTazi in #1006
[Tests] Improve unet / vae tests by @patrickvonplaten in #1018
[Tests] Speed up slow tests by @patrickvonplaten in #1040
Fix some failing tests by @patrickvonplaten in #1041
[Tests] Better prints by @patrickvonplaten in #1043
[Tests] no random latents anymore by @patrickvonplaten in #1045
Update training and fine-tuning docs by @pcuenca in #1020
Fix speedup ratio in fp16.mdx by @mwbyeon in #837
clean incomplete pages by @natolambert in #1008
Add seed resizing to community pipelines by @MarkRich in #1011
Tests: upgrade PyTorch cuda to 11.7 to fix examples tests. by @pcuenca in #1048
Experimental: allow fp16 in mps by @pcuenca in #961
Move safety detection to model call in Flax safety checker by @jonatanklosko in #1023
Fix pipelines user_agent, ignore CI requests by @anton-l in #1058
[GitBot] Automatically close issues after inactivitiy by @patrickvonplaten in #1079
Allow safety_checker to be None when using CPU offload by @pcuenca in #1078
k-diffusion-euler by @hlky in #1019
[Better scheduler docs] Improve usage examples of schedulers by @patrickvonplaten in #890
[Tests] Fix slow tests by @patrickvonplaten in #1087
Remove nn sequential by @patrickvonplaten in #1086
Remove some unused parameter in CrossAttnUpBlock2D by @LaurentMazare in #1034
Add imagic to community pipelines by @MarkRich in #958
Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR in #532
[docs] add euler scheduler in docs, how to use differnet schedulers by @patil-suraj in #1089
Integration tests precision improvement for inpainting by @Lewington-pitsos in #1052
lpwstablediffusion: Add iscancelledcallback by @irgolic in #1053
Rename latent by @patrickvonplaten in #1102
fix typo in examples dreambooth README.md by @jorahn in #1073
fix model card url in text inversion readme. by @patil-suraj in #1103
[CI] Framework and hardware-specific CI tests by @anton-l in #997
Fix a small typo of a variable name by @omihub777 in #1063
Fix tests for equivalence of DDIM and DDPM pipelines by @sgrigory in #1069
Fix padding in dreambooth by @shirayu in #1030
[Flax] time embedding by @kashif in #1081
Training to predict x0 in training example by @lukovnikov in #1031
[Loading] Ignore unneeded files by @patrickvonplaten in #1107
Fix hub-dependent tests for PRs by @anton-l in #1119
Allow saving None pipeline components by @anton-l in #1118
feat: add repaint by @Revist in #974
Continuation of #1035 by @pcuenca in #1120
VQ-diffusion by @williamberman in #658

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.6.0: Finetuned Stable Diffusion inpainting

:art: Finetuned Stable Diffusion inpainting

The first official stable diffusion checkpoint fine-tuned on inpainting has been released.

You can try it out in the official demo here

or code it up yourself :computer: :

```python from io import BytesIO

import torch

import PIL import requests from diffusers import StableDiffusionInpaintPipeline

def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB")

imgurl = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpaintingexamples/overture-creations-5sI6fQgYIuo.png" maskurl = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpaintingexamples/overture-creations-5sI6fQgYIuomask.png" image = downloadimage(imgurl).resize((512, 512)) maskimage = downloadimage(maskurl).resize((512, 512))

pipe = StableDiffusionInpaintPipeline.frompretrained( "runwayml/stable-diffusion-inpainting", revision="fp16", torchdtype=torch.float16, ) pipe.to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"

output = pipe(prompt=prompt, image=image, maskimage=maskimage) image = output.images[0] ```

gives:

image | mask_image | prompt | | Output | :-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:| drawing | drawing | Face of a yellow cat, high resolution, sitting on a park bench | => | drawing |

:warning: This release deprecates the unsupervised noising-based inpainting pipeline into StableDiffusionInpaintPipelineLegacy. The new StableDiffusionInpaintPipeline is based on a Stable Diffusion model finetuned for the inpainting task: https://huggingface.co/runwayml/stable-diffusion-inpainting

Note When loading StableDiffusionInpaintPipeline with a non-finetuned model (i.e. the one saved with diffusers<=0.5.1), the pipeline will default to StableDiffusionInpaintPipelineLegacy, to maintain backward compatibility :sparkles:

```python from diffusers import StableDiffusionInpaintPipeline

pipe = StableDiffusionInpaintPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

assert pipe.class .name == "StableDiffusionInpaintPipelineLegacy" ```

Context:

Why this change? When Stable Diffusion came out ~2 months ago, there were many unofficial in-painting demos using the original v1-4 checkpoint ("CompVis/stable-diffusion-v1-4"). These demos worked reasonably well, so that we integrated an experimental StableDiffusionInpaintPipeline class into diffusers. Now that the official inpainting checkpoint was released: https://github.com/runwayml/stable-diffusion we decided to make this our official pipeline and move the old / hacky one to "StableDiffusionInpaintPipelineLegacy".

:rocket: ONNX pipelines for image2image and inpainting

Thanks to the contribution by @zledas (#552) this release supports OnnxStableDiffusionImg2ImgPipeline and OnnxStableDiffusionInpaintPipeline optimized for CPU inference:

```python from diffusers import OnnxStableDiffusionImg2ImgPipeline, OnnxStableDiffusionInpaintPipeline

imgpipeline = OnnxStableDiffusionImg2ImgPipeline.frompretrained( "CompVis/stable-diffusion-v1-4", revision="onnx", provider="CPUExecutionProvider" )

inpaintpipeline = OnnxStableDiffusionInpaintPipeline.frompretrained( "runwayml/stable-diffusion-inpainting", revision="onnx", provider="CPUExecutionProvider" ) ```

:earth_africa: Community Pipelines

Two new community pipelines have been added to diffusers :fire:

Stable Diffusion Interpolation example

Interpolate the latent space of Stable Diffusion between different prompts/seeds. For more info see stable-diffusion-videos.

For a code example, see Stable Diffusion Interpolation

Add Stable Diffusion Interpolation Example by @nateraw in #862

Stable Diffusion Interpolation Mega

One Stable Diffusion Pipeline with all functionalities of Text2Image, Image2Image and Inpainting

For a code example, see Stable Diffusion Mega

All in one Stable Diffusion Pipeline by @patrickvonplaten in #821

:memo: Changelog

[Community] One step unet by @patrickvonplaten in #840
Remove unneeded useauthtoken by @osanseviero in #839
Bump to 0.6.0.dev0 by @anton-l in #831
Remove the last of ["sample"] by @anton-l in #842
Fix Flax pipeline: width and height are ignored #838 by @camenduru in #848
[DeviceMap] Make sure stable diffusion can be loaded from older trans… by @patrickvonplaten in #860
Fix small community pipeline import bug and finish README by @patrickvonplaten in #869
Fix training pushtohub (unconditional image generation): models were not saved before pushing to hub by @pcuenca in #868
Fix table in community README.md by @nateraw in #879
Add generic inference example to community pipeline readme by @apolinario in #874
Rename frame filename in interpolation community example by @nateraw in #881
Add Apple M1 tests by @anton-l in #796
Fix autoencoder test by @pcuenca in #886
Rename StableDiffusionOnnxPipeline -> OnnxStableDiffusionPipeline by @anton-l in #887
Fix DDIM on Windows not using int64 for timesteps by @hafriedlander in #819
[dreambooth] allow fine-tuning text encoder by @patil-suraj in #883
Stable Diffusion image-to-image and inpaint using onnx. by @zledas in #552
Improve ONNX img2img numpy handling, temporarily fix the tests by @anton-l in #899
[Stable Diffusion Inpainting] Deprecate inpainting pipeline in favor of official one by @patrickvonplaten in #903
[Communit Pipeline] Make sure "mega" uses correct inpaint pipeline by @patrickvonplaten in #908
Stable diffusion inpainting by @patil-suraj in #904
ONNX supervised inpainting by @anton-l in #906

- Python
Published by anton-l over 3 years ago

diffusers - v0.5.1: Patch release

This patch release fixes an bug with Flax's NFSW safety checker in the pipeline.

https://github.com/huggingface/diffusers/pull/832 by @patil-suraj

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.5.0: JAX/Flax and TPU support

:earofrice: JAX/Flax integration for super fast Stable Diffusion on TPUs.

We added JAX support for Stable Diffusion! You can now run Stable Diffusion on Colab TPUs (and GPUs too!) for faster inference.

Check out this TPU-ready colab for a Stable Diffusion pipeline: And a detailed blog post on Stable Diffusion and parallelism in JAX / Flax :hugs: https://huggingface.co/blog/stablediffusionjax

The most used models, schedulers and pipelines have been ported to JAX/Flax, namely: - Models: FlaxAutoencoderKL, FlaxUNet2DConditionModel - Schedulers: FlaxDDIMScheduler, FlaxDDIMScheduler, FlaxPNDMScheduler - Pipelines: FlaxStableDiffusionPipeline

Changelog: - Implement FlaxModelMixin #493 by @mishig25 , @patil-suraj, @patrickvonplaten , @pcuenca - Karras VE, DDIM and DDPM flax schedulers #508 by @kashif - initial flax pndm scheduler #492 by @kashif - FlaxDiffusionPipeline & FlaxStableDiffusionPipeline #559 by @mishig25 , @patrickvonplaten , @pcuenca - Flax pipeline pndm #583 by @pcuenca - Add frompt argument in .frompretrained #527 by @younesbelkada - Make flax from_pretrained work with local subfolder #608 by @mishig25

:fire: DeepSpeed low-memory training

Thanks to the :hugs: accelerate integration with DeepSpeed, a few of our training examples became even more optimized in terms of VRAM and speed: * DreamBooth is now trainable on 8GB GPUs thanks to a contribution from @Ttl! Find out how to run it here. * The Text2Image finetuning example is also fully compatible with DeepSpeed.

:pencil2: Changelog

Revert "[v0.4.0] Temporarily remove Flax modules from the public API by @anton-l in #755)"
Fix pushtohub for dreambooth and textual_inversion by @YaYaB in #748
Fix ONNX conversion script opset argument type by @justinchuby in #739
Add final latent slice checks to SD pipeline intermediate state tests by @jamestiotio in #731
fix(DDIM scheduler): use correct dtype for noise by @keturn in #742
[Tests] Fix tests by @patrickvonplaten in #774
debug an exception by @LowinLi in #638
Clean up resnet.py file by @natolambert in #780
add sigmoid betas by @natolambert in #777
[Low CPU memory] + device map by @patrickvonplaten in #772
Fix gradient checkpointing test by @patrickvonplaten in #797
fix typo docstring in unet2d by @natolambert in #798
DreamBooth DeepSpeed support for under 8 GB VRAM training by @Ttl in #735
support bf16 for stable diffusion by @patil-suraj in #792
stable diffusion fine-tuning by @patil-suraj in #356
Flax: Trickle down norm_num_groups by @akash5474 in #789
Eventually preserve this typo? :) by @spezialspezial in #804
Fix indentation in the code example by @osanseviero in #802
[Img2Img] Fix batch size mismatch prompts vs. init images by @patrickvonplaten in #793
Minor package fixes by @anton-l in #809
[Dummy imports] Better error message by @patrickvonplaten in #795
add or fix license formatting in models directory by @natolambert in #808
[train_text2image] Fix EMA and make it compatible with deepspeed. by @patil-suraj in #813
Fix fine-tuning compatibility with deepspeed by @pink-red in #816
Add diffusers version and pipeline class to the Hub UA by @anton-l in #814
[Flax] Add test by @patrickvonplaten in #824
update flax scheduler API by @patil-suraj in #822
Fix dreambooth loss type with prior_preservation and fp16 by @anton-l in #826
Fix type mismatch error, add tests for negative prompts by @anton-l in #823
Give more customizable options for safety checker by @patrickvonplaten in #815
Flax safety checker by @pcuenca in #825
Align PT and Flax API - allow loading checkpoint from PyTorch configs by @patrickvonplaten in #827

- Python
Published by anton-l over 3 years ago

diffusers - v0.4.2: Patch release

This patch release allows the img2img pipeline to be run on fp16 and fixes a bug with the "mps" device.

schedulers hanlde dtype in add_noise by @patil-suraj
img2img, inpainting fix fp16 inference by @patil-suraj
mps: Alternative implementation for repeat_interleave by @pcuenca

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.4.1: Patch release

This patch release fixes an bug with incorrect module naming for community pipelines and an incorrect breaking change when moving piplines in fp16 to "cpu" or "mps".

Change fp16 error to warning by @apolinario
Community Pipeline - Fix module bug & Lower required memory for clip guided by @patrickvonplaten

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.4.0 Better, faster, stronger!

🚗 Faster

We have thoroughly profiled our codebase and applied a number of incremental improvements that, when combined, provide a speed improvement of almost 3x.

On top of that, we now default to using the float16 format. It's much faster than float32 and, according to our tests, produces images with no discernible difference in quality. This beats the use of autocast, so the resulting code is cleaner!

🔑 `use_auth_token` no more

The recently released version of huggingface-hub automatically uses your access token if you are logged in, so you don't need to put it everywhere in your code. All you need to do is authenticate once using huggingface-cli login in your terminal and you're all set.

diff - pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True) + pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

We bumped huggingface-hub version to 0.10.0 in our dependencies to achieve this.

🎈More flexible APIs

Schedulers now use a common, simpler unified API design. This has allowed us to remove many conditionals and special cases in the rest of the code, including the pipelines. This is very important for us and for the users of 🧨 diffusers: we all gain clarity and a solid abstraction for schedulers. See the description in https://github.com/huggingface/diffusers/pull/719 for more details

Please update any custom Stable Diffusion pipelines accordingly: diff - if isinstance(self.scheduler, LMSDiscreteScheduler): - latents = latents * self.scheduler.sigmas[0] + latents = latents * self.scheduler.init_noise_sigma diff - if isinstance(self.scheduler, LMSDiscreteScheduler): - sigma = self.scheduler.sigmas[i] - latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5) + latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) diff - if isinstance(self.scheduler, LMSDiscreteScheduler): - latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs).prev_sample - else: - latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample + latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample

Pipeline callbacks. As a community project (h/t @jamestiotio!), diffusers pipelines can now invoke a callback function during generation, providing the latents at each step of the process. This makes it easier to perform tasks such as visualization, inspection, explainability and others the community may invent.

🛠️ More tasks

Building on top of the previous foundations, this release incorporates several new tasks that have been adapted from research papers or community projects. These include:

Textual inversion. Makes it possible to quickly train a new concept or style and incorporate it into the vocabulary of Stable Diffusion. Hundreds of people have already created theirs, and they can be shared and combined together. See the training Colab to get started.
Dreambooth. Similar goal to textual inversion, but instead of creating a new item in the vocabulary it fine-tunes the model to make it learn a new concept. Training Colab.
Negative prompts. Another community effort led by @shirayu. The Stable Diffusion pipeline can now receive both a positive prompt (the one you want to create), and a negative prompt (something you want to drive the model away from). This opens up a lot of creative possibilities!

🏃‍♀️ Under the hood changes to support better fine-tuning

Gradient checkpointing and 8-bit optimizers have been successfully applied to achieve Dreambooth fine-tuning in a Colab notebook! These updates will make it easier for diffusers to support general-purpose fine-tuning (coming soon!).

⚠️ Experimental: community pipelines

This is big, but it's still an experimental feature that may change in the future.

We are constantly amazed at the amount of imagination and creativity in the diffusers community, so we've made it easy to create custom pipelines and share them with others. You can write your own pipeline code, store it in 🤗 Hub, GitHub or your local filesystem and StableDiffusionPipeline.from_pretrained will be able to load and run it. Read more in the documentation.

We can't wait to see what new tasks the community creates!

💪 Quality of life fixes

Bug fixing, improved documentation, better tests are all important to ensure diffusers is a high-quality codebase, and we always spend a lot of effort working on them. Several first-time contributors have helped here, and we are very grateful for their efforts!

🙏 Significant community contributions

The following people have made significant contributions to the library over the last release:

@Victarry – Add training example for DreamBooth (#554)
@jamestiotio – Add callback parameters for Stable Diffusion pipelines (#521)
@jachiam – Allow resolutions that are not multiples of 64 (#505)
@johnowhitaker – Adding predoriginalsample to SchedulerOutput for some samplers (#614).
@keturn – Interesting discussions and insights on many topics.

✏️ Change list

[Docs] Correct links by @patrickvonplaten in #432
[Black] Update black by @patrickvonplaten in #433
use torch.matmul instead of einsum in attnetion. by @patil-suraj in #445
Renamed variables from single letter to better naming by @daspartho in #449
Docs: fix installation typo by @daspartho in #453
fix table formatting for stable diffusion pipeline doc (add blank line) by @natolambert in #471
update expected results of slow tests by @kashif in #268
[Flax] Make room for more frameworks by @patrickvonplaten in #494
Fix disable_attention_slicing in pipelines by @pcuenca in #498
Rename testscheduleroutputs_equivalence in model tests. by @pcuenca in #451
Scheduler docs update by @natolambert in #464
Fix scheduler inference steps error with power of 3 by @natolambert in #466
initial flax pndm schedular by @kashif in #492
Fix vae tests for cpu and gpu by @kashif in #480
[Docs] Add subfolder docs by @patrickvonplaten in #500
docs: bocken doc links for relative links by @jjmachan in #504
Removing .float() (autocast in fp16 will discard this (I think)). by @Narsil in #495
Fix MPS scheduler indexing when using mps by @pcuenca in #450
[CrossAttention] add different method for sliced attention by @patil-suraj in #446
Implement FlaxModelMixin by @mishig25 in #493
Karras VE, DDIM and DDPM flax schedulers by @kashif in #508
[UNet2DConditionModel, UNet2DModel] pass normnumgroups to all the blocks by @patil-suraj in #442
Add init_weights method to FlaxMixin by @mishig25 in #513
UNet Flax with FlaxModelMixin by @pcuenca in #502
Stable diffusion text2img conversion script. by @patil-suraj in #154
[CI] Add stalebot by @anton-l in #481
Fix isonnxavailable by @SkyTNT in #440
[Tests] Test attention.py by @sidthekidder in #368
Finally fix the image-based SD tests by @anton-l in #509
Remove the usage of numpy in up/down sample_2d by @ydshieh in #503
Fix typos and add Typo check GitHub Action by @shirayu in #483
Quick fix for the img2img tests by @anton-l in #530
[Tests] Fix spatial transformer tests on GPU by @anton-l in #531
[StableDiffusionInpaintPipeline] accept tensors for init and mask image by @patil-suraj in #439
adding more typehints to DDIM scheduler by @vishnu-anirudh in #456
Revert "adding more typehints to DDIM scheduler" by @patrickvonplaten in #533
Add LMSDiscreteSchedulerTest by @sidthekidder in #467
[Download] Smart downloading by @patrickvonplaten in #512
[Hub] Update hub version by @patrickvonplaten in #538
Unify offset configuration in DDIM and PNDM schedulers by @jonatanklosko in #479
[Configuration] Better logging by @patrickvonplaten in #545
make fixup support by @younesbelkada in #546
FlaxUNet2DConditionOutput @flax.struct.dataclass by @mishig25 in #550
[Flax] fix Flax scheduler by @kashif in #564
JAX/Flax safety checker by @pcuenca in #558
Flax: ignore dtype for configuration by @pcuenca in #565
Remove checktfutils to avoid an unnecessary TF import for now by @anton-l in #566
Fix _upsample_2d by @ydshieh in #535
[Flax] Add Vae for Stable Diffusion by @patrickvonplaten in #555
[Flax] Solve problem with VAE by @patrickvonplaten in #574
[Tests] Upload custom test artifacts by @anton-l in #572
[Tests] Mark the ncsnpp model tests as slow by @anton-l in #575
[examples/community] add CLIPGuidedStableDiffusion by @patil-suraj in #561
Fix CrossAttention._sliced_attention by @ydshieh in #563
Fix typos by @shirayu in #568
Add from_pt argument in .from_pretrained by @younesbelkada in #527
[FlaxAutoencoderKL] rename weights to align with PT by @patil-suraj in #584
Fix BaseOutput initialization from dict by @anton-l in #570
Add the K-LMS scheduler to the inpainting pipeline + tests by @anton-l in #587
[flax safety checker] Use FlaxPreTrainedModel for saving/loading by @patil-suraj in #591
FlaxDiffusionPipeline & FlaxStableDiffusionPipeline by @mishig25 in #559
[Flax] Fix unet and ddim scheduler by @patrickvonplaten in #594
Fix params replication when using the dummy checker by @pcuenca in #602
Allow dtype to be specified in Flax pipeline by @pcuenca in #600
Fix flax from_pretrained pytorch weight check by @mishig25 in #603
Mv weights name consts to diffusers.utils by @mishig25 in #605
Replace dropout_prob by dropout in vae by @younesbelkada in #595
Add smoke tests for the training examples by @anton-l in #585
Add torchvision to training deps by @anton-l in #607
Return Flax scheduler state by @pcuenca in #601
[ONNX] Collate the external weights, speed up loading from the hub by @anton-l in #610
docs: fix Berkeley ref by @ryanrussell in #611
Handle the PIL.Image.Resampling deprecation by @anton-l in #588
Make flax from_pretrained work with local subfolder by @mishig25 in #608
[flax] 'dtype' should not be part of self.internaldict by @mishig25 in #609
[UNet2DConditionModel] add gradient checkpointing by @patil-suraj in #461
docs: fix stochastic_karras_ve ref by @ryanrussell in #618
Adding predoriginalsample to SchedulerOutput for some samplers by @johnowhitaker in #614
docs: .md readability fixups by @ryanrussell in #619
Flax documentation by @younesbelkada in #589
fix docs: change sample to images by @AbdullahAlfaraj in #613
refactor: pipelines readability improvements by @ryanrussell in #622
Allow passing session_options for ORT backend by @cloudhan in #620
Fix breaking error: "ort is not defined" by @pcuenca in #626
docs: src/diffusers readability improvements by @ryanrussell in #629
Fix formula for noise levels in Karras scheduler and tests by @sgrigory in #627
[CI] Fix onnxruntime installation order by @anton-l in #633
Warning for too long prompts in DiffusionPipelines (Resolve #447) by @shirayu in #472
Fix docs link to train_unconditional.py by @AbdullahAlfaraj in #642
Remove deprecated torch_device kwarg by @pcuenca in #623
refactor: custom_init_isort readability fixups by @ryanrussell in #631
Remove inappropriate docstrings in LMS docstrings. by @pcuenca in #634
Flax pipeline pndm by @pcuenca in #583
Fix SpatialTransformer by @ydshieh in #578
Add training example for DreamBooth. by @Victarry in #554
[Pytorch] Pytorch only schedulers by @kashif in #534
[examples/dreambooth] don't pass tensor_format to scheduler. by @patil-suraj in #649
[dreambooth] update install section by @patil-suraj in #650
[DDIM, DDPM] fix add_noise by @patil-suraj in #648
[Pytorch] add dep. warning for pytorch schedulers by @kashif in #651
[CLIPGuidedStableDiffusion] remove set_format from pipeline by @patil-suraj in #653
Fix onnx tensor format by @anton-l in #654
Fix main: stable diffusion pipelines cannot be loaded by @pcuenca in #655
Fix the LMS pytorch regression by @anton-l in #664
Added script to save during textual inversion training. Issue 524 by @isamu-isozaki in #645
[CLIPGuidedStableDiffusion] take the correct text embeddings by @patil-suraj in #667
Update index.mdx by @tmabraham in #670
[examples] update transfomers version by @patil-suraj in #665
[gradient checkpointing] lower tolerance for test by @patil-suraj in #652
Flax from_pretrained: clean up mismatched_keys. by @pcuenca in #630
trained_betas ignored in some schedulers by @vishnu-anirudh in #635
Renamed x -> hidden_states in resnet.py by @daspartho in #676
Optimize Stable Diffusion by @NouamaneTazi in #371
Allow resolutions that are not multiples of 64 by @jachiam in #505
refactor: update ldm-bert config.json url closes #675 by @ryanrussell in #680
[docs] fix table in fp16.mdx by @NouamaneTazi in #683
Fix slow tests by @NouamaneTazi in #689
Fix BibText citation by @osanseviero in #693
Add callback parameters for Stable Diffusion pipelines by @jamestiotio in #521
[dreambooth] fix applying clipgradnorm_ by @patil-suraj in #686
Flax: add shape argument to set_timesteps by @pcuenca in #690
Fix type annotations on StableDiffusionPipeline.call by @tasercake in #682
Fix import with Flax but without PyTorch by @pcuenca in #688
[Support PyTorch 1.8] Remove inference mode by @patrickvonplaten in #707
[CI] Speed up slow tests by @anton-l in #708
[Utils] Add deprecate function and move testing_utils under utils by @patrickvonplaten in #659
Checkpoint conversion script from Diffusers => Stable Diffusion (CompVis) by @jachiam in #701
[Docs] fix docstring for issue #709 by @kashif in #710
Update schedulers README.md by @tmabraham in #694
add accelerate to load models with smaller memory footprint by @piEsposito in #361
Fix typos by @shirayu in #718
Add an argument "negative_prompt" by @shirayu in #549
Fix import if PyTorch is not installed by @pcuenca in #715
Remove comments no longer appropriate by @pcuenca in #716
[trainunconditional] fix applying clipgradnorm by @patil-suraj in #721
renamed x to meaningful variable in resnet.py by @i-am-epic in #677
[Tests] Add accelerate to testing by @patrickvonplaten in #729
[dreambooth] Using already created Path in dataset by @DrInfiniteExplorer in #681
Include CLIPTextModel parameters in conversion by @kanewallmann in #695
Avoid negative strides for tensors by @shirayu in #717
[Pytorch] pytorch only timesteps by @kashif in #724
[Scheduler design] The pragmatic approach by @anton-l in #719
Removing autocast for 35-25% speedup. (autocast considered harmful). by @Narsil in #511
No more useauthtoken=True by @patrickvonplaten in #733
remove useauthtoken from remaining places by @patil-suraj in #737
Replace messages that have empty backquotes by @pcuenca in #738
[Docs] Advertise fp16 instead of autocast by @patrickvonplaten in #740
remove useauthtoken from for TI test by @patil-suraj in #747
allow multiple generations per prompt by @patil-suraj in #741
Add back-compatibility to LMS timesteps by @anton-l in #750
update the clip guided PR according to the new API by @patil-suraj in #751
Raise an error when moving an fp16 pipeline to CPU by @anton-l in #749
Better steps deprecation for LMS by @anton-l in #753

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.3.0: New API, Stable Diffusion pipelines, low-memory inference, MPS backend, ONNX

:books: Shiny new docs!

Thanks to the community efforts for [Docs] and [Type Hints] we've started populating the Diffusers documentation pages with lots of helpful guides, links and API references.

:memo: New API & breaking changes

New API

Pipeline, Model, and Scheduler outputs can now be both dataclasses, Dicts, and Tuples:

python image = pipe("The red cat is sitting on a chair")["sample"][0]

is now replaced by:

```python image = pipe("The red cat is sitting on a chair").images[0]

or

image = pipe("The red cat is sitting on a chair")["image"][0]

or

image = pipe("The red cat is sitting on a chair")[0] ```

Similarly:

python sample = unet(...).sample and

python prev_sample = scheduler(...).prev_sample

is now possible!

🚨🚨🚨 Breaking change 🚨🚨🚨

This PR introduces breaking changes for the following public-facing methods:

VQModel.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latents = model.encode(...) to latents = model.encode(...)[0] or latents = model.encode(...).latens
VQModel.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sample
VQModel.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sample
AutoencoderKL.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latent_dist = model.encode(...) to latent_dist = model.encode(...)[0] or latent_dist = model.encode(...).latent_dist
AutoencoderKL.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sample
AutoencoderKL.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sample

:art: New Stable Diffusion pipelines

A couple of new pipelines have been added to Diffusers! We invite you to experiment with them, and to take them as inspiration to create your cool new tasks. These are the new pipelines:

Image-to-image generation. In addition to using a text prompt, this pipeline lets you include an example image to be used as the initial state of the process. 🤗 Diffuse the Rest is a cool demo about it!
Inpainting (experimental). You can provide an image and a mask and ask Stable Diffusion to replace the mask.

For more details about how they work, please visit our new API documentation.

This is a summary of all the Stable Diffusion tasks that can be easily used with 🤗 Diffusers:

| Pipeline | Tasks | Colab | Demo |---|---|:---:|:---:| | pipelinestablediffusion.py | Text-to-Image Generation | | 🤗 Stable Diffusion | pipelinestablediffusion_img2img.py | Image-to-Image Text-Guided Generation | | 🤗 Diffuse the Rest | pipelinestablediffusion_inpaint.py | Experimental – Text-Guided Image Inpainting | | Coming soon

:candy: Less memory usage for smaller GPUs

Now the diffusion models can take up significantly less VRAM (3.2 GB for Stable Diffusion) at the expense of 10% of speed thanks to the optimizations discussed in https://github.com/basujindal/stable-diffusion/pull/117.

To make use of the attention optimization, just enable it with .enable_attention_slicing() after loading the pipeline: ```python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torchdtype=torch.float16, useauthtoken=True ) pipe = pipe.to("cuda") pipe.enableattentionslicing() ```

This will allow many more users to play with Stable Diffusion in their own computers! We can't wait to see what new ideas and results will be created by the community!

:black_cat: Textual Inversion

Textual Inversion lets you personalize a Stable Diffusion model on your own images with just 3-5 samples.

GitHub: https://github.com/huggingface/diffusers/tree/main/examples/textualinversion Training: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sdtextualinversiontraining.ipynb Inference: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stableconceptualizerinference.ipynb

:apple: MPS backend for Apple Silicon

🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch mps device. You need to install PyTorch Preview (Nightly) on a Mac with M1 or M2 CPU, and then use the pipeline as usual:

```python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-4", useauth_token=True) pipe = pipe.to("mps")

prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] ```

We are seeing great speedups (31s vs 214s in a M1 Max), but there are still a couple of limitations. We encourage you to read the documentation for the details.

:factory: Experimental ONNX exporter and pipeline for Stable Diffusion

We introduce a new (and experimental) Stable Diffusion pipeline compatible with the ONNX Runtime. This allows you to run Stable Diffusion on any hardware that supports ONNX (including a significant speedup on CPUs).

You need to use StableDiffusionOnnxPipeline instead of StableDiffusionPipeline. You also need to download the weights from the onnx branch of the repository, and indicate the runtime provider you want to use (CPU, in the following example):

```python from diffusers import StableDiffusionOnnxPipeline

pipe = StableDiffusionOnnxPipeline.frompretrained( "CompVis/stable-diffusion-v1-4", revision="onnx", provider="CPUExecutionProvider", useauth_token=True, )

prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] ```

:warning: Warning: the script above takes a long time to download the external ONNX weights, so it will be faster to convert the checkpoint yourself (see below).

To convert your own checkpoint, run the conversion script locally: bash python scripts/convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx" After that it can be loaded from the local path: python pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="CPUExecutionProvider")

Improvements and bugfixes

Mark in painting experimental by @patrickvonplaten in #430
Add config docs by @patrickvonplaten in #429
[Docs] Models by @kashif in #416
[Docs] Using diffusers by @patrickvonplaten in #428
[Outputs] Improve syntax by @patrickvonplaten in #423
Initial ONNX doc (TODO: Installation) by @pcuenca in #426
[Tests] Correct image folder tests by @patrickvonplaten in #427
[MPS] Make sure it doesn't break torch < 1.12 by @patrickvonplaten in #425
[ONNX] Stable Diffusion exporter and pipeline by @anton-l in #399
[Tests] Make image-based SD tests reproducible with fixed datasets by @anton-l in #424
[Docs] Outputs.mdx by @patrickvonplaten in #422
[Docs] Fix scheduler docs by @patrickvonplaten in #421
[Docs] DiffusionPipeline by @patrickvonplaten in #418
Improve unconditional diffusers example by @satpalsr in #414
Improve latent diff example by @satpalsr in #413
Inference support for mps device by @pcuenca in #355
[Docs] Minor fixes in optimization section by @patrickvonplaten in #420
[Docs] Pipelines for inference by @satpalsr in #417
[Docs] Training docs by @patrickvonplaten in #415
Docs: fp16 page by @pcuenca in #404
Add typing to schedulingsdeve: init, settimesteps, and setsigmas function definitions by @danielpatrickhug in #412
Docs fix some typos by @natolambert in #408
[docs sprint] schedulers docs, will update by @natolambert in #376
Docs: fix undefined in toctree by @natolambert in #406
Attention slicing by @patrickvonplaten in #407
Rename variables from single letter to meaningful name fix by @rashmimarganiatgithub in #395
Docs: Stable Diffusion pipeline by @pcuenca in #386
Small changes to Philosophy by @pcuenca in #403
karras-ve docs by @kashif in #401
Score sde ve doc by @kashif in #400
[Docs] Finish Intro Section by @patrickvonplaten in #402
[Docs] Quicktour by @patrickvonplaten in #397
ddim docs by @kashif in #396
Docs: optimization / special hardware by @pcuenca in #390
added pndm docs by @kashif in #391
Update text_inversion.mdx by @johnowhitaker in #393
[Docs] Logging by @patrickvonplaten in #394
[Pipeline Docs] ddpm docs for sprint by @kashif in #382
[Pipeline Docs] Unconditional Latent Diffusion by @satpalsr in #388
Docs: Conceptual section by @pcuenca in #392
[Pipeline Docs] Latent Diffusion by @patrickvonplaten in #377
[textual-inversion] fix saving embeds by @patil-suraj in #387
[Docs] Let's go by @patrickvonplaten in #385
Add colab links to textual inversion by @apolinario in #375
Efficient Attention by @patrickvonplaten in #366
Use expand instead of ones to broadcast tensor by @pcuenca in #373
[Tests] Fix SD slow tests by @anton-l in #364
[Type Hint] VAE models by @daspartho in #365
[Type hint] scheduling lms discrete by @santiviquez in #360
[Type hint] scheduling karras ve by @santiviquez in #359
type hints: models/vae.py by @shepherd1530 in #346
[Type Hints] DDIM pipelines by @sidthekidder in #345
[ModelOutputs] Replace dict outputs with Dict/Dataclass and allow to return tuples by @patrickvonplaten in #334
package version on main should have .dev0 suffix by @mishig25 in #354
[textualinversion] use tokenizer.addtokens to add placeholder_token by @patil-suraj in #357
[Type hint] scheduling ddim by @santiviquez in #343
[Type Hints] VAE models by @daspartho in #344
[Type Hint] DDPM schedulers by @daspartho in #349
[Type hint] PNDM schedulers by @daspartho in #335
Fix typo in unet_blocks.py by @da03 in #353
[Commands] Add env command by @patrickvonplaten in #352
Add transformers and scipy to dependency table by @patrickvonplaten in #348
[Type Hint] Unet Models by @sidthekidder in #330
[Img2Img2] Re-add K LMS scheduler by @patrickvonplaten in #340
Use ONNX / Core ML compatible method to broadcast by @pcuenca in #310
[Type hint] PNDM pipeline by @daspartho in #327
[Type hint] Latent Diffusion Uncond pipeline by @santiviquez in #333
Add contributions to README and re-order a bit by @patrickvonplaten in #316
[CI] try to fix GPU OOMs between tests and excessive tqdm logging by @anton-l in #323
README: stable diffusion version v1-3 -> v1-4 by @pcuenca in #331
Textual inversion by @patil-suraj in #266
[Type hint] Score SDE VE pipeline by @santiviquez in #325
[CI] Cancel pending jobs for PRs on new commits by @anton-l in #324
[train_unconditional] fix gradient accumulation. by @patil-suraj in #308
Fix nondeterministic tests for GPU runs by @anton-l in #314
Improve README to show how to use SD without an access token by @patrickvonplaten in #315
Fix flake8 F401 imported but unused by @anton-l in #317
Allow downloading of revisions for models. by @okalldal in #303
Fix more links by @python273 in #312
Changed variable name from "h" to "hidden_states" by @JC-swEng in #285
Fix stable-diffusion-seeds.ipynb link by @python273 in #309
[Tests] Add fast pipeline tests by @patrickvonplaten in #302
Improve README by @patrickvonplaten in #301
[Refactor] Remove set_seed by @patrickvonplaten in #289
[Stable Diffusion] Hotfix by @patrickvonplaten in #299
Check dummy file by @patrickvonplaten in #297
Add missing auth tokens for two SD tests by @anton-l in #296
Fix GPU tests (token + single-process) by @anton-l in #294
[PNDM Scheduler] format timesteps attrs to np arrays by @NouamaneTazi in #273
Fix link by @python273 in #286
[Type hint] Karras VE pipeline by @patrickvonplaten in #288
Add datasets + transformers + scipy to test deps by @anton-l in #279
Easily understandable error if inference steps not set before using scheduler by @samedii in #263)
[Docs] Add some guides by @patrickvonplaten in #276
[README] Add readme for SD by @patrickvonplaten in #274
Refactor Pipelines / Community pipelines and add better explanations. by @patrickvonplaten in #257
Refactor progress bar by @hysts in #242
Support K-LMS in img2img by @anton-l in #270
[BugFix]: Fixed add_noise in LMSDiscreteScheduler by @nicolas-dufour in #253
[Tests] Make sure tests are on GPU by @patrickvonplaten in #269
Adds missing torch imports to inpainting and imagetoimage example by @PulkitMishra in #265
Fix typo in README.md by @webel in #260
Fix inpainting script by @patil-suraj in #258
Initialize CI for code quality and testing by @anton-l in #256
add inpainting example script by @nagolinc in #241
Update README.md with examples by @natolambert in #252
Reproducible images by supplying latents to pipeline by @pcuenca in #247
Style the scripts directory by @anton-l in #250
Pin black==22.3 to keep a stable --preview flag by @anton-l in #249
[Clean up] Clean unused code by @patrickvonplaten in #245
added test workflow and fixed failing test by @kashif in #237
split testsmodelingutils by @kashif in #223
[example/image2image] raise error if strength is not in desired range by @patil-suraj in #238
Add image2image example script. by @patil-suraj in #231
Remove dead code in resnet.py by @ydshieh in #218

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@kashif
- [Docs] Models (#416)
- karras-ve docs (#401)
- Score sde ve doc (#400)
- ddim docs (#396)
- added pndm docs (#391)
- [Pipeline Docs] ddpm docs for sprint (#382)
- added test workflow and fixed failing test (#237)
- split testsmodelingutils (#223)

- Python
Published by anton-l over 3 years ago

diffusers - v0.2.4: Patch release

This patch release allows the Stable Diffusion pipelines to be loaded with float16 precision: python pipe = StableDiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ) pipe = pipe.to("cuda")

The resulting models take up less than 6900 MiB of GPU memory.

[Loading] allow modules to be loaded in fp16 by @patrickvonplaten in #230

- Python
Published by anton-l almost 4 years ago

diffusers - v0.2.3: Stable Diffusion public release

:art: Stable Diffusion public release

The Stable Diffusion checkpoints are now public and can be loaded by anyone! :partying_face:

Make sure to accept the license terms on the model page first (requires login): https://huggingface.co/CompVis/stable-diffusion-v1-4 Install the required packages: pip install diffusers==0.2.3 transformers scipy And log in on your machine using the huggingface-cli login command.

```python from torch import autocast from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

this will substitute the default PNDM scheduler for K-LMS

lms = LMSDiscreteScheduler( betastart=0.00085, betaend=0.012, betaschedule="scaledlinear" )

pipe = StableDiffusionPipeline.frompretrained( "CompVis/stable-diffusion-v1-4", scheduler=lms, useauth_token=True ).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars" with autocast("cuda"): image = pipe(prompt)["sample"][0]

image.save("astronautrideshorse.png") ```

The safety checker

Following the model authors' guidelines and code, the Stable Diffusion inference results will now be filtered to exclude unsafe content. Any images classified as unsafe will be returned as blank. To check if the safety module is triggered programmaticaly, check the nsfw_content_detected flag like so:

python outputs = pipe(prompt) image = outputs if any(outputs["nsfw_content_detected"]): print("Potential unsafe content was detected in one or more images. Try again with a different prompt and/or seed.")

Improvements and bugfixes

add add_noise method in LMSDiscreteScheduler, PNDMScheduler by @patil-suraj in #227
hotfix for pdnm test by @natolambert in #220
Restore is_modelcards_available in .utils by @pcuenca in #224
Update README for 0.2.3 release by @pcuenca in #225
Pipeline to device by @pcuenca in #210
fix safety check by @patil-suraj in #217
Add safety module by @patil-suraj in #213
Support one-string prompts and custom image size in LDM by @anton-l in #212
Add is_torch_available, is_flax_available by @anton-l in #204
Revive make quality by @anton-l in #203
[StableDiffusionPipeline] use default params in call by @patil-suraj in #196
fix testfrompretrainedhubpass_model by @patil-suraj in #194
Match params with official Stable Diffusion lib by @apolinario in #192

Full Changelog: https://github.com/huggingface/diffusers/compare/v0.2.2...v0.2.3

- Python
Published by anton-l almost 4 years ago

diffusers - v0.2.2

This patch release fixes an import of the StableDiffusionPipeline

[K-LMS Scheduler] fix import by @patrickvonplaten in #191

- Python
Published by patrickvonplaten almost 4 years ago

diffusers - v0.2.1 Patch release

This patch release fixes a small bug of the StableDiffusionPipeline

[Stable diffusion] Hot fix by @patrickvonplaten in 50a9ae

- Python
Published by patrickvonplaten almost 4 years ago

diffusers - v0.2.0: Stable Diffusion early access, K-LMS sampling

Stable Diffusion

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It's trained on 512x512 images from a subset of the LAION-5B database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the model card for more information.

The Stable Diffusion weights are currently only available to universities, academics, research institutions and independent researchers. Please request access applying to this form

```python from torch import autocast from diffusers import StableDiffusionPipeline

make sure you're logged in with `huggingface-cli login`

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-3-diffusers", useauth_token=True)

prompt = "a photograph of an astronaut riding a horse" with autocast("cuda"): image = pipe(prompt, guidance_scale=7)["sample"][0] # image here is in PIL format

image.save(f"astronautrideshorse.png") ```

K-LMS sampling

The new LMSDiscreteScheduler is a port of k-lms from k-diffusion by Katherine Crowson. The scheduler can be easily swapped into existing pipelines like so:

```python from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

model_id = "CompVis/stable-diffusion-v1-3-diffusers"

Use the K-LMS scheduler here instead

scheduler = LMSDiscreteScheduler(betastart=0.00085, betaend=0.012, betaschedule="scaledlinear", numtraintimesteps=1000) pipe = StableDiffusionPipeline.frompretrained(modelid, scheduler=scheduler, useauthtoken=True) ```

Integration test with text-to-image script of Stable-Diffusion

182 and #186 make sure that DDIM and PNDM/PLMS scheduler yield 1-to-1 the same results as stable diffusion.

Try it out yourself:

In Stable-Diffusion:

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code --plms or python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code

In diffusers:

```py from diffusers import StableDiffusionPipeline, DDIMScheduler from time import time from PIL import Image from einops import rearrange import numpy as np import torch from torch import autocast from torchvision.utils import make_grid

torch.manual_seed(42)

prompt = "a photograph of an astronaut riding a horse"

prompt = "a photograph of the eiffel tower on the moon"

prompt = "an oil painting of a futuristic forest gives"

uncomment to use DDIM

scheduler = DDIMScheduler(betastart=0.00085, betaend=0.012, betaschedule="scaledlinear", clipsample=False, setalphatoone=False)

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-3-diffusers", useauth_token=True, scheduler=scheduler) # make sure you're logged in with `huggingface-cli login`

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-3-diffusers", useauth_token=True) # make sure you're logged in with huggingface-cli login

allimages = [] numrows = 1 numcolumns = 4 for _ in range(numrows): with autocast("cuda"): images = pipe(numcolumns * [prompt], guidancescale=7.5, outputtype="np")["sample"] # image here is in PIL format allimages.append(torch.from_numpy(images))

additionally, save as grid

grid = torch.stack(allimages, 0) grid = rearrange(grid, 'n b h w c -> (n b) h w c') grid = rearrange(grid, 'n h w c -> n c h w') grid = makegrid(grid, nrow=num_rows)

to image

grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy() image = Image.fromarray(grid.astype(np.uint8))

image.save(f"./images/diffusers/{''.join(prompt.split())}{round(time())}.png") ```

Improvements and bugfixes

Allow passing non-default modules to pipeline by @pcuenca in #188
Add K-LMS scheduler from k-diffusion by @anton-l in #185
[Naming] correct config naming of DDIM pipeline by @patrickvonplaten in #187
[PNDM] Stable diffusion by @patrickvonplaten in #186
[Half precision] Make sure half-precision is correct by @patrickvonplaten in #182
allow custom height, width in StableDiffusionPipeline by @patil-suraj in #179
add tests for stable diffusion pipeline by @patil-suraj in #178
Stable diffusion pipeline by @patil-suraj in #168
[LDM pipeline] fix eta condition. by @patil-suraj in #171
[PNDM in LDM pipeline] use inspect in pipeline instead of unused kwargs by @patil-suraj in #167
allow pndm scheduler to be used with ldm pipeline by @patil-suraj in #165
add scaled_linear schedule in PNDM and DDPM by @patil-suraj in #164
add attention up/down blocks for VAE by @patil-suraj in #161
Add an alternative Karras et al. stochastic scheduler for VE models by @anton-l in #160
[LDMTextToImagePipeline] make text model generic by @patil-suraj in #162
Minor typos by @pcuenca in #159
Fix arg key for dataset_name in create_model_card by @pcuenca in #158
[VAE] fix the downsample block in Encoder. by @patil-suraj in #156
[UNet2DConditionModel] add crossattentiondim as an argument by @patil-suraj in #155
Added diffusers to conda-forge and updated README for installation instruction by @sugatoray in #129
Add issue templates for feature requests and bug reports by @osanseviero in #153
Support training with a local image folder by @anton-l in #152
Allow DDPM scheduler to use model's predicated variance by @eyalmazuz in #132

Full Changelog: https://github.com/huggingface/diffusers/compare/0.1.3...v0.2.0

- Python
Published by anton-l almost 4 years ago

diffusers - 0.1.3 Patch release

This patch releases refactors the model architecture of VQModel or AutoencoderKL including the weight naming. Therefore the official weights of the CompVis organization have been re-uploaded, see: - https://huggingface.co/CompVis/ldm-celebahq-256/commit/63b33cf3bbdd833de32080a8ba55ba4d0b111859 - https://huggingface.co/CompVis/ldm-celebahq-256/commit/03978f22272a3c2502da709c3940e227c9714bdd - https://huggingface.co/CompVis/ldm-text2im-large-256/commit/31ff4edafd3ee09656d2068d05a4d5338129d592 - https://huggingface.co/CompVis/ldm-text2im-large-256/commit/9bd2b48d2d45e6deb6fb5a03eb2a601e4b95bd91

Corresponding PR: https://github.com/huggingface/diffusers/pull/137

Please make sure to upgrade diffusers to have those models running correctly: pip install --upgrade diffusers

Bug fixes

Fix FileNotFoundError: 'model_card_template.md' https://github.com/huggingface/diffusers/pull/136

- Python
Published by patrickvonplaten almost 4 years ago

diffusers - Initial release of 🧨 Diffusers

These are the release notes of the 🧨 Diffusers library

Introducing Hugging Face's new library for diffusion models.

Diffusion models proved themselves very effective in artificial synthesis, even beating GANs for images. Because of that, they gained traction in the machine learning community and play an important role for systems like DALL-E 2 or Imagen to generate photorealistic images when prompted on text.

While the most prolific successes of diffusion models have been in the computer vision community, these models have also achieved remarkable results in other domains, such as:

and more.

Goals

The goals of diffusers are:

to centralize the research of diffusion models from independent repositories to a clear and maintained project,
to reproduce high impact machine learning systems such as DALLE and Imagen in a manner that is accessible for the public, and
to create an easy to use API that enables one to train their own models or re-use checkpoints from other repositories for inference.

Release overview

Quickstart: - For a light walk-through of the library, please have a look at the Official 🧨 Diffusers Notebook. - To directly jump into training a diffusion model yourself, please have a look at the Training Diffusers Notebook

Diffusers aims to be a modular toolbox for diffusion techniques, with a focus the following categories:

:bullettrain_side: Inference pipelines

Inference pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box. The goal is for them to stick as close as possible to their original implementation, and they can include components of other libraries (such as text encoders).

The original release contains the following pipelines:

DDPM for unconditional image generation with discrete scheduling in pipeline_ddpm.
DDIM for unconditional image generation with discrete scheduling in pipeline_ddim.
PNDM for unconditional image generation with discrete scheduling in pipeline_pndm.
Stochastic Differential Equations for unconditional image generation with continuous scheduling in scoresdeve
Latent diffusion for text to image generation / conditional image generation in pipelinelatentdiffusion as well as for unconditional image generation in latentdiffusionuncond

We are currently working on enabling other pipelines for different modalities. The following pipelines are expected to land in a subsequent release:

BDDMPipeline for spectrogram-to-sound vocoding
GLIDEPipeline to support OpenAI's GLIDE model
Grad-TTS for text to audio generation / conditional audio generation
A reinforcement learning pipeline (happening in https://github.com/huggingface/diffusers/pull/105)

:alarm_clock: Schedulers

Schedulers are the algorithms to use diffusion models in inference as well as for training. They include the noise schedules and define algorithm-specific diffusion steps.
Schedulers can be used interchangable between diffusion models in inference to find the preferred tradef-off between speed and generation quality.
Schedulers are available in numpy, but can easily be transformed into PyTorch.

The goal is for each scheduler to provide one or more step() functions that should be called iteratively to unroll the diffusion loop during the forward pass. They are framework agnostic, but offer conversion methods which should allow easy conversion to PyTorch utilities.

The initial release contains the following schedulers:

DDIM, from the Denoising Diffusion Implicit Models paper.
DDPM, from the Denoising Diffusion Probabilistic Models paper.
PNDM, from the Pseudo Numerical Methods for Diffusion Models on Manifolds paper
SDE_VE, from the Score-Based Generative Modeling through Stochastic Differential Equations paper.

:factory: Models

Models are hosted in the src/diffusers/models folder.

For the initial release, you'll get to see a few building blocks, as well as some resulting models:

UNet2DModel can be seen as a version of the recent UNet architectures as shown in recent papers. It can be seen as the unconditional version of the UNet model, in opposition to the conditional version that follows below.
UNet2DConditionModel is similar to the UNet2DModel, but is conditional: it uses the cross-attention mechanism in order to have skip connections in its downsample and upsample layers. These cross-attentions can be fed by other models. An example of a pipeline using a conditional UNet model is the latent diffusion pipeline.
AutoencoderKL and VQModel are still experimental models that are prone to breaking changes in the near future. However, they can already be used as part of the Latent Diffusion pipelines.

:pagewithcurl: Training example

The first release contains a dataset-agnostic unconditional example and a training notebook:

The train_unconditional.py example, which trains a DDPM UNet model on a dataset of your choice.
More examples can be found under the Hugging Face Diffusers Notebooks

Credits

This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:

@CompVis' latent diffusion models library, available here
@hojonathanho original DDPM implementation, available here as well as the extremely useful translation into PyTorch by @pesser, available here
@ermongroup's DDIM implementation, available here.
@yang-song's Score-VE and Score-VP implementations, available here

We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available here.

- Python
Published by LysandreJik almost 4 years ago

Recent Releases of diffusers

diffusers - v0.35.1 for improvements in Qwen-Image Edit

diffusers - Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more

New pipelines 🧨

Wan 2.2 📹

Flux-Kontext 🎇

Qwen-Image 🌅

New training scripts 🎛️

Single-file modeling implementations

Attention refactor

Regional compilation

Faster pipeline loading ⚡️

rest of the loading code

Better GGUF integration

Modular Diffusers (Experimental)

All commits

Significant community contributions

diffusers - Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more

📹 New video generation pipelines

Wan VACE

Cosmos Predict2 Video2World

LTX 0.9.7 and Distilled

Hunyuan Video Framepack and F1

FusionX

AccVideo and CausVid (only LoRAs)

🌠 New image generation pipelines

Cosmos Predict2 Text2Image

Chroma

VisualCloze

Better torch.compile support

PipelineQuantizationConfig

Group offloading with disk

LoRA metadata parsing

New training scripts

Updates on educational materials on quantization

All commits

Significant community contributions

diffusers - v0.33.1: fix ftfy import

All commits

diffusers - Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more

New Pipelines for Video Generation

Wan 2.1

LTX Video 0.9.5

Hunyuan Image to Video

Others

New Pipelines for Image Generation

Sana-Sprint

Lumina2

Omnigen

Others

New Memory Optimizations

Layerwise Casting

Group Offloading

Remote Components

Introducing Cached Inference for DiTs

Quantization

Quanto Backend

Improved loading for uintx TorchAO checkpoints with torch>=2.6

LoRAs

dtype Maps for Pipelines

AutoModel

All commits

Significant community contributions

diffusers - v0.32.2

Fixes for Flux Single File loading, LoRA loading for 4bit BnB Flux, Hunyuan Video

All commits

diffusers - v0.32.1

TorchAO Quantizer fixes

All commits

diffusers - Diffusers 0.32.0: New video pipelines, new image pipelines, new quantization backends, new training scripts, and more

New Video Generation Pipelines 📹

New Image Generation Pipelines

Acknowledgements

New Quantization Backends

New training scripts

All commits

Significant community contributions

diffusers - v0.31.0

v0.31.0: Stable Diffusion 3.5 Large, CogView3, Quantization, Training Scripts, and more

Stable Diffusion 3.5 Large

Better `torch.compile` support

Improved loading for `uintx` TorchAO checkpoints with `torch>=2.6`

`dtype` Maps for Pipelines

If enabling CPU offloading, remember to remove `pipe.to("cuda")` above

🌀 Massive Refactor of `from_single_file` 🌀

`device_map` in Pipelines 🧪

`VideoProcessor` Class