Recent Releases of diffusers

diffusers - v0.35.1 for improvements in Qwen-Image Edit

Thanks to @naykun for the following PRs that improve Qwen-Image Edit:

  • https://github.com/huggingface/diffusers/pull/12188
  • https://github.com/huggingface/diffusers/pull/12190

- Python
Published by sayakpaul 6 months ago

diffusers - Diffusers 0.35.0: Qwen Image pipelines, Flux Kontext, Wan 2.2, and more

This release comes packed with new image generation and editing pipelines, a new video pipeline, new training scripts, quality-of-life improvements, and much more. Read the rest of the release notes fully to not miss out on the fun stuff.

New pipelines 🧨

We welcomed new pipelines in this release:

  • Wan 2.2
  • Flux-Kontext
  • Qwen-Image
  • Qwen-Image-Edit

Wan 2.2 📹

This update to Wan provides significant improvements in video fidelity, prompt adherence, and style. Please check out the official doc to learn more.

Flux-Kontext 🎇

Flux-Kontext is a 12-billion-parameter rectified flow transformer capable of editing images based on text instructions. Please check out the official doc to learn more about it.

Qwen-Image 🌅

After a successful run of delivering language models and vision-language models, the Qwen team is back with an image generation model, which is Apache-2.0 licensed! It achieves significant advances in complex text rendering and precise image editing. To learn more about this powerful model, refer to our docs.

Thanks to @naykun for contributing both Qwen-Image and Qwen-Image-Edit via this PR and this PR.

New training scripts 🎛️

Make these newly added models your own with our training scripts:

Single-file modeling implementations

Following the 🤗 Transformers’ philosophy of single-file modeling implementations, we have started implementing modeling code in single and self-contained files. The Flux Transformer code is one example of this.

Attention refactor

We have massively refactored how we do attention in the models. This allows us to provide support for different attention backends (such as PyTorch native scaled_dot_product_attention, Flash Attention 3, SAGE attention, etc.) in the library seamlessly.

Having attention supported this way also allows us to integrate different parallelization mechanisms, which we’re actively working on. Follow this PR if you’re interested.

Users shouldn’t be affected at all by these changes. Please open an issue if you face any problems.

Regional compilation

Regional compilation trims cold-start latency by only compiling the small and frequently-repeated block(s) of a model - typically a transformer layer - and enables reusing compiled artifacts for every subsequent occurrence. For many diffusion architectures, this delivers the same runtime speedups as full-graph compilation and reduces compile time by 8–10x. Refer to this doc to learn more.

Thanks to @anijain2305 for contributing this feature in this PR.

We have also authored a number of posts that center around the use of torch.compile. You can check them out at the links below:

Faster pipeline loading ⚡️

Users can now load pipelines directly on an accelerator device leading to significantly faster load times. This particularly becomes evident when loading large pipelines like Wan and Qwen-Image.

```diff from diffusers import DiffusionPipeline import torch

ckptid = "Qwen/Qwen-Image" pipe = DiffusionPipeline.frompretrained( - ckptid, torchdtype=torch.bfloat16 - ).to("cuda") + ckptid, torchdtype=torch.bfloat16, device_map="cuda" + )
```

You can speed up loading even more by enabling parallelized loading of state dict shards. This is particularly helpful when you’re working with large models like Wan and Qwen-Image, where the model state dicts are typically sharded across multiple files.

```python import os os.environ["HFENABLEPARALLEL_LOADING"] = "yes"

rest of the loading code

.... ```

Better GGUF integration

@Isotr0py contributed support for native GGUF CUDA kernels in this PR. This should provide an approximately 10% improvement in inference speed.

We have also worked on a tool for converting regular checkpoints to GGUF, letting the community easily share their GGUF checkpoints. Learn more here.

We now support loading of Diffusers format GGUF checkpoints.

You can learn more about all of this in our GGUF official docs.

Modular Diffusers (Experimental)

Modular Diffusers is a system for building diffusion pipelines pipelines with individual pipeline blocks. It is highly customisable, with blocks that can be mixed and matched to adapt to or create a pipeline for a specific workflow or multiple workflows.

The API is currently in active development and is being released as an experimental feature. Learn more in our docs.

All commits

  • [tests] skip instead of returning. by @sayakpaul in #11793
  • adjust to get CI test cases passed on XPU by @kaixuanliu in #11759
  • fix deprecation in lora after 0.34.0 release by @sayakpaul in #11802
  • [chore] post release v0.34.0 by @sayakpaul in #11800
  • Follow up for Group Offload to Disk by @DN6 in #11760
  • [rfc][compile] compile method for DiffusionPipeline by @anijain2305 in #11705
  • [tests] add a test on torch compile for varied resolutions by @sayakpaul in #11776
  • adjust tolerance criteria for test_float16_inference in unit test by @kaixuanliu in #11809
  • Flux Kontext by @a-r-r-o-w in #11812
  • Kontext training by @sayakpaul in #11813
  • Kontext fixes by @a-r-r-o-w in #11815
  • remove syncs before denoising in Kontext by @sayakpaul in #11818
  • [CI] disable onnx, mps, flax from the CI by @sayakpaul in #11803
  • TorchAO compile + offloading tests by @a-r-r-o-w in #11697
  • Support dynamically loading/unloading loras with group offloading by @a-r-r-o-w in #11804
  • [lora] fix: lora unloading behvaiour by @sayakpaul in #11822
  • [lora]feat: use exclude modules to loraconfig. by @sayakpaul in #11806
  • ENH: Improve speed of function expanding LoRA scales by @BenjaminBossan in #11834
  • Remove print statement in SCM Scheduler by @a-r-r-o-w in #11836
  • [tests] add test for hotswapping + compilation on resolution changes by @sayakpaul in #11825
  • reset deterministic in tearDownClass by @jiqing-feng in #11785
  • [tests] Fix failing float16 cuda tests by @a-r-r-o-w in #11835
  • [single file] Cosmos by @a-r-r-o-w in #11801
  • [docs] fix single_file example. by @sayakpaul in #11847
  • Use real-valued instead of complex tensors in Wan2.1 RoPE by @mjkvaak-amd in #11649
  • [docs] Batch generation by @stevhliu in #11841
  • [docs] Deprecated pipelines by @stevhliu in #11838
  • fix norm not training in traincontrollora_flux.py by @Luo-Yihang in #11832
  • [From Single File] support from_single_file method for WanVACE3DTransformer by @J4BEZ in #11807
  • [lora] tests for exclude_modules with Wan VACE by @sayakpaul in #11843
  • update: FluxKontextInpaintPipeline support by @vuongminh1907 in #11820
  • [Flux Kontext] Support Fal Kontext LoRA by @linoytsaban in #11823
  • [docs] Add a note of _keep_in_fp32_modules by @a-r-r-o-w in #11851
  • [benchmarks] overhaul benchmarks by @sayakpaul in #11565
  • FIX setloradevice when target layers differ by @BenjaminBossan in #11844
  • Fix Wan AccVideo/CausVid fuse_lora by @a-r-r-o-w in #11856
  • [chore] deprecate blip controlnet pipeline. by @sayakpaul in #11877
  • [docs] fix references in flux pipelines. by @sayakpaul in #11857
  • [tests] remove tests for deprecated pipelines. by @sayakpaul in #11879
  • [docs] LoRA metadata by @stevhliu in #11848
  • [training ] add Kontext i2i training by @sayakpaul in #11858
  • [CI] Fix big GPU test marker by @DN6 in #11786
  • First Block Cache by @a-r-r-o-w in #11180
  • [tests] annotate compilation test classes with bnb by @sayakpaul in #11715
  • Update chroma.md by @shm4r7 in #11891
  • [CI] Speed up GPU PR Tests by @DN6 in #11887
  • Pin k-diffusion for CI by @sayakpaul in #11894
  • [Docker] update doc builder dockerfile to include quant libs. by @sayakpaul in #11728
  • [tests] Remove more deprecated tests by @sayakpaul in #11895
  • [tests] mark the wanvace lora tester flaky by @sayakpaul in #11883
  • [tests] add compile + offload tests for GGUF. by @sayakpaul in #11740
  • feat: add multiple input image support in Flux Kontext by @Net-Mist in #11880
  • Fix unique memory address when doing group-offloading with disk by @sayakpaul in #11767
  • [SD3] CFG Cutoff fix and official callback by @asomoza in #11890
  • The Modular Diffusers by @yiyixuxu in #9672
  • [quant] QoL improvements for pipeline-level quant config by @sayakpaul in #11876
  • Bump torch from 2.4.1 to 2.7.0 in /examples/server by @dependabot[bot] in #11429
  • [LoRA] fix: disabling hooks when loading loras. by @sayakpaul in #11896
  • [utils] account for MPS when available in get_device(). by @sayakpaul in #11905
  • [ControlnetUnion] Multiple Fixes by @asomoza in #11888
  • Avoid creating tensor in CosmosAttnProcessor2_0 by @chenxiao111222 in #11761)
  • [tests] Unify compilation + offloading tests in quantization by @sayakpaul in #11910
  • Speedup model loading by 4-5x ⚡ by @a-r-r-o-w in #11904
  • [docs] torch.compile blog post by @stevhliu in #11837
  • Flux: pass jointattentionkwargs when using gradient_checkpointing by @piercus in #11814
  • Fix: Align VAE processing in ControlNet SD3 training with inference by @Henry-Bi in #11909
  • Bump aiohttp from 3.10.10 to 3.12.14 in /examples/server by @dependabot[bot] in #11924
  • [tests] Improve Flux tests by @a-r-r-o-w in #11919
  • Remove device synchronization when loading weights by @a-r-r-o-w in #11927
  • Remove forced float64 from onnx stable diffusion pipelines by @lostdisc in #11054
  • Fixed bug: Uncontrolled recursive calls that caused an infinite loop when loading certain pipelines containing Transformer2DModel by @lengmo1996 in #11923
  • [ControlnetUnion] Propagate #11888 to img2img by @asomoza in #11929
  • enable flux pipeline compatible with unipc and dpm-solver by @gameofdimension in #11908
  • [training] add an offload utility that can be used as a context manager. by @sayakpaul in #11775
  • Add SkyReels V2: Infinite-Length Film Generative Model by @tolgacangoz in #11518
  • [refactor] Flux/Chroma single file implementation + Attention Dispatcher by @a-r-r-o-w in #11916
  • [docs] clarify the mapping between Transformer2DModel and finegrained variants. by @sayakpaul in #11947
  • [Modular] Updates for Custom Pipeline Blocks by @DN6 in #11940
  • [docs] Update toctree by @stevhliu in #11936
  • [docs] include bp link. by @sayakpaul in #11952
  • Fix kontext finetune issue when batch size >1 by @mymusise in #11921
  • [tests] Add test slices for Hunyuan Video by @a-r-r-o-w in #11954
  • [tests] Add test slices for Cosmos by @a-r-r-o-w in #11955
  • [tests] Add fast test slices for HiDream-Image by @a-r-r-o-w in #11953
  • [Modular] update the collection behavior by @yiyixuxu in #11963
  • fix "Expected all tensors to be on the same device, but found at least two devices" error by @yao-matrix in #11690
  • Remove logger warnings for attention backends and hard error during runtime instead by @a-r-r-o-w in #11967
  • [Examples] Uniform notations in trainfluxlora by @tomguluson92 in #10011
  • fix style by @yiyixuxu in #11975
  • [tests] Add test slices for Wan by @a-r-r-o-w in #11920
  • [docs] update guidance_scale docstring for guidance_distilled models. by @sayakpaul in #11935
  • [tests] enforce torch version in the compilation tests. by @sayakpaul in #11979
  • [modular diffusers] Wan by @a-r-r-o-w in #11913
  • [compile] logger statements create unnecessary guards during dynamo tracing by @a-r-r-o-w in #11987
  • enable quantcompile test on xpu by @yao-matrix in #11988
  • [WIP] Wan2.2 by @yiyixuxu in #12004
  • [refactor] some shared parts between hooks + docs by @a-r-r-o-w in #11968
  • [refactor] Wan single file implementation by @a-r-r-o-w in #11918
  • Fix huggingface-hub failing tests by @asomoza in #11994
  • feat: add flux kontext by @jlonge4 in #11985
  • [modular] add Modular flux for text-to-image by @sayakpaul in #11995
  • [docs] include lora fast post. by @sayakpaul in #11993
  • [docs] quant_kwargs by @stevhliu in #11712
  • [docs] Fix link by @stevhliu in #12018
  • [wan2.2] add 5b i2v by @yiyixuxu in #12006
  • wan2.2 i2v FirstBlockCache fix by @okaris in #12013
  • [core] support attention backends for LTX by @sayakpaul in #12021
  • [docs] Update index by @stevhliu in #12020
  • [Fix] huggingface-cli to hf missed files by @asomoza in #12008
  • [training-scripts] Make pytorch examples UV-compatible by @sayakpaul in #12000
  • [wan2.2] fix vae patches by @yiyixuxu in #12041
  • Allow SD pipeline to use newer schedulers, eg: FlowMatch by @ppbrown in #12015
  • [LoRA] support lightx2v lora in wan by @sayakpaul in #12040
  • Fix type of force_upcast to bool by @BerndDoser in #12046
  • Update autoencoderklcosmos.py by @tanuj-rai in #12045
  • Qwen-Image by @naykun in #12055
  • [wan2.2] follow-up by @yiyixuxu in #12024
  • tests + minor refactor for QwenImage by @a-r-r-o-w in #12057
  • Cross attention module to Wan Attention by @samuelt0 in #12058
  • fix(qwen-image): update vae license by @naykun in #12063
  • CI fixing by @paulinebm in #12059
  • enable all gpus when running ci. by @sayakpaul in #12062
  • fix the rest for all GPUs in CI by @sayakpaul in #12064
  • [docs] Install by @stevhliu in #12026
  • [wip] feat: support lora in qwen image and training script by @sayakpaul in #12056
  • [docs] small corrections to the example in the Qwen docs by @sayakpaul in #12068
  • [tests] Fix Qwen test_inference slices by @a-r-r-o-w in #12070
  • [tests] deal with the failing AudioLDM2 tests by @sayakpaul in #12069
  • optimize QwenImagePipeline to reduce unnecessary CUDA synchronization by @chengzeyi in #12072
  • Add cuda kernel support for GGUF inference by @Isotr0py in #11869
  • fix input shape for WanGGUFTexttoVideoSingleFileTests by @jiqing-feng in #12081
  • [refactor] condense group offloading by @a-r-r-o-w in #11990
  • Fix group offloading synchronization bug for parameter-only GroupModule's by @a-r-r-o-w in #12077
  • Helper functions to return skip-layer compatible layers by @a-r-r-o-w in #12048
  • Make prompt_2 optional in Flux Pipelines by @DN6 in #12073
  • [tests] tighten compilation tests for quantization by @sayakpaul in #12002
  • Implement Frequency-Decoupled Guidance (FDG) as a Guider by @dg845 in #11976
  • fix flux type hint by @DefTruth in #12089
  • [qwen] device typo by @yiyixuxu in #12099
  • [lora] adapt new LoRA config injection method by @sayakpaul in #11999
  • loraconversionutils: replace lora up/down with a/b even if transformer. in key by @Beinsezii in #12101
  • [tests] device placement for non-denoiser components in group offloading LoRA tests by @sayakpaul in #12103
  • [Modular] Fast Tests by @yiyixuxu in #11937
  • [GGUF] feat: support loading diffusers format gguf checkpoints. by @sayakpaul in #11684
  • [docs] diffusers gguf checkpoints by @sayakpaul in #12092
  • [core] add modular support for Flux I2I by @sayakpaul in #12086
  • [lora] support loading loras from lightx2v/Qwen-Image-Lightning by @sayakpaul in #12119
  • [Modular] More Updates for Custom Code Loading by @DN6 in #11969
  • enable compilation in qwen image. by @sayakpaul in #12061
  • [tests] Add inference test slices for SD3 and remove unnecessary tests by @a-r-r-o-w in #12106
  • [chore] complete the licensing statement. by @sayakpaul in #12001
  • [docs] Cache link by @stevhliu in #12105
  • [Modular] Add experimental feature warning for Modular Diffusers by @DN6 in #12127
  • Add lowcpumemusage option to fromsinglefile to align with frompretrained by @IrisRainbowNeko in #12114
  • [docs] Modular diffusers by @stevhliu in #11931
  • [Bugfix] typo fix in NPU FA by @leisuzz in #12129
  • Add QwenImage Inpainting and Img2Img pipeline by @Trgtuan10 in #12117
  • [core] parallel loading of shards by @sayakpaul in #12028
  • try to use deepseek with an agent to auto i18n to zh by @SamYuan1990 in #12032
  • [docs] Refresh effective and efficient doc by @stevhliu in #12134
  • Fix bf15/fp16 for pipelinewanvace.py by @SlimRG in #12143
  • make parallel loading flag a part of constants. by @sayakpaul in #12137
  • [docs] Parallel loading of shards by @stevhliu in #12135
  • feat: cuda device_map for pipelines. by @sayakpaul in #12122
  • [core] respect local_files_only=True when using sharded checkpoints by @sayakpaul in #12005
  • support hf_quantizer in cache warmup. by @sayakpaul in #12043
  • make test_gguf all pass on xpu by @yao-matrix in #12158
  • [docs] Quickstart by @stevhliu in #12128
  • Qwen Image Edit Support by @naykun in #12164
  • remove silu for CogView4 by @lambertwjh in #12150
  • [qwen] Qwen image edit followups by @sayakpaul in #12166
  • Minor modification to support DC-AE-turbo by @chenjy2003 in #12169
  • [Docs] typo error in qwen image by @leisuzz in #12144
  • fix: caching allocator behaviour for quantization. by @sayakpaul in #12172
  • fix(training_utils): wrap device in list for DiffusionPipeline by @MengAiDev in #12178
  • [docs] Clarify guidance scale in Qwen pipelines by @sayakpaul in #12181
  • [LoRA] feat: support more Qwen LoRAs from the community. by @sayakpaul in #12170
  • Update README.md by @Taechai in #12182
  • [chore] add lora button to qwenimage docs by @sayakpaul in #12183
  • [Wan 2.2 LoRA] add support for 2nd transformer lora loading + wan 2.2 lightx2v lora by @linoytsaban in #12074
  • Release: v0.35.0 by @sayakpaul (direct commit on v0.35.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @vuongminh1907
    • update: FluxKontextInpaintPipeline support (#11820)
  • @Net-Mist
    • feat: add multiple input image support in Flux Kontext (#11880)
  • @tolgacangoz
    • Add SkyReels V2: Infinite-Length Film Generative Model (#11518)
  • @naykun
    • Qwen-Image (#12055)
    • fix(qwen-image): update vae license (#12063)
    • Qwen Image Edit Support (#12164)
  • @Trgtuan10
    • Add QwenImage Inpainting and Img2Img pipeline (#12117)
  • @SamYuan1990
    • try to use deepseek with an agent to auto i18n to zh (#12032)

- Python
Published by sayakpaul 6 months ago

diffusers - Diffusers 0.34.0: New Image and Video Models, Better torch.compile Support, and more

📹 New video generation pipelines

Wan VACE

Wan VACE supports various generation techniques which achieve controllable video generation. It comes in two variants: a 1.3B model for fast iteration & prototyping, and a 14B for high quality generation. Some of the capabilities include:

  • Control to Video (Depth, Pose, Sketch, Flow, Grayscale, Scribble, Layout, Boundary Box, etc.). Recommended library for preprocessing videos to obtain control videos: huggingface/controlnet_aux
  • Image/Video to Video (first frame, last frame, starting clip, ending clip, random clips)
  • Inpainting and Outpainting
  • Subject to Video (faces, object, characters, etc.)
  • Composition to Video (reference anything, animate anything, swap anything, expand anything, move anything, etc.)

The code snippets available in this pull request demonstrate some examples of how videos can be generated with controllability signals.

Check out the docs to learn more.

Cosmos Predict2 Video2World

Cosmos-Predict2 is a key branch of the Cosmos World Foundation Models (WFMs) ecosystem for Physical AI, specializing in future state prediction through advanced world modeling. It offers two powerful capabilities: text-to-image generation for creating high-quality images from text descriptions, and video-to-world generation for producing visual simulations from video inputs.

The Video2World model comes in a 2B and 14B variant. Check out the docs to learn more.

LTX 0.9.7 and Distilled

LTX 0.9.7 and its distilled variants are the latest in the family of models released by Lightricks.

Check out the docs to learn more.

Hunyuan Video Framepack and F1

Framepack is a novel method for enabling long video generation. There are two released variants of Hunyuan Video trained using this technique. Check out the docs to learn more.

FusionX

The FusionX family of models and LoRAs, built on top of Wan2.1-14B, should already be supported. To load the model, use from_single_file():

python transformer = AutoModel.from_single_file( "https://huggingface.co/vrgamedevgirl84/Wan14BT2VFusioniX/blob/main/Wan14Bi2vFusioniX_fp16.safetensors", torch_dtype=torch.bfloat16 )

To load the LoRAs, use load_lora_weights():

python pipe = DiffusionPipeline.from_pretrained( "Wan-AI/Wan2.1-T2V-14B-Diffusers", torch_dtype=torch.bfloat16 ).to("cuda") pipe.load_lora_weights( "vrgamedevgirl84/Wan14BT2VFusioniX", weight_name="FusionX_LoRa/Wan2.1_T2V_14B_FusionX_LoRA.safetensors" )

AccVideo and CausVid (only LoRAs)

AccVideo and CausVid are two novel distillation techniques that speed up the generation time of video diffusion models while preserving quality. Diffusers supports loading their extracted LoRAs with their respective models.

🌠 New image generation pipelines

Cosmos Predict2 Text2Image

Text-to-image models from the Cosmos-Predict2 release. The models comes in a 2B and 14B variant. Check out the docs to learn more.

Chroma

Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it. Checkout the docs to learn more

Thanks to @Ednaordinary for contributing it in this PR!

VisualCloze

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning is an innovative in-context learning framework based universal image generation framework that offers key capabilities:

  1. Support for various in-domain tasks
  2. Generalization to unseen tasks through in-context learning
  3. Unify multiple tasks into one step and generate both target image and intermediate results
  4. Support reverse-engineering conditions from target images

Check out the docs to learn more. Thanks to @lzyhha for contributing this in this PR!

Better torch.compile support

We have worked with the PyTorch team to improve how we provide torch.compile() compatibility throughout the library. More specifically, we now test the widely used models like Flux for any recompilation and graph break issues which can get in the way of fully realizing torch.compile() benefits. Refer to the following links to learn more:

  • https://github.com/huggingface/diffusers/pull/11085
  • https://github.com/huggingface/diffusers/issues/11430

Additionally, users can combine offloading with compilation to get a better speed-memory trade-off. Below is an example:

Code ```py import torch from diffusers import DiffusionPipeline torch._dynamo.config.cache_size_limit = 10000 pipeline = DiffusionPipeline.from_pretrained( "black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16 ) pipline.enable_model_cpu_offload() # Compile. pipeline.transformer.compile() image = pipeline( prompt="An astronaut riding a horse on Mars", guidance_scale=0., height=768, width=1360, num_inference_steps=4, max_sequence_length=256, ).images[0] print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB") ```

This is compatible with group offloading, too. Interested readers can check out the concerned PRs below:

  • https://github.com/huggingface/diffusers/pull/11605
  • https://github.com/huggingface/diffusers/pull/11670

You can substantially reduce memory requirements by combining quantization with offloading and then improving speed with torch.compile(). Below is an example:

Code ```py from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig from transformers import BitsAndBytesConfig as TransformersBitsAndBytesConfig from diffusers import AutoModel, FluxPipeline from transformers import T5EncoderModel import torch torch._dynamo.config.recompile_limit = 1000 quant_kwargs = {"load_in_4bit": True, "bnb_4bit_compute_dtype": torch_dtype, "bnb_4bit_quant_type": "nf4"} text_encoder_2_quant_config = TransformersBitsAndBytesConfig(**quant_kwargs) dit_quant_config = DiffusersBitsAndBytesConfig(**quant_kwargs) ckpt_id = "black-forest-labs/FLUX.1-dev" text_encoder_2 = T5EncoderModel.from_pretrained( ckpt_id, subfolder="text_encoder_2", quantization_config=text_encoder_2_quant_config, torch_dtype=torch_dtype, ) transformer = AutoModel.from_pretrained( ckpt_id, subfolder="transformer", quantization_config=dit_quant_config, torch_dtype=torch_dtype, ) pipe = FluxPipeline.from_pretrained( ckpt_id, transformer=transformer, text_encoder_2=text_encoder_2, torch_dtype=torch_dtype, ) pipe.enable_model_cpu_offload() pipe.transformer.compile() image = pipeline( prompt="An astronaut riding a horse on Mars", guidance_scale=3.5, height=768, width=1360, num_inference_steps=28, max_sequence_length=512, ).images[0] ```

Starting from bitsandbytes==0.46.0 onwards, bnb-quantized models should be fully compatible with torch.compile() without graph-breaks. This means that when compiling a bnb-quantized model, users can do: model.compile(fullgraph=True). This can significantly improve speed while still providing memory benefits. The figure below provides a comparison with Flux.1-Dev. Refer to this benchmarking script to learn more.

image

Note that for 4bit bnb models, it’s currently needed to install PyTorch nightly if fullgraph=True is specified during compilation.

Huge shoutout to @anijain2305 and @StrongerXi from the PyTorch team for the incredible support.

PipelineQuantizationConfig

Users can now provide a quantization config while initializing a pipeline:

```python import torch from diffusers import DiffusionPipeline from diffusers.quantizers import PipelineQuantizationConfig

pipelinequantconfig = PipelineQuantizationConfig( quantbackend="bitsandbytes4bit", quantkwargs={"loadin4bit": True, "bnb4bitquanttype": "nf4", "bnb4bitcomputedtype": torch.bfloat16}, componentstoquantize=["transformer", "textencoder2"], ) pipe = DiffusionPipeline.frompretrained( "black-forest-labs/FLUX.1-dev", quantizationconfig=pipelinequantconfig, torchdtype=torch.bfloat16, ).to("cuda")

image = pipe("photo of a cute dog").images[0] ```

This reduces the barrier to entry for our users willing to use quantization without having to write too much code. Refer to the documentation to learn more about different configurations allowed through PipelineQuantizationConfig.

Group offloading with disk

In the previous release, we shipped “group offloading” which lets you offload blocks/nodes within a model, optimizing its memory consumption. It also lets you overlap this offloading with computation, providing a good speed-memory trade-off, especially in low VRAM environments.

However, you still need a considerable amount of system RAM to make offloading work effectively. So, low VRAM and low RAM environments would still not work.

Starting this release, users will additionally have the option to offload to disk instead of RAM, further lowering memory consumption. Set the offload_to_disk_path to enable this feature.

python pipeline.transformer.enable_group_offload( onload_device="cuda", offload_device="cpu", offload_type="leaf_level", offload_to_disk_path="path/to/disk" )

Refer to these two tables to compare the speed and memory trade-offs.

LoRA metadata parsing

It is beneficial to include the LoraConfig in a LoRA state dict that was used to train the LoRA. In its absence, users were restricted to using the same LoRA alpha as the LoRA rank. We have modified the most popular training scripts to allow passing custom lora_alpha through the CLI. Refer to this thread for more updates. Refer to this comment for some extended clarifications.

New training scripts

  • We now have a capable training script for training robust timestep-distilled models through the SANA Sprint framework. Check out this resource for more details. Thanks to @scxue and @lawrence-cj for contributing it in this PR.
  • HiDream LoRA DreamBooth training script (docs). The script supports training with quantization. HiDream is an MIT-licensed model. So, make it yours with this training script.

Updates on educational materials on quantization

We have worked on a two-part series discussing the support of quantization in Diffusers. Check them out:

All commits

  • [LoRA] support musubi wan loras. by @sayakpaul in #11243
  • fix testvanillafunetuning failure on XPU and A100 by @yao-matrix in #11263
  • make teststablediffusioninpaintfp16 pass on XPU by @yao-matrix in #11264
  • make testdicttupleoutputsequivalent pass on XPU by @yao-matrix in #11265
  • add onnxruntime-qnn & onnxruntime-cann by @xieofxie in #11269
  • make testinstantstylemultiplemasks pass on XPU by @yao-matrix in #11266
  • [BUG] Fix convertvaepttodiffusers bug by @lavinal712 in #11078
  • Fix LTX 0.9.5 single file by @hlky in #11271
  • [Tests] Cleanup lora tests utils by @sayakpaul in #11276
  • [CI] relax tolerance for unclip further by @sayakpaul in #11268
  • do not use DIFFUSERS_REQUEST_TIMEOUT for notification bot by @sayakpaul in #11273
  • Fix incorrect tilelatentmin_width calculation in AutoencoderKLMochi by @kuantuna in #11294
  • HiDream Image by @hlky in #11231
  • flow matching lcm scheduler by @quickjkee in #11170
  • Update autoencoderkl_allegro.md by @Forbu in #11303
  • Hidream refactoring follow ups by @a-r-r-o-w in #11299
  • Fix incorrect tilelatentmin_width calculations by @kuantuna in #11305
  • [ControlNet] Adds controlnet for SanaTransformer by @ishan-modi in #11040
  • make KandinskyV22PipelineInpaintCombinedFastTests::testfloat16inference pass on XPU by @yao-matrix in #11308
  • make teststablediffusionkarrassigmas pass on XPU by @yao-matrix in #11310
  • make KolorsPipelineFastTests::test_inference_batch_single_identical pass on XPU by @faaany in #11313
  • [LoRA] support more SDXL loras. by @sayakpaul in #11292
  • [HiDream] code example by @linoytsaban in #11317
  • import for FlowMatchLCMScheduler by @asomoza in #11318
  • Use float32 on mps or npu in transformerhidreamimage's rope by @hlky in #11316
  • Add skrample section to community_projects.md by @Beinsezii in #11319
  • [docs] Promote AutoModel usage by @sayakpaul in #11300
  • [LoRA] Add LoRA support to AuraFlow by @hameerabbasi in #10216
  • Fix vae.Decoder prevoutputchannel by @hlky in #11280
  • fix CPU offloading related fail cases on XPU by @yao-matrix in #11288
  • [docs] fix hidream docstrings. by @sayakpaul in #11325
  • Rewrite AuraFlowPatchEmbed.peselectionindexbasedon_dim to be torch.compile compatible by @AstraliteHeart in #11297
  • post release 0.33.0 by @sayakpaul in #11255
  • another fix for FlowMatchLCMScheduler forgotten import by @asomoza in #11330
  • Fix Hunyuan I2V for transformers>4.47.1 by @DN6 in #11293
  • unpin torch versions for onnx Dockerfile by @sayakpaul in #11290
  • [single file] enable telemetry for single file loading when using GGUF. by @sayakpaul in #11284
  • [docs] add a snippet for compilation in the auraflow docs. by @sayakpaul in #11327
  • Hunyuan I2V fast tests fix by @DN6 in #11341
  • [BUG] fixed _toctree.yml alphabetical ordering by @ishan-modi in #11277
  • Fix wrong dtype argument name as torch_dtype by @nPeppon in #11346
  • [chore] fix lora docs utils by @sayakpaul in #11338
  • [docs] add note about useduckshape in auraflow docs. by @sayakpaul in #11348
  • [LoRA] Propagate hotswap better by @sayakpaul in #11333
  • [Hi Dream] follow-up by @yiyixuxu in #11296
  • [bitsandbytes] improve dtype mismatch handling for bnb + lora. by @sayakpaul in #11270
  • Update controlnet_flux.py by @haofanwang in #11350
  • enable 2 test cases on XPU by @yao-matrix in #11332
  • [BNB] Fix testmovingtocputhrows_warning by @SunMarc in #11356
  • support Wan-FLF2V by @yiyixuxu in #11353
  • Fix: StableDiffusionXLControlNetAdapterInpaintPipeline incorrectly inherited StableDiffusionLoraLoaderMixin by @Kazuki-Yoda in #11357
  • update output for Hidream transformer by @yiyixuxu in #11366
  • [Wan2.1-FLF2V] update conversion script by @yiyixuxu in #11365
  • [Flux LoRAs] fix lr scheduler bug in distributed scenarios by @linoytsaban in #11242
  • [traindreamboothlorasdxl.py] Fix the LR Schedulers when numtrain_epochs is passed in a distributed training env by @kghamilton89 in #11240
  • fix issue that training flux controlnet was unstable and validation r… by @PromeAIpro in #11373
  • Fix Wan I2V prepare_latents dtype by @a-r-r-o-w in #11371
  • [BUG] fixes in kadinsky pipeline by @ishan-modi in #11080
  • Add Serialized Type Name kwarg in Model Output by @anzr299 in #10502
  • [cogview4][feat] Support attention mechanism with variable-length support and batch packing by @OleehyO in #11349
  • Support different-length pos/neg prompts for FLUX.1-schnell variants like Chroma by @josephrocca in #11120
  • [Refactor] Minor Improvement for import utils by @ishan-modi in #11161
  • Add stochastic sampling to FlowMatchEulerDiscreteScheduler by @apolinario in #11369
  • [LoRA] add LoRA support to HiDream and fine-tuning script by @linoytsaban in #11281
  • Update modeling imports by @a-r-r-o-w in #11129
  • [HiDream] move deprecation to 0.35.0 by @yiyixuxu in #11384
  • Update README_hidream.md by @AMEERAZAM08 in #11386
  • Fix group offloading with blocklevel and usestream=True by @a-r-r-o-w in #11375
  • [traindreamboothflux] Add LANCZOS as the default interpolation mode for image resizing by @ishandutta0098 in #11395
  • [Feature] Added Xlab Controlnet support by @ishan-modi in #11249
  • Kolors additional pipelines, community contrib by @Teriks in #11372
  • [HiDream LoRA] optimizations + small updates by @linoytsaban in #11381
  • Fix Flux IP adapter argument in the pipeline example by @AeroDEmi in #11402
  • [BUG] fixed WAN docstring by @ishan-modi in #11226
  • Fix typos in strings and comments by @co63oc in #11407
  • [traindreamboothlora.py] Set LANCZOS as default interpolation mode for resizing by @merterbak in #11421
  • [tests] add tests to check for graph breaks, recompilation, cuda syncs in pipelines during torch.compile() by @sayakpaul in #11085
  • enable group_offload cases and quanto cases on XPU by @yao-matrix in #11405
  • enable testlayerwisecasting_memory cases on XPU by @yao-matrix in #11406
  • [tests] fix import. by @sayakpaul in #11434
  • [traintextto_image] Better image interpolation in training scripts follow up by @tongyu0924 in #11426
  • [traintexttoimagelora] Better image interpolation in training scripts follow up by @tongyu0924 in #11427
  • enable 28 GGUF test cases on XPU by @yao-matrix in #11404
  • [Hi-Dream LoRA] fix bug in validation by @linoytsaban in #11439
  • Fixing missing provider options argument by @urpetkov-amd in #11397
  • Set LANCZOS as the default interpolation for image resizing in ControlNet training by @YoulunPeng in #11449
  • Raise warning instead of error for block offloading with streams by @a-r-r-o-w in #11425
  • enable marigold_intrinsics cases on XPU by @yao-matrix in #11445
  • torch.compile fullgraph compatibility for Hunyuan Video by @a-r-r-o-w in #11457
  • enable consistency test cases on XPU, all passed by @yao-matrix in #11446
  • enable unidiffuser test cases on xpu by @yao-matrix in #11444
  • Add generic support for Intel Gaudi accelerator (hpu device) by @dsocek in #11328
  • Add StableDiffusion3InstructPix2PixPipeline by @xduzhangjiayu in #11378
  • make safe diffusion test cases pass on XPU and A100 by @yao-matrix in #11458
  • [testmodelstransformerhunyuanvideo] help us test torch.compile() for impactful models by @tongyu0924 in #11431
  • Add LANCZOS as default interplotation mode. by @Va16hav07 in #11463
  • make autoencoders. controlnetflux and wantransformer3dsinglefile pass on xpu by @yao-matrix in #11461
  • [WAN] fix recompilation issues by @sayakpaul in #11475
  • Fix typos in docs and comments by @co63oc in #11416
  • [tests] xfail recent pipeline tests for specific methods. by @sayakpaul in #11469
  • cache packages_distributions by @vladmandic in #11453
  • [docs] Memory optims by @stevhliu in #11385
  • [docs] Adapters by @stevhliu in #11331
  • [traindreamboothlorasdxladvanced] Add LANCZOS as the default interpolation mode for image resizing by @yuanjua in #11471
  • [traindreamboothlorafluxadvanced] Add LANCZOS as the default interpolation mode for image resizing by @ysurs in #11472
  • enable semantic diffusion and stable diffusion panorama cases on XPU by @yao-matrix in #11459
  • [Feature] Implement tiled VAE encoding/decoding for Wan model. by @c8ef in #11414
  • [traintexttoimagesdxl]Add LANCZOS as default interpolation mode for image resizing by @ParagEkbote in #11455
  • [traindreamboothlorasdxl] Add --imageinterpolation_mode option for image resizing (default to lanczos) by @MinJu-Ha in #11490
  • [traindreamboothlora_lumina2] Add LANCZOS as the default interpolation mode for image resizing by @cjfghk5697 in #11491
  • [training] feat: enable quantization for hidream lora training. by @sayakpaul in #11494
  • Set LANCZOS as the default interpolation method for image resizing. by @yijun-lee in #11492
  • Update training script for txt to img sdxl with lora supp with new interpolation. by @RogerSinghChugh in #11496
  • Fix torchao docs typo for fp8 granular quantization by @a-r-r-o-w in #11473
  • Update setup.py to pin min version of peft by @sayakpaul in #11502
  • update dep table. by @sayakpaul in #11504
  • [LoRA] use removeprefix to preserve sanity. by @sayakpaul in #11493
  • Hunyuan Video Framepack by @a-r-r-o-w in #11428
  • enable lora cases on XPU by @yao-matrix in #11506
  • [lora_conversion] Enhance key handling for OneTrainer components in LORA conversion utility by @iamwavecut in #11441)
  • [docs] minor updates to bitsandbytes docs. by @sayakpaul in #11509
  • Cosmos by @a-r-r-o-w in #10660
  • clean up the Init for stable_diffusion by @yiyixuxu in #11500
  • fix audioldm by @sayakpaul (direct commit on v0.34.0-release)
  • Revert "fix audioldm" by @sayakpaul (direct commit on v0.34.0-release)
  • [LoRA] make lora alpha and dropout configurable by @linoytsaban in #11467
  • Add cross attention type for Sana-Sprint training in diffusers. by @scxue in #11514
  • Conditionally import torchvision in Cosmos transformer by @a-r-r-o-w in #11524
  • [tests] fix audioldm2 for transformers main. by @sayakpaul in #11522
  • feat: pipeline-level quantization config by @sayakpaul in #11130
  • [Tests] Enable more general testing for torch.compile() with LoRA hotswapping by @sayakpaul in #11322
  • [LoRA] support non-diffusers hidream loras by @sayakpaul in #11532
  • enable 7 cases on XPU by @yao-matrix in #11503
  • [LTXPipeline] Update latents dtype to match VAE dtype by @james-p-xu in #11533
  • enable dit integration cases on xpu by @yao-matrix in #11523
  • enable print_env on xpu by @yao-matrix in #11507
  • Change Framepack transformer layer initialization order by @a-r-r-o-w in #11535
  • [tests] add tests for framepack transformer model. by @sayakpaul in #11520
  • Hunyuan Video Framepack F1 by @a-r-r-o-w in #11534
  • enable several pipeline integration tests on XPU by @yao-matrix in #11526
  • [testmodelstransformer_ltx.py] help us test torch.compile() for impactful models by @cjfghk5697 in #11512
  • Add VisualCloze by @lzyhha in #11377
  • Fix typo in traindiffusionorposdxllora_wds.py by @Meeex2 in #11541
  • fix: remove torch_dtype="auto" option from docstrings by @johannaSommer in #11513
  • [traindreambooth.py] Fix the LR Schedulers when numtrain_epochs is passed in a distributed training env by @kghamilton89 in #11239
  • [LoRA] small change to support Hunyuan LoRA Loading for FramePack by @linoytsaban in #11546
  • LTX Video 0.9.7 by @a-r-r-o-w in #11516
  • [tests] Enable testing for HiDream transformer by @sayakpaul in #11478
  • Update pipelinefluximg2img.py to add missing vaeslicing and vaetiling calls. by @Meatfucker in #11545
  • Fix deprecation warnings in testltximage2video.py by @AChowdhury1211 in #11538
  • [tests] Add torch.compile test for UNet2DConditionModel by @olccihyeon in #11537
  • [Single File] GGUF/Single File Support for HiDream by @DN6 in #11550
  • [gguf] Refactor torch_function to avoid unnecessary computation by @anijain2305 in #11551
  • [tests] add tests for combining layerwise upcasting and groupoffloading. by @sayakpaul in #11558
  • [docs] Regional compilation docs by @sayakpaul in #11556
  • enhance value guard of deviceagnostic_dispatch by @yao-matrix in #11553
  • Doc update by @Player256 in #11531
  • Revert error to warning when loading LoRA from repo with multiple weights by @apolinario in #11568
  • [docs] tip for group offloding + quantization by @sayakpaul in #11576
  • [LoRA] support non-diffusers LTX-Video loras by @linoytsaban in #11572
  • [WIP][LoRA] start supporting kijai wan lora. by @sayakpaul in #11579
  • [Single File] Fix loading for LTX 0.9.7 transformer by @DN6 in #11578
  • Use HF Papers by @qgallouedec in #11567
  • LTX 0.9.7-distilled; documentation improvements by @a-r-r-o-w in #11571
  • [LoRA] kijai wan lora support for I2V by @linoytsaban in #11588
  • docs: fix invalid links by @osrm in #11505
  • [docs] Remove fast diffusion tutorial by @stevhliu in #11583
  • RegionalPrompting: Inherit from Stable Diffusion by @b-sai in #11525
  • [chore] allow string device to be passed to randn_tensor. by @sayakpaul in #11559
  • Type annotation fix by @DN6 in #11597
  • [LoRA] minor fix for load_lora_weights() for Flux and a test by @sayakpaul in #11595
  • Update Intel Gaudi doc by @regisss in #11479
  • enable pipeline test cases on xpu by @yao-matrix in #11527
  • [Feature] AutoModel can load components using model_index.json by @ishan-modi in #11401
  • [docs] Pipeline-level quantization by @stevhliu in #11604
  • Fix bug when variant and safetensor file does not match by @kaixuanliu in #11587
  • [tests] Changes to the torch.compile() CI and tests by @sayakpaul in #11508
  • Fix mixed variant downloading by @DN6 in #11611
  • fix security issue in build docker ci by @sayakpaul in #11614
  • Make group offloading compatible with torch.compile() by @sayakpaul in #11605
  • [training docs] smol update to README files by @linoytsaban in #11616
  • Adding NPU for get device function by @leisuzz in #11617
  • [LoRA] improve LoRA fusion tests by @sayakpaul in #11274
  • [Sana Sprint] add image-to-image pipeline by @linoytsaban in #11602
  • [CI] fix the filename for displaying failures in lora ci. by @sayakpaul in #11600
  • [docs] PyTorch 2.0 by @stevhliu in #11618
  • [textualinversionsdxl.py] fix lr scheduler steps count by @yuanjua in #11557
  • Fix wrong indent for examples of controlnet script by @Justin900429 in #11632
  • removing unnecessary else statement by @YanivDorGalron in #11624
  • enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed by @yao-matrix in #11620
  • Bug: Fixed Image 2 Image example by @vltmedia in #11619
  • typo fix in pipeline_flux.py by @YanivDorGalron in #11623
  • Fix typos in strings and comments by @co63oc in #11476
  • [docs] update torchao doc link by @sayakpaul in #11634
  • Use float32 RoPE freqs in Wan with MPS backends by @hvaara in #11643
  • [chore] misc changes in the bnb tests for consistency. by @sayakpaul in #11355
  • [tests] chore: rename lora model-level tests. by @sayakpaul in #11481
  • [docs] Caching methods by @stevhliu in #11625
  • [docs] Model cards by @stevhliu in #11112
  • [CI] Some improvements to Nightly reports summaries by @DN6 in #11166
  • [chore] bring PipelineQuantizationConfig at the top of the import chain. by @sayakpaul in #11656
  • [examples] flux-control: use numtrainingstepsforscheduler by @Markus-Pobitzer in #11662
  • use deterministic to get stable result by @jiqing-feng in #11663
  • [tests] add test for torch.compile + group offloading by @sayakpaul in #11670
  • Wan VACE by @a-r-r-o-w in #11582
  • fixed axesdimsrope init (huggingface#11641) by @sofinvalery in #11678
  • [tests] Fix how compiler mixin classes are used by @sayakpaul in #11680
  • Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process by @DN6 in #11596
  • Add community class StableDiffusionXL_T5Pipeline by @ppbrown in #11626
  • Update pipelinefluxinpaint.py to fix paddingmaskcrop returning only the inpainted area by @Meatfucker in #11658
  • Allow remote code repo names to contain "." by @akasharidas in #11652
  • [LoRA] support Flux Control LoRA with bnb 8bit. by @sayakpaul in #11655
  • [Wan] Fix VAE sampling mode in WanVideoToVideoPipeline by @tolgacangoz in #11639
  • enable torchao test cases on XPU and switch to device agnostic APIs for test cases by @yao-matrix in #11654
  • [tests] tests for compilation + quantization (bnb) by @sayakpaul in #11672
  • [tests] model-level device_map clarifications by @sayakpaul in #11681
  • Improve Wan docstrings by @a-r-r-o-w in #11689
  • Set torchversion to N/A if torch is disabled. by @rasmi in #11645
  • Avoid DtoH sync from access of nonzero() item in scheduler by @jbschlosser in #11696
  • Apply Occam's Razor in position embedding calculation by @tolgacangoz in #11562
  • [docs] add compilation bits to the bitsandbytes docs. by @sayakpaul in #11693
  • swap out token for style bot. by @sayakpaul in #11701
  • [docs] mention fp8 benefits on supported hardware. by @sayakpaul in #11699
  • Support Wan AccVideo lora by @a-r-r-o-w in #11704
  • [LoRA] parse metadata from LoRA and save metadata by @sayakpaul in #11324
  • Cosmos Predict2 by @a-r-r-o-w in #11695
  • Chroma Pipeline by @Ednaordinary in #11698
  • [LoRA ]fix flux lora loader when return_metadata is true for non-diffusers by @sayakpaul in #11716
  • [training] show how metadata stuff should be incorporated in training scripts. by @sayakpaul in #11707
  • Fix misleading comment by @carlthome in #11722
  • Add Pruna optimization framework documentation by @davidberenstein1957 in #11688
  • Support more Wan loras (VACE) by @a-r-r-o-w in #11726
  • [LoRA training] update metadata use for lora alpha + README by @linoytsaban in #11723
  • ⚡️ Speed up method AutoencoderKLWan.clear_cache by 886% by @misrasaurabh1 in #11665
  • [training] add ds support to lora hidream by @leisuzz in #11737
  • [tests] device_map tests for all models. by @sayakpaul in #11708
  • [chore] change to 2025 licensing for remaining by @sayakpaul in #11741
  • Chroma Follow Up by @DN6 in #11725
  • [Quantizers] add is_compileable property to quantizers. by @sayakpaul in #11736
  • Update more licenses to 2025 by @a-r-r-o-w in #11746
  • Add missing HiDream license by @a-r-r-o-w in #11747
  • Bump urllib3 from 2.2.3 to 2.5.0 in /examples/server by @dependabot[bot] in #11748
  • [LoRA] refactor lora loading at the model-level by @sayakpaul in #11719
  • [CI] Fix WAN VACE tests by @DN6 in #11757
  • [CI] Fix SANA tests by @DN6 in #11756
  • Fix HiDream pipeline test module by @DN6 in #11754
  • make group offloading work with disk/nvme transfers by @sayakpaul in #11682
  • Update Chroma Docs by @DN6 in #11753
  • fix invalid component handling behaviour in PipelineQuantizationConfig by @sayakpaul in #11750
  • Fix failing cpu offload test for LTX Latent Upscale by @DN6 in #11755
  • [docs] Quantization + torch.compile + offloading by @stevhliu in #11703
  • [docs] device_map by @stevhliu in #11711
  • [docs] LoRA scale scheduling by @stevhliu in #11727
  • Fix dimensionalities in apply_rotary_emb functions' comments by @tolgacangoz in #11717
  • enable deterministic in bnb 4 bit tests by @jiqing-feng in #11738
  • enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU by @yao-matrix in #11671
  • [tests] properly skip tests instead of return by @sayakpaul in #11771
  • [CI] Skip ONNX Upscale tests by @DN6 in #11774
  • [Wan] Fix mask padding in Wan VACE pipeline. by @bennyguo in #11778
  • Add --loraalpha and metadata handling to traindreamboothlorasana.py by @imbr92 in #11744
  • [docs] minor cleanups in the lora docs. by @sayakpaul in #11770
  • [lora] only remove hooks that we add back by @yiyixuxu in #11768
  • [tests] Fix HunyuanVideo Framepack device tests by @a-r-r-o-w in #11789
  • [chore] raise as early as possible in group offloading by @sayakpaul in #11792
  • [tests] Fix group offloading and layerwise casting test interaction by @a-r-r-o-w in #11796
  • guard omnigen processor. by @sayakpaul in #11799
  • Release: v0.34.0 by @sayakpaul (direct commit on v0.34.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @yao-matrix
    • fix testvanillafunetuning failure on XPU and A100 (#11263)
    • make teststablediffusioninpaintfp16 pass on XPU (#11264)
    • make testdicttupleoutputsequivalent pass on XPU (#11265)
    • make testinstantstylemultiplemasks pass on XPU (#11266)
    • make KandinskyV22PipelineInpaintCombinedFastTests::testfloat16inference pass on XPU (#11308)
    • make teststablediffusionkarrassigmas pass on XPU (#11310)
    • fix CPU offloading related fail cases on XPU (#11288)
    • enable 2 test cases on XPU (#11332)
    • enable group_offload cases and quanto cases on XPU (#11405)
    • enable testlayerwisecasting_memory cases on XPU (#11406)
    • enable 28 GGUF test cases on XPU (#11404)
    • enable marigold_intrinsics cases on XPU (#11445)
    • enable consistency test cases on XPU, all passed (#11446)
    • enable unidiffuser test cases on xpu (#11444)
    • make safe diffusion test cases pass on XPU and A100 (#11458)
    • make autoencoders. controlnetflux and wantransformer3dsinglefile pass on xpu (#11461)
    • enable semantic diffusion and stable diffusion panorama cases on XPU (#11459)
    • enable lora cases on XPU (#11506)
    • enable 7 cases on XPU (#11503)
    • enable dit integration cases on xpu (#11523)
    • enable print_env on xpu (#11507)
    • enable several pipeline integration tests on XPU (#11526)
    • enhance value guard of deviceagnostic_dispatch (#11553)
    • enable pipeline test cases on xpu (#11527)
    • enable group_offloading and PipelineDeviceAndDtypeStabilityTests on XPU, all passed (#11620)
    • enable torchao test cases on XPU and switch to device agnostic APIs for test cases (#11654)
    • enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU (#11671)
  • @hlky
    • Fix LTX 0.9.5 single file (#11271)
    • HiDream Image (#11231)
    • Use float32 on mps or npu in transformerhidreamimage's rope (#11316)
    • Fix vae.Decoder prevoutputchannel (#11280)
  • @quickjkee
    • flow matching lcm scheduler (#11170)
  • @ishan-modi
    • [ControlNet] Adds controlnet for SanaTransformer (#11040)
    • [BUG] fixed _toctree.yml alphabetical ordering (#11277)
    • [BUG] fixes in kadinsky pipeline (#11080)
    • [Refactor] Minor Improvement for import utils (#11161)
    • [Feature] Added Xlab Controlnet support (#11249)
    • [BUG] fixed WAN docstring (#11226)
    • [Feature] AutoModel can load components using model_index.json (#11401)
  • @linoytsaban
    • [HiDream] code example (#11317)
    • [Flux LoRAs] fix lr scheduler bug in distributed scenarios (#11242)
    • [LoRA] add LoRA support to HiDream and fine-tuning script (#11281)
    • [HiDream LoRA] optimizations + small updates (#11381)
    • [Hi-Dream LoRA] fix bug in validation (#11439)
    • [LoRA] make lora alpha and dropout configurable (#11467)
    • [LoRA] small change to support Hunyuan LoRA Loading for FramePack (#11546)
    • [LoRA] support non-diffusers LTX-Video loras (#11572)
    • [LoRA] kijai wan lora support for I2V (#11588)
    • [training docs] smol update to README files (#11616)
    • [Sana Sprint] add image-to-image pipeline (#11602)
    • [LoRA training] update metadata use for lora alpha + README (#11723)
  • @hameerabbasi
    • [LoRA] Add LoRA support to AuraFlow (#10216)
  • @DN6
    • Fix Hunyuan I2V for transformers>4.47.1 (#11293)
    • Hunyuan I2V fast tests fix (#11341)
    • [Single File] GGUF/Single File Support for HiDream (#11550)
    • [Single File] Fix loading for LTX 0.9.7 transformer (#11578)
    • Type annotation fix (#11597)
    • Fix mixed variant downloading (#11611)
    • [CI] Some improvements to Nightly reports summaries (#11166)
    • Introduce DeprecatedPipelineMixin to simplify pipeline deprecation process (#11596)
    • Chroma Follow Up (#11725)
    • [CI] Fix WAN VACE tests (#11757)
    • [CI] Fix SANA tests (#11756)
    • Fix HiDream pipeline test module (#11754)
    • Update Chroma Docs (#11753)
    • Fix failing cpu offload test for LTX Latent Upscale (#11755)
    • [CI] Skip ONNX Upscale tests (#11774)
  • @yiyixuxu
    • [Hi Dream] follow-up (#11296)
    • support Wan-FLF2V (#11353)
    • update output for Hidream transformer (#11366)
    • [Wan2.1-FLF2V] update conversion script (#11365)
    • [HiDream] move deprecation to 0.35.0 (#11384)
    • clean up the Init for stable_diffusion (#11500)
    • [lora] only remove hooks that we add back (#11768)
  • @Teriks
    • Kolors additional pipelines, community contrib (#11372)
  • @co63oc
    • Fix typos in strings and comments (#11407)
    • Fix typos in docs and comments (#11416)
    • Fix typos in strings and comments (#11476)
  • @xduzhangjiayu
    • Add StableDiffusion3InstructPix2PixPipeline (#11378)
  • @scxue
    • Add cross attention type for Sana-Sprint training in diffusers. (#11514)
  • @lzyhha
    • Add VisualCloze (#11377)
  • @b-sai
    • RegionalPrompting: Inherit from Stable Diffusion (#11525)
  • @Ednaordinary
    • Chroma Pipeline (#11698)

- Python
Published by sayakpaul 8 months ago

diffusers - v0.33.1: fix ftfy import

All commits

  • fix ftfy import for wan pipelines by @yiyixuxu in #11262

- Python
Published by yiyixuxu 11 months ago

diffusers - Diffusers 0.33.0: New Image and Video Models, Memory Optimizations, Caching Methods, Remote VAEs, New Training Scripts, and more

New Pipelines for Video Generation

Wan 2.1

Wan2.1 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. The model release includes 4 different model variants and three different pipelines for Text to Video, Image to Video and Video to Video.

  • Wan-AI/Wan2.1-T2V-1.3B-Diffusers
  • Wan-AI/Wan2.1-T2V-14B-Diffusers
  • Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
  • Wan-AI/Wan2.1-I2V-14B-720P-Diffusers

Check out the docs here to learn more.

LTX Video 0.9.5

LTX Video 0.9.5 is the updated version of the super-fast LTX Video model series. The latest model introduces additional conditioning options, such as keyframe-based animation and video extension (both forward and backward).

To support these additional conditioning inputs, we’ve introduced the LTXConditionPipeline and LTXVideoCondition object.

To learn more about the usage, check out the docs here.

Hunyuan Image to Video

Hunyuan utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder. The input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data and seamlessly integrating information from both the image and its associated caption.

To learn more, check out the docs here.

Others

New Pipelines for Image Generation

Sana-Sprint

SANA-Sprint is an efficient diffusion model for ultra-fast text-to-image generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4, rivaling the quality of models like Flux.

Shoutout to @lawrence-cj for their help and guidance on this PR.

Check out the pipeline docs of SANA-Sprint to learn more.

Lumina2

Lumina-Image-2.0 is a 2B parameter flow-based diffusion transformer for text-to-image generation released under the Apache 2.0 license.

Check out the docs to learn more. Thanks to @zhuole1025 for contributing this through this PR.

One can also LoRA fine-tune Lumina2, taking advantage of its Apach2.0 licensing. Check out the guide for more details.

Omnigen

OmniGen is a unified image generation model that can handle multiple tasks including text-to-image, image editing, subject-driven generation, and various computer vision tasks within a single framework. The model consists of a VAE, and a single transformer based on Phi-3 that handles text and image encoding as well as the diffusion process.

Check out the docs to learn more about OmniGen. Thanks to @staoxiao for contributing OmniGen in this PR.

Others

  • CogView4 (thanks to @zRzRzRzRzRzRzR for contributing CogView4 in this PR)

New Memory Optimizations

Layerwise Casting

PyTorch supports torch.float8_e4m3fn and torch.float8_e5m2 as weight storage dtypes, but they can’t be used for computation on many devices due to unimplemented kernel support.

However, you can still use these dtypes to store model weights in FP8 precision and upcast them to a widely supported dtype such as torch.float16 or torch.bfloat16 on-the-fly when the layers are used in the forward pass. This is known as layerwise weight-casting. This can potentially cut down the VRAM requirements of a model by 50%.  

Code ```py import torch from diffusers import CogVideoXPipeline, CogVideoXTransformer3DModel from diffusers.utils import export_to_video model_id = "THUDM/CogVideoX-5b" # Load the model in bfloat16 and enable layerwise casting transformer = CogVideoXTransformer3DModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16) transformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16) # Load the pipeline pipe = CogVideoXPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16) pipe.to("cuda") prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0] export_to_video(video, "output.mp4", fps=8) ```

Group Offloading

Group offloading is the middle ground between sequential and model offloading. It works by offloading groups of internal layers (either torch.nn.ModuleList or torch.nn.Sequential), which uses less memory than model-level offloading. It is also faster than sequential-level offloading because the number of device synchronizations is reduced.

On CUDA devices, we also have the option to enable using layer prefetching with CUDA Streams. The next layer to be executed is loaded onto the accelerator device while the current layer is being executed which makes inference substantially faster while still keeping VRAM requirements very low. With this, we introduce the idea of overlapping computation with data transfer.

One thing to note is that using CUDA streams can cause a considerable spike in CPU RAM usage. Please ensure that the available CPU RAM is 2 times the size of the model if you choose to set use_stream=True. You can reduce CPU RAM usage by setting low_cpu_mem_usage=True. This should limit the CPU RAM used to be roughly the same as the size of the model, but will introduce slight latency in the inference process.

You can also use record_stream=True when using use_stream=True to obtain more speedups at the expense of slightly increased memory usage.

Code ```py import torch from diffusers import CogVideoXPipeline from diffusers.utils import export_to_video # Load the pipeline onload_device = torch.device("cuda") offload_device = torch.device("cpu") pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) # We can utilize the enable_group_offload method for Diffusers model implementations pipe.transformer.enable_group_offload( onload_device=onload_device, offload_device=offload_device, offload_type="leaf_level", use_stream=True ) prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) video = pipe(prompt=prompt, guidance_scale=6, num_inference_steps=50).frames[0] # This utilized about 14.79 GB. It can be further reduced by using tiling and using leaf_level offloading throughout the pipeline. print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB") export_to_video(video, "output.mp4", fps=8) ```

Group offloading can also be applied to non-Diffusers models such as text encoders from the transformers library.

Code ```py import torch from diffusers import CogVideoXPipeline from diffusers.hooks import apply_group_offloading from diffusers.utils import export_to_video # Load the pipeline onload_device = torch.device("cuda") offload_device = torch.device("cpu") pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) # For any other model implementations, the apply_group_offloading function can be used apply_group_offloading(pipe.text_encoder, onload_device=onload_device, offload_type="block_level", num_blocks_per_group=2) ```

Remote Components

Remote components are an experimental feature designed to offload memory-intensive steps of the inference pipeline to remote endpoints. The initial implementation focuses primarily on VAE decoding operations. Below are the currently supported model endpoints:

| Model | Endpoint | Model | |---------------------|---------------------------------------------------------------------|--------------------------------------| | Stable Diffusion v1 | https://q1bj3bpq6kzilnsu.us-east-1.aws.endpoints.huggingface.cloud | stabilityai/sd-vae-ft-mse | | Stable Diffusion XL | https://x2dmsqunjd6k9prw.us-east-1.aws.endpoints.huggingface.cloud | madebyollin/sdxl-vae-fp16-fix | | Flux | https://whhx50ex1aryqvw6.us-east-1.aws.endpoints.huggingface.cloud | black-forest-labs/FLUX.1-schnell | | HunyuanVideo | https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud | hunyuanvideo-community/HunyuanVideo |

This is an example of using remote decoding with the Hunyuan Video pipeline:

Code ```py from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel model_id = "hunyuanvideo-community/HunyuanVideo" transformer = HunyuanVideoTransformer3DModel.from_pretrained( model_id, subfolder="transformer", torch_dtype=torch.bfloat16 ) pipe = HunyuanVideoPipeline.from_pretrained( model_id, transformer=transformer, vae=None, torch_dtype=torch.float16 ).to("cuda") latent = pipe( prompt="A cat walks on the grass, realistic", height=320, width=512, num_frames=61, num_inference_steps=30, output_type="latent", ).frames video = remote_decode( endpoint="https://o7ywnmrahorts457.us-east-1.aws.endpoints.huggingface.cloud/", tensor=latent, output_type="mp4", ) if isinstance(video, bytes): with open("video.mp4", "wb") as f: f.write(video) ```

Check out the docs to know more.

Introducing Cached Inference for DiTs

Cached Inference for Diffusion Transformer models is a performance optimization that significantly accelerates the denoising process by caching intermediate values. This technique reduces redundant computations across timesteps, resulting in faster generation with a slight dip in output quality.

Check out the docs to learn more about the available caching methods.

Pyramind Attention Broadcast

Code ```py import torch from diffusers import CogVideoXPipeline, PyramidAttentionBroadcastConfig pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) pipe.to("cuda") config = PyramidAttentionBroadcastConfig( spatial_attention_block_skip_range=2, spatial_attention_timestep_skip_range=(100, 800), current_timestep_callback=lambda: pipe.current_timestep, ) pipe.transformer.enable_cache(config) ```

FasterCache

Code ```py import torch from diffusers import CogVideoXPipeline, FasterCacheConfig pipe = CogVideoXPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16) pipe.to("cuda") config = FasterCacheConfig( spatial_attention_block_skip_range=2, spatial_attention_timestep_skip_range=(-1, 901), unconditional_batch_skip_range=2, attention_weight_callback=lambda _: 0.5, is_guidance_distilled=True, ) pipe.transformer.enable_cache(config) ```

Quantization

Quanto Backend

Diffusers now has support for the Quanto quantization backend, which provides float8 , int8 , int4 and int2 quantization dtypes.

```python import torch from diffusers import FluxTransformer2DModel, QuantoConfig

modelid = "black-forest-labs/FLUX.1-dev" quantizationconfig = QuantoConfig(weightsdtype="float8") transformer = FluxTransformer2DModel.frompretrained( modelid, subfolder="transformer", quantizationconfig=quantizationconfig, torchdtype=torch.bfloat16, ) ```

Quanto int8 models are also compatible with torch.compile :

Code ```py import torch from diffusers import FluxTransformer2DModel, QuantoConfig model_id = "black-forest-labs/FLUX.1-dev" quantization_config = QuantoConfig(weights_dtype="float8") transformer = FluxTransformer2DModel.from_pretrained( model_id, subfolder="transformer", quantization_config=quantization_config, torch_dtype=torch.bfloat16, ) transformer.compile() ```

Improved loading for uintx TorchAO checkpoints with torch>=2.6

TorchAO checkpoints currently have to be serialized using pickle. For some quantization dtypes using the uintx format, such as uint4wo this involves saving subclassed TorchAO Tensor objects in the model file. This made loading the models directly with Diffusers a bit tricky since we do not allow deserializing artbitary Python objects from pickle files.

Torch 2.6 allows adding expected Tensors to torch safe globals, which lets us directly load TorchAO checkpoints with these objects.

diff - state_dict = torch.load("/path/to/flux_uint4wo/diffusion_pytorch_model.bin", weights_only=False, map_location="cpu") - with init_empty_weights(): - transformer = FluxTransformer2DModel.from_config("/path/to/flux_uint4wo/config.json") - transformer.load_state_dict(state_dict, strict=True, assign=True) + transformer = FluxTransformer2DModel.from_pretrained("/path/to/flux_uint4wo/")

LoRAs

We have shipped a couple of improvements on the LoRA front in this release.

🚨 Improved coverage for loading non-diffusers LoRA checkpoints for Flux

Take note of the breaking change introduced in this PR 🚨 We suggest you upgrade your peft installation to the latest version - pip install -U peft especially when dealing with Flux LoRAs.

torch.compile() support when hotswapping LoRAs without triggering recompilation

A common use case when serving multiple adapters is to load one adapter first, generate images, load another adapter, generate more images, load another adapter, etc. This workflow normally requires calling loadloraweights(), set_adapters(), and possibly delete_adapters() to save memory. Moreover, if the model is compiled using torch.compile, performing these steps requires recompilation, which takes time.

To better support this common workflow, you can “hotswap” a LoRA adapter, to avoid accumulating memory and in some cases, recompilation. It requires an adapter to already be loaded, and the new adapter weights are swapped in-place for the existing adapter.

Check out the docs to learn more about this feature.

The other major change is the support for

  • Loading LoRAs into quantized model checkpoints

dtype Maps for Pipelines

Since various pipelines require their components to run in different compute dtypes, we now support passing a dtype map when initializing a pipeline:

```python from diffusers import HunyuanVideoPipeline import torch

pipe = HunyuanVideoPipeline.frompretrained( "hunyuanvideo-community/HunyuanVideo", torchdtype={"transformer": torch.bfloat16, "default": torch.float16}, ) print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16) ```

AutoModel

This release includes an AutoModel object similar to the one found in transformers that automatically fetches the appropriate model class for the provided repo.

```python from diffusers import AutoModel

unet = AutoModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet") ```

All commits

  • [Sana 4K] Add vae tiling option to avoid OOM by @leisuzz in #10583
  • IP-Adapter for StableDiffusion3Img2ImgPipeline by @guiyrt in #10589
  • [DC-AE, SANA] fix SanaMultiscaleLinearAttention applyquadraticattention bf16 by @chenjy2003 in #10595
  • Move buffers to device by @hlky in #10523
  • [Docs] Update SD3 ipadapter modelid to diffusers checkpoint by @guiyrt in #10597
  • Scheduling fixes on MPS by @hlky in #10549
  • [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo by @chengzeyi in #10544
  • NPU adaption for RMSNorm by @leisuzz in #10534
  • implementing flux on TPUs with ptxla by @entrpn in #10515
  • [core] ConsisID by @SHYuanBest in #10140
  • [training] set rest of the blocks with requires_grad False. by @sayakpaul in #10607
  • chore: remove redundant words by @sunxunle in #10609
  • bugfix for npu not support float64 by @baymax591 in #10123
  • [chore] change licensing to 2025 from 2024. by @sayakpaul in #10615
  • Enable dreambooth lora finetune example on other devices by @jiqing-feng in #10602
  • Remove the FP32 Wrapper when evaluating by @lmxyy in #10617
  • [tests] make tests device-agnostic (part 3) by @faaany in #10437
  • fix offload gpu tests etc by @yiyixuxu in #10366
  • Remove cache migration script by @Wauplin in #10619
  • [core] Layerwise Upcasting by @a-r-r-o-w in #10347
  • Improve TorchAO error message by @a-r-r-o-w in #10627
  • [CI] Update HF_TOKEN in all workflows by @DN6 in #10613
  • add onnxruntime-migraphx as part of check for onnxruntime in import_utils.py by @kahmed10 in #10624
  • [Tests] modify the test slices for the failing flax test by @sayakpaul in #10630
  • [docs] fix image path in para attention docs by @sayakpaul in #10632
  • [docs] uv installation by @stevhliu in #10622
  • width and height are mixed-up by @raulc0399 in #10629
  • Add IP-Adapter example to Flux docs by @hlky in #10633
  • removing redundant requires_grad = False by @YanivDorGalron in #10628
  • [chore] add a script to extract loras from full fine-tuned models by @sayakpaul in #10631
  • Add pipelinestablediffusionxlattentive_eraser by @Anonym0u3 in #10579
  • NPU Adaption for Sanna by @leisuzz in #10409
  • Add sigmoid scheduler in scheduling_ddpm.py docs by @JacobHelwig in #10648
  • create a script to train autoencoderkl by @lavinal712 in #10605
  • Add community pipeline for semantic guidance for FLUX by @Marlon154 in #10610
  • ControlNet Union controlnetconditioningscale for multiple control inputs by @hlky in #10666
  • [training] Convert to ImageFolder script by @hlky in #10664
  • Add provider_options to OnnxRuntimeModel by @hlky in #10661
  • fix check_inputs func in LuminaText2ImgPipeline by @victolee0 in #10651
  • SDXL ControlNet Union pipelines, make control_image argument immutible by @Teriks in #10663
  • Revert RePaint scheduler 'fix' by @GiusCat in #10644
  • [core] Pyramid Attention Broadcast by @a-r-r-o-w in #9562
  • [fix] refer useframewiseencoding on AutoencoderKLHunyuanVideo._encode by @hanchchch in #10600
  • Refactor gradient checkpointing by @a-r-r-o-w in #10611
  • [Tests] conditionally check fp8_e4m3_bf16_max_memory < fp8_e4m3_fp32_max_memory by @sayakpaul in #10669
  • Fix pipeline dtype unexpected change when using SDXL reference community pipelines in float16 mode by @dimitribarbot in #10670
  • [tests] update llamatokenizer in hunyuanvideo tests by @sayakpaul in #10681
  • support StableDiffusionAdapterPipeline.fromsinglefile by @Teriks in #10552
  • fix(hunyuan-video): typo in height and width input check by @badayvedat in #10684
  • [FIX] check_inputs function in Auraflow Pipeline by @SahilCarterr in #10678
  • Fix enable memory efficient attention on ROCm by @tenpercent in #10564
  • Fix inconsistent random transform in instruct pix2pix by @Luvata in #10698
  • feat(training-utils): support device and dtype params in computedensityfortimestepsampling by @badayvedat in #10699
  • Fixed grammar in "writeownpipeline" readme by @N0-Flux-given in #10706
  • Fix Documentation about Image-to-Image Pipeline by @ParagEkbote in #10704
  • [bitsandbytes] Simplify bnb int8 dequant by @sayakpaul in #10401
  • Fix traintextto_image.py --help by @nkthiebaut in #10711
  • Notebooks for Community Scripts-6 by @ParagEkbote in #10713
  • [Fix] Type Hint in from_pretrained() to Ensure Correct Type Inference by @SahilCarterr in #10714
  • add provideroptions in frompretrained by @xieofxie in #10719
  • [Community] Enhanced Model Search by @suzukimain in #10417
  • [bugfix] NPU Adaption for Sana by @leisuzz in #10724
  • Quantized Flux with IP-Adapter by @hlky in #10728
  • EDMEulerScheduler accept sigmas, add finalsigmastype by @hlky in #10734
  • [LoRA] fix peft state dict parsing by @sayakpaul in #10532
  • Add Self type hint to ModelMixin's from_pretrained by @hlky in #10742
  • [Tests] Test layerwise casting with training by @sayakpaul in #10765
  • speedup hunyuan encoder causal mask generation by @dabeschte in #10764
  • [CI] Fix Truffle Hog failure by @DN6 in #10769
  • Add OmniGen by @staoxiao in #10148
  • feat: new community mixturetilingsdxl pipeline for SDXL by @elismasilva in #10759
  • Add support for lumina2 by @zhuole1025 in #10642
  • Refactor OmniGen by @a-r-r-o-w in #10771
  • Faster set_adapters by @Luvata in #10777
  • [Single File] Add Single File support for Lumina Image 2.0 Transformer by @DN6 in #10781
  • Fix use_lu_lambdas and use_karras_sigmas with beta_schedule=squaredcos_cap_v2 in DPMSolverMultistepScheduler by @hlky in #10740
  • MultiControlNetUnionModel on SDXL by @guiyrt in #10747
  • fix: [Community pipeline] Fix flattened elements on image by @elismasilva in #10774
  • make tensors contiguous before passing to safetensors by @faaany in #10761
  • Disable PEFT input autocast when using fp8 layerwise casting by @a-r-r-o-w in #10685
  • Update FlowMatch docstrings to mention correct output classes by @a-r-r-o-w in #10788
  • Refactor CogVideoX transformer forward by @a-r-r-o-w in #10789
  • Module Group Offloading by @a-r-r-o-w in #10503
  • Update Custom Diffusion Documentation for Multiple Concept Inference to resolve issue #10791 by @puhuk in #10792
  • [FIX] check_inputs function in lumina2 by @SahilCarterr in #10784
  • follow-up refactor on lumina2 by @yiyixuxu in #10776
  • CogView4 (supports different length c and uc) by @zRzRzRzRzRzRzR in #10649
  • typo fix by @YanivDorGalron in #10802
  • Extend Support for callbackonstep_end for AuraFlow and LuminaText2Img Pipelines by @ParagEkbote in #10746
  • [chore] update notes generation spaces by @sayakpaul in #10592
  • [LoRA] improve lora support for flux. by @sayakpaul in #10810
  • Fix max_shift value in flux and related functions to 1.15 (issue #10675) by @puhuk in #10807
  • [docs] add missing entries to the lora docs. by @sayakpaul in #10819
  • DiffusionPipeline mixin to+FromOriginalModelMixin/FromSingleFileMixin from_single_file type hint by @hlky in #10811
  • [LoRA] make set_adapters() robust on silent failures. by @sayakpaul in #9618
  • [FEAT] Model loading refactor by @SunMarc in #10604
  • [misc] feat: introduce a style bot. by @sayakpaul in #10274
  • Remove print statements by @a-r-r-o-w in #10836
  • [tests] use proper gemma class and config in lumina2 tests. by @sayakpaul in #10828
  • [LoRA] add LoRA support to Lumina2 and fine-tuning script by @sayakpaul in #10818
  • [Utils] add utilities for checking if certain utilities are properly documented by @sayakpaul in #7763
  • Add missing isinstance for arg checks in GGUFParameter by @AstraliteHeart in #10834
  • [tests] test encode_prompt() in isolation by @sayakpaul in #10438
  • store activation cls instead of function by @SunMarc in #10832
  • fix: support transformer models' generation_config in pipeline by @JeffersonQin in #10779
  • Notebooks for Community Scripts-7 by @ParagEkbote in #10846
  • [CI] install accelerate transformers from main by @sayakpaul in #10289
  • [CI] run fast gpu tests conditionally on pull requests. by @sayakpaul in #10310
  • SD3 IP-Adapter runtime checkpoint conversion by @guiyrt in #10718
  • Some consistency-related fixes for HunyuanVideo by @a-r-r-o-w in #10835
  • SkyReels Hunyuan T2V & I2V by @a-r-r-o-w in #10837
  • fix: run tests from a pr workflow. by @sayakpaul in #9696
  • [chore] template for remote vae. by @sayakpaul in #10849
  • fix remote vae template by @sayakpaul in #10852
  • [CI] Fix incorrectly named test module for Hunyuan DiT by @DN6 in #10854
  • [CI] Update always test Pipelines list in Pipeline fetcher by @DN6 in #10856
  • device_map in load_model_dict_into_meta by @hlky in #10851
  • [Fix] Docs overview.md by @SahilCarterr in #10858
  • remove format check for safetensors file by @SunMarc in #10864
  • [docs] LoRA support by @stevhliu in #10844
  • Comprehensive type checking for from_pretrained kwargs by @guiyrt in #10758
  • Fix torch_dtype in Kolors text encoder with transformers v4.49 by @hlky in #10816
  • [LoRA] restrict certain keys to be checked for peft config update. by @sayakpaul in #10808
  • Add SD3 ControlNet to AutoPipeline by @hlky in #10888
  • [docs] Update prompt weighting docs by @stevhliu in #10843
  • [docs] Flux group offload by @stevhliu in #10847
  • [Fix] fp16 unscaling in traindreamboothlora_sdxl by @SahilCarterr in #10889
  • [docs] Add CogVideoX Schedulers by @a-r-r-o-w in #10885
  • [chore] correct qk norm list. by @sayakpaul in #10876
  • [Docs] Fix toctree sorting by @DN6 in #10894
  • [refactor] SD3 docs & remove additional code by @a-r-r-o-w in #10882
  • [refactor] Remove additional Flux code by @a-r-r-o-w in #10881
  • [CI] Improvements to conditional GPU PR tests by @DN6 in #10859
  • Multi IP-Adapter for Flux pipelines by @guiyrt in #10867
  • Fix Callback Tensor Inputs of the SDXL Controlnet Inpaint and Img2img Pipelines are missing "controlnet_image". by @CyberVy in #10880
  • Security fix by @ydshieh in #10905
  • Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation by @toshas in #10884
  • [Tests] fix: lumina2 lora fuse_nan test by @sayakpaul in #10911
  • Fix Callback Tensor Inputs of the SD Controlnet Pipelines are missing some elements. by @CyberVy in #10907
  • [CI] Fix Fast GPU tests on PR by @DN6 in #10912
  • [CI] Fix for failing IP Adapter test in Fast GPU PR tests by @DN6 in #10915
  • Experimental per control type scale for ControlNet Union by @hlky in #10723
  • [style bot] improve security for the stylebot. by @sayakpaul in #10908
  • [CI] Update Stylebot Permissions by @DN6 in #10931
  • [Alibaba Wan Team] continue on #10921 Wan2.1 by @yiyixuxu in #10922
  • Support IPAdapter for more Flux pipelines by @hlky in #10708
  • Add remote_decode to remote_utils by @hlky in #10898
  • Update VAE Decode endpoints by @hlky in #10939
  • [chore] fix-copies to flux pipelines by @sayakpaul in #10941
  • [Tests] Remove more encode prompts tests by @sayakpaul in #10942
  • Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model by @bubbliiiing in #10626
  • Fix SD2.X clip single file load projection_dim by @Teriks in #10770
  • add fromsinglefile to animatediff by @ in #10924
  • Add Example of IPAdapterScaleCutoffCallback to Docs by @ParagEkbote in #10934
  • Update pipeline_cogview4.py by @zRzRzRzRzRzRzR in #10944
  • Fix redundant prevoutputchannel assignment in UNet2DModel by @ahmedbelgacem in #10945
  • Improve loadipadapter RAM Usage by @CyberVy in #10948
  • [tests] make tests device-agnostic (part 4) by @faaany in #10508
  • Update evaluation.md by @sayakpaul in #10938
  • [LoRA] feat: support non-diffusers lumina2 LoRAs. by @sayakpaul in #10909
  • [Quantization] support pass MappingType for TorchAoConfig by @a120092009 in #10927
  • Fix the missing parentheses when calling istorchaoavailable in quantization_config.py. by @CyberVy in #10961
  • [LoRA] Support Wan by @a-r-r-o-w in #10943
  • Fix incorrect seed initialization when args.seed is 0 by @azolotenkov in #10964
  • feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL by @elismasilva in #10951
  • [Docs] CogView4 comment fix by @zRzRzRzRzRzRzR in #10957
  • update check_input for cogview4 by @yiyixuxu in #10966
  • Add VAE Decode endpoint slow test by @hlky in #10946
  • [flux lora training] fix t5 training bug by @linoytsaban in #10845
  • use style bot GH Action from huggingface_hub by @hanouticelina in #10970
  • [traindreamboothlora.py] Fix the LR Schedulers when num_train_epochs is passed in a distributed training env by @flyxiv in #10973
  • [tests] fix tests for save load components by @sayakpaul in #10977
  • Fix loading OneTrainer Flux LoRA by @hlky in #10978
  • fix default values of Flux guidance_scale in docstrings by @catwell in #10982
  • [CI] remove synchornized. by @sayakpaul in #10980
  • Bump jinja2 from 3.1.5 to 3.1.6 in /examples/research_projects/realfill by @dependabot[bot] in #10984
  • Fix Flux Controlnet Pipeline callbacktensor_inputs Missing Some Elements by @CyberVy in #10974
  • [Single File] Add user agent to SF download requests. by @DN6 in #10979
  • Add CogVideoX DDIM Inversion to Community Pipelines by @LittleNyima in #10956
  • fix wan i2v pipeline bugs by @yupeng1111 in #10975
  • Hunyuan I2V by @a-r-r-o-w in #10983
  • Fix Graph Breaks When Compiling CogView4 by @chengzeyi in #10959
  • Wan VAE move scaling to pipeline by @hlky in #10998
  • [LoRA] remove full key prefix from peft. by @sayakpaul in #11004
  • [Single File] Add single file support for Wan T2V/I2V by @DN6 in #10991
  • Add STG to community pipelines by @kinam0252 in #10960
  • [LoRA] Improve copied from comments in the LoRA loader classes by @sayakpaul in #10995
  • Fix for fetching variants only by @DN6 in #10646
  • [Quantization] Add Quanto backend by @DN6 in #10756
  • [Single File] Add single file loading for SANA Transformer by @ishan-modi in #10947
  • [LoRA] Improve warning messages when LoRA loading becomes a no-op by @sayakpaul in #10187
  • [LoRA] CogView4 by @a-r-r-o-w in #10981
  • [Tests] improve quantization tests by additionally measuring the inference memory savings by @sayakpaul in #11021
  • [Research Project] Add AnyText: Multilingual Visual Text Generation And Editing by @tolgacangoz in #8998
  • [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 by @DN6 in #11018
  • fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings by @elismasilva in #11012
  • [LoRA] support wan i2v loras from the world. by @sayakpaul in #11025
  • Fix SD3 IPAdapter feature extractor by @hlky in #11027
  • chore: fix help messages in advanced diffusion examples by @wonderfan in #10923
  • Fix missing **kwargs in lora_pipeline.py by @CyberVy in #11011
  • Fix for multi-GPU WAN inference by @AmericanPresidentJimmyCarter in #10997
  • [Refactor] Clean up import utils boilerplate by @DN6 in #11026
  • Use output_size in repeat_interleave by @hlky in #11030
  • [hybrid inference 🍯🐝] Add VAE encode by @hlky in #11017
  • Wan Pipeline scaling fix, type hint warning, multi generator fix by @hlky in #11007
  • [LoRA] change to warning from info when notifying the users about a LoRA no-op by @sayakpaul in #11044
  • Rename Lumina(2)Text2ImgPipeline -> Lumina(2)Pipeline by @hlky in #10827
  • making formatted_images initialization compact by @YanivDorGalron in #10801
  • Fix aclnnRepeatInterleaveIntWithDim error on NPU for get1drotaryposembed by @ZhengKai91 in #10820
  • [Tests] restrict memory tests for quanto for certain schemes. by @sayakpaul in #11052
  • [LoRA] feat: support non-diffusers wan t2v loras. by @sayakpaul in #11059
  • [examples/controlnet/traincontrolnetsd3.py] Fixes #11050 - Cast promptembeds and pooledpromptembeds to weightdtype to prevent dtype mismatch by @andjoer in #11051
  • reverts accidental change that removes attn_mask in attn. Improves fl… by @entrpn in #11065
  • Fix deterministic issue when getting pipeline dtype and device by @dimitribarbot in #10696
  • [Tests] add requires peft decorator. by @sayakpaul in #11037
  • CogView4 Control Block by @zRzRzRzRzRzRzR in #10809
  • [CI] pin transformers version for benchmarking. by @sayakpaul in #11067
  • Fix Wan I2V Quality by @chengzeyi in #11087
  • LTX 0.9.5 by @a-r-r-o-w in #10968
  • make PR GPU tests conditioned on styling. by @sayakpaul in #11099
  • Group offloading improvements by @a-r-r-o-w in #11094
  • Fix pipelinefluxcontrolnet.py by @co63oc in #11095
  • update readme instructions. by @entrpn in #11096
  • Resolve stride mismatch in UNet's ResNet to support Torch DDP by @jinc7461 in #11098
  • Fix Group offloading behaviour when using streams by @a-r-r-o-w in #11097
  • Quality options in export_to_video by @hlky in #11090
  • [CI] uninstall deps properly from pr gpu tests. by @sayakpaul in #11102
  • [BUG] Fix Autoencoderkl train script by @lavinal712 in #11113
  • [Wan LoRAs] make T2V LoRAs compatible with Wan I2V by @linoytsaban in #11107
  • [tests] enable bnb tests on xpu by @faaany in #11001
  • [fix bug] PixArt inference_steps=1 by @lawrence-cj in #11079
  • Flux with Remote Encode by @hlky in #11091
  • [tests] make cuda only tests device-agnostic by @faaany in #11058
  • Provide option to reduce CPU RAM usage in Group Offload by @DN6 in #11106
  • remove F.rms_norm for now by @yiyixuxu in #11126
  • Notebooks for Community Scripts-8 by @ParagEkbote in #11128
  • fix callbacktensor_inputs of sd controlnet inpaint pipeline missing some elements by @CyberVy in #11073
  • [core] FasterCache by @a-r-r-o-w in #10163
  • add sana-sprint by @yiyixuxu in #11074
  • Don't override torch_dtype and don't use when quantization_config is set by @hlky in #11039
  • Update README and example code for AnyText usage by @tolgacangoz in #11028
  • Modify the implementation of retrieve_timesteps in CogView4-Control. by @zRzRzRzRzRzRzR in #11125
  • [fix SANA-Sprint] by @lawrence-cj in #11142
  • New HunyuanVideo-I2V by @a-r-r-o-w in #11066
  • [doc] Fix Korean Controlnet Train doc by @flyxiv in #11141
  • Improve information about group offloading and layerwise casting by @a-r-r-o-w in #11101
  • add a timestep scale for sana-sprint teacher model by @lawrence-cj in #11150
  • [Quantization] dtype fix for GGUF + fix BnB tests by @DN6 in #11159
  • Set self.hfpeftconfigloaded to True when LoRA is loaded using load_lora_adapter in PeftAdapterMixin class by @kentdan3msu in #11155
  • WanI2V encode_image by @hlky in #11164
  • [Docs] Update Wan Docs with memory optimizations by @DN6 in #11089
  • Fix LatteTransformer3DModel dtype mismatch with enabletemporalattentions by @hlky in #11139
  • Raise warning and round down if Wan num_frames is not 4k + 1 by @a-r-r-o-w in #11167
  • [Docs] Fix environment variables in installation.md by @remarkablemark in #11179
  • Add latents_mean and latents_std to SDXLLongPromptWeightingPipeline by @hlky in #11034
  • Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set by @kakukakujirori in #10918
  • [tests] no hard-coded cuda by @faaany in #11186
  • [WIP] Add Wan Video2Video by @DN6 in #11053
  • map BACKENDRESETMAXMEMORYALLOCATED to resetpeakmemory_stats on XPU by @yao-matrix in #11191
  • fix autocast by @jiqing-feng in #11190
  • fix: for checking mandatory and optional pipeline components by @elismasilva in #11189
  • remove unnecessary call to F.pad by @bm-synth in #10620
  • allow models to run with a user-provided dtype map instead of a single dtype by @hlky in #10301
  • [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU by @faaany in #11197
  • Revert save_model in ModelMixin savepretrained and use safeserialization=False in test by @hlky in #11196
  • [docs] torch_dtype map by @hlky in #11194
  • Fix enablesequentialcpu_offload in CogView4Pipeline by @hlky in #11195
  • SchedulerMixin from_pretrained and ConfigMixin Self type annotation by @hlky in #11192
  • Update import_utils.py by @Lakshaysharma048 in #10329
  • Add CacheMixin to Wan and LTX Transformers by @DN6 in #11187
  • feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline by @elismasilva in #11188
  • [Model Card] standardize advanced diffusion training sdxl lora by @chiral-carbon in #7615
  • Change KolorsPipeline LoRA Loader to StableDiffusion by @BasileLewan in #11198
  • Update Style Bot workflow by @hanouticelina in #11202
  • Fixed requests.get function call by adding timeout parameter. by @kghamilton89 in #11156
  • Fix Single File loading for LTX VAE by @DN6 in #11200
  • [feat]Add strength in flux_fill pipeline (denoising strength for fluxfill) by @Suprhimp in #10603
  • [LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning by @tolgacangoz in #11174
  • Add Wan with STG as a community pipeline by @Ednaordinary in #11184
  • Add missing MochiEncoder3D.gradient_checkpointing attribute by @mjkvaak-amd in #11146
  • enable 1 case on XPU by @yao-matrix in #11219
  • ensure dtype match between diffused latents and vae weights by @heyalexchoi in #8391
  • [docs] MPS update by @stevhliu in #11212
  • Add support to pass image embeddings to the WAN I2V pipeline. by @goiri in #11175
  • [traincontrolnet.py] Fix the LR schedulers when numtrain_epochs is passed in a distributed training env by @Bhavay-2001 in #8461
  • [Training] Better image interpolation in training scripts by @asomoza in #11206
  • [LoRA] Implement hot-swapping of LoRA by @BenjaminBossan in #9453
  • introduce compute arch specific expectations and fix testsd3img2img_inference failure by @yao-matrix in #11227
  • [Flux LoRA] fix issues in flux lora scripts by @linoytsaban in #11111
  • Flux quantized with lora by @hlky in #10990
  • [feat] implement record_stream when using CUDA streams during group offloading by @sayakpaul in #11081
  • [bistandbytes] improve replacement warnings for bnb by @sayakpaul in #11132
  • minor update to sana sprint docs. by @sayakpaul in #11236
  • [docs] minor updates to dtype map docs. by @sayakpaul in #11237
  • [LoRA] support more comyui loras for Flux 🚨 by @sayakpaul in #10985
  • fix: SD3 ControlNet validation so that it runs on a A100. by @sayakpaul in #11238
  • AudioLDM2 Fixes by @hlky in #11244
  • AutoModel by @hlky in #11115
  • fix FluxReduxSlowTests::testfluxredux_inference case failure on XPU by @yao-matrix in #11245
  • [docs] AutoModel by @hlky in #11250
  • Update Ruff to latest Version by @DN6 in #10919
  • fix flux controlnet bug by @free001style in #11152
  • fix timeout constant by @sayakpaul in #11252
  • fix consisid imports by @sayakpaul in #11254
  • Release: v0.33.0 by @sayakpaul (direct commit on v0.33.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @guiyrt
    • IP-Adapter for StableDiffusion3Img2ImgPipeline (#10589)
    • [Docs] Update SD3 ipadapter modelid to diffusers checkpoint (#10597)
    • MultiControlNetUnionModel on SDXL (#10747)
    • SD3 IP-Adapter runtime checkpoint conversion (#10718)
    • Comprehensive type checking for from_pretrained kwargs (#10758)
    • Multi IP-Adapter for Flux pipelines (#10867)
  • @chengzeyi
    • [Docs] Add documentation about using ParaAttention to optimize FLUX and HunyuanVideo (#10544)
    • Fix Graph Breaks When Compiling CogView4 (#10959)
    • Fix Wan I2V Quality (#11087)
  • @entrpn
    • implementing flux on TPUs with ptxla (#10515)
    • reverts accidental change that removes attn_mask in attn. Improves fl… (#11065)
    • update readme instructions. (#11096)
  • @SHYuanBest
    • [core] ConsisID (#10140)
  • @faaany
    • [tests] make tests device-agnostic (part 3) (#10437)
    • make tensors contiguous before passing to safetensors (#10761)
    • [tests] make tests device-agnostic (part 4) (#10508)
    • [tests] enable bnb tests on xpu (#11001)
    • [tests] make cuda only tests device-agnostic (#11058)
    • [tests] no hard-coded cuda (#11186)
    • [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (#11197)
  • @yiyixuxu
    • fix offload gpu tests etc (#10366)
    • follow-up refactor on lumina2 (#10776)
    • [Alibaba Wan Team] continue on #10921 Wan2.1 (#10922)
    • update check_input for cogview4 (#10966)
    • remove F.rms_norm for now (#11126)
    • add sana-sprint (#11074)
  • @DN6
    • [CI] Update HF_TOKEN in all workflows (#10613)
    • [CI] Fix Truffle Hog failure (#10769)
    • [Single File] Add Single File support for Lumina Image 2.0 Transformer (#10781)
    • [CI] Fix incorrectly named test module for Hunyuan DiT (#10854)
    • [CI] Update always test Pipelines list in Pipeline fetcher (#10856)
    • [Docs] Fix toctree sorting (#10894)
    • [CI] Improvements to conditional GPU PR tests (#10859)
    • [CI] Fix Fast GPU tests on PR (#10912)
    • [CI] Fix for failing IP Adapter test in Fast GPU PR tests (#10915)
    • [CI] Update Stylebot Permissions (#10931)
    • [Single File] Add user agent to SF download requests. (#10979)
    • [Single File] Add single file support for Wan T2V/I2V (#10991)
    • Fix for fetching variants only (#10646)
    • [Quantization] Add Quanto backend (#10756)
    • [Quantization] Allow loading TorchAO serialized Tensor objects with torch>=2.6 (#11018)
    • [Refactor] Clean up import utils boilerplate (#11026)
    • Provide option to reduce CPU RAM usage in Group Offload (#11106)
    • [Quantization] dtype fix for GGUF + fix BnB tests (#11159)
    • [Docs] Update Wan Docs with memory optimizations (#11089)
    • [WIP] Add Wan Video2Video (#11053)
    • Add CacheMixin to Wan and LTX Transformers (#11187)
    • Fix Single File loading for LTX VAE (#11200)
    • Update Ruff to latest Version (#10919)
  • @Anonym0u3
    • Add pipelinestablediffusionxlattentive_eraser (#10579)
  • @lavinal712
    • create a script to train autoencoderkl (#10605)
    • [BUG] Fix Autoencoderkl train script (#11113)
  • @Marlon154
    • Add community pipeline for semantic guidance for FLUX (#10610)
  • @ParagEkbote
    • Fix Documentation about Image-to-Image Pipeline (#10704)
    • Notebooks for Community Scripts-6 (#10713)
    • Extend Support for callbackonstep_end for AuraFlow and LuminaText2Img Pipelines (#10746)
    • Notebooks for Community Scripts-7 (#10846)
    • Add Example of IPAdapterScaleCutoffCallback to Docs (#10934)
    • Notebooks for Community Scripts-8 (#11128)
  • @suzukimain
    • [Community] Enhanced Model Search (#10417)
  • @staoxiao
    • Add OmniGen (#10148)
  • @elismasilva
    • feat: new community mixturetilingsdxl pipeline for SDXL (#10759)
    • fix: [Community pipeline] Fix flattened elements on image (#10774)
    • feat: add Mixture-of-Diffusers ControlNet Tile upscaler Pipeline for SDXL (#10951)
    • fix: mixture tiling sdxl pipeline - adjust gerating time_ids & embeddings (#11012)
    • fix: for checking mandatory and optional pipeline components (#11189)
    • feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (#11188)
  • @zhuole1025
    • Add support for lumina2 (#10642)
  • @zRzRzRzRzRzRzR
    • CogView4 (supports different length c and uc) (#10649)
    • Update pipeline_cogview4.py (#10944)
    • [Docs] CogView4 comment fix (#10957)
    • CogView4 Control Block (#10809)
    • Modify the implementation of retrieve_timesteps in CogView4-Control. (#11125)
  • @toshas
    • Marigold Update: v1-1 models, Intrinsic Image Decomposition pipeline, documentation (#10884)
  • @bubbliiiing
    • Add EasyAnimateV5.1 text-to-video, image-to-video, control-to-video generation model (#10626)
  • @LittleNyima
    • Add CogVideoX DDIM Inversion to Community Pipelines (#10956)
  • @kinam0252
    • Add STG to community pipelines (#10960)
  • @tolgacangoz
    • [Research Project] Add AnyText: Multilingual Visual Text Generation And Editing (#8998)
    • Update README and example code for AnyText usage (#11028)
    • [LTX0.9.5] Refactor LTXConditionPipeline for text-only conditioning (#11174)
  • @Ednaordinary
    • Add Wan with STG as a community pipeline (#11184)

- Python
Published by sayakpaul 11 months ago

diffusers - v0.32.2

Fixes for Flux Single File loading, LoRA loading for 4bit BnB Flux, Hunyuan Video

This patch release

  • Fixes a regression in loading Comfy UI format single file checkpoints for Flux
  • Fixes a regression in loading LoRAs with bitsandbytes 4bit quantized Flux models
  • Adds unload_lora_weights for Flux Control
  • Fixes a bug that prevents Hunyuan Video from running with batch size > 1
  • Allow Hunyuan Video to load LoRAs created from the original repository code

All commits

  • [Single File] Fix loading Flux Dev finetunes with Comfy Prefix by @DN6 in #10545
  • [CI] Update HF Token on Fast GPU Model Tests by @DN6 #10570
  • [CI] Update HF Token in Fast GPU Tests by @DN6 #10568
  • Fix batch > 1 in HunyuanVideo by @hlky in #10548
  • Fix HunyuanVideo produces NaN on PyTorch<2.5 by @hlky in #10482
  • Fix hunyuan video attention mask dim by @a-r-r-o-w in #10454
  • [LoRA] Support original format loras for HunyuanVideo by @a-r-r-o-w in #10376
  • [LoRA] feat: support loading loras into 4bit quantized Flux models. by @sayakpaul in #10578
  • [LoRA] clean up load_lora_into_text_encoder() and fuse_lora() copied from by @sayakpaul in #10495
  • [LoRA] feat: support unload_lora_weights() for Flux Control. by @sayakpaul in #10206
  • Fix Flux multiple Lora loading bug by @maxs-kan in #10388
  • [LoRA] fix: lora unloading when using expanded Flux LoRAs. by @sayakpaul in #10397

- Python
Published by DN6 about 1 year ago

diffusers - v0.32.1

TorchAO Quantizer fixes

This patch release fixes a few bugs related to the TorchAO Quantizer introduced in v0.32.0.

  • Importing Diffusers would raise an error in PyTorch versions lower than 2.3.0. This should no longer be a problem.
  • Device Map does not work as expected when using the quantizer. We now raise an error if it is used. Support for using device maps with different quantization backends will be added in the near future.
  • Quantization was not performed due to faulty logic. This is now fixed and better tested.

Refer to our documentation to learn more about how to use different quantization backends.

All commits

  • make style for https://github.com/huggingface/diffusers/pull/10368 by @yiyixuxu in #10370
  • fix test pypi installation in the release workflow by @sayakpaul in #10360
  • Fix TorchAO related bugs; revert device_map changes by @a-r-r-o-w in #10371

- Python
Published by a-r-r-o-w about 1 year ago

diffusers - Diffusers 0.32.0: New video pipelines, new image pipelines, new quantization backends, new training scripts, and more

https://github.com/user-attachments/assets/34d5f7ca-8e33-4401-8109-5c245ce7595f

This release took a while, but it has many exciting updates. It contains several new pipelines for image and video generation, new quantization backends, and more.

Going forward, to provide more transparency to the community about ongoing developments and releases in Diffusers, we will be making use of a roadmap tracker.

New Video Generation Pipelines 📹

Open video generation models are on the rise, and we’re pleased to provide comprehensive integration support for all of them. The following video pipelines are bundled in this release:

Check out this section to learn more about the fine-tuning options available for these new video models.

New Image Generation Pipelines

Important Note about the new Flux Models

We can combine the regular Flux.1 Dev LoRAs with Flux Control LoRAs, Flux Control, and Flux Fill. For example, you can enable few-steps inference with Flux Fill using:

```python from diffusers import FluxFillPipeline from diffusers.utils import load_image import torch

pipe = FluxFillPipeline.frompretrained( "black-forest-labs/FLUX.1-Fill-dev", torchdtype=torch.bfloat16 ).to("cuda")

adapterid = "alimama-creative/FLUX.1-Turbo-Alpha" pipe.loadloraweights(adapterid)

image = loadimage("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png") mask = loadimage("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")

image = pipe( prompt="a white paper cup", image=image, maskimage=mask, height=1632, width=1232, guidancescale=30, numinferencesteps=8, maxsequencelength=512, generator=torch.Generator("cpu").manual_seed(0) ).images[0] image.save("flux-fill-dev.png") ```

To learn more, check out the documentation.

[!NOTE]
SANA is a small model compared to other models like Flux and Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. We support LoRA fine-tuning of SANA. Check out this section for more details.

Acknowledgements

  • Shoutout to @lawrence-cj and @chenjy2003 for contributing SANA in this PR. SANA also features a Deep Compression Autoencoder, which was contributed by @lawrence-cj in this PR.
  • Shoutout to @guiyrt for contributing SD3.5 IP Adapter in this PR.

New Quantization Backends

Please be aware of the following caveats:

  • TorchAO quantized checkpoints cannot be serialized in safetensors currently. This may change in the future.
  • GGUF currently only supports loading pre-quantized checkpoints into models in this release. Support for saving models with GGUF quantization will be added in the future.

New training scripts

This release features many new training scripts for the community to play:

All commits

  • post-release 0.31.0 by @sayakpaul in #9742
  • fix bug in require_accelerate_version_greater by @faaany in #9746
  • [Official callbacks] SDXL Controlnet CFG Cutoff by @asomoza in #9311
  • [SD3-5 dreambooth lora] update model cards by @linoytsaban in #9749
  • config attribute not foud error for FluxImagetoImage Pipeline for multi controlnet solved by @rshah240 in #9586
  • Some minor updates to the nightly and push workflows by @sayakpaul in #9759
  • [Docs] fix docstring typo in SD3 pipeline by @shenzhiy21 in #9765
  • [bugfix] bugfix for npu free memory by @leisuzz in #9640
  • [research_projects] add flux training script with quantization by @sayakpaul in #9754
  • Add a doc for AWS Neuron in Diffusers by @JingyaHuang in #9766
  • [refactor] enhance readability of flux related pipelines by @Luciennnnnnn in #9711
  • Added Support of Xlabs controlnet to FluxControlNetInpaintPipeline by @SahilCarterr in #9770
  • [research_projects] Update README.md to include a note about NF5 T5-xxl by @sayakpaul in #9775
  • [Fix] traindreamboothlorafluxadvanced ValueError: unexpected save model: by @rootonchair in #9777
  • [Fix] remove setting lr for T5 text encoder when using prodigy in flux dreambooth lora script by @biswaroop1547 in #9473
  • [SD 3.5 Dreambooth LoRA] support configurable training block & layers by @linoytsaban in #9762
  • [flux dreambooth lora training] make LoRA target modules configurable + small bug fix by @linoytsaban in #9646
  • adds the pipeline for pixart alpha controlnet by @raulc0399 in #8857
  • [core] Allegro T2V by @a-r-r-o-w in #9736
  • Allegro VAE fix by @a-r-r-o-w in #9811
  • [CI] add new runner for testing by @sayakpaul in #9699
  • [training] fixes to the quantization training script and add AdEMAMix optimizer as an option by @sayakpaul in #9806
  • [training] use the lr when using 8bit adam. by @sayakpaul in #9796
  • [Tests] clean up and refactor gradient checkpointing tests by @sayakpaul in #9494
  • [CI] add a big GPU marker to run memory-intensive tests separately on CI by @sayakpaul in #9691
  • [LoRA] fix: lora loading when using with a device_mapped model. by @sayakpaul in #9449
  • Revert "[LoRA] fix: lora loading when using with a device_mapped mode… by @yiyixuxu in #9823
  • [Model Card] standardize advanced diffusion training sd15 lora by @chiral-carbon in #7613
  • NPU Adaption for FLUX by @leisuzz in #9751
  • Fixes EMAModel "from_pretrained" method by @SahilCarterr in #9779
  • Update traincontrolnetflux.py,Fix size mismatch issue in validation by @ScilenceForest in #9679
  • Handling mixed precision for dreambooth flux lora training by @icsl-Jeon in #9565
  • Reduce Memory Cost in Flux Training by @leisuzz in #9829
  • Add Diffusion Policy for Reinforcement Learning by @DorsaRoh in #9824
  • [feat] add load_lora_adapter() for compatible models by @sayakpaul in #9712
  • Refac training utils.py by @RogerSinghChugh in #9815
  • [core] Mochi T2V by @a-r-r-o-w in #9769
  • [Fix] Test of sd3 lora by @SahilCarterr in #9843
  • Fix: Remove duplicated comma in distributed_inference.md by @vahidaskari in #9868
  • Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA by @jellyheadandrew in #9228
  • Updated encodepromptwithclip and encodeprompt in traindreamboth_sd3 by @SahilCarterr in #9800
  • [Core] introduce controlnet module by @sayakpaul in #8768
  • [Flux] reduce explicit device transfers and typecasting in flux. by @sayakpaul in #9817
  • Improve downloads of sharded variants by @DN6 in #9869
  • [fix] Replaced shutil.copy with shutil.copyfile by @SahilCarterr in #9885
  • Enabling gradient checkpointing in eval() mode by @MikeTkachuk in #9878
  • [FIX] Fix TypeError in DreamBooth SDXL when use_dora is False by @SahilCarterr in #9879
  • [Advanced LoRA v1.5] fix: gradient unscaling problem by @sayakpaul in #7018
  • Revert "[Flux] reduce explicit device transfers and typecasting in flux." by @sayakpaul in #9896
  • Feature IP Adapter Xformers Attention Processor by @elismasilva in #9881
  • Notebooks for Community Scripts Examples by @ParagEkbote in #9905
  • Fix Progress Bar Updates in SD 1.5 PAG Img2Img pipeline by @painebenjamin in #9925
  • Update pipelinefluximg2img.py by @example-git in #9928
  • add depth controlnet sd3 pre-trained checkpoints to docs by @pureexe in #9937
  • Move Wuerstchen Dreambooth to research_projects by @ParagEkbote in #9935
  • Update ip_adapter.py by @mkknightr in #8882
  • Modify applyoverlay for inpainting with paddingmask_crop (Inpainting area: "Only Masked") by @clarkkent0618 in #8793
  • Correct pipeline_output.py to the type Mochi by @twobob in #9945
  • Add all AttnProcessor classes in AttentionProcessor type by @Prgckwb in #9909
  • Fixed Nits in Docs and Example Script by @ParagEkbote in #9940
  • Add server example by @thealmightygrant in #9918
  • CogVideoX 1.5 by @zRzRzRzRzRzRzR in #9877
  • Notebooks for Community Scripts-2 by @ParagEkbote in #9952
  • [advanced flux training] bug fix + reduce memory cost as in #9829 by @linoytsaban in #9838
  • [LoRA] feat: save_lora_adapter() by @sayakpaul in #9862
  • Make CogVideoX RoPE implementation consistent by @a-r-r-o-w in #9963
  • [CI] Unpin torch<2.5 in CI by @DN6 in #9961
  • Move IP Adapter Scripts to research project by @ParagEkbote in #9960
  • add skip_layers argument to SD3 transformer model class by @bghira in #9880
  • Fix beta and exponential sigmas + add tests by @hlky in #9954
  • Flux latents fix by @DN6 in #9929
  • [LoRA] enable LoRA for Mochi-1 by @sayakpaul in #9943
  • Improve control net block index for sd3 by @linjiapro in #9758
  • Update handle single blocks on convertxlabsfluxloratodiffusers by @raulmosa in #9915
  • fix controlnet module refactor by @yiyixuxu in #9968
  • Fix prepare latent image ids and vae sample generators for flux by @a-r-r-o-w in #9981
  • [Tests] skip nan lora tests on PyTorch 2.5.1 CPU. by @sayakpaul in #9975
  • make pipelines tests device-agnostic (part1) by @faaany in #9399
  • ControlNet fromsinglefile when already converted by @hlky in #9978
  • Flux Fill, Canny, Depth, Redux by @a-r-r-o-w in #9985
  • [SD3 dreambooth lora] smol fix to checkpoint saving by @linoytsaban in #9993
  • [Docs] add: missing pipelines from the spec. by @sayakpaul in #10005
  • Add prompt about wandb in examples/dreambooth/readme. by @SkyCol in #10014
  • [docs] Fix CogVideoX table by @a-r-r-o-w in #10008
  • Notebooks for Community Scripts-3 by @ParagEkbote in #10032
  • Sd35 controlnet by @yiyixuxu in #10020
  • Add beta, exponential and karras sigmas to FlowMatchEulerDiscreteScheduler by @hlky in #10001
  • Update sdxl reference pipeline to latest sdxl pipeline by @dimitribarbot in #9938
  • [Community Pipeline] Add some feature for regional prompting pipeline by @cjkangme in #9874
  • Add sdxl controlnet reference community pipeline by @dimitribarbot in #9893
  • Change imagegenaux repository URL by @asomoza in #10048
  • make pipelines tests device-agnostic (part2) by @faaany in #9400
  • [Mochi-1] ensuring to compute the fourier features in FP32 in Mochi encoder by @sayakpaul in #10031
  • [Fix] Syntax error by @SahilCarterr in #10068
  • [CI] Add quantization by @sayakpaul in #9832
  • Add sigmas to Flux pipelines by @hlky in #10081
  • Fixed Nits in Evaluation Docs by @ParagEkbote in #10063
  • fix link in the docs by @coding-famer in #10058
  • fix offloading for sd3.5 controlnets by @yiyixuxu in #10072
  • [Single File] Fix SD3.5 single file loading by @DN6 in #10077
  • Fix num_images_per_prompt>1 with Skip Guidance Layers in StableDiffusion3Pipeline by @hlky in #10086
  • [Single File] Pass token when fetching interpreted config by @DN6 in #10082
  • Interpolate fix on cuda for large output tensors by @pcuenca in #10067
  • Convert sigmas to np.array in FlowMatch set_timesteps by @hlky in #10088
  • fix: missing AutoencoderKL lora adapter by @beniz in #9807
  • Let server decide default repo visibility by @Wauplin in #10047
  • Fix some documentation in ./src/diffusers/models/embeddings.py for demo by @DTG2005 in #9579
  • Don't stale close-to-merge by @pcuenca in #10096
  • Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG by @painebenjamin in #9932
  • Notebooks for Community Scripts-4 by @ParagEkbote in #10094
  • Fix Broken Link in Optimization Docs by @ParagEkbote in #10105
  • DPM++ third order fixes by @StAlKeR7779 in #9104
  • update by @aihao2000 in #7067
  • Avoid compiling a progress bar. by @lsb in #10098
  • [Bug fix] "previous_timestep()" in DDPM scheduling compatible with "trailing" and "linspace" options by @AnandK27 in #9384
  • Fix multi-prompt inference by @hlky in #10103
  • Test skip_guidance_layers in SD3 pipeline by @hlky in #10102
  • Use parameters + buffers when deciding upscale_dtype by @universome in #9882
  • [tests] refactor vae tests by @sayakpaul in #9808
  • add torchxla support in pipelinestable_audio.py by @ in #10109
  • Fix pipeline_stable_audio formating by @hlky in #10114
  • [bitsandbytes] allow directly CUDA placements of pipelines loaded with bnb components by @sayakpaul in #9840
  • Fix Broken Links in ReadMe by @ParagEkbote in #10117
  • Add sigmas to pipelines using FlowMatch by @hlky in #10116
  • [Flux Redux] add prompt & multiple image input by @linoytsaban in #10056
  • Fix a bug in the state dict judgment in ip_adapter.py. by @zhangp365 in #10095
  • Fix a bug for SD35 control net training and improve control net block index by @linjiapro in #10065
  • pass attn mask arg for flux by @yiyixuxu in #10122
  • [docs] loadloraadapter by @stevhliu in #10119
  • Use torch.device instead of current device index for BnB quantizer by @a-r-r-o-w in #10069
  • [Tests] fix condition argument in xfail. by @sayakpaul in #10099
  • [Tests] xfail incompatible SD configs. by @sayakpaul in #10127
  • [FIX] Bug in FluxPosEmbed by @SahilCarterr in #10115
  • [Guide] Quantize your Diffusion Models with bnb by @ariG23498 in #10012
  • Remove duplicate checks for len(generator) != batch_size when generator is a list by @a-r-r-o-w in #10134
  • [community] Load Models from Sources like Civitai into Existing Pipelines by @suzukimain in #9986
  • [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); by @lawrence-cj in #9708
  • fixed a dtype bfloat16 bug in torch_utils.py by @zhangp365 in #10125
  • [LoRA] depcrecate saveattnprocs(). by @sayakpaul in #10126
  • Update ptxla training by @entrpn in #9864
  • support sd3.5 for controlnet example by @DavyMorgan in #9860
  • [Single file] Support revision argument when loading single file config by @a-r-r-o-w in #10168
  • [community pipeline] Add RF-inversion Flux pipeline by @linoytsaban in #9816
  • Improve post-processing performance by @soof-golan in #10170
  • Use torch in get_3d_rotary_pos_embed/_allegro by @hlky in #10161
  • Flux Control LoRA by @a-r-r-o-w in #9999
  • Add PAG Support for Stable Diffusion Inpaint Pipeline by @darshil0805 in #9386
  • [community pipeline rf-inversion] - fix example in doc by @linoytsaban in #10179
  • Fix Nonetype attribute error when loading multiple Flux loras by @jonathanyin12 in #10182
  • Added Error when len(gligenimages ) is not equal to len(gligenphrases) in StableDiffusionGLIGENTextImagePipeline by @SahilCarterr in #10176
  • [Single File] Add single file support for AutoencoderDC by @DN6 in #10183
  • Add ControlNetUnion by @hlky in #10131
  • fix min-snr implementation by @ethansmith2000 in #8466
  • Add support for XFormers in SD3 by @CanvaChen in #8583
  • [LoRA] add a test to ensure set_adapters() and attn kwargs outs match by @sayakpaul in #10110
  • [CI] merge peft pr workflow into the main pr workflow. by @sayakpaul in #10042
  • [WIP][Training] Flux Control LoRA training script by @sayakpaul in #10130
  • [core] LTX Video by @a-r-r-o-w in #10021
  • Ci update tpu by @paulinebm in #10197
  • Remove negative_* from SDXL callback by @hlky in #10203
  • refactor StableDiffusionXLControlNetUnion by @hlky in #10200
  • update StableDiffusion3Img2ImgPipeline.add image size validation by @ZHJ19970917 in #10166
  • Remove mps workaround for fp16 GELU, which is now supported natively by @skotapati in #10133
  • [RF inversion community pipeline] add eta_decay by @linoytsaban in #10199
  • Allow image resolutions multiple of 8 instead of 64 in SVD pipeline by @mlfarinha in #6646
  • Use torch in get_2d_sincos_pos_embed and get_3d_sincos_pos_embed by @hlky in #10156
  • add reshape to fix usememoryefficient_attention in flax by @entrpn in #7918
  • Add offload option in flux-control training by @Adenialzz in #10225
  • Test error raised when loading normal and expanding loras together in Flux by @a-r-r-o-w in #10188
  • [Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. by @lawrence-cj in #9982
  • [Tests] update always test pipelines list. by @sayakpaul in #10143
  • Update sana.md with minor corrections by @sayakpaul in #10232
  • [docs] minor stuff to ltx video docs. by @sayakpaul in #10229
  • Fix format issue in push_test yml by @DN6 in #10235
  • [core] Hunyuan Video by @a-r-r-o-w in #10136
  • Update pipelinecontrolnet.py add support for pytorchxla by @ in #10222
  • [Docs] add rest of the lora loader mixins to the docs. by @sayakpaul in #10230
  • Use t instead of timestep in _apply_perturbed_attention_guidance by @hlky in #10243
  • Add dynamic_shifting to SD3 by @hlky in #10236
  • Fix use_flow_sigmas by @hlky in #10242
  • Fix ControlNetUnion callbacktensor_inputs by @hlky in #10218
  • Use non-human subject in StableDiffusion3ControlNetPipeline example by @hlky in #10214
  • Add enablevaetiling to AllegroPipeline, fix example by @hlky in #10212
  • Fix checkpoint in CogView3PlusPipeline example by @hlky in #10211
  • Fix RePaint Scheduler by @hlky in #10185
  • Add ControlNetUnion to AutoPipeline from_pretrained by @hlky in #10219
  • fix downsample bug in MidResTemporalBlock1D by @holmosaint in #10250
  • [core] TorchAO Quantizer by @a-r-r-o-w in #10009
  • [docs] Add missing AttnProcessors by @stevhliu in #10246
  • [chore] add contribution note for lawrence. by @sayakpaul in #10253
  • Fix copied from comment in Mochi lora loader by @a-r-r-o-w in #10255
  • [LoRA] Support LTX Video by @a-r-r-o-w in #10228
  • [docs] Clarify dtypes for Sana by @a-r-r-o-w in #10248
  • [Single File] Add GGUF support by @DN6 in #9964
  • Fix Mochi Quality Issues by @DN6 in #10033
  • [tests] Remove/rename unsupported quantization torchao type by @a-r-r-o-w in #10263
  • [docs] delete_adapters() by @stevhliu in #10245
  • [Community Pipeline] Fix typo that cause error on regional prompting pipeline by @cjkangme in #10251
  • Add set_shift to FlowMatchEulerDiscreteScheduler by @hlky in #10269
  • [LoRA] feat: lora support for SANA. by @sayakpaul in #10234
  • [chore] fix: licensing headers in mochi and ltx by @sayakpaul in #10275
  • Use torch in get_2d_rotary_pos_embed by @hlky in #10155
  • [chore] fix: reamde -> readme by @sayakpaul in #10276
  • Make time_embed_dim of UNet2DModel changeable by @Bichidian in #10262
  • Support pass kwargs to sd3 custom attention processor by @Matrix53 in #9818
  • Flux Control(Depth/Canny) + Inpaint by @affromero in #10192
  • Fix sigmalast with useflow_sigmas by @hlky in #10267
  • Fix Doc links in GGUF and Quantization overview docs by @DN6 in #10279
  • Make zeroing prompt embeds for Mochi Pipeline configurable by @DN6 in #10284
  • [Single File] Add single file support for Flux Canny, Depth and Fill by @DN6 in #10288
  • [tests] Fix broken cuda, nightly and lora tests on main for CogVideoX by @a-r-r-o-w in #10270
  • Rename Mochi integration test correctly by @a-r-r-o-w in #10220
  • [tests] remove nullop import checks from lora tests by @a-r-r-o-w in #10273
  • [chore] Update README_sana.md to update the default model by @sayakpaul in #10285
  • Hunyuan VAE tiling fixes and transformer docs by @a-r-r-o-w in #10295
  • Add Flux Control to AutoPipeline by @hlky in #10292
  • Update loraconversionutils.py by @zhaowendao30 in #9980
  • Check correct model type is passed to from_pretrained by @hlky in #10189
  • [LoRA] Support HunyuanVideo by @SHYuanBest in #10254
  • [Single File] Add single file support for Mochi Transformer by @DN6 in #10268
  • Allow Mochi Transformer to be split across multiple GPUs by @DN6 in #10300
  • Fix local_files_only for checkpoints with shards by @hlky in #10294
  • Fix failing lora tests after HunyuanVideo lora by @a-r-r-o-w in #10307
  • unet's sample_size attribute is to accept tuple(h, w) in StableDiffusionPipeline by @Foundsheep in #10181
  • Enable Gradient Checkpointing for UNet2DModel (New) by @dg845 in #7201
  • [WIP] SD3.5 IP-Adapter Pipeline Integration by @guiyrt in #9987
  • Add support for sharded models when TorchAO quantization is enabled by @a-r-r-o-w in #10256
  • Make tensors in ResNet contiguous for Hunyuan VAE by @a-r-r-o-w in #10309
  • [Single File] Add GGUF support for LTX by @DN6 in #10298
  • [LoRA] feat: support loading regular Flux LoRAs into Flux Control, and Fill by @sayakpaul in #10259
  • [Tests] add integration tests for lora expansion stuff in Flux. by @sayakpaul in #10318
  • Mochi docs by @DN6 in #9934
  • [Docs] Update ltxvideo.md to remove generator from `frompretrained()` by @sayakpaul in #10316
  • docs: fix a mistake in docstring by @Leojc in #10319
  • [BUG FIX] [Stable Audio Pipeline] Resolve torch.Tensor.newzeros() TypeError in function preparelatents caused by audiovaelength by @syntaxticsugr in #10306
  • [docs] Fix quantization links by @stevhliu in #10323
  • [Sana]add 2K related model for Sana by @lawrence-cj in #10322
  • [Docs] Update gguf.md to remove generator from the pipeline from_pretrained by @sayakpaul in #10299
  • Fix pushtestsmps.yml by @hlky in #10326
  • Fix EMAModel testfrompretrained by @hlky in #10325
  • Support Flux IP Adapter by @hlky in #10261
  • flux controlnet inpaint config bug by @yigitozgenc in #10291
  • Community hosted weights for diffusers format HunyuanVideo weights by @a-r-r-o-w in #10344
  • Fix enablesequentialcpuoffload in testkandinsky_combined by @hlky in #10324
  • update get_parameter_dtype by @yiyixuxu in #10342
  • [Single File] Add Single File support for HunYuan video by @DN6 in #10320
  • [Sana bug] bug fix for 2K model config by @lawrence-cj in #10340
  • .from_single_file() - Add missing .shape by @gau-nernst in #10332
  • Bump minimum TorchAO version to 0.7.0 by @a-r-r-o-w in #10293
  • [docs] fix: torchao example. by @sayakpaul in #10278
  • [tests] Refactor TorchAO serialization fast tests by @a-r-r-o-w in #10271
  • [SANA LoRA] sana lora training tests and misc. by @sayakpaul in #10296
  • [Single File] Fix loading by @DN6 in #10349
  • [Tests] QoL improvements to the LoRA test suite by @sayakpaul in #10304
  • Fix FluxIPAdapterTesterMixin by @hlky in #10354
  • Fix failing CogVideoX LoRA fuse test by @a-r-r-o-w in #10352
  • Rename LTX blocks and docs title by @a-r-r-o-w in #10213
  • [LoRA] test fix by @sayakpaul in #10351
  • [Tests] Fix more tests sayak by @sayakpaul in #10359
  • [core] LTX Video 0.9.1 by @a-r-r-o-w in #10330
  • Release: v0.32.0 by @sayakpaul (direct commit on v0.32.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @faaany
    • fix bug in require_accelerate_version_greater (#9746)
    • make pipelines tests device-agnostic (part1) (#9399)
    • make pipelines tests device-agnostic (part2) (#9400)
  • @linoytsaban
    • [SD3-5 dreambooth lora] update model cards (#9749)
    • [SD 3.5 Dreambooth LoRA] support configurable training block & layers (#9762)
    • [flux dreambooth lora training] make LoRA target modules configurable + small bug fix (#9646)
    • [advanced flux training] bug fix + reduce memory cost as in #9829 (#9838)
    • [SD3 dreambooth lora] smol fix to checkpoint saving (#9993)
    • [Flux Redux] add prompt & multiple image input (#10056)
    • [community pipeline] Add RF-inversion Flux pipeline (#9816)
    • [community pipeline rf-inversion] - fix example in doc (#10179)
    • [RF inversion community pipeline] add eta_decay (#10199)
  • @raulc0399
    • adds the pipeline for pixart alpha controlnet (#8857)
  • @yiyixuxu
    • Revert "[LoRA] fix: lora loading when using with a device_mapped mode… (#9823)
    • fix controlnet module refactor (#9968)
    • Sd35 controlnet (#10020)
    • fix offloading for sd3.5 controlnets (#10072)
    • pass attn mask arg for flux (#10122)
    • update get_parameter_dtype (#10342)
  • @jellyheadandrew
    • Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA (#9228)
  • @DN6
    • Improve downloads of sharded variants (#9869)
    • [CI] Unpin torch<2.5 in CI (#9961)
    • Flux latents fix (#9929)
    • [Single File] Fix SD3.5 single file loading (#10077)
    • [Single File] Pass token when fetching interpreted config (#10082)
    • [Single File] Add single file support for AutoencoderDC (#10183)
    • Fix format issue in push_test yml (#10235)
    • [Single File] Add GGUF support (#9964)
    • Fix Mochi Quality Issues (#10033)
    • Fix Doc links in GGUF and Quantization overview docs (#10279)
    • Make zeroing prompt embeds for Mochi Pipeline configurable (#10284)
    • [Single File] Add single file support for Flux Canny, Depth and Fill (#10288)
    • [Single File] Add single file support for Mochi Transformer (#10268)
    • Allow Mochi Transformer to be split across multiple GPUs (#10300)
    • [Single File] Add GGUF support for LTX (#10298)
    • Mochi docs (#9934)
    • [Single File] Add Single File support for HunYuan video (#10320)
    • [Single File] Fix loading (#10349)
  • @ParagEkbote
    • Notebooks for Community Scripts Examples (#9905)
    • Move Wuerstchen Dreambooth to research_projects (#9935)
    • Fixed Nits in Docs and Example Script (#9940)
    • Notebooks for Community Scripts-2 (#9952)
    • Move IP Adapter Scripts to research project (#9960)
    • Notebooks for Community Scripts-3 (#10032)
    • Fixed Nits in Evaluation Docs (#10063)
    • Notebooks for Community Scripts-4 (#10094)
    • Fix Broken Link in Optimization Docs (#10105)
    • Fix Broken Links in ReadMe (#10117)
  • @painebenjamin
    • Fix Progress Bar Updates in SD 1.5 PAG Img2Img pipeline (#9925)
    • Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)
  • @hlky
    • Fix beta and exponential sigmas + add tests (#9954)
    • ControlNet fromsinglefile when already converted (#9978)
    • Add beta, exponential and karras sigmas to FlowMatchEulerDiscreteScheduler (#10001)
    • Add sigmas to Flux pipelines (#10081)
    • Fix num_images_per_prompt>1 with Skip Guidance Layers in StableDiffusion3Pipeline (#10086)
    • Convert sigmas to np.array in FlowMatch set_timesteps (#10088)
    • Fix multi-prompt inference (#10103)
    • Test skip_guidance_layers in SD3 pipeline (#10102)
    • Fix pipeline_stable_audio formating (#10114)
    • Add sigmas to pipelines using FlowMatch (#10116)
    • Use torch in get_3d_rotary_pos_embed/_allegro (#10161)
    • Add ControlNetUnion (#10131)
    • Remove negative_* from SDXL callback (#10203)
    • refactor StableDiffusionXLControlNetUnion (#10200)
    • Use torch in get_2d_sincos_pos_embed and get_3d_sincos_pos_embed (#10156)
    • Use t instead of timestep in _apply_perturbed_attention_guidance (#10243)
    • Add dynamic_shifting to SD3 (#10236)
    • Fix use_flow_sigmas (#10242)
    • Fix ControlNetUnion callbacktensor_inputs (#10218)
    • Use non-human subject in StableDiffusion3ControlNetPipeline example (#10214)
    • Add enablevaetiling to AllegroPipeline, fix example (#10212)
    • Fix checkpoint in CogView3PlusPipeline example (#10211)
    • Fix RePaint Scheduler (#10185)
    • Add ControlNetUnion to AutoPipeline from_pretrained (#10219)
    • Add set_shift to FlowMatchEulerDiscreteScheduler (#10269)
    • Use torch in get_2d_rotary_pos_embed (#10155)
    • Fix sigmalast with useflow_sigmas (#10267)
    • Add Flux Control to AutoPipeline (#10292)
    • Check correct model type is passed to from_pretrained (#10189)
    • Fix local_files_only for checkpoints with shards (#10294)
    • Fix pushtestsmps.yml (#10326)
    • Fix EMAModel testfrompretrained (#10325)
    • Support Flux IP Adapter (#10261)
    • Fix enablesequentialcpuoffload in testkandinsky_combined (#10324)
    • Fix FluxIPAdapterTesterMixin (#10354)
  • @dimitribarbot
    • Update sdxl reference pipeline to latest sdxl pipeline (#9938)
    • Add sdxl controlnet reference community pipeline (#9893)
  • @suzukimain
    • [community] Load Models from Sources like Civitai into Existing Pipelines (#9986)
  • @lawrence-cj
    • [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); (#9708)
    • [Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. (#9982)
    • [Sana]add 2K related model for Sana (#10322)
    • [Sana bug] bug fix for 2K model config (#10340)
  • @darshil0805
    • Add PAG Support for Stable Diffusion Inpaint Pipeline (#9386)
  • @affromero
    • Flux Control(Depth/Canny) + Inpaint (#10192)
  • @SHYuanBest
    • [LoRA] Support HunyuanVideo (#10254)
  • @guiyrt
    • [WIP] SD3.5 IP-Adapter Pipeline Integration (#9987)

- Python
Published by sayakpaul about 1 year ago

diffusers - v0.31.0

v0.31.0: Stable Diffusion 3.5 Large, CogView3, Quantization, Training Scripts, and more

Stable Diffusion 3.5 Large

Stability AI’s latest text-to-image generation model is Stable Diffusion 3.5 Large. SD3.5 Large is the next iteration of Stable Diffusion 3. It comes with two checkpoints (both of which have 8B params):

  • A regular one
  • A timestep-distilled one enabling few-step inference

Make sure to fill up the form by going to the model page, and then run huggingface-cli login before running the code below.

```python

make sure to update diffusers

pip install -U diffusers

import torch from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.frompretrained( "stabilityai/stable-diffusion-3.5-large", torchdtype=torch.bfloat16 ).to("cuda")

image = pipe( prompt="a photo of a cat holding a sign that says hello world", negativeprompt="", numinferencesteps=40, height=1024, width=1024, guidancescale=4.5, ).images[0]

image.save("sd3helloworld.png") ```

Follow the documentation to know more.

Cogview3-plus

We added a new text-to-image model, Cogview3-plus, from the THUDM team! The model is DiT-based and supports image generation from 512 to 2048px. Thanks to @zRzRzRzRzRzRzR for contributing it!

```python from diffusers import CogView3PlusPipeline import torch

pipe = CogView3PlusPipeline.frompretrained("THUDM/CogView3-Plus-3B", torchdtype=torch.float16).to("cuda")

Enable it to reduce GPU memory usage

pipe.enablemodelcpuoffload() pipe.vae.enableslicing() pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."

image = pipe( prompt=prompt, guidancescale=7.0, numimagesperprompt=1, numinferencesteps=50, width=1024, height=1024, ).images[0]

image.save("cogview3.png") ```

Refer to the documentation to know more.

Quantization

We have landed native quantization support in Diffusers, starting with bitsandbytes as its first quantization backend. With this, we hope to see large diffusion models becoming much more accessible to run on consumer hardware.

The example below shows how to run Flux.1 Dev with the NF4 data-type. Make sure you install the libraries:

bash pip install -Uq git+https://github.com/huggingface/transformers@main pip install -Uq bitsandbytes pip install -Uq diffusers

```python from diffusers import BitsAndBytesConfig, FluxTransformer2DModel import torch

ckptid = "black-forest-labs/FLUX.1-dev" nf4config = BitsAndBytesConfig( loadin4bit=True, bnb4bitquanttype="nf4", bnb4bitcomputedtype=torch.bfloat16 ) modelnf4 = FluxTransformer2DModel.frompretrained( ckptid, subfolder="transformer", quantizationconfig=nf4config, torchdtype=torch.bfloat16 ) ```

Then, we use model_nf4 to instantiate the FluxPipeline:

```python

from diffusers import FluxPipeline

pipeline = StableDiffusion3Pipeline.frompretrained( ckptid, transformer=modelnf4, torchdtype=torch.bfloat16 ) pipeline.enablemodelcpu_offload()

prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"

image = pipeline( prompt=prompt, negativeprompt="", numinferencesteps=50, guidancescale=4.5, maxsequencelength=512, ).images[0] image.save("whimsical.png") ```

Follow the documentation here to know more. Additionally, check out this Colab Notebook that runs Flux.1 Dev in an end-to-end manner with NF4 quantization.

Training scripts

We have a fresh bucket of training scripts with this release:

Video model fine-tuning can be quite expensive. So, we have worked on a repository, cogvideox-factory, which provides memory-optimized scripts to fine-tune the Cog family of models.

Misc

  • We now support the loading of different kinds of Flux LoRAs, including Kohya, TheLastBen, and Xlabs.
  • Loading of Xlabs Flux ControlNets is also now supported. Thanks to @Anghellia for contributing it!

All commits

  • Feature flux controlnet img2img and inpaint pipeline by @ighoshsubho in #9408
  • Remove CogVideoX mentions from single file docs; Test updates by @a-r-r-o-w in #9444
  • set maxshardsize to None for pipeline save_pretrained by @a-r-r-o-w in #9447
  • adapt masked im2im pipeline for SDXL by @noskill in #7790
  • [Flux] add lora integration tests. by @sayakpaul in #9353
  • [training] CogVideoX Lora by @a-r-r-o-w in #9302
  • Several fixes to Flux ControlNet pipelines by @vladmandic in #9472
  • [refactor] LoRA tests by @a-r-r-o-w in #9481
  • [CI] fix nightly model tests by @sayakpaul in #9483
  • [Cog] some minor fixes and nits by @sayakpaul in #9466
  • [Tests] Reduce the model size in the lumina test by @saqlain2204 in #8985
  • Fix the bug of sd3 controlnet training when using gradient checkpointing. by @pibbo88 in #9498
  • [Schedulers] Add exponential sigmas / exponential noise schedule by @hlky in #9499
  • Allow DDPMPipeline half precision by @sbinnee in #9222
  • Add Noise Schedule/Schedule Type to Schedulers Overview documentation by @hlky in #9504
  • fix bugs for sd3 controlnet training by @xduzhangjiayu in #9489
  • [Doc] Fix path and and also import imageio by @LukeLIN-web in #9506
  • [CI] allow faster downloads from the Hub in CI. by @sayakpaul in #9478
  • a few fix for SingleFile tests by @yiyixuxu in #9522
  • Add exponential sigmas to other schedulers and update docs by @hlky in #9518
  • [Community Pipeline] Batched implementation of Flux with CFG by @sayakpaul in #9513
  • Update community_projects.md by @lee101 in #9266
  • [docs] Model sharding by @stevhliu in #9521
  • update getparameterdtype by @yiyixuxu in #9526
  • [Doc] Improved level of clarity for latentstorgb. by @LagPixelLOL in #9529
  • [Schedulers] Add beta sigmas / beta noise schedule by @hlky in #9509
  • flux controlnet fix (control_modes batch & others) by @yiyixuxu in #9507
  • [Tests] Fix ChatGLMTokenizer by @asomoza in #9536
  • [bug] Precedence of operations in VAE should be slicing -> tiling by @a-r-r-o-w in #9342
  • [LoRA] make set_adapters() method more robust. by @sayakpaul in #9535
  • [examples] add train flux-controlnet scripts in example. by @PromeAIpro in #9324
  • [Tests] [LoRA] clean up the serialization stuff. by @sayakpaul in #9512
  • [Core] fix variant-identification. by @sayakpaul in #9253
  • [refactor] remove conv_cache from CogVideoX VAE by @a-r-r-o-w in #9524
  • [traininstructpix2pix.py]Fix the LR schedulers when num_train_epochs is passed in a distributed training env by @AnandK27 in #9316
  • [chore] fix: retain memory utility. by @sayakpaul in #9543
  • [LoRA] support Kohya Flux LoRAs that have text encoders as well by @sayakpaul in #9542
  • Add beta sigmas to other schedulers and update docs by @hlky in #9538
  • Add PAG support to StableDiffusionControlNetPAGInpaintPipeline by @juancopi81 in #8875
  • Support bfloat16 for Upsample2D by @darhsu in #9480
  • fix cogvideox autoencoder decode by @Xiang-cd in #9569
  • [sd3] make sure height and size are divisible by 16 by @yiyixuxu in #9573
  • fix xlabs FLUX lora conversion typo by @Clement-Lelievre in #9581
  • [Chore] add a note on the versions in Flux LoRA integration tests by @sayakpaul in #9598
  • fix vae dtype when accelerate config using --mixed_precision="fp16" by @xduzhangjiayu in #9601
  • refac: docstrings in import_utils.py by @yijun-lee in #9583
  • Fix for use_safetensors parameters, allow use of parameter on loading submodels by @elismasilva in #9576)
  • Update distributedinference.md to include `transformer.devicemap` by @sayakpaul in #9553
  • fix: CogVideox train dataset preprocessdata crop video by @glide-the in #9574
  • [LoRA] Handle DoRA better by @sayakpaul in #9547
  • Fixed noisepredtext referenced before assignment. by @LagPixelLOL in #9537
  • Fix the bug that joint_attention_kwargs is not passed to the FLUX's transformer attention processors by @HorizonWind2004 in #9517
  • refac/pipeline_output by @yijun-lee in #9582
  • [LoRA] allow loras to be loaded with lowcpumem_usage. by @sayakpaul in #9510
  • add PAG support for SD Img2Img by @SahilCarterr in #9463
  • make controlnet support interrupt by @pureexe in #9620
  • [LoRA] fix dora test to catch the warning properly. by @sayakpaul in #9627
  • flux controlnet controlguidancestart and controlguidanceend implement by @ighoshsubho in #9571
  • fix IsADirectoryError when running the training code for sd3dreamboothlora_16gb.ipynb by @alaister123 in #9634
  • Add Differential Diffusion to Kolors by @saqlain2204 in #9423
  • FluxMultiControlNetModel by @hlky in #9647
  • [CI] replace ubuntu version to 22.04. by @sayakpaul in #9656
  • [docs] Fix xDiT doc image damage by @Eigensystem in #9655
  • [Tests] increase transformers version in test_low_cpu_mem_usage_with_loading by @sayakpaul in #9662
  • Flux - soft inpainting via differential diffusion by @ryanlyn in #9268
  • CogView3Plus DiT by @zRzRzRzRzRzRzR in #9570
  • Improve the performance and suitable for NPU computing by @leisuzz in #9642
  • [Community Pipeline] Add 🪆Matryoshka Diffusion Models by @tolgacangoz in #9157
  • Added Lora Support to SD3 Img2Img Pipeline by @SahilCarterr in #9659
  • Add predoriginalsample to if not return_dict path by @hlky in #9649
  • Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel by @hlky in #9652
  • Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel by @hlky in #9651
  • Refactor SchedulerOutput and add predoriginalsample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 by @hlky in #9650
  • Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral by @hlky in #9616
  • [Fix] when run load pretain with localfilesonly, local variable 'cached_folder' referenced before assignment by @RobinXL in #9376
  • [Chore] fix import of EntryNotFoundError. by @sayakpaul in #9676
  • Dreambooth lora flux bug 3dtensor to 2dtensor by @0x-74 in #9653
  • refactor image_processor.py file by @charchit7 in #9608
  • [doc] Fix some docstrings in src/diffusers/training_utils.py by @mreraser in #9606
  • [docs] refactoring docstrings in community/hd_painter.py by @Jwaminju in #9593
  • [docs] refactoring docstrings in models/embeddings_flax.py by @Jwaminju in #9592
  • Fix some documentation in ./src/diffusers/models/adapter.py by @ahnjj in #9591
  • [training] CogVideoX-I2V LoRA by @a-r-r-o-w in #9482
  • [authored by @Anghellia) Add support of Xlabs Controlnets #9638 by @yiyixuxu in #9687
  • Docs: CogVideoX by @glide-the in #9578
  • Resolves [BUG] 'GatheredParameters' object is not callable by @charchit7 in #9614
  • [LoRA] log a warning when there are missing keys in the LoRA loading. by @sayakpaul in #9622
  • [SD3 dreambooth-lora training] small updates + bug fixes by @linoytsaban in #9682
  • [peft] simple update when unscale by @sweetcocoa in #9689
  • [pipeline] CogVideoX-Fun Control by @a-r-r-o-w in #9671
  • [core] improve VAE encode/decode framewise batching by @a-r-r-o-w in #9684
  • [tests] fix name and unskip CogI2V integration test by @a-r-r-o-w in #9683
  • [Flux] Add advanced training script + support textual inversion inference by @linoytsaban in #9434
  • [refactor] DiffusionPipeline.download by @a-r-r-o-w in #9557
  • [advanced flux lora script] minor updates to readme by @linoytsaban in #9705
  • Fix bug in Textual Inversion Unloading by @bonlime in #9304
  • Add prompt scheduling callback to community scripts by @hlky in #9718
  • [CI] pin max torch version to fix CI errors by @a-r-r-o-w in #9709
  • [Docker] pin torch versions in the dockerfiles. by @sayakpaul in #9721
  • make deps_table_update to fix CI tests by @a-r-r-o-w in #9720
  • [Quantization] Add quantization support for bitsandbytes by @sayakpaul in #9213
  • Fix typo in cogvideo pipeline by @lichenyu20 in #9722
  • [Docs] docs to xlabs controlnets. by @sayakpaul in #9688
  • [docs] add docstrings in pipline_stable_diffusion.py by @jeongiin in #9590
  • minor doc/test update by @yiyixuxu in #9734
  • [bugfix] reduce float value error when adding noise by @gameofdimension in #9004
  • fix singlestep dpm tests by @yiyixuxu in #9716
  • Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models by @tolgacangoz in #9723
  • Update sd3 controlnet example by @DavyMorgan in #9735
  • [Fix] Using sharded checkpoints with gated repositories by @asomoza in #9737
  • [bitsandbbytes] follow-ups by @sayakpaul in #9730
  • Fix typos by @DN6 in #9739
  • issafetensorscompatible fix by @DN6 in #9741
  • Release: v0.31.0 by @sayakpaul (direct commit on v0.31.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @ighoshsubho
    • Feature flux controlnet img2img and inpaint pipeline (#9408)
    • flux controlnet controlguidancestart and controlguidanceend implement (#9571)
  • @noskill
    • adapt masked im2im pipeline for SDXL (#7790)
  • @saqlain2204
    • [Tests] Reduce the model size in the lumina test (#8985)
    • Add Differential Diffusion to Kolors (#9423)
  • @hlky
    • [Schedulers] Add exponential sigmas / exponential noise schedule (#9499)
    • Add Noise Schedule/Schedule Type to Schedulers Overview documentation (#9504)
    • Add exponential sigmas to other schedulers and update docs (#9518)
    • [Schedulers] Add beta sigmas / beta noise schedule (#9509)
    • Add beta sigmas to other schedulers and update docs (#9538)
    • FluxMultiControlNetModel (#9647)
    • Add predoriginalsample to if not return_dict path (#9649)
    • Convert list/tuple of SD3ControlNetModel to SD3MultiControlNetModel (#9652)
    • Convert list/tuple of HunyuanDiT2DControlNetModel to HunyuanDiT2DMultiControlNetModel (#9651)
    • Refactor SchedulerOutput and add predoriginalsample in DPMSolverSDE, Heun, KDPM2Ancestral and KDPM2 (#9650)
    • Slight performance improvement to Euler, EDMEuler, FlowMatchHeun, KDPM2Ancestral (#9616)
    • Add prompt scheduling callback to community scripts (#9718)
  • @yiyixuxu
    • a few fix for SingleFile tests (#9522)
    • update getparameterdtype (#9526)
    • flux controlnet fix (control_modes batch & others) (#9507)
    • [sd3] make sure height and size are divisible by 16 (#9573)
    • [authored by @Anghellia) Add support of Xlabs Controlnets #9638 (#9687)
    • minor doc/test update (#9734)
    • fix singlestep dpm tests (#9716)
  • @PromeAIpro
    • [examples] add train flux-controlnet scripts in example. (#9324)
  • @juancopi81
    • Add PAG support to StableDiffusionControlNetPAGInpaintPipeline (#8875)
  • @glide-the
    • fix: CogVideox train dataset preprocessdata crop video (#9574)
    • Docs: CogVideoX (#9578)
  • @SahilCarterr
    • add PAG support for SD Img2Img (#9463)
    • Added Lora Support to SD3 Img2Img Pipeline (#9659)
  • @ryanlyn
    • Flux - soft inpainting via differential diffusion (#9268)
  • @zRzRzRzRzRzRzR
    • CogView3Plus DiT (#9570)
  • @tolgacangoz
    • [Community Pipeline] Add 🪆Matryoshka Diffusion Models (#9157)
    • Fix schedule_shifted_power usage in 🪆Matryoshka Diffusion Models (#9723)
  • @linoytsaban
    • [SD3 dreambooth-lora training] small updates + bug fixes (#9682)
    • [Flux] Add advanced training script + support textual inversion inference (#9434)
    • [advanced flux lora script] minor updates to readme (#9705)

- Python
Published by sayakpaul over 1 year ago

diffusers - v0.30.3: CogVideoX Image-to-Video and Video-to-Video

This patch release adds Diffusers support for the upcoming CogVideoX-5B-I2V release (an Image-to-Video generation model)! The model weights will be available by end of the week on the HF Hub at THUDM/CogVideoX-5b-I2V (Link). Stay tuned for the release!

This release features two new pipelines:

  • CogVideoXImageToVideoPipeline
  • CogVideoXVideoToVideoPipeline

Additionally, we now have support for tiled encoding in the CogVideoX VAE. This can be enabled by calling the vae.enable_tiling() method, and it is used in the new Video-to-Video pipeline to encode sample videos to latents in a memory-efficient manner.

CogVideoXImageToVideoPipeline

The code below demonstrates how to use the new image-to-video pipeline:

```python import torch from diffusers import CogVideoXImageToVideoPipeline from diffusers.utils import exporttovideo, load_image

pipe = CogVideoXImageToVideoPipeline.frompretrained("THUDM/CogVideoX-5b-I2V", torchdtype=torch.bfloat16) pipe.to("cuda")

Optionally, enable memory optimizations.

If enabling CPU offloading, remember to remove pipe.to("cuda") above

pipe.enablemodelcpuoffload() pipe.vae.enabletiling()

prompt = "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." image = loadimage( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg" ) video = pipe(image, prompt, usedynamiccfg=True) exportto_video(video.frames[0], "output.mp4", fps=8) ```

CogVideoXVideoToVideoPipeline

The code below demonstrates how to use the new video-to-video pipeline:

```python import torch from diffusers import CogVideoXDPMScheduler, CogVideoXVideoToVideoPipeline from diffusers.utils import exporttovideo, load_video

Models: "THUDM/CogVideoX-2b" or "THUDM/CogVideoX-5b"

pipe = CogVideoXVideoToVideoPipeline.frompretrained("THUDM/CogVideoX-5b-trial", torchdtype=torch.bfloat16) pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config) pipe.to("cuda")

inputvideo = loadvideo( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4" ) prompt = ( "An astronaut stands triumphantly at the peak of a towering mountain. Panorama of rugged peaks and " "valleys. Very futuristic vibe and animated aesthetic. Highlights of purple and golden colors in " "the scene. The sky is looks like an animated/cartoonish dream of galaxies, nebulae, stars, planets, " "moons, but the remainder of the scene is mostly realistic." )

video = pipe( video=inputvideo, prompt=prompt, strength=0.8, guidancescale=6, numinferencesteps=50 ).frames[0] exporttovideo(video, "output.mp4", fps=8) ```

Shoutout to @tin2tin for the awesome demonstration!

Refer to our documentation to learn more about it.

All commits

  • [core] Support VideoToVideo with CogVideoX by @a-r-r-o-w in #9333
  • [core] CogVideoX memory optimizations in VAE encode by @a-r-r-o-w in #9340
  • [CI] Quick fix for Cog Video Test by @DN6 in #9373
  • [refactor] move positional embeddings to patch embed layer for CogVideoX by @a-r-r-o-w in #9263
  • CogVideoX-5b-I2V support by @zRzRzRzRzRzRzR in #9418

- Python
Published by a-r-r-o-w over 1 year ago

diffusers - v0.30.2: Update from single file default repository

All commits

  • update runway repo for single_file by @yiyixuxu in #9323
  • Fix Flux CLIP prompt embeds repeat for numimagesper_prompt > 1 by @DN6 in #9280
  • [IP Adapter] Fix cachedir and localfiles_only for image encoder by @asomoza in #9272

- Python
Published by asomoza over 1 year ago

diffusers - V0.30.1: CogVideoX-5B & Bug fixes

CogVideoX-5B

This patch release adds diffusers support for the upcoming CogVideoX-5B release! The model weights will be available next week on the Huggingface Hub at THUDM/CogVideoX-5b. Stay tuned for the release!

Additionally, we have implemented VAE tiling feature, which reduces the memory requirement for CogVideoX models. With this update, the total memory requirement is now 12GB for CogVideoX-2B and 21GB for CogVideoX-5B (with CPU offloading). To Enable this feature, simply call enable_tiling() on the VAE.

The code below shows how to generate a video with CogVideoX-5B

```python import torch from diffusers import CogVideoXPipeline from diffusers.utils import exporttovideo

prompt = "Tracking shot,late afternoon light casting long shadows,a cyclist in athletic gear pedaling down a scenic mountain road,winding path with trees and a lake in the background,invigorating and adventurous atmosphere."

pipe = CogVideoXPipeline.frompretrained( "THUDM/CogVideoX-5b", torchdtype=torch.bfloat16 )

pipe.enablemodelcpuoffload() pipe.vae.enabletiling()

video = pipe( prompt=prompt, numvideosperprompt=1, numinferencesteps=50, numframes=49, guidance_scale=6, ).frames[0]

exporttovideo(video, "output.mp4", fps=8) ```

https://github.com/user-attachments/assets/c2d4f7e8-ef86-4da6-8085-cb9f83f47f34

Refer to our documentation to learn more about it.

All commits

  • Update Video Loading/Export to use imageio by @DN6 in #9094
  • [refactor] CogVideoX followups + tiled decoding support by @a-r-r-o-w in #9150
  • Add Learned PE selection for Auraflow by @cloneofsimo in #9182
  • [Single File] Fix configuring scheduler via legacy kwargs by @DN6 in #9229
  • [Flux LoRA] support parsing alpha from a flux lora state dict. by @sayakpaul in #9236
  • [tests] fix broken xformers tests by @a-r-r-o-w in #9206
  • Cogvideox-5B Model adapter change by @zRzRzRzRzRzRzR in #9203
  • [Single File] Support loading Comfy UI Flux checkpoints by @DN6 in #9243

- Python
Published by yiyixuxu over 1 year ago

diffusers - v0.30.0: New Pipelines (Flux, Stable Audio, Kolors, CogVideoX, Latte, and more), New Methods (FreeNoise, SparseCtrl), and New Refactors

New pipelines

Untitled

Image taken from the Lumina’s GitHub.

This release features many new pipelines. Below, we provide a list:

Audio pipelines 🎼

Video pipelines 📹

  • Latte (thanks to @maxin-cn for the contribution through #8404)
  • CogVideoX (thanks to @zRzRzRzRzRzRzR for the contribution through #9082)

Image pipelines 🎇

Be sure to check out the respective docs to know more about these pipelines. Some additional pointers are below for curious minds:

  • Lumina introduces a new DiT architecture that is multilingual in nature.
  • Kolors is inspired by SDXL and is also multilingual in nature.
  • Flux introduces the largest (more than 12B parameters!) open-sourced DiT variant available to date. For efficient DreamBooth + LoRA training, we recommend @bghira’s guide here.
  • We have worked on a guide that shows how to quantize these large pipelines for memory efficiency with optimum.quanto. Check it out here.
  • CogVideoX introduces a novel and truly 3D VAE into Diffusers.

Perturbed Attention Guidance (PAG)

| Without PAG | With PAG | |-------------|----------| | | |

We already had community pipelines for PAG, but given its usefulness, we decided to make it a first-class citizen of the library. We have a central usage guide for PAG here, which should be the entry point for a user interested in understanding and using PAG for their use cases. We currently support the following pipelines with PAG:

  • StableDiffusionPAGPipeline
  • StableDiffusion3PAGPipeline
  • StableDiffusionControlNetPAGPipeline
  • StableDiffusionXLPAGPipeline
  • StableDiffusionXLPAGImg2ImgPipeline
  • StableDiffusionXLPAGInpaintPipeline
  • StableDiffusionXLControlNetPAGPipeline
  • StableDiffusion3PAGPipeline
  • PixArtSigmaPAGPipeline
  • HunyuanDiTPAGPipeline
  • AnimateDiffPAGPipeline
  • KolorsPAGPipeline

If you’re interested in helping us extend our PAG support for other pipelines, please check out this thread. Special thanks to Ahn Donghoon (@sunovivid), the author of PAG, for helping us with the integration and adding PAG support to SD3.

AnimateDiff with SparseCtrl

SparseCtrl introduces methods of controllability into text-to-video diffusion models leveraging signals such as line/edge sketches, depth maps, and RGB images by incorporating an additional condition encoder, inspired by ControlNet, to process these signals in the AnimateDiff framework. It can be applied to a diverse set of applications such as interpolation or video prediction (filling in the gaps between sequence of images for animation), personalized image animation, sketch-to-video, depth-to-video, and more. It was introduced in SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models.

There are two SparseCtrl-specific checkpoints and a Motion LoRA made available by the authors namely:

Scribble Interpolation Example:

Image 1 Image 2 Image 3
Image 4

```python import torch

from diffusers import AnimateDiffSparseControlNetPipeline, AutoencoderKL, MotionAdapter, SparseControlNetModel from diffusers.schedulers import DPMSolverMultistepScheduler from diffusers.utils import exporttogif, load_image

motionadapter = MotionAdapter.frompretrained("guoyww/animatediff-motion-adapter-v1-5-3", torchdtype=torch.float16).to(device) controlnet = SparseControlNetModel.frompretrained("guoyww/animatediff-sparsectrl-scribble", torchdtype=torch.float16).to(device) vae = AutoencoderKL.frompretrained("stabilityai/sd-vae-ft-mse", torchdtype=torch.float16).to(device) pipe = AnimateDiffSparseControlNetPipeline.frompretrained( "SG161222/RealisticVisionV5.1noVAE", motionadapter=motionadapter, controlnet=controlnet, vae=vae, scheduler=scheduler, torchdtype=torch.float16, ).to(device) pipe.scheduler = DPMSolverMultistepScheduler.fromconfig(pipe.scheduler.config, betaschedule="linear", algorithmtype="dpmsolver++", usekarrassigmas=True) pipe.loadloraweights("guoyww/animatediff-motion-lora-v1-5-3", adaptername="motionlora") pipe.fuselora(lora_scale=1.0)

prompt = "an aerial view of a cyberpunk city, night time, neon lights, masterpiece, high quality" negative_prompt = "low quality, worst quality, letterboxed"

imagefiles = [ "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-1.png", "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-2.png", "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-scribble-3.png" ] conditionframeindices = [0, 8, 15] conditioningframes = [loadimage(imgfile) for imgfile in imagefiles]

video = pipe( prompt=prompt, negativeprompt=negativeprompt, numinferencesteps=25, conditioningframes=conditioningframes, controlnetconditioningscale=1.0, controlnetframeindices=conditionframeindices, generator=torch.Generator().manualseed(1337), ).frames[0] exportto_gif(video, "output.gif") ```

📜 Check out the docs here.

FreeNoise for AnimateDiff

FreeNoise is a training-free method that allows extending the generative capabilities of pretrained video diffusion models beyond their existing context/frame limits.

Instead of initializing noises for all frames, FreeNoise reschedules a sequence of noises for long-range correlation and performs temporal attention over them using a window-based function. We have added FreeNoise to the AnimateDiff family of models in Diffusers, allowing them to generate videos beyond their default 32 frame limit.
 

```python import torch from diffusers import AnimateDiffPipeline, MotionAdapter, EulerAncestralDiscreteScheduler from diffusers.utils import exporttogif

adapter = MotionAdapter.frompretrained("guoyww/animatediff-motion-adapter-v1-5-2", torchdtype=torch.float16) pipe = AnimateDiffPipeline.frompretrained("SG161222/RealisticVisionV6.0B1noVAE", motionadapter=adapter, torchdtype=torch.float16) pipe.scheduler = EulerAncestralDiscreteScheduler( betaschedule="linear", betastart=0.00085, betaend=0.012, )

pipe.enablefreenoise() pipe.vae.enable_slicing()

pipe.enablemodelcpuoffload() frames = pipe( "An astronaut riding a horse on Mars.", numframes=64, numinferencesteps=20, guidancescale=7.0, decodechunk_size=2, ).frames[0]

exporttogif(frames, "freenoise-64.gif") ```

LoRA refactor

We have significantly refactored the loader classes associated with LoRA. Going forward, this will help in adding LoRA support for new pipelines and models. We now have a LoraBaseMixin class which is subclassed by the different pipeline-level LoRA loading classes such as StableDiffusionXLLoraLoaderMixin. This document provides an overview of the available classes.

Additionally, we have increased the coverage of methods within the PeftAdapterMixin class. This refactoring allows all the supported models to share common LoRA functionalities such set_adapter(), add_adapter(), and so on.

To learn more details, please follow this PR. If you see any LoRA-related issues stemming from these refactors, please open an issue.

🚨 Fixing attention projection fusion

We discovered that the implementation of fuse_qkv_projections() was broken. This was fixed in this PR. Additionally, this PR added the fusion support to AuraFlow and PixArt Sigma. A reasoning as to where this kind of fusion might be useful is available here.

All commits

  • [Release notification] add some info when there is an error. by @sayakpaul in #8718
  • Modify FlowMatch Scale Noise by @asomoza in #8678
  • Fix json WindowsPath crash by @vincedovy in #8662
  • Motion Model / Adapter versatility by @Arlaz in #8301
  • [Chore] perform better deprecation for vqmodeloutput by @sayakpaul in #8719
  • [Advanced dreambooth lora] adjustments to align with canonical script by @linoytsaban in #8406
  • [Tests] Fix precision related issues in slow pipeline tests by @DN6 in #8720
  • fix: ValueError when using FromOriginalModelMixin in subclasses #8440 by @fkcptlst in #8454
  • [Community pipeline] SD3 Differential Diffusion Img2Img Pipeline by @asomoza in #8679
  • Benchmarking workflow fix by @sayakpaul in #8389
  • add PAG support for SD architecture by @shauray8 in #8725
  • shift cache in benchmarking. by @sayakpaul in #8740
  • [traincontrolnetsdxl.py] Fix the LR schedulers when numtrainepochs is passed in a distributed training env by @Bhavay-2001 in #8476
  • fix the LR schedulers for dreambooth_lora by @WenheLI in #8510
  • [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support by @gnobitab in #8747
  • Always raise from previous error by @Wauplin in #8751
  • [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart by @yiyixuxu in #8735
  • Remove legacy single file model loading mixins by @DN6 in #8754
  • Allow from_transformer in SD3ControlNetModel by @haofanwang in #8749
  • [SD3 LoRA Training] Fix errors when not training text encoders by @asomoza in #8743
  • [Tests] add test suite for SD3 DreamBooth by @sayakpaul in #8650
  • [hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding by @yiyixuxu in #8761
  • Enforce ordering when running Pipeline slow tests by @DN6 in #8763
  • Fix warning in UNetMotionModel by @DN6 in #8756
  • Fix indent in dreambooth lora advanced SD 15 script by @DN6 in #8753
  • Fix mistake in Single File Docs page by @DN6 in #8765
  • Reflect few contributions on philosophy.md that were not reflected on #8294 by @mreraser in #8690
  • correct attention_head_dim for JointTransformerBlock by @yiyixuxu in #8608
  • [LoRA] introduce LoraBaseMixin to promote reusability. by @sayakpaul in #8670
  • Revert "[LoRA] introduce LoraBaseMixin to promote reusability." by @sayakpaul in #8773
  • Allow SD3 DreamBooth LoRA fine-tuning on a free-tier Colab by @sayakpaul in #8762
  • Update README.md to include Colab link by @sayakpaul in #8775
  • [Chore] add dummy lora attention processors to prevent failures in other libs by @sayakpaul in #8777
  • [advanced dreambooth lora] add clip_skip arg by @linoytsaban in #8715
  • [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet by @gnobitab in #8783
  • Fix minor bug in SD3 img2img test by @a-r-r-o-w in #8779
  • [Tests] fix sharding tests by @sayakpaul in #8764
  • Add vae_roundtrip.py example by @thomaseding in #7104
  • [Single File] Allow loading T5 encoder in mixed precision by @DN6 in #8778
  • Fix saving text encoder weights and kohya weights in advanced dreambooth lora script by @DN6 in #8766
  • Improve model card for push_to_hub trainers by @apolinario in #8697
  • fix loading sharded checkpoints from subfolder by @yiyixuxu in #8798
  • [Alpha-VLLM Team] Add Lumina-T2X to diffusers by @PommesPeter in #8652
  • Fix static typing and doc typos by @zhuoqun-chen in #8807
  • Remove unnecessary lines by @tolgacangoz in #8569
  • Add pipelinestablediffusion3inpaint.py for SD3 Inference by @IrohXu in #8709
  • [Tests] fix more sharding tests by @sayakpaul in #8797
  • Reformat docstring for get_timestep_embedding by @alanhdu in #8811
  • Latte: Latent Diffusion Transformer for Video Generation by @maxin-cn in #8404
  • [Core] Add Kolors by @asomoza in #8812
  • [Core] Add AuraFlow by @sayakpaul in #8796
  • Add VAE tiling option for SD3 by @DN6 in #8791
  • Add single file loading support for AnimateDiff by @DN6 in #8819
  • [Docs] add AuraFlow docs by @sayakpaul in #8851
  • [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU by @ustcuna in #8643
  • add PAG support sd15 controlnet by @tuanh123789 in #8820
  • [tests] fix typo in pag tests by @a-r-r-o-w in #8845
  • [Docker] include python3.10 dev and solve header missing problem by @sayakpaul in #8865
  • [Cont'd] Add the SDE variant of ~~DPM-Solver~~ and DPM-Solver++ to DPM Single Step by @tolgacangoz in #8269
  • modify pocs. by @sayakpaul in #8867
  • [Core] fix: shard loading and saving when variant is provided. by @sayakpaul in #8869
  • [Chore] allow auraflow latest to be torch compile compatible. by @sayakpaul in #8859
  • Add AuraFlowPipeline and KolorsPipeline to auto map by @Beinsezii in #8849
  • Fix multi-gpu case for train_cm_ct_unconditional.py by @tolgacangoz in #8653
  • [docs] pipeline docs for latte by @a-r-r-o-w in #8844
  • [Chore] add disable forward chunking to SD3 transformer. by @sayakpaul in #8838
  • [Core] remove resume_download from Hub related stuff by @sayakpaul in #8648
  • Add option to SSH into CPU runner. by @DN6 in #8884
  • SSH into cpu runner fix by @DN6 in #8888
  • SSH into cpu runner additional fix by @DN6 in #8893
  • [SDXL] Fix uncaught error with image to image by @asomoza in #8856
  • fix loop bug in SlicedAttnProcessor by @shinetzh in #8836
  • [fix code annotation] Adjust the dimensions of the rotary positional embedding. by @wangqixun in #8890
  • allow tensors in several schedulers step() call by @catwell in #8905
  • Use modelinfo.id instead of modelinfo.modelId by @Wauplin in #8912
  • [Training] SD3 training fixes by @sayakpaul in #8917
  • 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) by @Snailpong in #8804
  • [Docs] small fixes to pag guide. by @sayakpaul in #8920
  • Reflect few contributions on ethical_guidelines.md that were not reflected on #8294 by @mreraser in #8914
  • [Tests] proper skipping of request caching test by @sayakpaul in #8908
  • Add attentionless VAE support by @Gothos in #8769
  • [Benchmarking] check if runner helps to restore benchmarking by @sayakpaul in #8929
  • Update pipeline test fetcher by @DN6 in #8931
  • [Tests] reduce the model size in the audioldm2 fast test by @ariG23498 in #7846
  • fix: checkpoint save issue in advanced dreambooth lora sdxl script by @akbaig in #8926
  • [Tests] Improve transformers model test suite coverage - Temporal Transformer by @rootonchair in #8932
  • Fix Colab and Notebook checks for diffusers-cli env by @tolgacangoz in #8408
  • Fix name when saving text inversion embeddings in dreambooth advanced scripts by @DN6 in #8927
  • [Core] fix QKV fusion for attention by @sayakpaul in #8829
  • remove residual i from auraflow. by @sayakpaul in #8949
  • [CI] Skip flaky download tests in PR CI by @DN6 in #8945
  • [AuraFlow] fix long prompt handling by @sayakpaul in #8937
  • Added Code for Gradient Accumulation to work for basic_training by @RandomGamingDev in #8961
  • [AudioLDM2] Fix cache pos for GPT-2 generation by @sanchit-gandhi in #8964
  • [Tests] fix slices of 26 tests (first half) by @sayakpaul in #8959
  • [CI] Slow Test Updates by @DN6 in #8870
  • [tests] speed up animatediff tests by @a-r-r-o-w in #8846
  • [LoRA] introduce LoraBaseMixin to promote reusability. by @sayakpaul in #8774
  • Update TensorRT img2img community pipeline by @asfiyab-nvidia in #8899
  • Enable CivitAI SDXL Inpainting Models Conversion by @mazharosama in #8795
  • Revert "[LoRA] introduce LoraBaseMixin to promote reusability." by @yiyixuxu in #8976
  • fix guidance_scale value not equal to the value in comments by @efwfe in #8941
  • [Chore] remove all is from auraflow. by @sayakpaul in #8980
  • [Chore] add LoraLoaderMixin to the inits by @sayakpaul in #8981
  • Added accelerator based gradient accumulation for basic_example by @RandomGamingDev in #8966
  • [CI] Fix parallelism in nightly tests by @DN6 in #8983
  • [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix by @DN6 in #8986
  • [fix] FreeInit step index out of bounds by @a-r-r-o-w in #8969
  • [core] AnimateDiff SparseCtrl by @a-r-r-o-w in #8897
  • remove unused code from pag attn procs by @a-r-r-o-w in #8928
  • [Kolors] Add IP Adapter by @asomoza in #8901
  • [CI] Update runner configuration for setup and nightly tests by @XciD in #9005
  • [Docs] credit where it's due for Lumina and Latte. by @sayakpaul in #9000
  • handle lora scale and clip skip in lpw sd and sdxl community pipelines by @noskill in #8988
  • [LoRA] fix: animate diff lora stuff. by @sayakpaul in #8995
  • Stable Audio integration by @ylacombe in #8716
  • [core] Move community AnimateDiff ControlNet to core by @a-r-r-o-w in #8972
  • Fix Stable Audio repository id by @ylacombe in #9016
  • PAG variant for AnimateDiff by @a-r-r-o-w in #8789
  • Updates deps for pipeline test fetcher by @DN6 in #9033
  • fix load sharded checkpoint from a subfolder (local path) by @yiyixuxu in #8913
  • [docs] fix pia example by @a-r-r-o-w in #9015
  • Flux pipeline by @sayakpaul in #9043
  • [Core] Add PAG support for PixArtSigma by @sayakpaul in #8921
  • [Flux] allow tests to run by @sayakpaul in #9050
  • Fix Nightly Deps by @DN6 in #9036
  • Update transformer_flux.py by @haofanwang in #9060
  • Errata: Fix typos & \s+$ by @tolgacangoz in #9008
  • [refactor] create modeling blocks specific to AnimateDiff by @a-r-r-o-w in #8979
  • Fix grammar mistake. by @prideout in #9072
  • [Flux] minor documentation fixes for flux. by @sayakpaul in #9048
  • Update TensorRT txt2img and inpaint community pipelines by @asfiyab-nvidia in #9037
  • type get_attention_scores as optional in get_attention_scores by @psychedelicious in #9075
  • [refactor] apply qk norm in attention processors by @a-r-r-o-w in #9071
  • [FLUX] support LoRA by @sayakpaul in #9057
  • [Tests] Improve transformers model test suite coverage - Latte by @rootonchair in #8919
  • PAG variant for HunyuanDiT, PAG refactor by @a-r-r-o-w in #8936
  • [Docs] add stable cascade unet doc. by @sayakpaul in #9066
  • add sentencepiece as a soft dependency by @yiyixuxu in #9065
  • Fix typos by @omahs in #9077
  • Update CLIPFeatureExtractor to CLIPImageProcessor and DPTFeatureExtractor to DPTImageProcessor by @tolgacangoz in #9002
  • [Core] add QKV fusion to AuraFlow and PixArt Sigma by @sayakpaul in #8952
  • [bug] remove unreachable normtype=adanorm_continuous from norm3 initialization conditions by @a-r-r-o-w in #9006
  • [Tests] Improve transformers model test suite coverage - Hunyuan DiT by @rootonchair in #8916
  • update by @DN6 (direct commit on v0.30.0-release)
  • [Docs] Add community projects section to docs by @DN6 in #9013
  • add PAG support for Stable Diffusion 3 by @sunovivid in #8861
  • Fix loading sharded checkpoints when we have variants by @SunMarc in #9061
  • [Single File] Add single file support for Flux Transformer by @DN6 in #9083
  • [Kolors] Add PAG by @asomoza in #8934
  • fix traindreamboothlora_sd3.py loading hook by @sayakpaul in #9107
  • [core] FreeNoise by @a-r-r-o-w in #8948
  • Flux fp16 inference fix by @latentCall145 in #9097
  • [feat] allow sparsectrl to be loaded from single file by @a-r-r-o-w in #9073
  • Freenoise change vae_batch_size to decode_chunk_size by @DN6 in #9110
  • Add CogVideoX text-to-video generation model by @zRzRzRzRzRzRzR in #9082
  • Release: v0.30.0 by @sayakpaul (direct commit on v0.30.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @DN6
    • [Tests] Fix precision related issues in slow pipeline tests (#8720)
    • Remove legacy single file model loading mixins (#8754)
    • Enforce ordering when running Pipeline slow tests (#8763)
    • Fix warning in UNetMotionModel (#8756)
    • Fix indent in dreambooth lora advanced SD 15 script (#8753)
    • Fix mistake in Single File Docs page (#8765)
    • [Single File] Allow loading T5 encoder in mixed precision (#8778)
    • Fix saving text encoder weights and kohya weights in advanced dreambooth lora script (#8766)
    • Add VAE tiling option for SD3 (#8791)
    • Add single file loading support for AnimateDiff (#8819)
    • Add option to SSH into CPU runner. (#8884)
    • SSH into cpu runner fix (#8888)
    • SSH into cpu runner additional fix (#8893)
    • Update pipeline test fetcher (#8931)
    • Fix name when saving text inversion embeddings in dreambooth advanced scripts (#8927)
    • [CI] Skip flaky download tests in PR CI (#8945)
    • [CI] Slow Test Updates (#8870)
    • [CI] Fix parallelism in nightly tests (#8983)
    • [CI] Nightly Test Runner explicitly set runner for Setup Pipeline Matrix (#8986)
    • Updates deps for pipeline test fetcher (#9033)
    • Fix Nightly Deps (#9036)
    • update
    • [Docs] Add community projects section to docs (#9013)
    • [Single File] Add single file support for Flux Transformer (#9083)
    • Freenoise change vae_batch_size to decode_chunk_size (#9110)
  • @shauray8
    • add PAG support for SD architecture (#8725)
  • @gnobitab
    • [Tencent Hunyuan Team] Add HunyuanDiT-v1.2 Support (#8747)
    • [Tencent Hunyuan Team] Add checkpoint conversion scripts and changed controlnet (#8783)
  • @yiyixuxu
    • [doc] add a tip about using SDXL refiner with hunyuan-dit and pixart (#8735)
    • [hunyuan-dit] refactor HunyuanCombinedTimestepTextSizeStyleEmbedding (#8761)
    • correct attention_head_dim for JointTransformerBlock (#8608)
    • fix loading sharded checkpoints from subfolder (#8798)
    • Revert "[LoRA] introduce LoraBaseMixin to promote reusability." (#8976)
    • fix load sharded checkpoint from a subfolder (local path) (#8913)
    • add sentencepiece as a soft dependency (#9065)
  • @PommesPeter
    • [Alpha-VLLM Team] Add Lumina-T2X to diffusers (#8652)
  • @IrohXu
    • Add pipelinestablediffusion3inpaint.py for SD3 Inference (#8709)
  • @maxin-cn
    • Latte: Latent Diffusion Transformer for Video Generation (#8404)
  • @ustcuna
    • [Community Pipelines] Accelerate inference of AnimateDiff by IPEX on CPU (#8643)
  • @tuanh123789
    • add PAG support sd15 controlnet (#8820)
  • @Snailpong
    • 🌐 [i18n-KO] Translated docs to Korean (added 7 docs and etc) (#8804)
  • @asfiyab-nvidia
    • Update TensorRT img2img community pipeline (#8899)
    • Update TensorRT txt2img and inpaint community pipelines (#9037)
  • @ylacombe
    • Stable Audio integration (#8716)
    • Fix Stable Audio repository id (#9016)
  • @sunovivid
    • add PAG support for Stable Diffusion 3 (#8861)
  • @zRzRzRzRzRzRzR
    • Add CogVideoX text-to-video generation model (#9082)

- Python
Published by sayakpaul over 1 year ago

diffusers - v0.29.2: fix deprecation and LoRA bugs 🐞

All commits

  • [SD3] Fix mis-matched shape when numimagesperprompt > 1 using without T5 (textencoder_3=None) by @Dalanke in #8558
  • [LoRA] refactor lora conversion utility. by @sayakpaul in #8295
  • [LoRA] fix conversion utility so that lora dora loads correctly by @sayakpaul in #8688
  • [Chore] remove deprecation from transformer2d regarding the output class. by @sayakpaul in #8698
  • [LoRA] fix vanilla fine-tuned lora loading. by @sayakpaul in #8691
  • Release: v0.29.2 by @sayakpaul (direct commit on v0.29.2-patch)

- Python
Published by sayakpaul over 1 year ago

diffusers - v0.29.1: SD3 ControlNet, Expanded SD3 `from_single_file` support, Using long Prompts with T5 Text Encoder & Bug fixes

SD3 CntrolNet

image

```python import torch from diffusers import StableDiffusion3ControlNetPipeline from diffusers.models import SD3ControlNetModel, SD3MultiControlNetModel from diffusers.utils import load_image

controlnet = SD3ControlNetModel.frompretrained("InstantX/SD3-Controlnet-Canny", torchdtype=torch.float16)

pipe = StableDiffusion3ControlNetPipeline.frompretrained( "stabilityai/stable-diffusion-3-medium-diffusers", controlnet=controlnet, torchdtype=torch.float16 ) pipe.to("cuda") controlimage = loadimage("https://huggingface.co/InstantX/SD3-Controlnet-Canny/resolve/main/canny.jpg") prompt = "A girl holding a sign that says InstantX" image = pipe(prompt, controlimage=controlimage, controlnetconditioningscale=0.7).images[0] image.save("sd3.png") ``` 📜 Refer to the official docs here to learn more about it.

Thanks to @haofanwang @wangqixun from the @ResearcherXman team for contributing this pipeline!

Expanded single file support

We now support all available single-file checkpoints for sd3 in diffusers! To load the single file checkpoint with t5

```python import torch from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.fromsinglefile( "https://huggingface.co/stabilityai/stable-diffusion-3-medium/blob/main/sd3mediuminclclipst5xxlfp8.safetensors", torchdtype=torch.float16, ) pipe.enablemodelcpuoffload()

image = pipe("a picture of a cat holding a sign that says hello world").images[0] image.save('sd3-single-file-t5-fp8.png') ```

Using Long Prompts with the T5 Text Encoder

We increased the default sequence length for the T5 Text Encoder from a maximum of 77 to 256! It can be adjusted to accept fewer or more tokens by setting the max_sequence_length to a maximum of 512. Keep in mind that longer sequences require additional resources and will result in longer generation times. This effect is particularly noticeable during batch inference.

```python prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus. This imaginative creature features the distinctive, bulky body of a hippo, but with a texture and appearance resembling a golden-brown, crispy waffle. The creature might have elements like waffle squares across its skin and a syrup-like sheen. It’s set in a surreal environment that playfully combines a natural water habitat of a hippo with elements of a breakfast table setting, possibly including oversized utensils or plates in the background. The image should evoke a sense of playful absurdity and culinary fantasy."

image = pipe( prompt=prompt, negativeprompt="", numinferencesteps=28, guidancescale=4.5, maxsequencelength=512, ).images[0] ```

|Before|maxsequencelength=256|maxsequencelength=512 |---|---|---| |20240612204503_2888268196|20240612204440_2888268196|20240613195139_569754043

All commits

  • Release: v0.29.0 by @sayakpaul (direct commit on v0.29.1-patch)
  • prepare for patch release by @yiyixuxu (direct commit on v0.29.1-patch)
  • fix warning log for Transformer SD3 by @sayakpaul in #8496
  • Add SD3 AutoPipeline mappings by @Beinsezii in #8489
  • Add Hunyuan AutoPipe mapping by @Beinsezii in #8505
  • Expand Single File support in SD3 Pipeline by @DN6 in #8517
  • [Single File Loading] Handle unexpected keys in CLIP models when accelerate isn't installed. by @DN6 in #8462
  • Fix sharding when no device_map is passed by @SunMarc in #8531
  • [SD3 Inference] T5 Token limit by @asomoza in #8506
  • Fix gradient checkpointing issue for Stable Diffusion 3 by @Carolinabanana in #8542
  • Support SD3 ControlNet and Multi-ControlNet. by @wangqixun in #8566
  • fix fromsinglefile for checkpoints with t5 by @yiyixuxu in #8631
  • [SD3] Fix mis-matched shape when numimagesperprompt > 1 using without T5 (textencoder_3=None) by @Dalanke in #8558

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @wangqixun
    • Support SD3 ControlNet and Multi-ControlNet. (#8566)

- Python
Published by yiyixuxu over 1 year ago

diffusers - v0.29.0: Stable Diffusion 3

This release emphasizes Stable Diffusion 3, Stability AI’s latest iteration of the Stable Diffusion family of models. It was introduced in Scaling Rectified Flow Transformers for High-Resolution Image Synthesis by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach.

As the model is gated, before using it with diffusers, you first need to go to the Stable Diffusion 3 Medium Hugging Face page, fill in the form and accept the gate. Once you are in, you need to log in so that your system knows you’ve accepted the gate.

bash huggingface-cli login

The code below shows how to perform text-to-image generation with SD3:

```python import torch from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.frompretrained("stabilityai/stable-diffusion-3-medium-diffusers", torchdtype=torch.float16) pipe = pipe.to("cuda")

image = pipe( "A cat holding a sign that says hello world", negativeprompt="", numinferencesteps=28, guidancescale=7.0, ).images[0] image ```

image

Refer to our documentation for learning all the optimizations you can apply to SD3 as well as the image-to-image pipeline.

Additionally, we support DreamBooth + LoRA fine-tuning of Stable Diffusion 3 through rectified flow. Check out this directory for more details.

- Python
Published by sayakpaul over 1 year ago

diffusers - v0.28.2: fix `from_single_file` clip model checkpoint key error 🐞

  • Change checkpoint key used to identify CLIP models in single file checkpoints by @DN6 in #8319

- Python
Published by yiyixuxu over 1 year ago

diffusers - v0.28.1: HunyuanDiT andTransformer2D model class variants

This patch release primarily introduces the Hunyuan DiT pipeline from the Tencent team.

Hunyuan DiT

image

Hunyuan DiT is a transformer-based diffusion pipeline, introduced in the Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding paper by the Tencent Hunyuan.

```python import torch from diffusers import HunyuanDiTPipeline

pipe = HunyuanDiTPipeline.frompretrained( "Tencent-Hunyuan/HunyuanDiT-Diffusers", torchdtype=torch.float16 ) pipe.to("cuda")

You may also use English prompt as HunyuanDiT supports both English and Chinese

prompt = "An astronaut riding a horse"

prompt = "一个宇航员在骑马" image = pipe(prompt).images[0] ```

🧠 This pipeline has support for multi-linguality.

📜 Refer to the official docs here to learn more about it.

Thanks to @gnobitab, for contributing Hunyuan DiT in #8240.

All commits

  • Release: v0.28.0 by @sayakpaul (direct commit on v0.28.1-patch)
  • [Core] Introduce class variants for Transformer2DModel by @sayakpaul in #7647
  • resolve comflicts by @toshas (direct commit on v0.28.1-patch)
  • Tencent Hunyuan Team: add HunyuanDiT related updates by @gnobitab in #8240
  • Tencent Hunyuan Team - Updated Doc for HunyuanDiT by @gnobitab in #8383
  • [Transformer2DModel] Handle norm_type safely while remapping by @sayakpaul in #8370
  • Release: v0.28.1 by @sayakpaul (direct commit on v0.28.1-patch)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @gnobitab
    • Tencent Hunyuan Team: add HunyuanDiT related updates (#8240)
    • Tencent Hunyuan Team - Updated Doc for HunyuanDiT (#8383)

- Python
Published by sayakpaul over 1 year ago

diffusers - v0.28.0: Marigold, PixArt Sigma, AnimateDiff SDXL, InstantStyle, VQGAN Training Script, and more

Diffusion models are known for their abilities in the space of generative modeling. This release of diffusers introduces the first official pipeline (Marigold) for discriminative tasks such as depth estimation and surface normals’ estimation!

Starting this release, we will also highlight the changes and features from the library that make it easy to integrate community checkpoints, features, and so on. Read on!

Marigold

Proposed in Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation, Marigold introduces a diffusion model and associated fine-tuning protocol for monocular depth estimation. It can also be extended to perform surface normals’ estimation.

marigold

(Image taken from the official repository)

The code snippet below shows how to use this pipeline for depth estimation:

```python import diffusers import torch

pipe = diffusers.MarigoldDepthPipeline.frompretrained( "prs-eth/marigold-depth-lcm-v1-0", variant="fp16", torchdtype=torch.float16 ).to("cuda")

image = diffusers.utils.load_image("https://marigoldmonodepth.github.io/images/einstein.jpg") depth = pipe(image)

vis = pipe.imageprocessor.visualizedepth(depth.prediction) vis[0].save("einstein_depth.png")

depth16bit = pipe.imageprocessor.exportdepthto16bitpng(depth.prediction) depth16bit[0].save("einsteindepth_16bit.png") ```

Check out the API documentation here. We also have a detailed guide about the pipeline here.

Thanks to @toshas, one of the authors of Marigold, who contributed this in #7847.

🌀 Massive Refactor of from_single_file 🌀

We have further refactored from_single_file to align its logic more closely to the from_pretrained method. The biggest benefit of doing this is that it allows us to expand single file loading support beyond Stable Diffusion-like pipelines and models. It also makes it easier to load models that are saved and shared in their original format.

Some of the changes introduced in this refactor:

  1. When loading a single file checkpoint, we will attempt to use the keys present in the checkpoint to infer a model repository on the Hugging Face Hub that we can use to configure the pipeline. For example, if you are using a single file checkpoint based on SD 1.5, we would use the configuration files in the runwayml/stable-diffusion-v1-5 repository to configure the model components and pipeline.
  2. Suppose this inferred configuration isn’t appropriate for your checkpoint. In that case, you can override it using the config argument and pass in either a path to a local model repo or a repo id on the Hugging Face Hub.

python pipe = StableDiffusionPipeline.from_single_file("...", config=<model repo id or local repo path>)

  1. Deprecation of model configuration arguments for the from_single_file method in Pipelines such as num_in_channels, scheduler_type , image_size and upcast_attention . This is an anti-pattern that we have supported in previous versions of the library when we assumed that it would only be relevant to Stable Diffusion based models. However, given that there is a demand to support other model types, we feel it is necessary for single-file loading behavior to adhere to the conventions set in our other loading methods. Configuring individual model components through a pipeline loading method is not something we support in from_pretrained, and therefore, we will be deprecating support for this behavior in from_single_file as well.

PixArt Sigma

PixArt Simga is the successor to PixArt Alpha. PixArt Sigma is capable of directly generating images at 4K resolution. It can also produce images of markedly higher fidelity and improved alignment with text prompts. It comes with a massive sequence length of 300 (for reference, PixArt Alpha has a maximum sequence length of 120)!


(Taken from the project website.)


```python import torch from diffusers import PixArtSigmaPipeline

You can replace the checkpoint id with "PixArt-alpha/PixArt-Sigma-XL-2-512-MS" too.

pipe = PixArtSigmaPipeline.frompretrained( "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", torchdtype=torch.float16 )

Enable memory optimizations.

pipe.enablemodelcpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert." image = pipe(prompt).images[0] ```

📃 Refer to the documentation here to learn more about PixArt Sigma.

Thanks to @lawrence-cj, one of the authors of PixArt Sigma, who contributed this in #7857.

AnimateDiff SDXL

@a-r-r-o-w contributed the Stable Diffusion XL (SDXL) version of AnimateDiff in #6721. However, note that this is currently an experimental feature, as only a beta release of the motion adapter checkpoint is available.

```python import torch from diffusers.models import MotionAdapter from diffusers import AnimateDiffSDXLPipeline, DDIMScheduler from diffusers.utils import exporttogif

adapter = MotionAdapter.frompretrained("guoyww/animatediff-motion-adapter-sdxl-beta", torchdtype=torch.float16)

modelid = "stabilityai/stable-diffusion-xl-base-1.0" scheduler = DDIMScheduler.frompretrained( modelid, subfolder="scheduler", clipsample=False, betaschedule="linear", stepsoffset=1, ) pipe = AnimateDiffSDXLPipeline.frompretrained( modelid, motionadapter=adapter, scheduler=scheduler, torchdtype=torch.float16, variant="fp16", ).enablemodelcpu_offload()

enable memory savings

pipe.enablevaeslicing() pipe.enablevaetiling()

output = pipe( prompt="a panda surfing in the ocean, realistic, high quality", negativeprompt="low quality, worst quality", numinferencesteps=20, guidancescale=8, width=1024, height=1024, num_frames=16, )

frames = output.frames[0] exporttogif(frames, "animation.gif") ```

📜 Refer to the documentation to learn more.

Block-wise LoRA

@UmerHA contributed the support to control the scales of different LoRA blocks in a granular manner in #7352. Depending on the LoRA checkpoint one is using, this granular control can significantly impact the quality of the generated outputs. Following code block shows how this feature can be used while performing inference:

```python ...

adapterweightscales = { "unet": { "down": 0, "mid": 1, "up": 0} } pipe.setadapters("pixel", adapterweightscales) image = pipe( prompt, numinferencesteps=30, generator=torch.manualseed(0) ).images[0] ```

✍️ Refer to our documentation for more details and a full-fledged example.

InstantStyle

More granular control of scale could be extended to IP-Adapters too. @DannHuang contributed to the support of InstantStyle, aka granular control of IP-Adapter scales, in #7668. The following code block shows how this feature could be used when performing inference with IP-Adapters:

```python ...

scale = { "down": {"block2": [0.0, 1.0]}, "up": {"block0": [0.0, 1.0, 0.0]}, } pipeline.setipadapter_scale(scale) ```

This way, one can generate images following only the style or layout from the image prompt, with significantly improved diversity. This is achieved by only activating IP-Adapters to specific parts of the model.

Check out the documentation here.

ControlNetXS

ControlNet-XS was introduced in ControlNet-XS by Denis Zavadski and Carsten Rother. Based on the observation, the control model in the original ControlNet can be made much smaller and still produce good results. ControlNet-XS generates images comparable to a regular ControlNet, but it is 20-25% faster (see benchmark with StableDiffusion-XL) and uses ~45% less memory.

ControlNet-XS is supported for both Stable Diffusion and Stable Diffusion XL.

Thanks to @UmerHA for contributing ControlNet-XS in #5827 and #6772.

Custom Timesteps

We introduced custom timesteps support for some of our pipelines and schedulers. You can now set your scheduler with a list of arbitrary timesteps. For example, you can use the AYS timesteps schedule to achieve very nice results with only 10 denoising steps.

```python from diffusers.schedulers import AysSchedules samplingschedule = AysSchedules["StableDiffusionXLTimesteps"] pipe = StableDiffusionXLPipeline.frompretrained( "SG161222/RealVisXLV4.0", torchdtype=torch.float16, variant="fp16", ).to("cuda")

pipe.scheduler = DPMSolverMultistepScheduler.fromconfig(pipe.scheduler.config, algorithmtype="sde-dpmsolver++") prompt = "A cinematic shot of a cute little rabbit wearing a jacket and doing a thumbs up" image = pipe(prompt=prompt, timesteps=sampling_schedule).images[0] ```

Check out the documentation here

device_map in Pipelines 🧪

We have introduced experimental support for device_map in our pipelines. This feature becomes relevant when you have multiple accelerators to distribute the components of a pipeline. Currently, we support only “balanced” device_map. However, we plan to support other device mapping strategies relevant to diffusion models in the future.

```python from diffusers import DiffusionPipeline import torch

pipeline = DiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", torchdtype=torch.float16, device_map="balanced" ) image = pipeline("a dog").images[0] ```

In cases where you might be limited to low VRAM accelerators, you can use device_map to benefit from them. Below, we simulate a situation where we have access to two GPUs, each having only a GB of VRAM (through the max_memory argument).

```python from diffusers import DiffusionPipeline import torch

maxmemory = {0:"1GB", 1:"1GB"} pipeline = DiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", torchdtype=torch.float16, usesafetensors=True, devicemap="balanced", maxmemory=max_memory ) image = pipeline("a dog").images[0] ```

📜 Refer to the documentation to learn more about it.

VQGAN Training Script 📈

VQGAN, proposed in Taming Transformers for High-Resolution Image Synthesis, is a crucial component in the modern generative image modeling toolbox. Once it is trained, its encoder can be leveraged to compute general-purpose tokens from input images.

Thanks to @isamu-isozaki, who contributed a script and related utilities to train VQGANs in #5483. For details, refer to the official training directory.

VideoProcessor Class

Similar to the VaeImageProcessor class, we have introduced a VideoProcessor to help make the preprocessing and postprocessing of videos easier and a little more streamlined across the pipelines. Refer to the documentation to learn more.

New Guides 📑

Starting with this release, we provide guides and tutorials to help users get started with some of the most frequently used tasks in image and video generation. For this release, we have a series of three guides about outpainting with different techniques:

Official Callbacks

We introduced official callbacks that you can conveniently plug into your pipeline. For example, to turn off classifier-free guidance after denoising steps with SDXLCFGCutoffCallback.

```python import torch from diffusers import DiffusionPipeline from diffusers.callbacks import SDXLCFGCutoffCallback

callback = SDXLCFGCutoffCallback(cutoffstepratio=0.4) pipeline = StableDiffusionXLPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-1.0", torchdtype=torch.float16, variant="fp16", ).to("cuda") prompt = "a sports car at the road, best quality, high quality, high detail, 8k resolution" out = pipeline( prompt=prompt, numinferencesteps=25, callbackonstep_end=callback, ) ```

Read more on our documentation 📜

Community Pipelines and from_pipe API

Starting with this release note, we will highlight the new community pipelines! More and more of our pipelines were added as community pipelines first and graduated as official pipelines once people started to use them a lot! We do not require community pipelines to follow diffusers’ coding style, so it is the easiest way to contribute to diffusers 😊 

We also introduced a from_pipe API that’s very useful for the community pipelines that share checkpoints with our official pipelines and improve generation quality in some way:) You can use from_pipe(...) to load many community pipelines without additional memory requirements. With this API, you can easily switch between different pipelines to apply different techniques.

Read more about from_pipe API in our documentation 📃.

Here are four new community pipelines since our last release.

BoxDiff

BoxDiff lets you use bounding box coordinates for a more controlled generation. Here is an example of how you can apply this technique on a stable diffusion pipeline you had created (i.e. pipe_sd in the below example)

```python pipebox = DiffusionPipeline.frompipe( pipesd, custompipeline="pipelinestablediffusionboxdiff", ) pipebox.enablemodelcpu_offload() phrases = ["aurora","reindeer","meadow","lake","mountain"] boxes = [[1,3,512,202], [75,344,421,495], [1,327,508,507], [2,217,507,341], [1,135,509,242]] boxes = [[x / 512 for x in box] for box in boxes]

generator = torch.Generator(device="cpu").manualseed(42) images = pipebox( prompt, boxdiffphrases=phrases, boxdiffboxes=boxes, boxdiffkwargs={ "attentionres": 16, "normalizeeot": True }, numinference_steps=50, generator=generator, ).images ```

Check out this community pipeline here

HD-Painter

HD-Painter can enhance inpainting pipelines with improved prompt faithfulness and generate higher resolution (up to 2k). You can switch from BoxDiff to HD-Painter like this

```python pipe = DiffusionPipeline.frompipe( pipebox, custompipeline="hdpainter" ) pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)

prompt = "wooden boat" initimage = loadimage("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/assets/samples/images/2.jpg") maskimage = loadimage("https://raw.githubusercontent.com/Picsart-AI-Research/HD-Painter/main/assets/samples/masks/2.png")

image = pipe (prompt, initimage, maskimage, userasg = True, usepainta = True, generator=torch.manual_seed(12345)).images[0] ```

Check out this community pipeline here

Differential Diffusion

Differential Diffusion enables customization of the amount of change per pixel or per image region. It’s very effective in inpainting and outpainting.

```python pipeline = DiffusionPipeline.frompipe( pipesdxl, custompipeline="pipelinestablediffusionxldifferentialimg2img", ).to("cuda") pipeline.scheduler = DPMSolverMultistepScheduler.fromconfig(pipeline.scheduler.config, usekarras_sigmas=True)

prompt = "a green pear" negative_prompt = "blurry"

image = pipeline( prompt=prompt, negativeprompt=negativeprompt, guidancescale=7.5, numinferencesteps=25, originalimage=image, image=image, strength=1.0, map=mask, ).images[0] ```

Check out this community pipeline here.

FRESCO

FRESCO aka FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation enables zero-shot video-to-video translation. Learn more about it from here.

All Commits

  • clean dep installation step in push_tests by @sayakpaul in #7382
  • [LoRA test suite] refactor the test suite and cleanse it by @sayakpaul in #7316
  • [Custom Pipelines with Custom Components] fix multiple things by @sayakpaul in #7304
  • Fix typos by @standardAI in #7411
  • fix: enable unet3dcondition to support timecondproj_dim by @yhZhai in #7364
  • add: space within docs to calculate mememory usage. by @sayakpaul (direct commit on v0.28.0-release)
  • Revert "add: space within docs to calculate mememory usage." by @sayakpaul (direct commit on v0.28.0-release)
  • [Docs] add missing output image by @sayakpaul in #7425
  • add a "Community Scripts" section by @yiyixuxu in #7358
  • add: space for calculating memory usagee. by @sayakpaul in #7414
  • [refactor] Fix FreeInit behaviour by @a-r-r-o-w in #7410
  • Remove distutils by @sayakpaul in #7455
  • [IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline by @standardAI in #7262
  • [Research Projects] ORPO diffusion for alignment by @sayakpaul in #7423
  • Additional Memory clean up for slow tests by @DN6 in #7436
  • Fix for str_to_bool definition in testing utils by @DN6 in #7461
  • [Docs] Fix typos by @standardAI in #7451
  • Fixed minor error in test_lora_layers_peft.py by @UmerHA in #7394
  • Small ldm3d fix by @estelleafl in #7464
  • [tests] skip dynamo tests when python is 3.12. by @sayakpaul in #7458
  • feat: support DoRA LoRA from community by @sayakpaul in #7371
  • Fix broken link by @salcc in #7472
  • Update traindreamboothlorasd15advanced.py by @ernestchu in #7433
  • [Training utils] add kohya conversion dict. by @sayakpaul in #7435
  • Fix Tiling in ConsistencyDecoderVAE by @standardAI in #7290
  • diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs by @bghira in #7446
  • Fix missing raise statements in check_inputs by @TonyLianLong in #7473
  • Add device arg to offloading with combined pipelines by @Disty0 in #7471
  • fix torch.compile for multi-controlnet of sdxl inpaint by @yiyixuxu in #7476
  • [chore] make the istructions on fetching all commits clearer. by @sayakpaul in #7474
  • Skip test_lora_fuse_nan on mps by @UmerHA in #7481
  • [Chore] Fix Colab notebook links in README.md by @thliang01 in #7495
  • [Modeling utils chore] import loadmodeldictintometa only once by @sayakpaul in #7437
  • Improve nightly tests by @sayakpaul in #7385
  • add: a helpful message when quality and repo consistency checks fail. by @sayakpaul in #7475
  • apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) by @bghira in #7447
  • cpu_offload: remove all hooks before offload by @yiyixuxu in #7448
  • Bug fix for controlnetpipeline check_image by @Fantast616 in #7103
  • fix OOM for testvaetiling by @yiyixuxu in #7510
  • [Tests] Speed up some fast pipeline tests by @sayakpaul in #7477
  • Memory clean up on all Slow Tests by @DN6 in #7514
  • Implements Blockwise lora by @UmerHA in #7352
  • Quick-Fix for #7352 block-lora by @UmerHA in #7523
  • add Instant id sdxl image2image pipeline by @linoytsaban in #7507
  • Perturbed-Attention Guidance by @HyoungwonCho in #7512
  • Add final_sigma_zero to UniPCMultistep by @Beinsezii in #7517
  • Fix IP Adapter Support for SAG Pipeline by @Stepheni12 in #7260
  • [Community pipeline] Marigold depth estimation update -- align with marigold v0.1.5 by @markkua in #7524
  • Fix typo in CPU offload test by @DN6 in #7542
  • Fix SVD bug (shape of time_context) by @KimbingNg in #7268
  • fix the cpu offload tests by @yiyixuxu in #7544
  • add HD-Painter pipeline by @haikmanukyan in #7520
  • add a from_pipe method to DiffusionPipeline by @yiyixuxu in #7241
  • [Community pipeline] SDXL Differential Diffusion Img2Img Pipeline by @asomoza in #7550
  • Fix FreeU tests by @DN6 in #7540
  • [Release tests] make nightly workflow dispatchable. by @sayakpaul in #7541
  • [Chore] remove class assignments for linear and conv. by @sayakpaul in #7553
  • [Tests] Speed up fast pipelines part II by @sayakpaul in #7521
  • 7529 do not disable autocast for cuda devices by @bghira in #7530
  • add: utility to format our docs too 📜 by @sayakpaul in #7314
  • UniPC Multistep fix tensor dtype/device on order=3 by @Beinsezii in #7532
  • UniPC Multistep add rescale_betas_zero_snr by @Beinsezii in #7531
  • [Core] refactor transformers 2d into multiple init variants. by @sayakpaul in #7491
  • [Chore] increase number of workers for the tests. by @sayakpaul in #7558
  • Update pipelineanimatediffvideo2video.py by @AbhinavGopal in #7457
  • Skip test_freeu_enabled on MPS by @UmerHA in #7570
  • [Tests] reduce block sizes of UNet and VAE tests by @sayakpaul in #7560
  • [IF| add setbeginindex for all IF pipelines by @yiyixuxu in #7577
  • Add AudioLDM2 TTS by @tuanh123789 in #5381
  • Allow more arguments to be passed to convertfromckpt by @w4ffl35 in #7222
  • [Docs] fix bugs in callback docs by @Adenialzz in #7594
  • Add missing restore() EMA call in train SDXL script by @christopher-beckham in #7599
  • disable testconversionwhenusingdevice_map by @yiyixuxu in #7620
  • Multi-image masking for single IP Adapter by @fabiorigano in #7499
  • add utilities for updating diffusers pipeline metadata. by @sayakpaul in #7573
  • [Core] refactor transformer_2d forward logic into meaningful conditions. by @sayakpaul in #7489
  • [Workflows] remove installation of libsndfile1-dev and libgl1 from workflows by @sayakpaul in #7543
  • [Core] add "balanced" device_map support to pipelines by @sayakpaul in #6857
  • add the option of upsample function for tiny vae by @IDKiro in #7604
  • [docs] remove duplicate tip block. by @sayakpaul in #7625
  • Modularize instruct_pix2pix SD inferencing during and after training in examples by @satani99 in #7603
  • [Tests] reduce the model sizes in the SD fast tests by @sayakpaul in #7580
  • [docs] Prompt enhancer by @stevhliu in #7565
  • [docs] T2I by @stevhliu in #7623
  • Fix cpu offload related slow tests by @yiyixuxu in #7618
  • [Core] fix img2img pipeline for Playground by @sayakpaul in #7627
  • Skip PEFT LoRA Scaling if the scale is 1.0 by @stevenjlm in #7576
  • LCM Distill Scripts Fix Bug when Initializing Target U-Net by @dg845 in #6848
  • Fixed YAML loading. by @YiqinZhao in #7579
  • fix: Replaced deprecated logger.warn with logger.warning by @Sai-Suraj-27 in #7643
  • FIX Setting device for DoRA parameters by @BenjaminBossan in #7655
  • Add (Scheduled) Pseudo-Huber Loss training scripts to research projects by @kabachuha in #7527
  • make docker-buildx mandatory. by @sayakpaul in #7652
  • fix: metadata token by @sayakpaul in #7631
  • don't install peft from the source with uv for now. by @sayakpaul in #7679
  • Fixing implementation of ControlNet-XS by @UmerHA in #6772
  • [Core] is_cosxl_edit arg in SDXL ip2p. by @sayakpaul in #7650
  • [Docs] Add TGATE in section optimization by @WentianZhang-ML in #7639
  • fix: Updated ruff configuration to avoid deprecated configuration warning by @Sai-Suraj-27 in #7637
  • Don't install PEFT with UV in slow tests by @DN6 in #7697
  • [Workflows] remove installation of redundant modules from flax PR tests by @sayakpaul in #7662
  • [Docs] Update TGATE in section optimization. by @WentianZhang-ML in #7698
  • [docs] Pipeline loading by @stevhliu in #7684
  • Add tailscale action to push_test by @glegendre01 in #7709
  • Move IP Adapter Face ID to core by @fabiorigano in #7186
  • adding back testconversionwhenusingdevice_map by @yiyixuxu in #7704
  • Cast height, width to int inside prepare latents by @DN6 in #7691
  • Cleanup ControlnetXS by @DN6 in #7701
  • fix: Fixed type annotations for compatability with python 3.8 by @Sai-Suraj-27 in #7648
  • fix/add tailscale key in case of failure by @glegendre01 in #7719
  • Animatediff Controlnet Community Pipeline IP Adapter Fix by @AbhinavGopal in #7413
  • Update Wuerschten Test by @DN6 in #7700
  • Fix Kandinksy V22 tests by @DN6 in #7699
  • [docs] AutoPipeline by @stevhliu in #7714
  • Remove redundant lines by @philipbutler in #7396
  • Support InstantStyle by @DannHuang in #7668
  • Restore AttnProcessor20 in unloadip_adapter by @fabiorigano in #7727
  • fix: Fixed a wrong decorator by modifying it to @classmethod by @Sai-Suraj-27 in #7653
  • [Metadat utils] fix: json lines ordering. by @sayakpaul in #7744
  • [docs] Clean up toctree by @stevhliu in #7715
  • Fix failing VAE tiling test by @DN6 in #7747
  • Fix test for consistency decoder. by @DN6 in #7746
  • PixArt-Sigma Implementation by @lawrence-cj in #7654
  • [PixArt] fix small nits in pixart sigma by @sayakpaul in #7767
  • [Tests] mark UNetControlNetXSModelTests::testforwardno_control to be flaky by @sayakpaul in #7771
  • Fix lora device test by @sayakpaul in #7738
  • [docs] Reproducible pipelines by @stevhliu in #7769
  • [docs] Refactor image quality docs by @stevhliu in #7758
  • Convert RGB to BGR for the SDXL watermark encoder by @btlorch in #7013
  • [docs] Fix AutoPipeline docstring by @stevhliu in #7779
  • Add PixArtSigmaPipeline to AutoPipeline mapping by @Beinsezii in #7783
  • [Docs] Update image masking and face id example by @fabiorigano in #7780
  • Add DREAM training by @AmericanPresidentJimmyCarter in #6381
  • [Scheduler] introduce sigma schedule. by @sayakpaul in #7649
  • Update InstantStyle usage in IP-Adapter documentation by @DannHuang in #7806
  • Check for latents, before calling prepare_latents - sdxlImg2Img by @nileshkokane01 in #7582
  • Add debugging workflow by @DN6 in #7778
  • [Pipeline] Fix error of SVD pipeline when numvideosper_prompt > 1 by @wuyushuwys in #7786
  • Safetensor loading in AnimateDiff conversion scripts by @DN6 in #7764
  • Adding TextualInversionLoaderMixin for the controlnetinpaintsd_xl pipeline by @jschoormans in #7288
  • Added get_velocity function to EulerDiscreteScheduler. by @RuiningLi in #7733
  • Set maininputname in StableDiffusionSafetyChecker to "clip_input" by @clinty in #7500
  • [Tests] reduce the model size in the ddim fast test by @ariG23498 in #7803
  • [Tests] reduce the model size in the ddpm fast test by @ariG23498 in #7797
  • [Tests] reduce the model size in the amused fast test by @ariG23498 in #7804
  • [Core] introduce nosplit_modules to ModelMixin by @sayakpaul in #6396
  • Add B-Lora training option to the advanced dreambooth lora script by @linoytsaban in #7741
  • SSH Runner Workflow Update by @DN6 in #7822
  • Fix CPU offload in docstring by @standardAI in #7827
  • [docs] Community pipelines by @stevhliu in #7819
  • Fix for pipeline slow test fetcher by @DN6 in #7824
  • [Tests] fix: device map tests for models by @sayakpaul in #7825
  • update the logic of is_sequential_cpu_offload by @yiyixuxu in #7788
  • [ip-adapter] fix ip-adapter for StableDiffusionInstructPix2PixPipeline by @yiyixuxu in #7820
  • [Tests] reduce the model size in the audioldm fast test by @ariG23498 in #7833
  • Fix key error for dictionary with randomized order in convertldmunet_checkpoint by @yunseongcho in #7680
  • Fix hanging pipeline fetching by @DN6 in #7837
  • Update download diff format tests by @DN6 in #7831
  • Update CI cache by @DN6 in #7832
  • move to new runners by @glegendre01 in #7839
  • Change GPU Runners by @glegendre01 in #7840
  • Update deps for pipe test fetcher by @DN6 in #7838
  • [Tests] reduce the model size in the blipdiffusion fast test by @ariG23498 in #7849
  • Respect resume_download deprecation by @Wauplin in #7843
  • Remove installing python again in container by @DN6 in #7852
  • Add Ascend NPU support for SDXL fine-tuning and fix the model saving bug when using DeepSpeed. by @HelloWorldBeginner in #7816
  • [docs] LCM by @stevhliu in #7829
  • Ci - change cache folder by @glegendre01 in #7867
  • [docs] Distilled inference by @stevhliu in #7834
  • Fix for "no lora weight found module" with some loras by @asomoza in #7875
  • 7879 - adjust documentation to use naruto dataset, since pokemon is now gated by @bghira in #7880
  • Modification on the PAG community pipeline (re) by @HyoungwonCho in #7876
  • Fix image upcasting by @standardAI in #7858
  • Check shape and remove deprecated APIs in schedulingddpmflax.py by @ppham27 in #7703
  • [Pipeline] AnimateDiff SDXL by @a-r-r-o-w in #6721
  • fix offload test by @yiyixuxu in #7868
  • Allow users to save SDXL LoRA weights for only one text encoder by @dulacp in #7607
  • Remove dead code and fix f-string issue by @standardAI in #7720
  • Fix several imports by @standardAI in #7712
  • [Refactor] Better align from_single_file logic with from_pretrained by @DN6 in #7496
  • [Tests] fix things after #7013 by @sayakpaul in #7899
  • Set max parallel jobs on slow test runners by @DN6 in #7878
  • fix _optional_components in StableCascadeCombinedPipeline by @yiyixuxu in #7894
  • [scheduler] support custom timesteps and sigmas by @yiyixuxu in #7817
  • upgrade to python 3.10 in the Dockerfiles by @sayakpaul in #7893
  • add missing image processors to the docs by @sayakpaul in #7910
  • [Core] introduce videoprocessor. by @sayakpaul in #7776
  • #7535 Update FloatTensor type hints to Tensor by @vanakema in #7883
  • fix bugs when using deepspeed in sdxl by @HelloWorldBeginner in #7917
  • add custom sigmas and timesteps for StableDiffusionXLControlNet pipeline by @neuron-party in #7913
  • fix: Fixed a wrong link to supported python versions in contributing.md file by @Sai-Suraj-27 in #7638
  • [Core] fix offload behaviour when device_map is enabled. by @sayakpaul in #7919
  • Add Ascend NPU support for SDXL. by @HelloWorldBeginner in #7916
  • Official callbacks by @asomoza in #7761
  • fix AnimateDiff creation with a unet loaded with IP Adapter by @fabiorigano in #7791
  • [LoRA] Fix LoRA tests (side effects of RGB ordering) part ii by @sayakpaul in #7932
  • fix multicontrolnet save_pretrained logic for compatibility by @rebel-kblee in #7821
  • Update requirements.txt for texttoimage by @ktakita1011 in #7892
  • Bump transformers from 4.36.0 to 4.38.0 in /examples/research_projects/realfill by @dependabot[bot] in #7635
  • fix VAE loading issue in train_dreambooth by @bssrdf in #7632
  • Expansion proposal of diffusers-cli env by @standardAI in #7403
  • update to use hf-workflows for reporting the Docker build statuses by @sayakpaul in #7938
  • [Core] separate the loading utilities in modeling similar to pipelines. by @sayakpaul in #7943
  • Fix added_cond_kwargs when using IP-Adapter in StableDiffusionXLControlNetInpaintPipeline by @detkov in #7924
  • [Pipeline] Adding BoxDiff to community examples by @zjysteven in #7947
  • [tests] decorate StableDiffusion21PipelineSingleFileSlowTests with slow. by @sayakpaul in #7941
  • Adding VQGAN Training script by @isamu-isozaki in #5483
  • move to GH hosted M1 runner by @glegendre01 in #7949
  • [Workflows] add a workflow that can be manually triggered on a PR. by @sayakpaul in #7942
  • refactor: Refactored code by Merging isinstance calls by @Sai-Suraj-27 in #7710
  • Fix the text tokenizer name in logger warning of PixArt pipelines by @liang-hou in #7912
  • Fix AttributeError in trainlcmdistilllorasdxl_wds.py by @jainalphin in #7923
  • Consistent SDXL Controlnet callback tensor inputs by @asomoza in #7958
  • remove unsafe workflow. by @sayakpaul in #7967
  • [tests] fix Pixart Sigma tests by @sayakpaul in #7966
  • Fix typo in "attention" by @jacobmarks in #7977
  • Update pipelinecontrolnetinpaintsdxl.py by @detkov in #7983
  • [docs] add doc for PixArtSigmaPipeline by @lawrence-cj in #7857
  • Passing cross_attention_kwargs to StableDiffusionInstructPix2PixPipeline by @AlexeyZhuravlev in #7961
  • fix: Fixed few docstrings according to the Google Style Guide by @Sai-Suraj-27 in #7717
  • Make VAE compatible to torch.compile() by @rootonchair in #7984
  • [docs] VideoProcessor by @stevhliu in #7965
  • Use HF_TOKEN env var in CI by @Wauplin in #7993
  • fix: Attribute error in Logger object (logger.warning) by @AMohamedAakhil in #8183
  • Remove unnecessary single file tests for SD Cascade UNet by @DN6 in #7996
  • Fix resize issue in SVD pipeline with VideoProcessor by @DN6 in #8229
  • Create custom container for doc builder by @DN6 in #8263
  • Use freedesktop_os_release() in diffusers cli for Python >=3.10 by @DN6 in #8235
  • [Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation by @SingleZombie in #8239
  • [Chore] run the documentation workflow in a custom container. by @sayakpaul in #8266
  • Respect resume_download deprecation V2 by @Wauplin in #8267
  • Clean up from_single_file docs by @DN6 in #8268
  • sampling bug fix in diffusers tutorial "basic_training.md" by @yue-here in #8223
  • Fix a grammatical error in the raise messages by @standardAI in #8272
  • Fix CPU Offloading Usage & Typos by @standardAI in #8230
  • Add details about 1-stage implementation in I2VGen-XL docs by @dhaivat1729 in #8282
  • [Workflows] add a more secure way to run tests from a PR. by @sayakpaul in #7969
  • Add zip package to doc builder image by @DN6 in #8284
  • [Pipeline] Marigold depth and normals estimation by @toshas in #7847
  • Release: v0.28.0 by @sayakpaul (direct commit on v0.28.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @standardAI
    • Fix typos (#7411)
    • [IP-Adapter] Fix IP-Adapter Support and Refactor Callback for StableDiffusionPanoramaPipeline (#7262)
    • [Docs] Fix typos (#7451)
    • Fix Tiling in ConsistencyDecoderVAE (#7290)
    • Fix CPU offload in docstring (#7827)
    • Fix image upcasting (#7858)
    • Remove dead code and fix f-string issue (#7720)
    • Fix several imports (#7712)
    • Expansion proposal of diffusers-cli env (#7403)
    • Fix a grammatical error in the raise messages (#8272)
    • Fix CPU Offloading Usage & Typos (#8230)
  • @a-r-r-o-w
    • [refactor] Fix FreeInit behaviour (#7410)
    • [Pipeline] AnimateDiff SDXL (#6721)
  • @UmerHA
    • Fixed minor error in test_lora_layers_peft.py (#7394)
    • Skip test_lora_fuse_nan on mps (#7481)
    • Implements Blockwise lora (#7352)
    • Quick-Fix for #7352 block-lora (#7523)
    • Skip test_freeu_enabled on MPS (#7570)
    • Fixing implementation of ControlNet-XS (#6772)
  • @bghira
    • diffusers#7426 fix stable diffusion xl inference on MPS when dtypes shift unexpectedly due to pytorch bugs (#7446)
    • apple mps: training support for SDXL (ControlNet, LoRA, Dreambooth, T2I) (#7447)
    • 7529 do not disable autocast for cuda devices (#7530)
    • 7879 - adjust documentation to use naruto dataset, since pokemon is now gated (#7880)
  • @HyoungwonCho
    • Perturbed-Attention Guidance (#7512)
    • Modification on the PAG community pipeline (re) (#7876)
  • @haikmanukyan
    • add HD-Painter pipeline (#7520)
  • @fabiorigano
    • Multi-image masking for single IP Adapter (#7499)
    • Move IP Adapter Face ID to core (#7186)
    • Restore AttnProcessor20 in unloadip_adapter (#7727)
    • [Docs] Update image masking and face id example (#7780)
    • fix AnimateDiff creation with a unet loaded with IP Adapter (#7791)
  • @kabachuha
    • Add (Scheduled) Pseudo-Huber Loss training scripts to research projects (#7527)
  • @lawrence-cj
    • PixArt-Sigma Implementation (#7654)
    • [docs] add doc for PixArtSigmaPipeline (#7857)
  • @vanakema
    • #7535 Update FloatTensor type hints to Tensor (#7883)
  • @zjysteven
    • [Pipeline] Adding BoxDiff to community examples (#7947)
  • @isamu-isozaki
    • Adding VQGAN Training script (#5483)
  • @SingleZombie
    • [Community Pipeline] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation (#8239)
  • @toshas
    • [Pipeline] Marigold depth and normals estimation (#7847)

- Python
Published by sayakpaul almost 2 years ago

diffusers - v0.27.2: Fix scheduler `add_noise` 🐞, embeddings in StableCascade, `scale` when using LoRA

All commits

  • [scheduler] fix a bug in add_noise by @yiyixuxu in https://github.com/huggingface/diffusers/pull/7386
  • [LoRA] fix crossattentionkwargs problems and tighten tests by @sayakpaul in https://github.com/huggingface/diffusers/pull/7388
  • Fix issue with prompt embeds and latents in SD Cascade Decoder with multiple image embeddings for a single prompt. by @DN6 in https://github.com/huggingface/diffusers/pull/7381

- Python
Published by sayakpaul almost 2 years ago

diffusers - v0.27.1: Clear `scale` argument confusion for LoRA

All commits

  • Release: v0.27.0 by @DN6 (direct commit on v0.27.1-patch)
  • [LoRA] pop the LoRA scale so that it doesn't get propagated to the weeds by @sayakpaul in #7338
  • Release: 0.27.1-patch by @sayakpaul (direct commit on v0.27.1-patch)

- Python
Published by sayakpaul almost 2 years ago

diffusers - v0.27.0: Stable Cascade, Playground v2.5, EDM-style training, IP-Adapter image embeds, and more

Stable Cascade

We are adding support for a new text-to-image model building on Würstchen called Stable Cascade, which comes with a non-commercial license. The Stable Cascade line of pipelines differs from Stable Diffusion in that they are built upon three distinct models and allow for hierarchical compression of image patients, achieving remarkable outputs.

```python from diffusers import StableCascadePriorPipeline, StableCascadeDecoderPipeline import torch

prior = StableCascadePriorPipeline.frompretrained( "stabilityai/stable-cascade-prior", torchdtype=torch.bfloat16, ).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" imageemb = prior(prompt=prompt).imageembeddings[0]

decoder = StableCascadeDecoderPipeline.frompretrained( "stabilityai/stable-cascade", torchdtype=torch.bfloat16, ).to("cuda")

image = pipe(imageembeddings=imageemb, prompt=prompt).images[0] image ```

📜 Check out the docs here to know more about the model.

Note: You will need a torch>=2.2.0 to use the torch.bfloat16 data type with the Stable Cascade pipeline.

Playground v2.5

PlaygroundAI released a new v2.5 model (playgroundai/playground-v2.5-1024px-aesthetic), which particularly excels at aesthetics. The model closely follows the architecture of Stable Diffusion XL, except for a few tweaks. This release comes with support for this model:

```python from diffusers import DiffusionPipeline import torch

pipe = DiffusionPipeline.frompretrained( "playgroundai/playground-v2.5-1024px-aesthetic", torchdtype=torch.float16, variant="fp16", ).to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt=prompt, numinferencesteps=50, guidance_scale=3).images[0] image ```

Loading from the original single-file checkpoint is also supported:

```python from diffusers import StableDiffusionXLPipeline, EDMDPMSolverMultistepScheduler import torch

url = "https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/playground-v2.5-1024px-aesthetic.safetensors" pipeline = StableDiffusionXLPipeline.fromsinglefile(url) pipeline.to(device="cuda", dtype=torch.float16)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipeline(prompt=prompt, guidancescale=3.0).images[0] image.save("playgroundtest_image.png") ```

You can also perform LoRA DreamBooth training with the playgroundai/playground-v2.5-1024px-aesthetic checkpoint:

python accelerate launch train_dreambooth_lora_sdxl.py \ --pretrained_model_name_or_path="playgroundai/playground-v2.5-1024px-aesthetic" \ --instance_data_dir="dog" \ --output_dir="dog-playground-lora" \ --mixed_precision="fp16" \ --instance_prompt="a photo of sks dog" \ --resolution=1024 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --learning_rate=1e-4 \ --use_8bit_adam \ --report_to="wandb" \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=500 \ --validation_prompt="A photo of sks dog in a bucket" \ --validation_epochs=25 \ --seed="0" \ --push_to_hub

To know more, follow the instructions here.

EDM-style training support

EDM refers to the training and sampling techniques introduced in the following paper: Elucidating the Design Space of Diffusion-Based Generative Models. We have introduced support for training using the EDM formulation in our train_dreambooth_lora_sdxl.py script.

To train stabilityai/stable-diffusion-xl-base-1.0 using the EDM formulation, you just have to specify the --do_edm_style_training flag in your training command, and voila 🤗

If you’re interested in extending this formulation to other training scripts, we refer you to this PR.

New schedulers with the EDM formulation

To better support the Playground v2.5 model and EDM-style training in general, we are bringing support for EDMDPMSolverMultistepScheduler and EDMEulerScheduler. These support the EDM formulations of the DPMSolverMultistepScheduler and EulerDiscreteScheduler, respectively.

Trajectory Consistency Distillation

Trajectory Consistency Distillation (TCD) enables a model to generate higher quality and more detailed images with fewer steps. Moreover, owing to the effective error mitigation during the distillation process, TCD demonstrates superior performance even under conditions of large inference steps. It was proposed in Trajectory Consistency Distillation.

This release comes with the support of a TCDScheduler that enables this kind of fast sampling. Much like LCM-LoRA, TCD requires an additional adapter for the acceleration. The code snippet below shows a usage:

```python import torch from diffusers import StableDiffusionXLPipeline, TCDScheduler

device = "cuda" basemodelid = "stabilityai/stable-diffusion-xl-base-1.0" tcdloraid = "h1t/TCD-SDXL-LoRA"

pipe = StableDiffusionXLPipeline.frompretrained(basemodelid, torchdtype=torch.float16, variant="fp16").to(device) pipe.scheduler = TCDScheduler.from_config(pipe.scheduler.config)

pipe.loadloraweights(tcdloraid) pipe.fuse_lora()

prompt = "Painting of the orange cat Otto von Garfield, Count of Bismarck-Schönhausen, Duke of Lauenburg, Minister-President of Prussia. Depicted wearing a Prussian Pickelhaube and eating his favorite meal - lasagna."

image = pipe( prompt=prompt, numinferencesteps=4, guidancescale=0, eta=0.3, generator=torch.Generator(device=device).manualseed(0), ).images[0] ```

tcd_image

📜 Check out the docs here to know more about TCD.

Many thanks to @mhh0318 for contributing the TCDScheduler in #7174 and the guide in #7259.

IP-Adapter image embeddings and masking

All the pipelines supporting IP-Adapter accept a ip_adapter_image_embeds argument. If you need to run the IP-Adapter multiple times with the same image, you can encode the image once and save the embedding to the disk. This saves computation time and is especially useful when building UIs. Additionally, ComfyUI image embeddings for IP-Adapters are fully compatible in Diffusers and should work out-of-box.

We have also introduced support for providing binary masks to specify which portion of the output image should be assigned to an IP-Adapter. For each input IP-Adapter image, a binary mask and an IP-Adapter must be provided. Thanks to @fabiorigano for contributing this feature through #6847.

📜 To know about the exact usage of both of the above, refer to our official guide.

We thank our community members, @fabiorigano, @asomoza, and @cubiq, for their guidance and input on these features.

Guide on merging LoRAs

Merging LoRAs can be a fun and creative way to create new and unique images. Diffusers provides merging support with the set_adapters method which concatenates the weights of the LoRAs to merge.

Now, Diffusers also supports the add_weighted_adapter method from the PEFT library, unlocking more efficient merging method like TIES, DARE, linear, and even combinations of these merging methods like dare_ties.

📜 Take a look at the Merge LoRAs guide to learn more about merging in Diffusers.

LEDITS++

We are adding support to the real image editing technique called LEDITS++: Limitless Image Editing using Text-to-Image Models, a parameter-free method, requiring no fine-tuning nor any optimization. To edit real images, the LEDITS++ pipelines first invert the image DPM-solver++ scheduler that facilitates editing with as little as 20 total diffusion steps for inversion and inference combined. LEDITS++ guidance is defined such that it both reflects the direction of the edit (if we want to push away from/towards the edit concept) and the strength of the effect. The guidance also includes a masking term focused on relevant image regions which, for multiple edits especially, ensures that the corresponding guidance terms for each concept remain mostly isolated, limiting interference.

The code snippet below shows a usage:

```python import torch import PIL import requests from io import BytesIO from diffusers import LEditsPPPipelineStableDiffusionXL, AutoencoderKL

device = "cuda" basemodelid = "stabilityai/stable-diffusion-xl-base-1.0"

vae = AutoencoderKL.frompretrained("madebyollin/sdxl-vae-fp16-fix", torchdtype=torch.float16)

pipe = LEditsPPPipelineStableDiffusionXL.frompretrained( basemodelid, vae=vae, torchdtype=torch.float16 ).to(device)

def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB")

imgurl = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg" image = downloadimage(img_url)

_ = pipe.invert( image = image, numinversionsteps=50, skip=0.2 )

editedimage = pipe( editingprompt=["tennis ball","tomato"], reverseeditingdirection=[True,False], editguidancescale=[5.0,10.0], edit_threshold=[0.9,0.85],) ```

Tennis ball Tomato ball

📜 Check out the docs here to learn more about LEDITS++.

Thanks to @manuelbrack for contributing this in #6074.

All commits

  • Fix flaky IP Adapter test by @DN6 in #6960
  • Move SDXL T2I Adapter lora test into PEFT workflow by @DN6 in #6965
  • Allow passing config_file argument to ControlNetModel when using from_single_file by @DN6 in #6959
  • [PEFT / docs] Add a note about torch.compile by @younesbelkada in #6864
  • [Core] Harmonize single file ckpt model loading by @sayakpaul in #6971
  • fix: controlnet inpaint single file. by @sayakpaul in #6975
  • [docs] IP-Adapter by @stevhliu in #6897
  • fix IPAdapter unloadipadapter test by @yiyixuxu in #6972
  • [advanced sdxl lora script] - fix #6967 bug when using prior preservation loss by @linoytsaban in #6968
  • [IP Adapters] feat: allow lowcpumem_usage in ip adapter loading by @sayakpaul in #6946
  • Fix diffusers import prompt2prompt by @ihkap11 in #6927
  • add: peft to the benchmark workflow by @sayakpaul in #6989
  • Fix procecss process by @co63oc in #6591
  • Standardize model card for textual inversion sdxl by @Stepheni12 in #6963
  • Update textual_inversion.py by @Bhavay-2001 in #6952
  • [docs] Fix callout by @stevhliu in #6998
  • [docs] Video generation by @stevhliu in #6701
  • start depcrecation cycle for loraattentionproc 👋 by @sayakpaul in #7007
  • Add documentation for strength parameter in Controlnet_img2img pipelines by @tlpss in #6951
  • Fixed typos in dosctrings of init() and in forward() of Unet3DConditionModel by @MK-2012 in #6663
  • [SVD] fix a bug when passing image as tensor by @yiyixuxu in #6999
  • Fix deprecation warning for torch.utils.pytree.registerpytreenode in PyTorch 2.2 by @zyinghua in #7008
  • [IP2P] Make text encoder truly optional in InstructPi2Pix by @sayakpaul in #6995
  • IP-Adapter attention masking by @fabiorigano in #6847
  • Fix Pixart Slow Tests by @DN6 in #6962
  • [fromsinglefile] pass torch_dtype to set_module_tensor_to_device by @yiyixuxu in #6994
  • [Refactor] FreeInit for AnimateDiff based pipelines by @DN6 in #6874
  • [Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU by @ustcuna in #6683
  • Add section on AnimateLCM to docs by @DN6 in #7024
  • IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline by @rootonchair in #6941
  • Supper IP Adapter weight loading in StableDiffusionXLControlNetInpaintPipeline by @tontan2545 in #7031
  • Fix alt text and image links in AnimateLCM docs by @DN6 in #7029
  • Update ControlNet Inpaint single file test by @DN6 in #7022
  • Fix load_model_dict_into_meta for ControlNet from_single_file by @DN6 in #7034
  • Remove disable_full_determinism from StableVideoDiffusion xformers test. by @DN6 in #7039
  • update header by @pravdomil in #6596
  • fix doc example for fomsinglefile by @yiyixuxu in #7015
  • Fix typos in texttoimage examples by @standardAI in #7050
  • Update checkpoint_merger pipeline to pass the "variant" argument by @lstein in #6670
  • allow explicit tokenizer & textencoder in unloadtextual_inversion by @H3zi in #6977
  • re-add unet refactor PR by @yiyixuxu in #7044
  • IPAdapterTesterMixin by @a-r-r-o-w in #6862
  • [Refactor] save_model_card function in text_to_image examples by @standardAI in #7051
  • Fix typos by @standardAI in #7068
  • Fix docstring of community pipeline imagic by @chongdashu in #7062
  • Change images to image. The variable images is not used anywhere by @bimsarapathiraja in #7074
  • fix: TensorRTStableDiffusionPipeline cannot set guidance_scale by @caiyueliang in #7065
  • [Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline by @standardAI in #7071
  • Fix truthy-ness condition in pipelines that use denoising_start by @a-r-r-o-w in #6912
  • Fix headtobatch_dim for IPAdapterAttnProcessor by @fabiorigano in #7077
  • [docs] Minor updates by @stevhliu in #7063
  • Modularize Dreambooth LoRA SD inferencing during and after training by @rootonchair in #6654
  • Modularize Dreambooth LoRA SDXL inferencing during and after training by @rootonchair in #6655
  • [Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet by @a-r-r-o-w in #7086
  • Pass uselinearprojection parameter to mid block in UNetMotionModel by @Stepheni12 in #7035
  • Resize image before crop by @jiqing-feng in #7095
  • Small change to download in dance diffusion convert script by @DN6 in #7070
  • Fix EMA in traintexttoimagesdxl.py by @standardAI in #7048
  • Make LoRACompatibleConv padding_mode work. by @jinghuan-Chen in #6031
  • [Easy] edit issue and PR templates by @sayakpaul in #7092
  • FIX [PEFT / Core] Copy the state dict when passing it to load_lora_weights by @younesbelkada in #7058
  • [Core] pass revision in the loading_kwargs. by @sayakpaul in #7019
  • [Examples] Multiple enhancements to the ControlNet training scripts by @sayakpaul in #7096
  • move to uv in the Dockerfiles. by @sayakpaul in #7094
  • Add tests to check configs when using single file loading by @DN6 in #7099
  • denormalize latents with the mean and std if available by @patil-suraj in #7111
  • [Dockerfile] remove uv from docker jax tpu by @sayakpaul in #7115
  • Add EDMEulerScheduler by @patil-suraj in #7109
  • add DPM scheduler with EDM formulation by @patil-suraj in #7120
  • [Docs] Fix typos by @standardAI in #7118
  • DPMSolverMultistep add rescale_betas_zero_snr by @Beinsezii in #7097
  • [Tests] make test steps dependent on certain things and general cleanup of the workflows by @sayakpaul in #7026
  • fix kwarg in the SDXL LoRA DreamBooth by @sayakpaul in #7124
  • [Diffusers CI] Switch slow test runners by @DN6 in #7123
  • [stalebot] don't close the issue if the stale label is removed by @yiyixuxu in #7106
  • refactor: move model helper function in pipeline to a mixin class by @ultranity in #6571
  • [docs] unet type hints by @a-r-r-o-w in #7134
  • use uv for installing stuff in the workflows. by @sayakpaul in #7116
  • limit documentation workflow runs for relevant changes. by @sayakpaul in #7125
  • add: support for notifying the maintainers about the docker ci status. by @sayakpaul in #7113
  • Fix setting fp16 dtype in AnimateDiff convert script. by @DN6 in #7127
  • [Docs] Fix typos by @standardAI in #7131
  • [ip-adapter] refactor prepare_ip_adapter_image_embeds and skip load image_encoder by @yiyixuxu in #7016
  • [CI] fix path filtering in the documentation workflows by @sayakpaul in #7153
  • [Urgent][Docker CI] pin uv version for now and a minor change in the Slack notification by @sayakpaul in #7155
  • Fix LCM benchmark test by @sayakpaul in #7158
  • [CI] Remove max parallel flag on slow test runners by @DN6 in #7162
  • Fix vaeencodingsfn hash in traintexttoimagesdxl.py by @lhoestq in #7171
  • fix: loading problem for sdxl lora dreambooth by @sayakpaul in #7166
  • Map speedup by @kopyl in #6745
  • [stalebot] fix a bug by @yiyixuxu in #7156
  • Support EDM-style training in DreamBooth LoRA SDXL script by @sayakpaul in #7126
  • Fix PixArt 256px inference by @lawrence-cj in #6789
  • [ip-adapter] fix problem using embeds with the plus version of ip adapters by @asomoza in #7189
  • feat: add ip adapter benchmark by @sayakpaul in #6936
  • [Docs] more elaborate example for peft torch.compile by @sayakpaul in #7161
  • adding callback_on_step_end for StableDiffusionLDM3DPipeline by @rootonchair in #7149
  • Update requirements.txt to remove huggingface-cli by @sayakpaul in #7202
  • [advanced dreambooth lora sdxl] add DoRA training feature by @linoytsaban in #7072
  • FIx torch and cuda version in ONNX tests by @DN6 in #7164
  • [training scripts] add tags of diffusers-training by @linoytsaban in #7206
  • fix a bug in from_config by @yiyixuxu in #7192
  • Fix: UNet2DModel::init type hints; fixes issue #4806 by @fpgaminer in #7175
  • Fix typos by @standardAI in #7181
  • Enable PyTorch's FakeTensorMode for EulerDiscreteScheduler scheduler by @thiagocrepaldi in #7151
  • [docs] Improve SVD pipeline docs by @a-r-r-o-w in #7087
  • [Docs] Update callback.md code example by @rootonchair in #7150
  • [Core] errors should be caught as soon as possible. by @sayakpaul in #7203
  • [Community] PromptDiffusion Pipeline by @iczaw in #6752
  • add TCD Scheduler by @mhh0318 in #7174
  • SDXL Turbo support and example launch by @bram-w in #6473
  • [bug] Fix float/int guidance scale not working in StableVideoDiffusionPipeline by @JinayJain in #7143
  • [Pipiline] Wuerstchen v3 aka Stable Cascasde pipeline by @kashif in #6487
  • Update traindreamboothlorasdxladvanced.py by @landmann in #7227
  • [Core] move out the utilities from pipeline_utils.py by @sayakpaul in #7234
  • Refactor Prompt2Prompt: Inherit from DiffusionPipeline by @ihkap11 in #7211
  • add DoRA training feature to sdxl dreambooth lora script by @linoytsaban in #7235
  • fix: remove duplicated code in TemporalBasicTransformerBlock. by @AsakusaRinne in #7212
  • [Examples] fix: prior preservation setting in DreamBooth LoRA SDXL script. by @sayakpaul in #7242
  • fix: support for loading playground v2.5 single file checkpoint. by @sayakpaul in #7230
  • Raise an error when trying to use SD Cascade Decoder with dtype bfloat16 and torch < 2.2 by @DN6 in #7244
  • Remove the line. Using it create wrong output by @bimsarapathiraja in #7075
  • [docs] Merge LoRAs by @stevhliu in #7213
  • use self.device by @pravdomil in #6595
  • [docs] Community tips by @stevhliu in #7137
  • [Core] throw error when patch inputs and layernorm are provided for Transformers2D by @sayakpaul in #7200
  • [Tests] fix: VAE tiling tests when setting the right device by @sayakpaul in #7246
  • [Utils] Improve " # Copied from ..." statements in the pipelines by @sayakpaul in #6917
  • [Easy] fix: savemodelcard utility of the DreamBooth SDXL LoRA script by @sayakpaul in #7258
  • Make mid block optional for flax UNet by @mar-muel in #7083
  • Solve missing clip_sample implementation in FlaxDDIMScheduler. by @hi-sushanta in #7017
  • [Tests] fix config checking tests by @sayakpaul in #7247
  • [docs] IP-Adapter image embedding by @stevhliu in #7226
  • Adds denoising_end parameter to ControlNetPipeline for SDXL by @UmerHA in #6175
  • Add npu support by @MengqingCao in #7144
  • [Community Pipeline] Skip Marigold depth_colored with color_map=None by @qqii in #7170
  • update the signature of fromsinglefile by @yiyixuxu in #7216
  • [UNetSpatioTemporalCondition] fix default numattentionheads in unetspatiotemporalcondition by @Wang-Xiaodong1899 in #7205
  • [docs/nits] Fix return values based on return_dict and minor doc updates by @a-r-r-o-w in #7105
  • [Chore] remove tf mention by @sayakpaul in #7245
  • Fix gmflow_dir by @pravdomil in #6583
  • Support latentsmean and latentsstd by @haofanwang in #7132
  • Inline InputPadder by @pravdomil in #6582
  • [Dockerfiles] add: a workflow to check if docker containers can be built in case of modifications by @sayakpaul in #7129
  • instruct pix2pix pipeline: remove sigma scaling when computing classifier free guidance by @erliding in #7006
  • Change export_to_video default by @DN6 in #6990
  • [Chore] switch to logger.warning by @sayakpaul in #7289
  • [LoRA] use the PyTorch classes wherever needed and start depcrecation cycles by @sayakpaul in #7204
  • Add single file support for Stable Cascade by @DN6 in #7274
  • Fix passing pooled prompt embeds to Cascade Decoder and Combined Pipeline by @DN6 in #7287
  • Fix loading Img2Img refiner components in from_single_file by @DN6 in #7282
  • [Chore] clean residue from copy-pasting in the UNet single file loader by @sayakpaul in #7295
  • Update Cascade documentation by @DN6 in #7257
  • Update Stable Cascade Conversion Scripts by @DN6 in #7271
  • [Pipeline] Add LEDITS++ pipelines by @manuelbrack in #6074
  • [PyPI publishing] feat: automate the process of pypi publication to some extent. by @sayakpaul in #7270
  • add: support for notifying maintainers about the nightly test status by @sayakpaul in #7117
  • Fix Wrong Text-encoder Grad Setting in Custom_Diffusion Training by @Rbrq03 in #7302
  • Add Intro page of TCD by @mhh0318 in #7259
  • Fix typos in UNet2DConditionModel documentation by @alexanderbonnet in #7291
  • Change step_offset scheduler docstrings by @Beinsezii in #7128
  • update getorderlist if statement by @kghamilton89 in #7309
  • add: pytest log installation by @sayakpaul in #7313
  • [Tests] Fix incorrect constant in VAE scaling test. by @DN6 in #7301
  • log loss per image by @noskill in #7278
  • add edm schedulers in doc by @patil-suraj in #7319
  • [Advanced DreamBooth LoRA SDXL] Support EDM-style training (follow up of #7126) by @linoytsaban in #7182
  • Update Cascade Tests by @DN6 in #7324
  • Release: v0.27.0 by @DN6 (direct commit on v0.27.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @ihkap11
    • Fix diffusers import prompt2prompt (#6927)
    • Refactor Prompt2Prompt: Inherit from DiffusionPipeline (#7211)
  • @ustcuna
    • [Community Pipelines]Accelerate inference of stable diffusion xl (SDXL) by IPEX on CPU (#6683)
  • @rootonchair
    • IP-Adapter support for StableDiffusionXLControlNetInpaintPipeline (#6941)
    • Modularize Dreambooth LoRA SD inferencing during and after training (#6654)
    • Modularize Dreambooth LoRA SDXL inferencing during and after training (#6655)
    • adding callback_on_step_end for StableDiffusionLDM3DPipeline (#7149)
    • [Docs] Update callback.md code example (#7150)
  • @standardAI
    • Fix typos in texttoimage examples (#7050)
    • [Refactor] save_model_card function in text_to_image examples (#7051)
    • Fix typos (#7068)
    • [Refactor] StableDiffusionReferencePipeline inheriting from DiffusionPipeline (#7071)
    • Fix EMA in traintexttoimagesdxl.py (#7048)
    • [Docs] Fix typos (#7118)
    • [Docs] Fix typos (#7131)
    • Fix typos (#7181)
  • @a-r-r-o-w
    • IPAdapterTesterMixin (#6862)
    • Fix truthy-ness condition in pipelines that use denoising_start (#6912)
    • [Community] Bug fix + Latest IP-Adapter impl. for AnimateDiff img2vid/controlnet (#7086)
    • [docs] unet type hints (#7134)
    • [docs] Improve SVD pipeline docs (#7087)
    • [docs/nits] Fix return values based on return_dict and minor doc updates (#7105)
  • @ultranity
    • refactor: move model helper function in pipeline to a mixin class (#6571)
  • @iczaw
    • [Community] PromptDiffusion Pipeline (#6752)
  • @mhh0318
    • add TCD Scheduler (#7174)
    • Add Intro page of TCD (#7259)
  • @manuelbrack
    • [Pipeline] Add LEDITS++ pipelines (#6074)

- Python
Published by sayakpaul almost 2 years ago

diffusers - v0.26.3: Patch release to fix DPMSolverSinglestepScheduler and configuring VAE from single file mixin

All commits

  • Fix configuring VAE from single file mixin by @DN6 in #6950
  • [DPMSolverSinglestepScheduler] correct get_order_list for solver_order=2and lower_order_final=True by @yiyixuxu in #6953

- Python
Published by yiyixuxu about 2 years ago

diffusers - v0.26.2: Patch fix for adding `self.use_ada_layer_norm_*` params back to `BasicTransformerBlock`

In v0.26.0, we introduced a bug 🐛 in the BasicTransformerBlock by removing some boolean flags. This caused many popular libraries tomesd to break. We have fixed that in this release. Thanks to @vladmandic for bringing this to our attention.

All commits

  • add self.use_ada_layer_norm_* params back to BasicTransformerBlock by @yiyixuxu in #6841

- Python
Published by sayakpaul about 2 years ago

diffusers - v0.26.1: Patch release to fix `torchvision` dependency

In the v0.26.0 release, we slipped in the torchvision library as a required library, which shouldn't have been the case. This is now fixed.

All commits

  • add istorchvisionavailable by @yiyixuxu in #6800

- Python
Published by sayakpaul about 2 years ago

diffusers - v0.26.0: New video pipelines, single-file checkpoint revamp, multi IP-Adapter inference with multiple images

This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more.

I2VGenXL

I2VGenXL is an image-to-video pipeline, proposed in I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.

```python import torch from diffusers import I2VGenXLPipeline from diffusers.utils import exporttogif, load_image

repoid = "ali-vilab/i2vgen-xl" pipeline = I2VGenXLPipeline.frompretrained(repoid, torchdtype=torch.float16).to("cuda") pipeline.enablemodelcpu_offload()

imageurl = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgenxlimages/img0001.jpg" image = loadimage(imageurl).convert("RGB") prompt = "A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style." negativeprompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms" generator = torch.manualseed(8888)

frames = pipeline( prompt=prompt, image=image, numinferencesteps=50, negativeprompt=negativeprompt, generator=generator, ).frames exporttogif(frames[0], "i2v.gif") ```

masterpiece, bestquality, sunset.
library

📜 Check out the docs here.

PIA

PIA is a Personalized Image Animator, that aligns with condition images, controls motion by text, and is compatible with various T2I models without specific tuning. PIA uses a base T2I model with temporal alignment layers for image animation. A key component of PIA is the condition module, which transfers appearance information for individual frame synthesis in the latent space, thus allowing a stronger focus on motion alignment. PIA was introduced in PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.

```python import torch from diffusers import ( EulerDiscreteScheduler, MotionAdapter, PIAPipeline, ) from diffusers.utils import exporttogif, load_image

adapter = MotionAdapter.frompretrained("openmmlab/PIA-condition-adapter") pipe = PIAPipeline.frompretrained("SG161222/RealisticVisionV6.0B1noVAE", motionadapter=adapter, torchdtype=torch.float16)

pipe.scheduler = EulerDiscreteScheduler.fromconfig(pipe.scheduler.config) pipe.enablemodelcpuoffload() pipe.enablevaeslicing()

image = loadimage( "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat6.png?download=true" ) image = image.resize((512, 512)) prompt = "cat in a field" negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"

generator = torch.Generator("cpu").manualseed(0) output = pipe(image=image, prompt=prompt, generator=generator) frames = output.frames[0] exportto_gif(frames, "pia-animation.gif") ```

masterpiece, bestquality, sunset.
cat in a field

📜 Check out the docs here.

Multiple IP-Adapters + Multiple reference images support (“Instant LoRA” Feature)

IP-Adapters are becoming quite popular, so we have added support for performing inference multiple IP-Adapters and multiple reference images! Thanks to @asomoza for their help. Get started with the code below:

```python import torch from diffusers import AutoPipelineForText2Image, DDIMScheduler from transformers import CLIPVisionModelWithProjection from diffusers.utils import load_image

imageencoder = CLIPVisionModelWithProjection.frompretrained( "h94/IP-Adapter", subfolder="models/imageencoder", torchdtype=torch.float16, )

pipeline = AutoPipelineForText2Image.frompretrained( "stabilityai/stable-diffusion-xl-base-1.0", torchdtype=torch.float16, imageencoder=imageencoder, ) pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)

pipeline.loadipadapter("h94/IP-Adapter", subfolder="sdxlmodels", weightname=["ip-adapter-plussdxlvit-h.safetensors", "ip-adapter-plus-facesdxlvit-h.safetensors"]) pipeline.setipadapter_scale([0.7, 0.3])

pipeline.enablemodelcpu_offload()

faceimage = loadimage("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")

stylefolder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/styleziggy" styleimages = [loadimage(f"{style_folder}/img{i}.png") for i in range(10)]

generator = torch.Generator(device="cpu").manual_seed(0)

image = pipeline( prompt="wonderwoman", ipadapterimage=[styleimages, faceimage], negativeprompt="monochrome, lowres, bad anatomy, worst quality, low quality", numinference_steps=50 generator=generator, ).images[0] ```

Reference style images:

Reference face Image Output Image

📜 Check out the docs here.

Single-file checkpoint loading

from_single_file() utility has been refactored for better readability and to follow similar semantics as from_pretrained() . Support for loading single file checkpoints and configs from URLs has also been added.

DPM scheduler fix

We introduced a fix for DPM schedulers, so now you can use it with SDXL to generate high-quality images in fewer steps than the Euler scheduler.

Apart from these, we have done a myriad of refactoring to improve the library design and will continue to do so in the coming days.

All commits

  • [docs] Fix missing API function by @stevhliu in #6604
  • Fix failing tests due to Posix Path by @DN6 in #6627
  • Update convertfromckpt.py / read checkpoint config yaml contents by @spezialspezial in #6633
  • [Community] Experimental AnimateDiff Image to Video (open to improvements) by @a-r-r-o-w in #6509
  • refactor: extract init/forward function in UNet2DConditionModel by @ultranity in #6478
  • Modularize InstructPix2Pix SDXL inferencing during and after training in examples by @sang-k in #6569
  • Fixed the bug related to saving DeepSpeed models. by @HelloWorldBeginner in #6628
  • fix DPM Scheduler with use_karras_sigmas option by @yiyixuxu in #6477
  • fix SDXL-kdiffusion tests by @yiyixuxu in #6647
  • add paddingmaskcrop to all inpaint pipelines by @rootonchair in #6360
  • add Sa-Solver by @lawrence-cj in #5975
  • Add tearDown method to LoRA tests. by @DN6 in #6660
  • [Diffusion DPO] apply fixes from #6547 by @sayakpaul in #6668
  • Update README by @standardAI in #6669
  • [Big refactor] move unets to unets module 🦋 by @sayakpaul in #6630
  • Standardise outputs for video pipelines by @DN6 in #6626
  • fix dpm related slow test failure by @yiyixuxu in #6680
  • [Tests] Test for passing local config file to from_single_file() by @sayakpaul in #6638
  • [Refactor] Update from single file by @DN6 in #6428
  • [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow by @ayushtues in #6057
  • Add InstantID Pipeline by @haofanwang in #6673
  • [Docs] update: tutorials ja | AUTOPIPELINE.md by @YasunaCoffee in #6629
  • [Fix bugs] pipelinecontrolnetsd_xl.py by @haofanwang in #6653
  • SD 1.5 Support For Advanced Lora Training (traindreamboothlorasdxladvanced.py) by @brandostrong in #6449
  • AnimateDiff Video to Video by @a-r-r-o-w in #6328
  • [docs] UViT2D by @stevhliu in #6643
  • Correct sigmas cpu settings by @patrickvonplaten in #6708
  • [docs] AnimateDiff Video-to-Video by @a-r-r-o-w in #6712
  • fix community README by @a-r-r-o-w in #6645
  • fix custom diffusion training with concept list by @AIshutin in #6710
  • Add IP Adapters to slow tests by @DN6 in #6714
  • Move tests for SD inference variant pipelines into their own modules by @DN6 in #6707
  • Add Community Example Consistency Training Script by @dg845 in #6717
  • Add UFOGenScheduler to Community Examples by @dg845 in #6650
  • [Hub] feat: explicitly tag to diffusers when using pushtohub by @sayakpaul in #6678
  • Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator by @thuliu-yt16 in #6307
  • changed to posix unet by @gzguevara in #6719
  • Change os.path to pathlib Path by @Stepheni12 in #6737
  • correct hflip arg by @sayakpaul in #6743
  • Add unloadtextualinversion method by @fabiorigano in #6656
  • [Core] move transformer scripts to transformers modules by @sayakpaul in #6747
  • Update lora.md with a more accurate description of rank by @xhedit in #6724
  • Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. by @sajadn in #6751
  • udpate ip-adapter slow tests by @yiyixuxu in #6760
  • Update export to video to support new tensor_to_vid function in video pipelines by @DN6 in #6715
  • [DDPMScheduler] Load alpha_cumprod to device to avoid redundant data movement. by @woshiyyya in #6704
  • Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten by @dg845 in #6736
  • add note about serialization by @sayakpaul in #6764
  • Update traindiffusiondpo.py by @viettmab in #6754
  • Pin torch < 2.2.0 in test runners by @DN6 in #6780
  • [Kandinsky tests] add is_flaky to testmodelcpuoffloadforward_pass by @sayakpaul in #6762
  • add ipo, hinge and cpo loss to dpo trainer by @kashif in #6788
  • Fix setting scaling factor in VAE config by @DN6 in #6779
  • Add PIA Model/Pipeline by @DN6 in #6698
  • [docs] Add missing parameter by @stevhliu in #6775
  • [IP-Adapter] Support multiple IP-Adapters by @yiyixuxu in #6573
  • [sdxl k-diffusion pipeline]move sigma to device by @yiyixuxu in #6757
  • [Feat] add I2VGenXL for image-to-video generation by @sayakpaul in #6665
  • Release: v0.26.0 by @ (direct commit on v0.26.0-release)
  • fix torchvision import by @patrickvonplaten in #6796

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @a-r-r-o-w
    • [Community] Experimental AnimateDiff Image to Video (open to improvements) (#6509)
    • AnimateDiff Video to Video (#6328)
    • [docs] AnimateDiff Video-to-Video (#6712)
    • fix community README (#6645)
  • @ultranity
    • refactor: extract init/forward function in UNet2DConditionModel (#6478)
  • @lawrence-cj
    • add Sa-Solver (#5975)
  • @ayushtues
    • [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (#6057)
  • @haofanwang
    • Add InstantID Pipeline (#6673)
    • [Fix bugs] pipelinecontrolnetsd_xl.py (#6653)
  • @brandostrong
    • SD 1.5 Support For Advanced Lora Training (traindreamboothlorasdxladvanced.py) (#6449)
  • @dg845
    • Add Community Example Consistency Training Script (#6717)
    • Add UFOGenScheduler to Community Examples (#6650)
    • Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten (#6736)

- Python
Published by sayakpaul about 2 years ago

diffusers - Patch release

Make sure diffusers can correctly be used in offline mode again: https://github.com/huggingface/diffusers/pull/1767#issuecomment-1896194917

  • Respect offline mode when loading pipeline by @Wauplin in #6456
  • Fix offline mode import by @Wauplin in #6467

- Python
Published by patrickvonplaten about 2 years ago

diffusers - v0.25.0: aMUSEd, faster SDXL, interruptable pipelines

aMUSEd

collage_full

aMUSEd is a lightweight text to image model based off of the MUSE architecture. aMUSEd is particularly useful in applications that require a lightweight and fast model, such as generating many images quickly at once. aMUSEd is currently a research release.

aMUSEd is a VQVAE token-based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with MUSE, it uses the smaller text encoder CLIP-L/14 instead of T5-XXL. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.

Text-to-image generation

```python import torch from diffusers import AmusedPipeline

pipe = AmusedPipeline.frompretrained( "amused/amused-512", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

prompt = "cowboy" image = pipe(prompt, generator=torch.manualseed(8)).images[0] image.save("text2image512.png") ```

Image-to-image generation

```python import torch from diffusers import AmusedImg2ImgPipeline from diffusers.utils import load_image

pipe = AmusedImg2ImgPipeline.frompretrained( "amused/amused-512", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

prompt = "apple watercolor" inputimage = ( loadimage( "https://huggingface.co/amused/amused-512/resolve/main/assets/image2image256orig.png" ) .resize((512, 512)) .convert("RGB") )

image = pipe(prompt, inputimage, strength=0.7, generator=torch.manualseed(3)).images[0] image.save("image2image_512.png") ```

Inpainting

```python import torch from diffusers import AmusedInpaintPipeline from diffusers.utils import load_image from PIL import Image

pipe = AmusedInpaintPipeline.frompretrained( "amused/amused-512", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

prompt = "a man with glasses" inputimage = ( loadimage( "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting256orig.png" ) .resize((512, 512)) .convert("RGB") ) mask = ( loadimage( "https://huggingface.co/amused/amused-512/resolve/main/assets/inpainting256_mask.png" ) .resize((512, 512)) .convert("L") )

image = pipe(prompt, inputimage, mask, generator=torch.manualseed(3)).images[0] image.save(f"inpainting_512.png") ```

📜 Docs: https://huggingface.co/docs/diffusers/main/en/api/pipelines/amused

🛠️ Models:

Faster SDXL

We’re excited to present an array of optimization techniques that can be used to accelerate the inference latency of text-to-image diffusion models. All of these can be done in native PyTorch without requiring additional C++ code.

SDXL_Batch_Size__1_Steps__30

These techniques are not specific to Stable Diffusion XL (SDXL) and can be used to improve other text-to-image diffusion models too. Starting from default fp32 precision, we can achieve a 3x speed improvement by applying different PyTorch optimization techniques. We encourage you to check out the detailed docs provided below.

Note: Compared to the default way most people use Diffusers which is fp16 + SDPA, applying all the optimization explained in the blog below yields a 30% speed-up.

📜 Docs: https://huggingface.co/docs/diffusers/main/en/tutorials/fast_diffusion 🌠 PyTorch blog post: https://pytorch.org/blog/accelerating-generative-ai-3/

Interruptible pipelines

Interrupting the diffusion process is particularly useful when building UIs that work with Diffusers because it allows users to stop the generation process if they're unhappy with the intermediate results. You can incorporate this into your pipeline with a callback.

This callback function should take the following arguments: pipe, i, t, and callback_kwargs (this must be returned). Set the pipeline's _interrupt attribute to True to stop the diffusion process after a certain number of steps. You are also free to implement your own custom stopping logic inside the callback.

In this example, the diffusion process is stopped after 10 steps even though num_inference_steps is set to 50.

```python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5") pipe.enablemodelcpuoffload() numinferencesteps = 50

def interruptcallback(pipe, i, t, callbackkwargs): stopidx = 10 if i == stopidx: pipe._interrupt = True

return callback_kwargs

pipe( "A photo of a cat", numinferencesteps=numinferencesteps, callbackonstepend=interruptcallback, ) ```

📜 Docs: https://huggingface.co/docs/diffusers/main/en/using-diffusers/callback

peft in our LoRA training examples

We incorporated peft in all the officially supported training examples concerning LoRA. This greatly simplifies the code and improves readability. LoRA training hasn't been easier, thanks to peft!

More memory-friendly version of LCM LoRA SDXL training

We incorporated best practices from peft to make LCM LoRA training for SDXL more memory-friendly. As such, you don't have to initialize two UNets (teacher and student) anymore. This version also integrates with the datasets library for quick experimentation. Check out this section for more details.

All commits

  • [docs] Fix video link by @stevhliu in #5986
  • Fix LLMGroundedDiffusionPipeline super class arguments by @KristianMischke in #5993
  • Remove a duplicated line? by @sweetcocoa in #6010
  • [examples/advanceddiffusiontraining] bug fixes and improvements for LoRA Dreambooth SDXL advanced training script by @linoytsaban in #5935
  • [advanceddreamboothlorasdxltranining_script] readme fix by @linoytsaban in #6019
  • [docs] Fix SVD video by @stevhliu in #6004
  • [Easy] minor edits to setup.py by @sayakpaul in #5996
  • [From Single File] Allow Text Encoder to be passed by @patrickvonplaten in #6020
  • [Community Pipeline] Regional Prompting Pipeline by @hako-mikan in #6015
  • [logging] Fix assertion bug by @standardAI in #6012
  • [Docs] Update a link by @standardAI in #6014
  • added attentionheaddim, attentiontype, resolutionidx by @charchit7 in #6011
  • fix style by @patrickvonplaten (direct commit on v0.25.0)
  • [Kandinsky 3.0] Follow-up TODOs by @yiyixuxu in #5944
  • [schedulers] create self.sigmas during init by @yiyixuxu in #6006
  • Post Release: v0.24.0 by @patrickvonplaten in #5985
  • LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft by @TonyLianLong in #6023
  • adapt PixArtAlphaPipeline for pixart-lcm model by @lawrence-cj in #5974
  • [PixArt Tests] remove fast tests from slow suite by @sayakpaul in #5945
  • [LoRA serialization] fix: duplicate unet prefix problem. by @sayakpaul in #5991
  • [advanced dreambooth lora sdxl training script] improve help tags by @linoytsaban in #6035
  • fix StableDiffusionTensorRT super args error by @gujingit in #6009
  • Update valueguidedsampling.py by @Parth38 in #6027
  • Update Tests Fetcher by @DN6 in #5950
  • Add variant argument to dreambooth lora sdxl advanced by @levi in #6021
  • [Feature] Support IP-Adapter Plus by @okotaku in #5915
  • [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ by @RuoyiDu in #6022
  • [advanced dreambooth lora training script][bugfix] change tokenabstraction type to str by @linoytsaban in #6040
  • [docs] Add Kandinsky 3 by @stevhliu in #5988
  • [docs] #Copied from mechanism by @stevhliu in #6007
  • Move kandinsky convert script by @DN6 in #6047
  • Pin Ruff Version by @DN6 in #6059
  • Ldm unet convert fix by @DN6 in #6038
  • Fix demofusion by @radames in #6049
  • [From single file] remove depr warning by @patrickvonplaten in #6043
  • [advanceddreamboothlorasdxltranining_script] save embeddings locally fix by @apolinario in #6058
  • Device agnostic testing by @arsalanu in #5612
  • [feat] allow SDXL pipeline to run with fused QKV projections by @sayakpaul in #6030
  • fix by @DN6 (direct commit on v0.25.0)
  • Use CC12M for LCM WDS training example by @pcuenca in #5908
  • Disable Tests Fetcher by @DN6 in #6060
  • [Advanced Diffusion Training] Cache latents to avoid VAE passes for every training step by @apolinario in #6076
  • [Euler Discrete] Fix sigma by @patrickvonplaten in #6078
  • Harmonize HF environment variables + deprecate useauthtoken by @Wauplin in #6066
  • [docs] SDXL Turbo by @stevhliu in #6065
  • Add ControlNet-XS support by @UmerHA in #5827
  • Fix typing inconsistency in Euler discrete scheduler by @iabaldwin in #6052
  • [PEFT] Adapt example scripts to use PEFT by @younesbelkada in #5388
  • Fix clearing backend cache from device agnostic testing by @DN6 in #6075
  • [Community] AnimateDiff + Controlnet Pipeline by @a-r-r-o-w in #5928
  • EulerDiscreteScheduler add rescale_betas_zero_snr by @Beinsezii in #6024
  • Add support for IPAdapterFull by @fabiorigano in #5911
  • Fix a bug in add_noise function by @yiyixuxu in #6085
  • [Advanced Diffusion Script] Add Widget default text by @apolinario in #6100
  • [Advanced Training Script] Fix pipe example by @apolinario in #6106
  • IP-Adapter for StableDiffusionControlNetImg2ImgPipeline by @charchit7 in #5901
  • IP adapter support for most pipelines by @a-r-r-o-w in #5900
  • Correct type annotation for VaeImageProcessor.numpy_to_pil by @edwardwli in #6111
  • [Docs] Fix typos by @standardAI in #6122
  • [feat: Benchmarking Workflow] add stuff for a benchmarking workflow by @sayakpaul in #5839
  • [Community] Add SDE Drag pipeline by @Monohydroxides in #6105
  • [docs] IP-Adapter API doc by @stevhliu in #6140
  • Add missing subclass docs, Fix broken example in SD_safe by @a-r-r-o-w in #6116
  • [advanced dreambooth lora sdxl training script] load pipeline for inference only if validation prompt is used by @linoytsaban in #6171
  • [docs] Add missing \ in lora.md by @pierd in #6174
  • [Sigmas] Keep sigmas on CPU by @patrickvonplaten in #6173
  • LoRA test fixes by @DN6 in #6163
  • Add PEFT to training deps by @DN6 in #6148
  • Clean Up Comments in LCM(-LoRA) Distillation Scripts. by @dg845 in #6145
  • Compile test fix by @DN6 in #6104
  • [LoRA] add an error message when dealing with bestguessweightname ofline by @sayakpaul in #6184
  • [Core] feat: enable fused attention projections for other SD and SDXL pipelines by @sayakpaul in #6179
  • [Benchmarks] fix: lcm benchmarking reporting by @sayakpaul in #6198
  • [Refactor autoencoders] feat: introduce autoencoders module by @sayakpaul in #6129
  • Fix the test script in examples/texttoimage/README.md by @krahets in #6209
  • Nit fix to training params by @osanseviero in #6200
  • [Training] remove depcreated method from lora scripts. by @sayakpaul in #6207
  • Fix SDXL Inpainting from single file with Refiner Model by @DN6 in #6147
  • Fix possible re-conversion issues after extracting from safetensors by @d8ahazard in #6097
  • Fix t2i. blog url by @abinthomasonline in #6205
  • [Text-to-Video] Clean up pipeline by @patrickvonplaten in #6213
  • [Torch Compile] Fix torch compile for svd vae by @patrickvonplaten in #6217
  • Deprecate Pipelines by @DN6 in #6169
  • Update README.md by @TilmannR in #6191
  • Support img2img and inpaint in lpw-xl by @a-r-r-o-w in #6114
  • Update traintexttoimagelora.py by @haofanwang in #6144
  • [SVD] Fix guidance scale by @patrickvonplaten in #6002
  • Slow Test for Pipelines minor fixes by @DN6 in #6221
  • Add converter method for ip adapters by @fabiorigano in #6150
  • offload the optional module image_encoder by @yiyixuxu in #6151
  • fix: init for vae during pixart tests by @sayakpaul in #6215
  • [T2I LoRA training] fix: unscale fp16 gradient problem by @sayakpaul in #6119
  • ControlNetXS fixes. by @DN6 in #6228
  • add peft dependency to fast push tests by @sayakpaul in #6229
  • [refactor embeddings]pixart-alpha by @yiyixuxu in #6212
  • [Docs] Fix a code example in the ControlNet Inpainting documentation by @raven38 in #6236
  • [docs] Batched seeds by @stevhliu in #6237
  • [Fix] Fix Regional Prompting Pipeline by @hako-mikan in #6188
  • EulerAncestral add rescale_betas_zero_snr by @Beinsezii in #6187
  • [Refactor upsamplers and downsamplers] separate out upsamplers and downsamplers. by @sayakpaul in #6128
  • Bump transformers from 4.34.0 to 4.36.0 in /examples/research_projects/realfill by @dependabot[bot] in #6255
  • fix: unscale fp16 gradient problem & potential error by @lvzii in #6086)
  • [Refactor] move diffedit out of stable_diffusion by @sayakpaul in #6260
  • move attend and excite out of stable_diffusion by @sayakpaul (direct commit on v0.25.0)
  • Revert "move attend and excite out of stable_diffusion" by @sayakpaul (direct commit on v0.25.0)
  • [Training] remove depcreated method from lora scripts again by @Yimi81 in #6266
  • [Refactor] move k diffusion out of stable_diffusion by @sayakpaul in #6267
  • [Refactor] move gligen out of stable diffusion. by @sayakpaul in #6265
  • [Refactor] move sag out of stable_diffusion by @sayakpaul in #6264
  • TST Fix LoRA test that fails with PEFT >= 0.7.0 by @BenjaminBossan in #6216
  • [Refactor] move attend and excite out of stable_diffusion. by @sayakpaul in #6261
  • [Refactor] move panorama out of stable_diffusion by @sayakpaul in #6262
  • [Deprecated pipelines] remove pix2pix zero from init by @sayakpaul in #6268
  • [Refactor] move ldm3d out of stable_diffusion. by @sayakpaul in #6263
  • open muse by @williamberman in #5437
  • Remove ONNX inpaint legacy by @DN6 in #6269
  • Remove peft tests from old lora backend tests by @DN6 in #6273
  • Allow diffusers to load with Flax, w/o PyTorch by @pcuenca in #6272
  • [Community Pipeline] Add Marigold Monocular Depth Estimation by @markkua in #6249
  • Fix Prodigy optimizer in SDXL Dreambooth script by @apolinario in #6290
  • [LoRA PEFT] fix LoRA loading so that correct alphas are parsed by @sayakpaul in #6225
  • LoRA Unfusion test fix by @DN6 in #6291
  • Fix typos in the ValueError for a nested image list as StableDiffusionControlNetPipeline input. by @celestialphineas in #6286
  • fix RuntimeError: Input type (float) and bias type (c10::Half) should be the same in traintexttoimagelora.py by @mwkldeveloper in #6259
  • fix: t2i apdater paper link by @sayakpaul in #6314
  • fix: lora peft dummy components by @sayakpaul in #6308
  • [Tests] Speed up example tests by @sayakpaul in #6319
  • fix: cannot set guidance_scale by @Jannchie in #6326
  • Change LCM-LoRA README Script Example Learning Rates to 1e-4 by @dg845 in #6304
  • [Peft] fix saving / loading when unet is not "unet" by @kashif in #6046
  • [Wuerstchen] fix fp16 training and correct lora args by @kashif in #6245
  • [docs] fix: animatediff docs by @sayakpaul in #6339
  • [Training] Add datasets version of LCM LoRA SDXL by @sayakpaul in #5778
  • [Peft / Lora] Add adapter_names in fuse_lora by @younesbelkada in #5823
  • [Diffusion fast] add doc for diffusion fast by @sayakpaul in #6311
  • Add rescalebetaszero_snr Argument to DDPMScheduler by @dg845 in #6305
  • Interruptable Pipelines by @DN6 in #5867
  • Update Animatediff docs by @DN6 in #6341
  • Add AnimateDiff conversion scripts by @DN6 in #6340
  • amused other pipelines docs by @williamberman in #6343
  • [Docs] fix: video rendering on svd. by @sayakpaul in #6330
  • [SDXL-IP2P] Update README_sdxl, Replace the link for wandb log with the correct run by @priprapre in #6270
  • adding auto1111 features to inpainting pipeline by @yiyixuxu in #6072
  • Remove unused parameters and fixed FutureWarning by @Justin900429 in #6317
  • amused update links to new repo by @williamberman in #6344
  • [LoRA] make LoRAs trained with peft loadable when peft isn't installed by @sayakpaul in #6306
  • Move ControlNetXS into Community Folder by @DN6 in #6316
  • fix: use retrieve_latents by @Jannchie in #6337
  • Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. by @dg845 in #6279
  • Fix "pushtohub only create repo in consistency model lora SDXL training script" by @aandyw in #6102
  • Fix chunking in SVD by @DN6 in #6350
  • Add PEFT to advanced training script by @apolinario in #6294
  • Release: v0.25.0 by @sayakpaul (direct commit on v0.25.0)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @hako-mikan
    • [Community Pipeline] Regional Prompting Pipeline (#6015)
    • [Fix] Fix Regional Prompting Pipeline (#6188)
  • @TonyLianLong
    • LLMGroundedDiffusionPipeline: inherit from DiffusionPipeline and fix peft (#6023)
  • @okotaku
    • [Feature] Support IP-Adapter Plus (#5915)
  • @RuoyiDu
    • [Community Pipeline] DemoFusion: Democratising High-Resolution Image Generation With No $$$ (#6022)
  • @UmerHA
    • Add ControlNet-XS support (#5827)
  • @a-r-r-o-w
    • [Community] AnimateDiff + Controlnet Pipeline (#5928)
    • IP adapter support for most pipelines (#5900)
    • Add missing subclass docs, Fix broken example in SD_safe (#6116)
    • Support img2img and inpaint in lpw-xl (#6114)
  • @Monohydroxides
    • [Community] Add SDE Drag pipeline (#6105)
  • @dg845
    • Clean Up Comments in LCM(-LoRA) Distillation Scripts. (#6145)
    • Change LCM-LoRA README Script Example Learning Rates to 1e-4 (#6304)
    • Add rescalebetaszero_snr Argument to DDPMScheduler (#6305)
    • Fix LCM distillation bug when creating the guidance scale embeddings using multiple GPUs. (#6279)
  • @markkua
    • [Community Pipeline] Add Marigold Monocular Depth Estimation (#6249)

- Python
Published by sayakpaul about 2 years ago

diffusers - v0.24.0: IP Adapters, Kandinsky 3.0, Stable Video Diffusion, SDXL Turbo

Stable Video Diffusion, SDXL Turbo, IP Adapters, Kandinsky 3.0

Stable Diffusion Video

Stable Video Diffusion is a powerful image-to-video generation model that can generate high resolution (576x1024) 2-4 seconds videos conditioned on the input image.

Image to Video Generation

There are two variants of SVD. SVD and SVD-XT. The SVD checkpoint is trained to generate 14 frames and the SVD-XT checkpoint is further finetuned to generate 25 frames.

You need to condition the generation on an initial image, as follows:

```python import torch

from diffusers import StableVideoDiffusionPipeline from diffusers.utils import loadimage, exportto_video

pipe = StableVideoDiffusionPipeline.frompretrained( "stabilityai/stable-video-diffusion-img2vid-xt", torchdtype=torch.float16, variant="fp16" ) pipe.enablemodelcpu_offload()

Load the conditioning image

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true") image = image.resize((1024, 576))

generator = torch.manualseed(42) frames = pipe(image, decodechunk_size=8, generator=generator).frames[0]

exporttovideo(frames, "generated.mp4", fps=7) ```

Since generating videos is more memory intensive, we can use the decode_chunk_size argument to control how many frames are decoded at once. This will reduce the memory usage. It's recommended to tweak this value based on your GPU memory. Setting decode_chunk_size=1 will decode one frame at a time and will use the least amount of memory, but the video might have some flickering.

Additionally, we also use model cpu offloading to reduce the memory usage.

rocket_generated

SDXL Turbo

SDXL Turbo is an adversarial time-distilled Stable Diffusion XL (SDXL) model capable of running inference in as little as 1 step. Also, it does not use classifier-free guidance, further increasing its speed. On a good consumer GPU, you can now generate an image in just 100ms.

Text-to-Image

For text-to-image, pass a text prompt. By default, SDXL Turbo generates a 512x512 image, and that resolution gives the best results. You can try setting the height and width parameters to 768x768 or 1024x1024, but you should expect quality degradations when doing so.

Make sure to set guidance_scale to 0.0 to disable, as the model was trained without it. A single inference step is enough to generate high quality images. Increasing the number of steps to 2, 3 or 4 should improve image quality.

```py from diffusers import AutoPipelineForText2Image import torch

pipelinetext2image = AutoPipelineForText2Image.frompretrained("stabilityai/sdxl-turbo", torchdtype=torch.float16, variant="fp16") pipelinetext2image = pipeline_text2image.to("cuda")

prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."

image = pipelinetext2image(prompt=prompt, guidancescale=0.0, numinferencesteps=1).images[0] image ```

generated image of a racoon in a robe

Image-to-image

For image-to-image generation, make sure that num_inference_steps * strength is larger or equal to 1. The image-to-image pipeline will run for int(num_inference_steps * strength) steps, e.g. 0.5 * 2.0 = 1 step in our example below.

```py from diffusers import AutoPipelineForImage2Image from diffusers.utils import loadimage, makeimage_grid

use from_pipe to avoid consuming additional memory when loading a checkpoint

pipeline = AutoPipelineForImage2Image.frompipe(pipelinetext2image).to("cuda")

initimage = loadimage("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") initimage = initimage.resize((512, 512))

prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"

image = pipeline(prompt, image=initimage, strength=0.5, guidancescale=0.0, numinferencesteps=2).images[0] makeimagegrid([init_image, image], rows=1, cols=2) ```

Image-to-image generation sample using SDXL Turbo

IP Adapters

IP Adapters have shown to be remarkably powerful at images conditioned on other images.

Thanks to @okotaku, we have added IP adapters to the most important pipelines allowing you to combine them for a variety of different workflows, e.g. they work with Img2Img2, ControlNet, and LCM-LoRA out of the box.

LCM-LoRA

```python from diffusers import DiffusionPipeline, LCMScheduler import torch from diffusers.utils import load_image

modelid = "sd-dreambooth-library/herge-style" lcmlora_id = "latent-consistency/lcm-lora-sdv1-5"

pipe = DiffusionPipeline.frompretrained(modelid, torch_dtype=torch.float16)

pipe.loadipadapter("h94/IP-Adapter", subfolder="models", weightname="ip-adaptersd15.bin") pipe.loadloraweights(lcmloraid) pipe.scheduler = LCMScheduler.fromconfig(pipe.scheduler.config) pipe.enablemodelcpuoffload()

prompt = "best quality, high quality" image = loadimage("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png") images = pipe( prompt=prompt, ipadapterimage=image, numinferencesteps=4, guidancescale=1, ).images[0] ``` yiyi_test_2_out

ControlNet

```py from diffusers import StableDiffusionControlNetPipeline, ControlNetModel import torch from diffusers.utils import load_image

controlnetmodelpath = "lllyasviel/controlv11f1psd15depth" controlnet = ControlNetModel.frompretrained(controlnetmodelpath, torch_dtype=torch.float16)

pipeline = StableDiffusionControlNetPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torchdtype=torch.float16) pipeline.to("cuda")

image = loadimage("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png") depthmap = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png")

pipeline.loadipadapter("h94/IP-Adapter", subfolder="models", weightname="ip-adaptersd15.bin")

generator = torch.Generator(device="cpu").manualseed(33) images = pipeline( prompt='best quality, high quality', image=depthmap, ipadapterimage=image, negativeprompt="monochrome, lowres, bad anatomy, worst quality, low quality", numinferencesteps=50, generator=generator, ).images images[0].save("yiyitest2out.png") ```

ip_image | condition | output | :-------------------------:|:-------------------------:|:-------------------------: statue | depth | yiyi_test_2_out

For more information: - :pointright: https://huggingface.co/docs/diffusers/main/en/using-diffusers/loadingadapters#ip-adapter

Kandinsky 3.0

Kandinsky has released the 3rd version, which has much improved text-to-image alignment thanks to using Flan-T5 as the text encoder.

Text-to-Image

```py from diffusers import AutoPipelineForText2Image import torch

pipe = AutoPipelineForText2Image.frompretrained("kandinsky-community/kandinsky-3", variant="fp16", torchdtype=torch.float16) pipe.enablemodelcpu_offload()

prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."

generator = torch.Generator(device="cpu").manualseed(0) image = pipe(prompt, numinference_steps=25, generator=generator).images[0] ```

Image-to-Image

```py from diffusers import AutoPipelineForImage2Image from diffusers.utils import load_image import torch

pipe = AutoPipelineForImage2Image.frompretrained("kandinsky-community/kandinsky-3", variant="fp16", torchdtype=torch.float16) pipe.enablemodelcpu_offload()

prompt = "A painting of the inside of a subway train with tiny raccoons." image = load_image("https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png")

generator = torch.Generator(device="cpu").manualseed(0) image = pipe(prompt, image=image, strength=0.75, numinference_steps=25, generator=generator).images[0] ```

Check it out: - :point_right: https://huggingface.co/docs/diffusers/main/en/api/pipelines/kandinsky3#kandinsky-3

All commits

  • LCM-LoRA docs by @patil-suraj in #5782
  • [Docs] Update and make improvements by @standardAI in #5819
  • [docs] Fix title by @stevhliu in #5831
  • Improve setup.py and add dependency check by @patrickvonplaten in #5826
  • [Docs] add: japanese sdxl as a reference by @sayakpaul in #5844
  • Set usedforsecurity=False in hashlib methods (FIPS compliance) by @Wauplin in #5790
  • fix memory consistency decoder test by @williamberman in #5828
  • [PEFT] Unpin peft by @patrickvonplaten in #5850
  • Speed up the peft lora unload by @pacman100 in #5741
  • [Tests/LoRA/PEFT] Test also on PEFT / transformers / accelerate latest by @younesbelkada in #5820
  • UnboundLocalError in SDXLInpaint.prepare_latents() by @a-r-r-o-w in #5648
  • [ControlNet] fix import in single file loading by @sayakpaul in #5834
  • [Styling] stylify using ruff by @kashif in #5841
  • [Community] [WIP] LCM Interpolation Pipeline by @a-r-r-o-w in #5767
  • [JAX] Replace uses of jax.devices("cpu") with jax.local_devices(backend="cpu") by @hvaara in #5864
  • [test / peft] Fix silent behaviour on PR tests by @younesbelkada in #5852
  • fix an issue that ipex occupy too much memory, it will not impact per… by @linlifan in #5625
  • Update LCMScheduler Inference Timesteps to be More Evenly Spaced by @dg845 in #5836
  • Revert "[Docs] Update and make improvements" by @standardAI in #5858
  • [docs] Loader APIs by @stevhliu in #5813
  • Update README.md by @co63oc in #5855
  • Add tests fetcher by @DN6 in #5848
  • Addition of new callbacks to controlnets by @a-r-r-o-w in #5812
  • [docs] MusicLDM by @stevhliu in #5854
  • Add features to the Dreambooth LoRA SDXL training script by @linoytsaban in #5508
  • [feat] IP Adapters (author @okotaku ) by @yiyixuxu in #5713
  • [Lora] Seperate logic by @patrickvonplaten in #5809
  • ControlNet+Adapter pipeline, and ControlNet+Adapter+Inpaint pipeline by @affromero in #5869
  • Adds an advanced version of the SD-XL DreamBooth LoRA training script supporting pivotal tuning by @linoytsaban in #5883
  • [bug fix] fix small bug in readme template of sdxl lora training script by @linoytsaban in #5906
  • [bug fix] fix small bug in readme template of sdxl lora training script by @linoytsaban in #5914
  • [Docs] add: 8bit inference with pixart alpha by @sayakpaul in #5814
  • [@cene555][Kandinsky 3.0] Add Kandinsky 3.0 by @patrickvonplaten in #5913
  • [Examples] Allow downloading variant model files by @patrickvonplaten in #5531
  • [Fix: pixart-alpha] random 512px resolution bug by @lawrence-cj in #5842
  • [Core] add support for gradient checkpointing in transformer_2d by @sayakpaul in #5943
  • Deprecate KarrasVeScheduler and ScoreSdeVpScheduler by @a-r-r-o-w in #5269
  • Add Custom Timesteps Support to LCMScheduler and Supported Pipelines by @dg845 in #5874
  • set the model to train state before accelerator prepare by @sywangyi in #5099
  • Avoid computing min() that is expensive when do_normalize is False in the image processor by @ivanprado in #5896
  • Fix LCM Stable Diffusion distillation bug related to parsing unettimecondprojdim by @dg845 in #5893
  • add LoRA weights load and fuse support for IPEX pipeline by @linlifan in #5920
  • Replace multiple variables with one variable. by @hi-sushanta in #5715
  • fix: error on device for lpw_stable_diffusion_xl pipeline if pipe.enable_sequential_cpu_offload() enabled by @VicGrygorchyk in #5885
  • [Vae] Make sure all vae's work with latent diffusion models by @patrickvonplaten in #5880
  • [Tests] Make sure that we don't run tests multiple times by @patrickvonplaten in #5949
  • [Community Pipeline] Diffusion Posterior Sampling for General Noisy Inverse Problems by @tongdaxu in #5939
  • [From_pretrained] Fix warning by @patrickvonplaten in #5948
  • [loadtextualinversion]: allow multiple tokens by @yiyixuxu in #5837
  • [docs] Fix space by @stevhliu in #5898
  • fix: minor typo in docstring by @soumik12345 in #5961
  • [ldm3d] Ldm3d upscaler to community pipeline by @estelleafl in #5870
  • [docs] Update pipeline list by @stevhliu in #5952
  • [Tests] Refactor test_examples.py for better readability by @sayakpaul in #5946
  • added doc for Kandinsky3.0 by @charchit7 in #5937
  • [bug fix] Inpainting for MultiAdapter by @affromero in #5922
  • Rename output_dir argument by @linhqyy in #5916
  • [LoRA refactor] move several state dict conversion utils out of lora.py by @sayakpaul in #5955
  • Support of ip-adapter to the StableDiffusionControlNetInpaintPipeline by @juancopi81 in #5887
  • [docs] LCM training by @stevhliu in #5796
  • Controlnet ssd 1b support by @MarkoKostiv in #5779
  • [Pipeline] Add TextToVideoZeroSDXLPipeline by @vahramtadevosyan in #4695
  • [Wuerstchen] Adapt lora training example scripts to use PEFT by @kashif in #5959
  • Fixed custom module importing on Windows by @PENGUINLIONG in #5891
  • Add SVD by @patil-suraj in #5895
  • [SDXL Turbo] Add some docs by @patrickvonplaten in #5982
  • Fix SVD doc by @patil-suraj in #5983

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @a-r-r-o-w
    • UnboundLocalError in SDXLInpaint.prepare_latents() (#5648)
    • [Community] [WIP] LCM Interpolation Pipeline (#5767)
    • Addition of new callbacks to controlnets (#5812)
    • Deprecate KarrasVeScheduler and ScoreSdeVpScheduler (#5269)
  • @dg845
    • Update LCMScheduler Inference Timesteps to be More Evenly Spaced (#5836)
    • Add Custom Timesteps Support to LCMScheduler and Supported Pipelines (#5874)
    • Fix LCM Stable Diffusion distillation bug related to parsing unettimecondprojdim (#5893)
  • @affromero
    • ControlNet+Adapter pipeline, and ControlNet+Adapter+Inpaint pipeline (#5869)
    • [bug fix] Inpainting for MultiAdapter (#5922)
  • @tongdaxu
    • [Community Pipeline] Diffusion Posterior Sampling for General Noisy Inverse Problems (#5939)
  • @estelleafl
    • [ldm3d] Ldm3d upscaler to community pipeline (#5870)
  • @vahramtadevosyan
    • [Pipeline] Add TextToVideoZeroSDXLPipeline (#4695)

- Python
Published by patrickvonplaten about 2 years ago

diffusers - [Patch release] Make sure we install correct PEFT version

Small patch release to make sure the correct PEFT version is installed.

All commits

  • Improve setup.py and add dependency check by @patrickvonplaten in #5826

- Python
Published by patrickvonplaten over 2 years ago

diffusers - v0.23.0: LCM LoRA, SDXL LCM, Consistency Decoder from DALL-E 3

LCM LoRA, LCM SDXL, Consistency Decoder

LCM LoRA

Latent Consistency Models (LCM) made quite the mark in the Stable Diffusion community by enabling ultra-fast inference. LCM author @luosiallen, alongside @patil-suraj and @dg845, managed to extend the LCM support for Stable Diffusion XL (SDXL) and pack everything into a LoRA.

The approach is called LCM LoRA.

Below is an example of using LCM LoRA, taking just 4 inference steps:

```python from diffusers import DiffusionPipeline, LCMScheduler import torch

modelid = "stabilityai/stable-diffusion-xl-base-1.0" lcmlora_id = "latent-consistency/lcm-lora-sdxl"

pipe = DiffusionPipeline.frompretrained(modelid, variant="fp16", torch_dtype=torch.float16).to("cuda")

pipe.loadloraweights(lcmloraid) pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "close-up photography of old man standing in the rain at night, in a street lit by lamps, leica 35mm summilux" image = pipe( prompt=prompt, numinferencesteps=4, guidance_scale=1, ).images[0] ``` You can combine the LoRA with Img2Img, Inpaint, ControlNet, ...

as well as with other LoRAs 🤯

image (31)

👉 Checkpoints 📜 Docs

If you want to learn more about the approach, please have a look at the following:

LCM SDXL

Continuing the work of Latent Consistency Models (LCM), we've applied the approach to SDXL as well and give you SSD-1B and SDXL fine-tuned checkpoints.

```python from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler import torch

unet = UNet2DConditionModel.frompretrained( "latent-consistency/lcm-sdxl", torchdtype=torch.float16, variant="fp16", ) pipe = DiffusionPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torchdtype=torch.float16 ).to("cuda") pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

generator = torch.manualseed(0) image = pipe( prompt=prompt, numinferencesteps=4, generator=generator, guidancescale=1.0 ).images[0] ```

👉 Checkpoints 📜 Docs

Consistency Decoder

OpenAI open-sourced the consistency decoder used in DALL-E 3. It improves the decoding part in the Stable Diffusion v1 family of models.

```python import torch from diffusers import DiffusionPipeline, ConsistencyDecoderVAE

vae = ConsistencyDecoderVAE.frompretrained("openai/consistency-decoder", torchdtype=pipe.torchdtype) pipe = StableDiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", vae=vae, torch_dtype=torch.float16 ).to("cuda")

pipe("horse", generator=torch.manual_seed(0)).images ```

Find the documentation here to learn more.

All commits

  • [Custom Pipelines] Make sure that community pipelines can use repo revision by @patrickvonplaten in #5659
  • post release (v0.22.0) by @sayakpaul in #5658
  • Add Pixart to AUTOTEXT2IMAGEPIPELINES_MAPPING by @Beinsezii in #5664
  • Update custom diffusion attn processor by @DN6 in #5663
  • Model tests xformers fixes by @DN6 in #5679
  • Update free model hooks by @DN6 in #5680
  • Fix Basic Transformer Block by @DN6 in #5683
  • Explicit torch/flax dependency check by @DN6 in #5673
  • [PixArt-Alpha] fix mask_feature so that precomputed embeddings work with a batch size > 1 by @sayakpaul in #5677
  • Make sure DDPM and diffusers can be used without Transformers by @sayakpaul in #5668
  • [PixArt-Alpha] Support non-square images by @sayakpaul in #5672
  • Improve LCMScheduler by @dg845 in #5681
  • [Docs] Fix typos, improve, update at Using Diffusers' Task page by @standardAI in #5611
  • Replacing the nn.Mish activation function with a get_activation function. by @hi-sushanta in #5651
  • speed up Shap-E fast test by @yiyixuxu in #5686
  • Fix the misaligned pipeline usage in dreamshaper docstrings by @kirill-fedyanin in #5700
  • Fixed issafetensorscompatible() handling of windows path separators by @PhilLab in #5650
  • [LCM] Fix img2img by @patrickvonplaten in #5698
  • [PixArt-Alpha] fix mask feature condition. by @sayakpaul in #5695
  • Fix styling issues by @patrickvonplaten in #5699
  • Add adapter fusing + PEFT to the docs by @apolinario in #5662
  • Fix prompt bug in AnimateDiff by @DN6 in #5702
  • [Bugfix] fix error of peft lora when xformers enabled by @okotaku in #5697
  • Install accelerate from PyPI in PR test runner by @DN6 in #5721
  • consistency decoder by @williamberman in #5694
  • Correct consist dec by @patrickvonplaten in #5722
  • LCM Add Tests by @patrickvonplaten in #5707
  • [LCM] add: locm docs. by @sayakpaul in #5723
  • Add LCM Scripts by @patil-suraj in #5727

- Python
Published by sayakpaul over 2 years ago

diffusers - v0.22.3: Fix PixArtAlpha and LCM Image-to-Image pipelines

🐛 There were some sneaky bugs in the PixArt-Alpha and LCM Image-to-Image pipelines which have been fixed in this release.

All commits

  • [LCM] Fix img2img by @patrickvonplaten in #5698
  • [PixArt-Alpha] fix mask feature condition. by @sayakpaul in #5695

- Python
Published by sayakpaul over 2 years ago

diffusers - Patch Release v0.22.2: Fix Animate Diff, fix DDPM import, Pixart various

  • Fix Basic Transformer Block by @DN6 in #5683
  • [PixArt-Alpha] fix mask_feature so that precomputed embeddings work with a batch size > 1 by @sayakpaul in #5677
  • Make sure DDPM and diffusers can be used without Transformers by @sayakpaul in #5668
  • [PixArt-Alpha] Support non-square images by @sayakpaul in #5672

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Fix community vs. hub pipelines revision

  • [Custom Pipelines] Make sure that community pipelines can use repo revision by @patrickvonplaten

- Python
Published by patrickvonplaten over 2 years ago

diffusers - v0.22.0: LCM, PixArt-Alpha, AnimateDiff, PEFT integration for LoRA, and more

Latent Consistency Models (LCM)

Untitled

LCMs enable a significantly fast inference process for diffusion models. They require far fewer inference steps to produce high-resolution images without compromising the image quality too much. Below is a usage example:

```python import torch from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.frompretrained("SimianLuo/LCMDreamshaperv7", torchdtype=torch.float32)

To save GPU memory, torch.float16 can be used, but it may compromise image quality.

pipe.to(torchdevice="cuda", torchdtype=torch.float32)

prompt = "Self-portrait oil painting, a beautiful cyborg with golden hair, 8k"

Can be set to 1~50 steps. LCM support fast inference even <= 4 steps. Recommend: 1~8 steps.

numinferencesteps = 4

images = pipe(prompt=prompt, numinferencesteps=numinferencesteps, guidance_scale=8.0).images ```

Refer to the documentation to learn more.

LCM comes with both text-to-image and image-to-image pipelines and they were contributed by @luosiallen, @nagolinc, and @dg845.

PixArt-Alpha

header_collage

PixArt-Alpha is a Transformer-based text-to-image diffusion model that rivals the quality of the existing state-of-the-art ones, such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient.

It was trained T5 text embeddings and has a maximum sequence length of 120. Thus, it allows for more detailed prompt inputs, unlocking better quality generations.

Despite the large text encoder, with model offloading, it takes a little under 11GBs of VRAM to run the PixArtAlphaPipeline:

```python from diffusers import PixArtAlphaPipeline import torch

pipelineid = "PixArt-alpha/PixArt-XL-2-1024-MS" pipeline = PixArtAlphaPipeline.frompretrained(pipelineid, torchdtype=torch.float16) pipeline.enablemodelcpu_offload()

prompt = "A small cactus with a happy face in the Sahara desert." image = pipe(prompt).images[0] image.save("sahara.png") ```

Check out the docs to learn more.

AnimateDiff

animatediff-doc

AnimateDiff is a modelling framework that allows you to create videos using pre-existing Stable Diffusion text-to-image models. It achieves this by inserting motion module layers into a frozen text-to-image model and training it on video clips to extract a motion prior.

These motion modules are applied after the ResNet and Attention blocks in the Stable Diffusion UNet. Their purpose is to introduce coherent motion across image frames. To support these modules, we introduce the concepts of a MotionAdapter and a UNetMotionModel. These serve as a convenient way to use these motion modules with existing Stable Diffusion models.

The following example demonstrates how you can utilize the motion modules with an existing Stable Diffusion text-to-image model.

```python import torch from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler from diffusers.utils import exporttogif

Load the motion adapter

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load SD 1.5 based finetuned model

modelid = "SG161222/RealisticVisionV5.1noVAE" pipe = AnimateDiffPipeline.frompretrained(modelid, motionadapter=adapter) scheduler = DDIMScheduler.frompretrained( modelid, subfolder="scheduler", clipsample=False, timestepspacing="linspace", stepsoffset=1 ) pipe.scheduler = scheduler

enable memory savings

pipe.enablevaeslicing() pipe.enablemodelcpu_offload()

output = pipe( prompt=( "masterpiece, bestquality, highlydetailed, ultradetailed, sunset, " "orange sky, warm lighting, fishing boats, ocean waves seagulls, " "rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, " "golden hour, coastal landscape, seaside scenery" ), negativeprompt="bad quality, worse quality", numframes=16, guidancescale=7.5, numinferencesteps=25, generator=torch.Generator("cpu").manualseed(42), ) frames = output.frames[0] exporttogif(frames, "animation.gif") ```

You can convert an existing 2D UNet into a UNetMotionModel:

```python from diffusers import MotionAdapter, UNetMotionModel, UNet2DConditionModel

unet = UNetMotionModel()

Load from an existing 2D UNet and MotionAdapter

unet2D = UNet2DConditionModel.frompretrained("SG161222/RealisticVisionV5.1noVAE", subfolder="unet") motionadapter = MotionAdapter.frompretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load motion adapter here

unetmotion = UNetMotionModel.fromunet2d(unet2D, motion_adapter: Optional = None)

Or load motion modules after init

unetmotion.loadmotionmodules(motionadapter)

freeze all 2D UNet layers except for the motion modules for finetuning

unetmotion.freezeunet2d_params()

Save only motion modules

unetmotion.savemotionmodule(, pushto_hub=True) ```

AnimateDiff also comes with motion LoRA modules, letting you control subtleties:

```python import torch from diffusers import MotionAdapter, AnimateDiffPipeline, DDIMScheduler from diffusers.utils import exporttogif

Load the motion adapter

adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2")

load SD 1.5 based finetuned model

modelid = "SG161222/RealisticVisionV5.1noVAE" pipe = AnimateDiffPipeline.frompretrained(modelid, motionadapter=adapter) pipe.loadloraweights("guoyww/animatediff-motion-lora-zoom-out", adaptername="zoom-out")

scheduler = DDIMScheduler.frompretrained( modelid, subfolder="scheduler", clipsample=False, timestepspacing="linspace", steps_offset=1 ) pipe.scheduler = scheduler

enable memory savings

pipe.enablevaeslicing() pipe.enablemodelcpu_offload()

output = pipe( prompt=( "masterpiece, bestquality, highlydetailed, ultradetailed, sunset, " "orange sky, warm lighting, fishing boats, ocean waves seagulls, " "rippling water, wharf, silhouette, serene atmosphere, dusk, evening glow, " "golden hour, coastal landscape, seaside scenery" ), negativeprompt="bad quality, worse quality", numframes=16, guidancescale=7.5, numinferencesteps=25, generator=torch.Generator("cpu").manualseed(42), ) frames = output.frames[0] exporttogif(frames, "animation.gif") ``` animatediff-zoom-out-lora

Check out the documentation to learn more.

PEFT 🤝 Diffusers

There are many adapters (LoRA, for example) trained in different styles to achieve different effects. You can even combine multiple adapters to create new and unique images. With the 🤗 PEFT integration in 🤗 Diffusers, it is really easy to load and manage adapters for inference.

Here is an example of combining multiple LoRAs using this new integration:

```python from diffusers import DiffusionPipeline import torch

pipeid = "stabilityai/stable-diffusion-xl-base-1.0" pipe = DiffusionPipeline.frompretrained(pipeid, torchdtype=torch.float16).to("cuda")

Load LoRA 1.

pipe.loadloraweights("CiroN2022/toy-face", weightname="toyfacesdxl.safetensors", adaptername="toy")

Load LoRA 2.

pipe.loadloraweights("nerijs/pixel-art-xl", weightname="pixel-art-xl.safetensors", adaptername="pixel")

Combine the adapters.

pipe.setadapters(["pixel", "toy"], adapterweights=[0.5, 1.0])

Perform inference.

prompt = "toyface of a hacker with a hoodie, pixel art" image = pipe( prompt, numinferencesteps=30, crossattentionkwargs={"scale": 1.0}, generator=torch.manualseed(0) ).images[0] image ```

Untitled 1

Refer to the documentation to learn more.

Community components with community pipelines

We have had support for community pipelines for a while now. This enables fast integration for pipelines we cannot directly integrate within the core codebase of the library. However, community pipelines always rely on the building blocks from Diffusers, which can be restrictive for advanced use cases.

To elevate this, we’re elevating community pipelines with community components starting this release 🤗 By specifying trust_remote_code=True and writing the pipeline repository in a specific way, users can customize their pipeline and component code as flexibly as possible:

```python from diffusers import DiffusionPipeline import torch

pipeline = DiffusionPipeline.frompretrained( "/", trustremotecode=True, torchdtype=torch.float16 ).to("cuda")

prompt = "hello"

Text embeds

promptembeds, negativeembeds = pipeline.encode_prompt(prompt)

Keyframes generation (8x64x40, 2fps)

videoframes = pipeline( promptembeds=promptembeds, negativepromptembeds=negativeembeds, numframes=8, height=40, width=64, numinferencesteps=2, guidancescale=9.0, output_type="pt" ).frames ```

Refer to the documentation to learn more.

Dynamic callbacks

Most 🤗 Diffusers pipelines now accept a callback_on_step_end argument that allows you to change the default behavior of denoising loop with custom defined functions. Here is an example of a callback function we can write to disable classifier free guidance after 40% of inference steps to save compute with minimum tradeoff in performance.

```python def callbackdynamiccfg(pipe, stepindex, timestep, callbackkwargs):
# adjust the batchsize of promptembeds according to guidancescale if stepindex == int(pipe.numtimestep * 0.4): promptembeds = callbackkwargs["promptembeds"] promptembeds =promptembeds.chunk(2)[-1]

# update guidance_scale and prompt_embeds
pipe._guidance_scale = 0.0
callback_kwargs["prompt_embeds"] = prompt_embeds
return callback_kwargs

```

Here’s how you can use it:

```python import torch from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5", torchdtype=torch.float16) pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"

generator = torch.Generator(device="cuda").manualseed(1) out= pipe(prompt, generator=generator, callbackonstepend=callbackcustomcfg, callbackonstependtensorinputs=['promptembeds'])

out.images[0].save("outcustomcfg.png") ```

Check out the docs to learn more.

All commits

  • [PEFT / LoRA ] Fix text encoder scaling by @younesbelkada in #5204
  • Fix doc KO unconditionalimagegeneration.md by @mishig25 in #5236
  • Flax: Ignore PyTorch, ONNX files when they coexist with Flax weights by @pcuenca in #5237
  • Fixed constants.py not using hugging face hub environment variable by @Zanz2 in #5222
  • Compile test fixes by @DN6 in #5235
  • [PEFT warnings] Only sure deprecation warnings in the future by @patrickvonplaten in #5240
  • Add docstrings in forward methods of adapter model by @Nandika-A in #5253
  • make style by @patrickvonplaten (direct commit on main)
  • [WIP] Refactor UniDiffuser Pipeline and Tests by @dg845 in #4948
  • fix: how print training resume logs. by @sayakpaul in #5117
  • Add docstring for the AutoencoderKL's decode by @freespirit in #5242
  • Add a docstring for the AutoencoderKL's encode by @freespirit in #5239
  • Update UniPC to support 1D diffusion. by @leng-yue in #5199
  • [Schedulers] Fix callback steps by @patrickvonplaten in #5261
  • make fix copies by @patrickvonplaten (direct commit on main)
  • [Research folder] Add SDXL example by @patrickvonplaten in #5275
  • Fix UniPC scheduler for 1D by @patrickvonplaten in #5276
  • New Pipeline Slow Test runners by @DN6 in #5131
  • handle case when controlnet is list or tuple by @noskill in #5179
  • make style by @patrickvonplaten (direct commit on main)
  • Zh doc by @WADreaming in #4807
  • ✨ [Core] Add FreeU mechanism by @kadirnar in #5164
  • pin torch version by @DN6 in #5297
  • add: entry for DDPO support. by @sayakpaul in #5250
  • Min-SNR Gamma: correct the fix for SNR weighted loss in v-prediction … by @bghira in #5238
  • Update bug-report.yml by @patrickvonplaten (direct commit on main)
  • Bump tolerance on shape test by @DN6 in #5289
  • Add from single file to StableDiffusionUpscalePipeline and StableDiffusionLatentUpscalePipeline by @DN6 in #5194
  • [LoRA] fix: torch.compile() for lora conv by @sayakpaul in #5298
  • [docs] Improved inpaint docs by @stevhliu in #5210
  • Minor fixes by @TimothyAlexisVass in #5309
  • [Hacktoberfest]Fixing issues #5241 by @jgyfutub in #5255
  • Update README.md by @ShubhamJagtap2000 in #5267
  • fix typo in train dreambooth lora description by @themez in #5332
  • Fix [core/GLIGEN]: TypeError when iterating over 0-d tensor with In-painting mode when EulerAncestralDiscreteScheduler is used by @rchuzh99 in #5305
  • fix inference in custom diffusion by @caopulan in #5329
  • Improve performance of fast test by reducing down blocks by @sepal in #5290
  • make-fast-test-for-StableDiffusionControlNetPipeline-faster by @m0saan in #5292
  • Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5299
  • Add py.typed for PEP 561 compliance by @byarbrough in #5326
  • [HacktoberFest] Add missing docstrings to diffusers/models by @a-r-r-o-w in #5248
  • make style by @patrickvonplaten (direct commit on main)
  • Fix links in docs to adapter code by @johnowhitaker in #5323
  • replace references to deprecated KeyArray & PRNGKeyArray by @jakevdp in #5324
  • Fix loading broken LoRAs that could give NaN by @patrickvonplaten in #5316
  • [JAX] Replace uses of jnp.array in types with jnp.ndarray. by @hvaara in #4719
  • Add missing dependency in requirements file by @juliensimon in #5345
  • fix problem of 'accelerator.ismainprocess' to run in mutiple GPUs by @jiaqiw09 in #5340
  • [docs] Create a mask for inpainting by @stevhliu in #5322
  • Adding PyTorch XLA support for sdxl inference by @ssusie in #5273
  • [Examples] use loralinear instead of depecrecated lora attn procs. by @sayakpaul in #5331
  • Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5312
  • Fix StableDiffusionXLImg2ImgPipeline creation in sdxl tutorial by @soumik12345 in #5367
  • I Added Doc-String Into The class. by @hi-sushanta in #5293
  • make style by @patrickvonplaten (direct commit on main)
  • [docs] Minor fixes by @stevhliu in #5369
  • New xformers test runner by @DN6 in #5349
  • [Core] Add FreeU to all the core pipelines and their (mostly-used) derivatives by @sayakpaul in #5376
  • [core / PEFT / LoRA] Integrate PEFT into Unet by @younesbelkada in #5151
  • [Bot] FIX stale.py uses timezone-aware datetime by @sayakpaul in #5396
  • [Examples] fix unconditioning generation training example for mixed-precision training by @sayakpaul in #5407
  • [Wuerstchen] text to image training script by @kashif in #5052
  • [Docs] add docs on peft diffusers integration by @sayakpaul in #5359
  • chore: fix typos by @afuetterer in #5386
  • [Examples] Update with HFApi by @sayakpaul in #5393
  • Add ability to mix usage of T2I-Adapter(s) and ControlNet(s). by @GreggHelt2 in #5362
  • make style by @patrickvonplaten (direct commit on main)
  • [Core] Fix/pipeline without text encoders for SDXL by @sayakpaul in #5301
  • [Examples] Follow up of #5393 by @sayakpaul in #5420
  • changed channel parameters for UNET and VAE. Changed configs parameters of CLIPText by @aeros29 in #5370
  • Chore: Typo fixed in multiple files by @SusheelThapa in #5422
  • Update base image for slow CUDA tests by @DN6 in #5426
  • Fix pipe fetcher for slow tests by @DN6 in #5424
  • make fix copies by @patrickvonplaten (direct commit on main)
  • Merge branch 'main' of https://github.com/huggingface/diffusers by @patrickvonplaten (direct commit on main)
  • [from_single_file()]fix: local single file loading. by @sayakpaul in #5440
  • Add latent consistency by @patrickvonplaten in #5438
  • Update-DeepFloyd-IF-Pipelines-Docstrings by @m0saan in #5304
  • style(sdxl): remove identity assignments by @liang-hou in #5418
  • Fix the order of width and height of original size in SDXL training script by @linjiapro in #5382
  • make style by @patrickvonplaten (direct commit on main)
  • Beautiful Doc string added into the UNetMidBlock2D class. by @hi-sushanta in #5389
  • make style by @patrickvonplaten (direct commit on main)
  • fix une2td ignoring class_labels by @kesimeg in #5401
  • Added support to create asymmetrical U-Net structures by @Gothos in #5400
  • [PEFT] Fix scale unscale with LoRA adapters by @younesbelkada in #5417
  • Make T2I-Adapter downscale padding match the UNet by @RyanJDick in #5435
  • Update README.md by @anvilarth in #5497
  • fixed SDXL text encoder training bug #5016 by @shyammarjit in #5078
  • make style by @patrickvonplaten (direct commit on main)
  • [torch.compile] fix graph break problems partially by @sayakpaul in #5453
  • Fix Slow Tests by @DN6 in #5469
  • Fix typo in controlnet docs by @MrSyee in #5486
  • [BUG] in transformer_temporal Fix Bugs by @zideliu in #5496
  • [docs] Fix links by @stevhliu in #5499
  • fix a few issues in controlnet inpaint pipelines by @yiyixuxu in #5470
  • Fixed autoencoder typo by @abhisharsinha in #5500
  • [Core] Refactor activation and normalization layers by @sayakpaul in #5493
  • Register BaseOutput subclasses as supported torch.utils._pytree nodes by @BowenBao in #5459
  • Japanese docs by @isamu-isozaki in #5478
  • [docs] General updates by @stevhliu in #5378
  • Add Latent Consistency Models Pipeline by @dg845 in #5448
  • fix typo by @mymusise in #5505
  • fix error of peft lora when xformers enabled by @AnyISalIn in #5506
  • fix a bug in 2nd order schedulers when using in ensemble of experts config by @yiyixuxu in #5511
  • [Schedulers] Fix 2nd order other than heun by @patrickvonplaten in #5526
  • Add a new community pipeline by @nagolinc in #5477
  • make style by @patrickvonplaten (direct commit on main)
  • Improve typehints and docs in diffusers/models by @a-r-r-o-w in #5391
  • make fix-copies by @patrickvonplaten (direct commit on main)
  • Fix missing punctuation in PHILOSOPHY.md by @RampagingSloth in #5530
  • fix a bug on torch_dtype argument in from_single_file of ControlNetModel by @xuyxu in #5528
  • [docs] Loader docs by @stevhliu in #5473
  • Add from_pt flag to enable model from PT by @RissyRan in #5501
  • Remove multiple if-else statement in the get_activation function. by @hi-sushanta in #5446
  • [Tests] Speed up expert of mixture tests by @patrickvonplaten in #5533
  • [Tests] Optimize test configurations for faster execution by @p1kit in #5535
  • [Remote code] Add functionality to run remote models, schedulers, pipelines by @patrickvonplaten in #5472
  • Update train_dreambooth.py - fix typos by @nickkolok in #5539
  • correct checkpoint in kandinsky2.2 doc page by @yiyixuxu in #5550
  • [Core] fix FreeU disable method by @sayakpaul in #5552
  • [docs] Internal classes API by @stevhliu in #5513
  • fix error reported 'findunusedparameters' running in mutiple GPUs by @jiaqiw09 in #5355
  • docs: initial pt translation by @SirMonteiro in #5549
  • Fix moved expandmask function by @patrickvonplaten in #5581
  • [PEFT / Tests] Add peft slow tests on push by @younesbelkada in #5419
  • Add realfill by @thuanz123 in #5456
  • add fix to be able use StableDiffusionXLAdapterPipeline.fromsinglefile by @pshtif in #5547
  • Stabilize DPM++, especially for SDXL and SDE-DPM++ by @LuChengTHU in #5541
  • Fix incorrect loading of custom pipeline by @a-r-r-o-w in #5568
  • [core / PEFT ]Bump transformers min version for PEFT integration by @younesbelkada in #5579
  • Fix divide by zero RuntimeWarning by @TimothyAlexisVass in #5543
  • [Community Pipelines] add textual inversion support for stablediffusionipex by @miaojinc in #5571
  • fix a mistake in text2image training script for kandinsky2.2 by @yiyixuxu in #5244
  • Update docker image for xformers by @DN6 in #5597
  • [Docs] Fix typos by @standardAI in #5583
  • [Docs] Fix typos, improve, update at Tutorials page by @standardAI in #5586
  • [docs] Lu lambdas by @stevhliu in #5602
  • Update final CPU offloading code for more diffusion pipelines by @clarencechen in #5589
  • [Core] enable lora for sdxl adapters too and add slow tests. by @ilisparrow in #5555
  • fix by @patrickvonplaten (direct commit on main)
  • Remove Redundant Variables from Encoder and Decoder by @hi-sushanta in #5569
  • Revert "Fix the order of width and height of original size in SDXL training script" by @patrickvonplaten in #5614
  • [PEFT / LoRA] Fix civitai bug when network alpha is an empty dict by @younesbelkada in #5608
  • [Docs] Fix typos, improve, update at Get Started page by @standardAI in #5587
  • [SDXL Adapter] Revert load lora by @patrickvonplaten in #5615
  • [docs] Kandinsky guide by @stevhliu in #4555
  • [remote code] document trust remote code. by @sayakpaul in #5620
  • [Tests] Fix cpu offload test by @patrickvonplaten in #5626
  • [Docs] Fix typos, improve, update at Conceptual Guides page by @standardAI in #5585
  • Animatediff Proposal by @DN6 in #5413
  • [Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page by @standardAI in #5584
  • [LCM] Make sure img2img works by @patrickvonplaten in #5632
  • Update animatediff docs to include section on Motion LoRAs by @DN6 in #5639
  • [Easy] Minor AnimateDiff Doc nits by @sayakpaul in #5640
  • fix a bug in AutoPipeline.from_pipe() when creating a controlnet pipeline from an existing controlnet by @yiyixuxu in #5638
  • [Easy] clean up the LCM docstrings. by @sayakpaul in #5637
  • Model loading speed optimization by @RyanJDick in #5635
  • Clean up LCM Pipeline and Test Code. by @dg845 in #5641
  • [Docs] Fix typos, improve, update at Using Diffusers' Tecniques page by @standardAI in #5627
  • [Core] support for tiny autoencoder in img2img by @sayakpaul in #5636
  • Remove the redundant line from the adapter.py file. by @hi-sushanta in #5618
  • add callbacks to denoising step by @yiyixuxu in #5427
  • [Feat] PixArt-Alpha by @sayakpaul in #5642
  • correct pipeline class name by @sayakpaul in #5652

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @dg845
    • [WIP] Refactor UniDiffuser Pipeline and Tests (#4948)
    • Add Latent Consistency Models Pipeline (#5448)
    • Clean up LCM Pipeline and Test Code. (#5641)
  • @kadirnar
    • ✨ [Core] Add FreeU mechanism (#5164)
  • @a-r-r-o-w
    • Improve typehints and docs in diffusers/models (#5299)
    • [HacktoberFest] Add missing docstrings to diffusers/models (#5248)
    • Improve typehints and docs in diffusers/models (#5312)
    • Improve typehints and docs in diffusers/models (#5391)
    • Fix incorrect loading of custom pipeline (#5568)
  • @isamu-isozaki
    • Japanese docs (#5478)
  • @nagolinc
    • Add a new community pipeline (#5477)
  • @SirMonteiro
    • docs: initial pt translation (#5549)
  • @thuanz123
    • Add realfill (#5456)
  • @standardAI
    • [Docs] Fix typos (#5583)
    • [Docs] Fix typos, improve, update at Tutorials page (#5586)
    • [Docs] Fix typos, improve, update at Get Started page (#5587)
    • [Docs] Fix typos, improve, update at Conceptual Guides page (#5585)
    • [Docs] Fix typos, improve, update at Using Diffusers' Loading & Hub page (#5584)
    • [Docs] Fix typos, improve, update at Using Diffusers' Tecniques page (#5627)

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Fix Lora fusing/unfusing

  • [Lora] fix lora fuse unfuse in #5003 by @patrickvonplaten

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Fix LoRA attention processor for xformers.

  • [LoRA, Xformers] Fix xformers lora by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/5201

- Python
Published by sayakpaul over 2 years ago

diffusers - Patch Release: CPU offloading + Lora load/Text inv load & Multi Adapter

  • [Textual inversion] Refactor textual inversion to make it cleaner by @patrickvonplaten in #5076
  • t2i Adapter community member fix by @williamberman in #5090
  • remove unused adapter weights in constructor by @williamberman in #5088
  • [LoRA] don't break offloading for incompatible lora ckpts. by @sayakpaul in #5085

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release v0.21.1: Fix import and config loading for `from_single_file`

  • Fix model offload bug when key isn't present by @DN6 in #5030
  • [Import] Don't force transformers to be installed by @patrickvonplaten in #5035
  • allow loading of sd models from safetensors without online lookups using local config files by @vladmandic in #5019
  • [Import] Add missing settings / Correct some dummy imports by @patrickvonplaten in #5036

- Python
Published by patrickvonplaten over 2 years ago

diffusers - v0.21.0: Würstchen, Faster LoRA loading, Faster imports, T2I Adapters for SDXL, and more

Würstchen

Würstchen is a diffusion model, whose text-conditional model works in a highly compressed latent space of images, allowing cheaper and faster inference.

Here is how to use the Würstchen as a pipeline:

```python import torch from diffusers import AutoPipelineForText2Image from diffusers.pipelines.wuerstchen import DEFAULTSTAGEC_TIMESTEPS

pipeline = AutoPipelineForText2Image.frompretrained("warp-ai/wuerstchen", torchdtype=torch.float16).to("cuda")

caption = "Anthropomorphic cat dressed as a firefighter" images = pipeline( caption, height=1024, width=1536, priortimesteps=DEFAULTSTAGECTIMESTEPS, priorguidancescale=4.0, numimagesper_prompt=4, ).images ```

To learn more about the pipeline, check out the official documentation.

This pipeline was contributed by one of the authors of Würstchen, @dome272, with help from @kashif and @patrickvonplaten.

👉 Try out the model here: https://huggingface.co/spaces/warp-ai/Wuerstchen

T2I Adapters for Stable Diffusion XL (SDXL)

T2I-Adapter is an efficient plug-and-play model that provides extra guidance to pre-trained text-to-image models while freezing the original large text-to-image models.

In collaboration with the Tencent ARC researchers, we trained T2I Adapters on various conditions: sketch, canny, lineart, depth, and openpose.

Below is an how to use the StableDiffusionXLAdapterPipeline.

First ensure, the controlnet_aux is installed:

bash pip install -U controlnet_aux==0.0.7

Then we can initialize the pipeline:

```python import torch from controlnetaux.lineart import LineartDetector from diffusers import (AutoencoderKL, EulerAncestralDiscreteScheduler, StableDiffusionXLAdapterPipeline, T2IAdapter) from diffusers.utils import loadimage, makeimagegrid

load adapter

adapter = T2IAdapter.frompretrained( "TencentARC/t2i-adapter-lineart-sdxl-1.0", torchdtype=torch.float16, varient="fp16" ).to("cuda")

load pipeline

modelid = "stabilityai/stable-diffusion-xl-base-1.0" eulera = EulerAncestralDiscreteScheduler.frompretrained( modelid, subfolder="scheduler" ) vae = AutoencoderKL.frompretrained( "madebyollin/sdxl-vae-fp16-fix", torchdtype=torch.float16 ) pipe = StableDiffusionXLAdapterPipeline.frompretrained( modelid, vae=vae, adapter=adapter, scheduler=eulera, torchdtype=torch.float16, variant="fp16", ).to("cuda")

load lineart detector

linedetector = LineartDetector.frompretrained("lllyasviel/Annotators").to("cuda") ```

We then load an image to compute the lineart conditionings:

python url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg" image = load_image(url) image = line_detector(image, detect_resolution=384, image_resolution=1024)

Then we generate:

python prompt = "Ice dragon roar, 4k photo" negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured" gen_images = pipe( prompt=prompt, negative_prompt=negative_prompt, image=image, num_inference_steps=30, adapter_conditioning_scale=0.8, guidance_scale=7.5, ).images[0]

Refer to the official documentation to learn more about StableDiffusionXLAdapterPipeline.

This blog post summarizes our experiences and provides all the resources (including the pre-trained T2I Adapter checkpoints) to get started using T2I Adapters for SDXL.

We’re also releasing a training script for training your custom T2I Adapters on SDXL. Check out the documentation to learn more.

Thanks to @MC-E (one of the authors of T2I Adapters) for contributing the StableDiffusionXLAdapterPipeline in #4696.

Faster imports

We introduced “lazy imports” (#4829) to significantly improve the time it takes to import our modules (such as pipelines, models, and so on). Below is a comparison of the timings with and without lazy imports on import diffusers.

With lazy imports:

bash real 0m0.417s user 0m0.714s sys 0m0.499s

Without lazy imports:

bash real 0m5.391s user 0m5.299s sys 0m1.273s

Faster LoRA loading

Previously, loading LoRA parameters using the load_lora_weights() used to be time-consuming as reported in #4975. To this end, we introduced a low_cpu_mem_usage argument to the load_lora_weights() method in #4994 which should speed up the loading time significantly. Just pass low_cpu_mem_usage=True to take the benefits.

LoRA fusing

LoRA weights can now be fused into the model weights, thus allowing models that have loaded LoRA weights to run as fast as models without. It also enables to fuse multiple LoRAs into the same model.

For more information, have a look at the documentation and the original PR: https://github.com/huggingface/diffusers/pull/4473.

More support for LoRAs

Almost all LoRA formats out there for SDXL are now supported. For a more details, please check the documentation.

All commits

  • fix: lora sdxl tests by @sayakpaul in #4652
  • Support tiled encode/decode for AutoencoderTiny by @Isotr0py in #4627
  • Add SDXL long weighted prompt pipeline (replace pr:4629) by @xhinker in #4661
  • add configfile to fromsingle_file by @zuojianghua in #4614
  • Add AudioLDM 2 by @sanchit-gandhi in #4549
  • [docs] Add note in UniDiffusers Doc about PyTorch 1.X numerical stability issue by @dg845 in #4703
  • [Core] enable lora for sdxl controlnets too and add slow tests. by @sayakpaul in #4666
  • [LoRA] ensure different LoRA ranks for text encoders can be properly handled by @sayakpaul in #4669
  • [LoRA] default to None when fc alphas are not available. by @sayakpaul in #4706
  • Replaces DIFFUSERS_TEST_DEVICE backend list with trying device by @vvvm23 in #4673
  • add convert diffuser pipeline of XL to original stable diffusion by @realliujiaxu in #4596
  • Add referenceattn & referenceadain support for sdxl by @zideliu in #4502
  • [Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
  • rename test file to run, so that examples tests do not fail by @patrickvonplaten in #4715
  • Revert "Move controlnet load local tests to nightly by @patrickvonplaten in #4543)"
  • Fix all docs by @patrickvonplaten in #4721
  • fix bad error message when transformers is missing by @patrickvonplaten in #4714
  • Fix AutoencoderTiny encoder scaling convention by @madebyollin in #4682
  • [Examples] fix checkpointing and casting bugs in train_text_to_image_lora_sdxl.py by @sayakpaul in #4632
  • [AudioLDM Docs] Fix docs for output by @sanchit-gandhi in #4737
  • [docs] add variant="fp16" flag by @realliujiaxu in #4678
  • [AudioLDM Docs] Update docstring by @sanchit-gandhi in #4744
  • fix dummy import for AudioLDM2 by @patil-suraj in #4741
  • change validation scheduler for train_dreambooth.py when training IF by @wyz894272237 in #4333
  • add a step_index counter by @yiyixuxu in #4347
  • [AudioLDM2] Doc fixes by @sanchit-gandhi in #4739
  • Bugfix for SDXL model loading in low ram system. by @Symbiomatrix in #4628
  • Clean up flaky behaviour on Slow CUDA Pytorch Push Tests by @DN6 in #4759
  • [Tests] Fix paint by example by @patrickvonplaten in #4761
  • [fix] multi t2i adapter set totaldownscalefactor by @williamberman in #4621
  • [Examples] Add madebyollin VAE to SDXL LoRA example, along with an explanation by @mnslarcher in #4762
  • [LoRA] relax lora loading logic by @sayakpaul in #4610
  • [Examples] fix sdxl dreambooth lora checkpointing. by @sayakpaul in #4749
  • fix sdxllwp empty negprompt error issue by @xhinker in #4743
  • improve setup.py by @sayakpaul in #4748
  • Torch device by @patrickvonplaten in #4755
  • [AudioLDM 2] Pipeline fixes by @sanchit-gandhi in #4738
  • Convert MusicLDM by @sanchit-gandhi in #4579
  • [WIP ] Proposal to address precision issues in CI by @DN6 in #4775
  • fix a bug in from_pretrained when load optional components by @yiyixuxu in #4745
  • fix bug of progress bar in clip guided images mixing by @scnuhealthy in #4729
  • Fixed broken link of CLIP doc in evaluation doc by @mayank2 in #4760
  • instanceprompt->classprompt by @williamberman in #4784
  • refactor preparemaskandmaskedimage with VaeImageProcessor by @yiyixuxu in #4444
  • Allow passing a checkpoint statedict to convertfrom_ckpt (instead of just a string path) by @cmdr2 in #4653
  • [SDXL] Add docs about forcing passed embeddings to be 0 by @patrickvonplaten in #4783
  • [Core] Support negative conditions in SDXL by @sayakpaul in #4774
  • Unet fix by @canberk17 in #4769
  • [Tests] Tighten up LoRA loading relaxation by @sayakpaul in #4787
  • [docs] Fix syntax for compel by @stevhliu in #4794
  • [Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795
  • [SDXL Lora] Fix last ben sdxl lora by @patrickvonplaten in #4797
  • [LoRA Attn Processors] Refactor LoRA Attn Processors by @patrickvonplaten in #4765
  • Update loaders.py by @chillpixelfun in #4805
  • [WIP] Add Fabric by @shauray8 in #4201
  • Fix save_path bug in textual inversion training script by @Yead in #4710
  • [Examples] Save SDXL LoRA weights with chosen precision by @mnslarcher in #4791
  • Fix Disentangle ONNX and non-ONNX pipeline by @DN6 in #4656
  • fix bug in StableDiffusionXLControlNetPipeline when use guess_mode by @yiyixuxu in #4799
  • fix autopipeline: pass kwargs to loadconfig by @yiyixuxu in #4793
  • add StableDiffusionXLControlNetImg2ImgPipeline by @yiyixuxu in #4592
  • add models for T2I-Adapter-XL by @MC-E in #4696
  • Fuse loras by @patrickvonplaten in #4473
  • Fix convertoriginalstablediffusionto_diffusers script by @wingrime in #4817
  • Support saving multiple t2i adapter models under one checkpoint by @VitjanZ in #4798
  • fix typo by @zideliu in #4822
  • VaeImageProcessor: Allow image resizing also for torch and numpy inputs by @gajendr-nikhil in #4832
  • [Core] refactor encode_prompt by @sayakpaul in #4617
  • Add loading ckpt from file for SDXL controlNet by @antigp in #4683
  • Fix Unfuse Lora by @patrickvonplaten in #4833
  • sketch inpaint from a1111 for non-inpaint models by @noskill in #4824
  • [docs] SDXL by @stevhliu in #4428
  • [Docs] improve the LoRA doc. by @sayakpaul in #4838
  • Fix potential type mismatch errors in SDXL pipelines by @hyk1996 in #4796
  • Fix image processor inputs width by @echarlaix in #4853
  • Remove warn with deprecate by @patrickvonplaten in #4850
  • [docs] ControlNet guide by @stevhliu in #4640
  • [SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858
  • fix sdxl-inpaint fast test by @yiyixuxu in #4859
  • [docs] Add inpainting example for forcing the unmasked area to remain unchanged to the docs by @dg845 in #4536
  • Add GLIGEN Text Image implementation by @tuanh123789 in #4777
  • Test Cleanup Precision issues by @DN6 in #4812
  • Fix link from API to using-diffusers by @pcuenca in #4856
  • [Docs] Korean translation update by @Snailpong in #4684
  • fix a bug in sdxl-controlnet-img2img when using MultiControlNetModel by @yiyixuxu in #4862
  • support AutoPipeline.from_pipe between a pipeline and its ControlNet pipeline counterpart by @yiyixuxu in #4861
  • [WIP] maskedlatentinputs for inpainting pipeline by @yiyixuxu in #4819
  • [docs] DiffEdit guide by @stevhliu in #4722
  • [docs] Shap-E guide by @stevhliu in #4700
  • [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL by @harutatsuakiyama in #4694
  • [Tests] Add combined pipeline tests by @patrickvonplaten in #4869
  • Retrieval Augmented Diffusion Models by @isamu-isozaki in #3297
  • check for unetloralayers in sdxl pipeline's saveloraweights method by @ErwannMillon in #4821
  • Fix getdummyinputs for Stable Diffusion Inpaint Tests by @dg845 in #4845
  • allow passing components to connected pipelines when use the combined pipeline by @yiyixuxu in #4883
  • [Core] LoRA improvements pt. 3 by @sayakpaul in #4842
  • Add dropout parameter to UNet2DModel/UNet2DConditionModel by @dg845 in #4882
  • [Core] better support offloading when side loading is enabled. by @sayakpaul in #4855
  • Add --vae_precision option to the SDXL pix2pix script so that we have… by @bghira in #4881
  • [Test] Reduce CPU memory by @patrickvonplaten in #4897
  • fix a bug in StableDiffusionUpscalePipeline.runsafetychecker by @yiyixuxu in #4886
  • remove latent input for kandinsky prior_emb2emb pipeline by @yiyixuxu in #4887
  • [docs] Add stronger warning for SDXL height/width by @stevhliu in #4867
  • [Docs] add doc entry to explain lora fusion and use of different scales. by @sayakpaul in #4893
  • [Textual inversion] Relax loading textual inversion by @patrickvonplaten in #4903
  • [docs] Fix typo in Inpainting force unmasked area unchanged example by @dg845 in #4910
  • Würstchen model by @kashif in #3849
  • [InstructPix2Pix] Fix pipeline implementation and add docs by @sayakpaul in #4844
  • [StableDiffusionXLAdapterPipeline] add adapterconditioningfactor by @patil-suraj in #4937
  • [StableDiffusionXLAdapterPipeline] allow negative micro conds by @patil-suraj in #4941
  • [examples] T2IAdapter training script by @patil-suraj in #4934
  • [Tests] add: tests for t2i adapter training. by @sayakpaul in #4947
  • guard save model hooks to only execute on main process by @williamberman in #4929
  • [Docs] add t2i adapter entry to overview of training scripts. by @sayakpaul in #4946
  • Temp Revert "[Core] better support offloading when side loading is enabled… by @williamberman in #4927
  • Revert revert and install accelerate main by @williamberman in #4963
  • [Docs] fix: minor formatting in the Würstchen docs by @sayakpaul in #4965
  • Lazy Import for Diffusers by @DN6 in #4829
  • [Core] Remove TF import checks by @patrickvonplaten in #4968
  • Make sure Flax pipelines can be loaded into PyTorch by @patrickvonplaten in #4971
  • Update README.md by @patrickvonplaten in #4973
  • Wuerstchen fixes by @kashif in #4942
  • Refactor model offload by @patrickvonplaten in #4514
  • [Bug Fix] Should pass the dtype instead of torch_dtype by @zhiqiang-canva in #4917
  • [Utils] Correct custom init sort by @patrickvonplaten in #4967
  • remove extra gligen in import by @DN6 in #4987
  • fix E721 Do not compare types, use isinstance() by @kashif in #4992
  • [Wuerstchen] fix combined pipeline's numimagesper_prompt by @kashif in #4989
  • fix image variation slow test by @DN6 in #4995
  • fix custom diffusion tests by @DN6 in #4996
  • [Lora] Speed up lora loading by @patrickvonplaten in #4994
  • [docs] Fix DiffusionPipeline.enablesequentialcpu_offload docstring by @dg845 in #4952
  • Fix safety checker seq offload by @patrickvonplaten in #4998
  • Fix PR template by @stevhliu in #4984
  • examples fix t2i training by @patrickvonplaten in #5001

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @xhinker
    • Add SDXL long weighted prompt pipeline (replace pr:4629) (#4661)
    • fix sdxllwp empty negprompt error issue (#4743)
  • @zideliu
    • Add referenceattn & referenceadain support for sdxl (#4502)
    • fix typo (#4822)
  • @shauray8
    • [WIP] Add Fabric (#4201)
  • @MC-E
    • add models for T2I-Adapter-XL (#4696)
  • @tuanh123789
    • Add GLIGEN Text Image implementation (#4777)
  • @Snailpong
    • [Docs] Korean translation update (#4684)
  • @harutatsuakiyama
    • [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL (#4694)
  • @isamu-isozaki
    • Retrieval Augmented Diffusion Models (#3297)

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release 0.20.2 - Correct SDXL Inpaint Strength Default

Stable Diffusion XL's strength default was accidentally set to 1.0 when creating the pipeline. The default should be set to 0.9999 instead. This patch release fixes that.

All commits

  • [SDXL Inpaint] Correct strength default by @patrickvonplaten in #4858

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Fix `torch.compile()` support for ControlNets

https://github.com/huggingface/diffusers/commit/3eb498e7b4868bca7460d41cda52d33c3ede5502#r125606630 introduced a 🐛 that broke the torch.compile() support for ControlNets. This patch release fixes that.

All commits

  • [Docs] Fix docs controlnet missing /Tip by @patrickvonplaten in #4717
  • [Torch compile] Fix torch compile for controlnet by @patrickvonplaten in #4795

- Python
Published by sayakpaul over 2 years ago

diffusers - v0.20.0: SDXL ControlNets with MultiControlNet, GLIGEN, Tiny Autoencoder, SDXL DreamBooth LoRA in free-tier Colab, and more

SDXL ControlNets 🚀

The 🧨 diffusers team has trained two ControlNets on Stable Diffusion XL (SDXL):

image_grid_controlnet_sdxl

You can find all the SDXL ControlNet checkpoints here, including some smaller ones (5 to 7x smaller).

To know more about how to use these ControlNets to perform inference, check out the respective model cards and the documentation. To train custom SDXL ControlNets, you can try out our training script.

MultiControlNet for SDXL

This release also introduces support for combining multiple ControlNets trained on SDXL and performing inference with them. Refer to the documentation to learn more.

GLIGEN

The GLIGEN model was developed by researchers and engineers from University of Wisconsin-Madison, Columbia University, and Microsoft. The StableDiffusionGLIGENPipeline can generate photorealistic images conditioned on grounding inputs. Along with text and bounding boxes, if input images are given, this pipeline can insert objects described by text at the region defined by bounding boxes. Otherwise, it’ll generate an image described by the caption/prompt and insert objects described by text at the region defined by bounding boxes. It’s trained on COCO2014D and COCO2014CD datasets, and the model uses a frozen CLIP ViT-L/14 text encoder to condition itself on grounding inputs.

gligen_gif

(GIF from the official website)

Grounded inpainting

```python import torch from diffusers import StableDiffusionGLIGENPipeline from diffusers.utils import load_image

Insert objects described by text at the region defined by bounding boxes

pipe = StableDiffusionGLIGENPipeline.frompretrained( "masterful/gligen-1-4-inpainting-text-box", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

inputimage = loadimage( "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/gligen/livingroom_modern.png" ) prompt = "a birthday cake" boxes = [[0.2676, 0.6088, 0.4773, 0.7183]] phrases = ["a birthday cake"]

images = pipe( prompt=prompt, gligenphrases=phrases, gligeninpaintimage=inputimage, gligenboxes=boxes, gligenscheduledsamplingbeta=1, outputtype="pil", numinference_steps=50, ).images images[0].save("./gligen-1-4-inpainting-text-box.jpg") ```

Grounded generation

```python import torch from diffusers import StableDiffusionGLIGENPipeline from diffusers.utils import load_image

Generate an image described by the prompt and

insert objects described by text at the region defined by bounding boxes

pipe = StableDiffusionGLIGENPipeline.frompretrained( "masterful/gligen-1-4-generation-text-box", variant="fp16", torchdtype=torch.float16 ) pipe = pipe.to("cuda")

prompt = "a waterfall and a modern high speed train running through the tunnel in a beautiful forest with fall foliage" boxes = [[0.1387, 0.2051, 0.4277, 0.7090], [0.4980, 0.4355, 0.8516, 0.7266]] phrases = ["a waterfall", "a modern high speed train running through the tunnel"]

images = pipe( prompt=prompt, gligenphrases=phrases, gligenboxes=boxes, gligenscheduledsamplingbeta=1, outputtype="pil", numinferencesteps=50, ).images images[0].save("./gligen-1-4-generation-text-box.jpg") ```

Refer to the documentation to learn more.

Thanks to @nikhil-masterful for contributing GLIGEN in #4441.

Tiny Autoencoder

@madebyollin trained two Autoencoders (on Stable Diffusion and Stable Diffusion XL, respectively) to dramatically cut down the image decoding time. The effects are especially pronounced when working with larger-resolution images. You can use AutoencoderTiny to take advantage of it.

Here’s the example usage for Stable Diffusion:

```python import torch from diffusers import DiffusionPipeline, AutoencoderTiny

pipe = DiffusionPipeline.frompretrained( "stabilityai/stable-diffusion-2-1-base", torchdtype=torch.float16 ) pipe.vae = AutoencoderTiny.frompretrained("madebyollin/taesd", torchdtype=torch.float16) pipe = pipe.to("cuda")

prompt = "slice of delicious New York-style berry cheesecake" image = pipe(prompt, numinferencesteps=25).images[0] image.save("cheesecake.png") ```

Refer to the documentation to learn more. Refer to this material to understand the implications of using this Autoencoder in terms of inference latency and memory footprint.

Fine-tuning Stable Diffusion XL with DreamBooth and LoRA on a free-tier Colab Notebook

Stable Diffusion XL’s (SDXL) high memory requirements often seem restrictive when it comes to using it for downstream applications. Even if one uses parameter-efficient fine-tuning techniques like LoRA, fine-tuning just the UNet component of SDXL can be quite memory-intensive. So, running it on a free-tier Colab Notebook (that usually has a 16 GB T4 GPU attached) seems impossible.

Now, with better support for gradient checkpointing and other recipes like 8 Bit Adam (via bitsandbytes), it is possible to fine-tune the UNet of SDXL with DreamBooth and LoRA on a free-tier Colab Notebook.

Check out the Colab Notebook to learn more.

Thanks to @ethansmith2000 for improving the gradient checkpointing support in #4474.

Support of push_to_hub for models, schedulers, and pipelines

Our models, schedulers, and pipelines now support an option of push_to_hub via the save_pretrained() and also come with a push_to_hub() method. Below are some examples of usage.

Models

```python from diffusers import ControlNetModel

controlnet = ControlNetModel( blockoutchannels=(32, 64), layersperblock=2, inchannels=4, downblocktypes=("DownBlock2D", "CrossAttnDownBlock2D"), crossattentiondim=32, conditioningembeddingoutchannels=(16, 32), ) controlnet.pushtohub("my-controlnet-model")

or controlnet.savepretrained("my-controlnet-model", pushto_hub=True)

```

Schedulers

```python from diffusers import DDIMScheduler

scheduler = DDIMScheduler( betastart=0.00085, betaend=0.012, betaschedule="scaledlinear", clipsample=False, setalphatoone=False, ) scheduler.pushtohub("my-controlnet-scheduler") ```

Pipelines

```python from diffusers import ( UNet2DConditionModel, AutoencoderKL, DDIMScheduler, StableDiffusionPipeline, ) from transformers import CLIPTextModel, CLIPTextConfig, CLIPTokenizer

unet = UNet2DConditionModel( blockoutchannels=(32, 64), layersperblock=2, samplesize=32, inchannels=4, outchannels=4, downblocktypes=("DownBlock2D", "CrossAttnDownBlock2D"), upblocktypes=("CrossAttnUpBlock2D", "UpBlock2D"), crossattention_dim=32, )

scheduler = DDIMScheduler( betastart=0.00085, betaend=0.012, betaschedule="scaledlinear", clipsample=False, setalphatoone=False, )

vae = AutoencoderKL( blockoutchannels=[32, 64], inchannels=3, outchannels=3, downblocktypes=["DownEncoderBlock2D", "DownEncoderBlock2D"], upblocktypes=["UpDecoderBlock2D", "UpDecoderBlock2D"], latent_channels=4, )

textencoderconfig = CLIPTextConfig( bostokenid=0, eostokenid=2, hiddensize=32, intermediatesize=37, layernormeps=1e-05, numattentionheads=4, numhiddenlayers=5, padtokenid=1, vocabsize=1000, ) textencoder = CLIPTextModel(textencoderconfig) tokenizer = CLIPTokenizer.from_pretrained("hf-internal-testing/tiny-random-clip")

components = { "unet": unet, "scheduler": scheduler, "vae": vae, "textencoder": textencoder, "tokenizer": tokenizer, "safetychecker": None, "featureextractor": None, } pipeline = StableDiffusionPipeline(**components) pipeline.pushtohub("my-pipeline") ```

Refer to the documentation to know more.

Thanks to @Wauplin for his generous and constructive feedback (refer to this #4218) on this feature.

Better support for loading Kohya-trained LoRA checkpoints

Providing seamless support for loading Kohya-trained LoRA checkpoints from diffusers is important for us. This is why we continue to improve our load_lora_weights() method. Check out the documentation to know more about what’s currently supported and the current limitations.

Thanks to @isidentical for extending their help in improving this support.

Better documentation for prompt weighting

Prompt weighting provides a way to emphasize or de-emphasize certain parts of a prompt, allowing for more control over the generated image. compel provides an easy way to do prompt weighting compatible with diffusers. To this end, we have worked on an improved guide. Check it out here.

Defaulting to serialize with .safetensors

Starting with this release, we will default to using .safetensors as our preferred serialization method. This change is reflected in all the training examples that we officially support.

All commits

  • 0.20.0dev0 by @patrickvonplaten in #4299
  • update Kandinsky doc by @yiyixuxu in #4301
  • [Torch.compile] Fixes torch compile graph break by @patrickvonplaten in #4315
  • Fix SDXL conversion from original to diffusers by @duongna21 in #4280
  • fix a bug in StableDiffusionUpscalePipeline when prompt is None by @yiyixuxu in #4278
  • [Local loading] Correct bug with local files only by @patrickvonplaten in #4318
  • Fix typo documentation by @echarlaix in #4320
  • fix validation option for dreambooth training example by @xinyangli in #4317
  • [Tests] add test for pipeline import. by @sayakpaul in #4276
  • Honor the SDXL 1.0 licensing from the training scripts. by @sayakpaul in #4319
  • Update README_sdxl.md to correct the header by @sayakpaul in #4330
  • [SDXL Refiner] Fix refiner forward pass for batched input by @patrickvonplaten in #4327
  • correct doc string for default value of guidance_scale by @Tanupriya-Singh in #4339
  • [ONNX] Don't download ONNX model by default by @patrickvonplaten in #4338
  • Fix repeat of negative prompt by @kathath in #4335
  • [SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by @patrickvonplaten in #4346
  • [Feat] Support SDXL Kohya-style LoRA by @sayakpaul in #4287
  • fix fp type in t2i adapter docs by @williamberman in #4350
  • Update README.md to have PyPI-friendly path by @sayakpaul in #4351
  • [SDXL-IP2P] Add gif for demonstrating training processes by @harutatsuakiyama in #4342
  • [SDXL] Fix dummy imports incorrect naming by @patrickvonplaten in #4370
  • Clean up duplicate lines in encode_prompt by @avoroshilov in #4369
  • minor doc fixes. by @sayakpaul in #4380
  • Update docs of unet_1d.py by @nishant42491 in #4394
  • [AutoPipeline] Correct naming by @patrickvonplaten in #4420
  • [ldm3d] documentation fixing typos by @estelleafl in #4284
  • Cleanup pass for flaky Slow Tests for Stable diffusion by @DN6 in #4415
  • support fromsinglefile for SDXL inpainting by @yiyixuxu in #4408
  • fix testfloat16inference by @yiyixuxu in #4412
  • train dreambooth fix pre encode class prompt by @williamberman in #4395
  • [docs] Fix SDXL docstring by @stevhliu in #4397
  • Update documentation by @echarlaix in #4422
  • remove mentions of textual inversion from sdxl. by @sayakpaul in #4404
  • [LoRA] Fix SDXL text encoder LoRAs by @sayakpaul in #4371
  • [docs] AutoPipeline tutorial by @stevhliu in #4273
  • [Pipelines] Add community pipeline for Zero123 by @kxhit in #4295
  • [Feat] add tiny Autoencoder for (almost) instant decoding by @sayakpaul in #4384
  • can call encode_prompt with out setting a text encoder instance variable by @williamberman in #4396
  • Accept pooledpromptembeds in the SDXL Controlnet pipeline. Fixes an error if prompt_embeds are passed. by @cmdr2 in #4309
  • Prevent online access when desired when using downloadfromoriginalstablediffusion_ckpt by @w4ffl35 in #4271
  • move tests to nightly by @DN6 in #4451
  • auto type conversion by @isNeil in #4270
  • Fix typerror in pipeline handling for MultiControlNets which only contain a single ControlNet by @Georgehe4 in #4454
  • Add rank argument to traindreamboothlora_sdxl.py by @levi in #4343
  • [docs] Distilled SD by @stevhliu in #4442
  • Allow controlnets to be loaded (from ckpt) in a parallel thread with a SD model (ckpt), and speed it up slightly by @cmdr2 in #4298
  • fix typo to ensure make test-examples work correctly by @statelesshz in #4329
  • Fix bug caused by typo by @HeliosZhao in #4357
  • Delete the duplicate code for the contolnet img 2 img by @VV-A-VV in #4411
  • Support different strength for Stable Diffusion TensorRT Inpainting pipeline by @jinwonkim93 in #4216
  • add sdxl to prompt weighting by @patrickvonplaten in #4439
  • a few fix for kandinsky combined pipeline by @yiyixuxu in #4352
  • fix-format by @yiyixuxu in #4458
  • Cleanup Pass on flaky slow tests for Stable Diffusion by @DN6 in #4455
  • Fixed multi-token textual inversion training by @manosplitsis in #4452
  • TensorRT Inpaint pipeline: minor fixes by @asfiyab-nvidia in #4457
  • [Tests] Adds integration tests for SDXL LoRAs by @sayakpaul in #4462
  • Update README_sdxl.md by @patrickvonplaten in #4472
  • [SDXL] Allow SDXL LoRA to be run with less than 16GB of VRAM by @patrickvonplaten in #4470
  • Add a datadir parameter to the loaddataset method. by @AisingioroHao0 in #4482
  • [Examples] Support traintexttoimagelora_sdxl.py by @okotaku in #4365
  • Log global_step instead of epoch to tensorboard by @mrlzla in #4493
  • Update lora.md to clarify SDXL support by @sayakpaul in #4503
  • [SDXL LoRA] fix batch size lora by @patrickvonplaten in #4509
  • Make sure fp16-fix is used as default by @patrickvonplaten in #4510
  • grad checkpointing by @ethansmith2000 in #4474
  • move pipeline only when running validation by @patrickvonplaten in #4515
  • Moving certain pipelines slow tests to nightly by @DN6 in #4469
  • add pipelineclassname argument to Stable Diffusion conversion script by @yiyixuxu in #4461
  • Fix misc typos by @Georgehe4 in #4479
  • fix indexing issue in sd reference pipeline by @DN6 in #4531
  • Copy lora functions to XLPipelines by @wooyeolBaek in #4512
  • introduce minimalistic reimplementation of SDXL on the SDXL doc by @cloneofsimo in #4532
  • Fix pushtohub in traintexttoimagelora_sdxl.py example by @ra100 in #4535
  • Update README_sdxl.md to include the free-tier Colab Notebook by @sayakpaul in #4540
  • Changed code that converts tensors to PIL images in the writeyourown_pipeline notebook by @jere357 in #4489
  • Move slow tests to nightly by @DN6 in #4526
  • pin ruff version for quality checks by @DN6 in #4539
  • [docs] Clean scheduler api by @stevhliu in #4204
  • Move controlnet load local tests to nightly by @DN6 in #4543
  • Revert "introduce minimalistic reimplementation of SDXL on the SDXL doc" by @patrickvonplaten in #4548
  • fix some typo error by @VV-A-VV in #4546
  • improve controlnet sdxl docs now that we have a good checkpoint. by @sayakpaul in #4556
  • [Doc] update sdxl-controlnet repo name by @yiyixuxu in #4564
  • [docs] Expand prompt weighting by @stevhliu in #4516
  • [docs] Remove attention slicing by @stevhliu in #4518
  • [docs] Add safetensors flag by @stevhliu in #4245
  • Convert Stable Diffusion ControlNet to TensorRT by @dotieuthien in #4465
  • Remove code snippets containing is_safetensors_available() by @chiral-carbon in #4521
  • Fixing repo_id regex validation error on windows platforms by @Mystfit in #4358
  • [Examples] fix: networkalpha -> networkalphas by @sayakpaul in #4572
  • [docs] Fix ControlNet SDXL docstring by @stevhliu in #4582
  • [Utility] adds an image grid utility by @sayakpaul in #4576
  • Fixed invalid pipelineclassname parameter. by @AisingioroHao0 in #4590
  • Fix git-lfs command typo in docs by @clairefro in #4586
  • [Examples] Update InstructPix2Pix README_sdxl.md to fix mentions by @sayakpaul in #4574
  • [Pipeline utils] feat: implement pushtohub for standalone models, schedulers as well as pipelines by @sayakpaul in #4128
  • An invalid clerical error in sdxl finetune by @XDUWQ in #4608
  • [Docs] fix links in the controlling generation doc. by @sayakpaul in #4612
  • add: pushtohubmixin to pipelines and schedulers docs overview. by @sayakpaul in #4607
  • add: train to text image with sdxl script. by @sayakpaul in #4505
  • Add GLIGEN implementation by @nikhil-masterful in #4441
  • Update text2image.md to fix the links by @sayakpaul in #4626
  • Fix unipc usekarrassigmas exception - fixes huggingface/diffusers#4580 by @reimager in #4581
  • [research_projects] SDXL controlnet script by @patil-suraj in #4633
  • [Core] feat: MultiControlNet support for SDXL ControlNet pipeline by @sayakpaul in #4597
  • [docs] PushToHubMixin by @stevhliu in #4622
  • [docs] MultiControlNet by @stevhliu in #4635
  • fix loading custom text encoder when using from_single_file by @DN6 in #4571
  • make things clear in the controlnet sdxl doc. by @sayakpaul in #4644
  • Fix UnboundLocalError during LoRA loading by @slessans in #4523
  • Support higher dimension LoRAs by @isidentical in #4625
  • [Safetensors] Make safetensors the default way of saving weights by @patrickvonplaten in #4235

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @kxhit
    • [Pipelines] Add community pipeline for Zero123 (#4295)
  • @okotaku
    • [Examples] Support traintexttoimagelora_sdxl.py (#4365)
  • @dotieuthien
    • Convert Stable Diffusion ControlNet to TensorRT (#4465)
  • @nikhil-masterful
    • Add GLIGEN implementation (#4441)

- Python
Published by sayakpaul over 2 years ago

diffusers - Patch release: Fix incorrect filenaming

0.19.3 is a patch release to make sure import diffusers works without transformers being installed.

It includes a fix of this issue.

All commits

[SDXL] Fix dummy imports incorrect naming by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/4370

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Support for SDXL Kohya-style LoRAs, Fix batched inference SDXL Img2Img, Improve watermarker

We still had some bugs 🐛 in 0.19.1 some bugs, notably:

SDXL (Kohya-style) LoRA

The official SD-XL 1.0 LoRA (Kohya-styled) is now supported thanks to https://github.com/huggingface/diffusers/pull/4287. You can try it as follows:

```py from diffusers import DiffusionPipeline import torch

pipe = DiffusionPipeline.frompretrained("stabilityai/stable-diffusion-xl-base-1.0", torchdtype=torch.float16) pipe.loadloraweights("stabilityai/stable-diffusion-xl-base-1.0", weightname="sdxloffsetexample-lora1.0.safetensors") pipe.to(torchdtype=torch.float16) pipe.to("cuda")

prompt = "beautiful scenery nature glass bottle landscape, purple galaxy bottle" negative_prompt = "text, watermark"

image = pipe(prompt, negativeprompt=negativeprompt, numinferencesteps=25).images[0] ```

256872357-33ce5e16-2bbd-472e-a72d-6499a2114ee1

In addition, a couple more SDXL LoRAs are now supported:

(SDXL 0.9:) * https://civitai.com/models/22279?modelVersionId=118556 * https://civitai.com/models/104515/sdxlor30costumesrevue-starlight-saijoclaudine-lora * https://civitai.com/models/108448/daiton-sdxl-test * https://filebin.net/2ntfqqnapiu9q3zx/pixelbuildings128-v1.safetensors

To know more details and the known limitations, please check out the documentation.

Thanks to @isidentical for their sincere help in the PR.

Batched inference

@bghira found that for SDXL Img2Img batched inference led to weird artifacts. That is fixed in: https://github.com/huggingface/diffusers/pull/4327.

Downloads

Under some circumstances SD-XL 1.0 can download ONNX weights which is corrected in https://github.com/huggingface/diffusers/pull/4338.

Improved SDXL behavior

https://github.com/huggingface/diffusers/pull/4346 allows the user to disable the watermarker under certain circumstances to improve the usability of SDXL.

All commits:

  • [SDXL Refiner] Fix refiner forward pass for batched input by @patrickvonplaten in #4327
  • [ONNX] Don't download ONNX model by default by @patrickvonplaten in #4338
  • [SDXL] Make watermarker optional under certain circumstances to improve usability of SDXL 1.0 by @patrickvonplaten in #4346
  • [Feat] Support SDXL Kohya-style LoRA by @sayakpaul in #4287

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: Fix torch compile and local_files_only

In 0.19.0 some bugs :bug: found their way into the release. We're very sorry about this :pray:

This patch releases fixes all of them.

All commits

  • update Kandinsky doc by @yiyixuxu in #4301
  • [Torch.compile] Fixes torch compile graph break by @patrickvonplaten in #4315
  • Fix SDXL conversion from original to diffusers by @duongna21 in #4280
  • fix a bug in StableDiffusionUpscalePipeline when prompt is None by @yiyixuxu in #4278
  • [Local loading] Correct bug with local files only by @patrickvonplaten in #4318
  • Release: v0.19.1 by @patrickvonplaten (direct commit on v0.19.1-patch)

- Python
Published by patrickvonplaten over 2 years ago

diffusers - v0.19.0: SD-XL 1.0 (permissive license), AutoPipelines, Improved Kanidnsky & Asymmetric VQGAN

SDXL 1.0

Stable Diffusion XL (SDXL) 1.0 with permissive CreativeML Open RAIL++-M License was released today. We provide full compatibility with SDXL in diffusers.

```py from diffusers import DiffusionPipeline import torch

pipe = DiffusionPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-1.0", torchdtype=torch.float16, variant="fp16", use_safetensors=True ) pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt=prompt).images[0] image ```

download (6)

Many additional cool features are released: - Pipelines for - Img2Img - Inpainting - Torch compile support - Model offloading - Ensemble of Denoising Exports (E-Diffi approach) - thanks to @bghira @SytanSD @Birch-san @AmericanPresidentJimmyCarter

Refer to the documentation to know more.

New training scripts for SDXL

When there’s a new pipeline, there ought to be new training scripts. We added support for the following training scripts that build on top of SDXL:

Shoutout to @harutatsuakiyama for contributing the training script for InstructPix2Pix in #4079.

New pipelines for SDXL

The ControlNet and InstructPix2Pix training scripts also needed their respective pipelines. So, we added support for the following pipelines as well:

  • StableDiffusionXLControlNetPipeline
  • StableDiffusionXLInstructPix2PixPipeline

The ControlNet and InstructPix2Pix pipelines don’t have interesting checkpoints yet. We hope that the community will be able to leverage the training scripts from this release to help produce some.

Shoutout to @harutatsuakiyama for contributing the StableDiffusionXLInstructPix2PixPipeline in #4079.

The AutoPipeline API

We now support Auto APIs for the following tasks: text-to-image, image-to-image, and inpainting:

Here is how to use one:

```python from diffusers import AutoPipelineForTextToImage import torch

pipet2i = AutoPipelineForText2Image.frompretrained( "runwayml/stable-diffusion-v1-5", requiressafetychecker=False, torch_dtype=torch.float16 ).to("cuda")

prompt = "photo a majestic sunrise in the mountains, best quality, 4k" image = pipe_t2i(prompt).images[0] image.save("image.png") ```

Without any extra memory, you can then switch to Image-to-Image

```python from diffusers import AutoPipelineForImageToImage

pipei2i = AutoPipelineForImageToImage.frompipe(pipe_t2i)

image = pipe_t2i("sunrise in snowy mountains", image=image, strength=0.75).images[0] image.save("image.png") ```

Supported Pipelines: SDv1, SDv2, SDXL, Kandinksy, ControlNet, IF ... with more to come.

Refer to the documentation to know more.

A new “combined pipeline” for the Kandinsky series

We introduced a new “combined pipeline” for the Kandinsky series to make it easier to use the Kandinsky prior and decoder together. This eliminates the need to initialize and use multiple pipelines for Kandinsky to generate images. Here is an example:

```python from diffusers import AutoPipelineForTextToImage import torch

pipe = AutoPipelineForTextToImage.frompretrained( "kandinsky-community/kandinsky-2-2-decoder", torchdtype=torch.float16 ) pipe.enablemodelcpu_offload()

prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k" image = pipe(prompt=prompt, numinferencesteps=25).images[0] image.save("image.png") ```

The following pipelines, which can be accessed via the "Auto" pipelines were added:

To know more, check out the following pages:

🚨🚨🚨 Breaking change for Kandinsky Mask Inpainting 🚨🚨🚨

NOW: mask_image repaints white pixels and preserves black pixels.

Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers API is aligned. We cannot have different mask formats for different pipelines.

Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:

```py

For PIL input

import PIL.ImageOps mask = PIL.ImageOps.invert(mask)

For PyTorch and Numpy input

mask = 1 - mask ```

Asymmetric VQGAN

Designing a Better Asymmetric VQGAN for StableDiffusion introduced a VQGAN that is particularly well-suited for inpainting tasks. This release brings the support of this new VQGAN. Here is how it can be used:

```python from io import BytesIO from PIL import Image import requests from diffusers import AsymmetricAutoencoderKL, StableDiffusionInpaintPipeline

def download_image(url: str) -> Image.Image: response = requests.get(url) return Image.open(BytesIO(response.content)).convert("RGB")

prompt = "a photo of a person" imgurl = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celebahq256.png" maskurl = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"

image = downloadimage(imgurl).resize((256, 256)) maskimage = downloadimage(mask_url).resize((256, 256))

pipe = StableDiffusionInpaintPipeline.frompretrained("runwayml/stable-diffusion-inpainting") pipe.vae = AsymmetricAutoencoderKL.frompretrained("cross-attention/asymmetric-autoencoder-kl-x-1-5") pipe.to("cuda")

image = pipe(prompt=prompt, image=image, maskimage=maskimage).images[0] image.save("image.jpeg") ```

Refer to the documentation to know more.

Thanks to @cross-attention for contributing this model in #3956.

Improved support for loading Kohya-style LoRA checkpoints

We are committed to providing seamless interoperability support of Kohya-trained checkpoints from diffusers. To that end, we improved the existing support for loading Kohya-trained checkpoints in diffusers. Users can expect further improvements in the upcoming releases.

Thanks to @takuma104 and @isidentical for contributing the improvements in #4147.

All commits

  • 📝 Fix broken link to models documentation by @kadirnar in #4026
  • move to 0.19.0dev by @patrickvonplaten in #4048
  • [SDXL] Partial diffusion support for Text2Img and Img2Img Pipelines by @bghira in #4015
  • Correct sdxl docs by @patrickvonplaten in #4058
  • Add circular padding for artifact-free StableDiffusionPanoramaPipeline by @EvgenyKashin in #4025
  • Update train_unconditional.py by @hjmnbnb in #3899
  • Trigger CI on ci-* branches by @Wauplin in #3635
  • Fix kandinsky remove safety by @patrickvonplaten in #4065
  • Multiply lr scheduler steps by num_processes. by @eliphatfs in #3983
  • [Community] Implementation of the IADB community pipeline by @tchambon in #3996
  • add kandinsky to readme table by @yiyixuxu in #4081
  • [From Single File] Force accelerate to be installed by @patrickvonplaten in #4078
  • fix requirement in SDXL by @killah-t-cell in #4082
  • fix: minor things in the SDXL docs. by @sayakpaul in #4070
  • [Invisible watermark] Correct version by @patrickvonplaten in #4087
  • [Feat] add: utility for unloading lora. by @sayakpaul in #4034
  • [tests] use parent class for monkey patching to not break other tests by @patrickvonplaten in #4088
  • Allow low precision vae sd xl by @patrickvonplaten in #4083
  • [SD-XL] Add inpainting by @patrickvonplaten in #4098
  • [Stable Diffusion Inpaint ]Fix dtype inpaint by @patrickvonplaten in #4113
  • [From ckpt] replace with os path join by @patrickvonplaten in #3746
  • [From single file] Make accelerate optional by @patrickvonplaten in #4132
  • add noise_sampler_seed to StableDiffusionKDiffusionPipeline.__call__ by @sunhs in #3911
  • Make setup.py compatible with pipenv by @apoorvaeternity in #4121
  • 📝 Update doc with more descriptive title and filename for "IF" section by @kadirnar in #4049
  • t2i pipeline by @williamberman in #3932
  • [Docs] Korean translation update by @Snailpong in #4022
  • [Enhance] Add rank in dreambooth by @okotaku in #4112
  • Refactor execution device & cpu offload by @patrickvonplaten in #4114
  • Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler by @clarencechen in #3865
  • [Core] add: controlnet support for SDXL by @sayakpaul in #4038
  • Docs/bentoml integration by @larme in #4090
  • Fixed SDXL single file loading to use the correct requested pipeline class by @Mystfit in #4142
  • feat: add act_fn param to OutValueFunctionBlock by @SauravMaheshkar in #3994
  • Add controlnet and vae from single file by @patrickvonplaten in #4084
  • fix incorrect attention head dimension in AttnProcessor2_0 by @zhvng in #4154
  • Fix bug in ControlNetPipelines with MultiControlNetModel of length 1 by @greentfrapp in #4032
  • Asymmetric vqgan by @cross-attention in #3956
  • Shap-E: add support for mesh output by @yiyixuxu in #4062
  • [From single file] Make sure that controlnet stays False for fromsinglefile by @patrickvonplaten in #4181
  • [ControlNet Training] Remove safety from controlnet by @patrickvonplaten in #4180
  • remove bentoml doc in favor of blogpost by @williamberman in #4182
  • Fix unloading of LoRAs when xformers attention procs are in use by @isidentical in #4179
  • [Safetensors] make safetensors a required dep by @patrickvonplaten in #4177
  • make enablesequentialcpu_offload more generic for third-party devices by @statelesshz in #4191
  • Allow passing different prompts to each text_encoder on stable_diffusion_xl pipelines by @apolinario in #4156
  • [SDXL ControlNet Training] Follow-up fixes by @sayakpaul in #4188
  • 📄 Renamed File for Better Understanding by @kadirnar in #4056
  • [docs] Clean up pipeline apis by @stevhliu in #3905
  • docs: Typo in dreambooth example README.md by @askulkarni2 in #4203
  • [fix] network_alpha when loading unet lora from old format by @Jackmin801 in #4221
  • fix no CFG for kandinsky pipelines by @yiyixuxu in #4193
  • fix a bug of prompt embeds in sdxl by @xiaohu2015 in #4099
  • Raise initial HTTPError if pipeline is not cached locally by @Wauplin in #4230
  • [SDXL] Fix sd xl encode prompt by @patrickvonplaten in #4237
  • [SD-XL] Fix sdxl controlnet inference by @patrickvonplaten in #4238
  • [docs] Changed path for ControlNet in docs by @rcmtcristian in #4215
  • Allow specifying denoisingstart and denoisingend as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers by @AmericanPresidentJimmyCarter in #4115
  • [docs] Other modalities by @stevhliu in #4205
  • docs: Add missing import statement in textual_inversion inference example by @askulkarni2 in #4227
  • [Docs] Fix from pretrained docs by @patrickvonplaten in #4240
  • [ControlNet SDXL training] fixes in the training script by @sayakpaul in #4223
  • [SDXL DreamBooth LoRA] add support for text encoder fine-tuning by @sayakpaul in #4097
  • Resolve bf16 error as mentioned in this issue by @nupurkmr9 in #4214
  • do not pass list to accelerator.init_trackers by @williamberman in #4248
  • [From Single File] Allow vae to be loaded by @patrickvonplaten in #4242
  • [SDXL] Improve docs by @patrickvonplaten in #4196
  • [draft v2] AutoPipeline by @yiyixuxu in #4138
  • Update README_sdxl.md to change the note on default hyperparameters by @sayakpaul in #4258
  • [fromsinglefile] Fix circular import by @patrickvonplaten in #4259
  • Model path for sdxl wrong in dreambooth README by @rrva in #4261
  • [SDXL and IP2P]: instruction pix2pix XL training and pipeline by @harutatsuakiyama in #4079
  • [docs] Fix image in SDXL docs by @stevhliu in #4267
  • [SDXL DreamBooth LoRA] multiple fixes by @sayakpaul in #4262
  • Load Kohya-ss style LoRAs with auxilary states by @isidentical in #4147
  • Fix all missing optional import statements from pipeline folders by @patrickvonplaten in #4272
  • [Kandinsky] Add combined pipelines / Fix cpu model offload / Fix inpainting by @patrickvonplaten in #4207
  • Where did this 'x' come from, Elon? by @camenduru in #4277
  • add openvino and onnx runtime SD XL documentation by @echarlaix in #4285
  • Rename by @patrickvonplaten in #4294

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @Snailpong
    • [Docs] Korean translation update (#4022)
  • @clarencechen
    • Add Recent Timestep Scheduling Improvements to DDIM Inverse Scheduler (#3865)
  • @cross-attention
    • Asymmetric vqgan (#3956)
  • @AmericanPresidentJimmyCarter
    • Allow specifying denoisingstart and denoisingend as integers representing the discrete timesteps, fixing the XL ensemble not working for many schedulers (#4115)
  • @harutatsuakiyama
    • [SDXL and IP2P]: instruction pix2pix XL training and pipeline (#4079)

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: v0.18.2

Patch release to fix: - 1. torch.compile for SD-XL for certain GPUs - 2. from_single_file for all SD models - 3. Fix broken ONNX export - 4. Fix incorrect VAE FP16 casting - 5. Deprecate loading variants that don't exist

Note:

Loading any stable diffusion safetensors or ckpt with StableDiffusionPipeline.from_single_file or StableDiffusionmg2ImgIPipeline.from_single_file or StableDiffusionInpaintPipeline.from_single_file or StableDiffusionXLPipeline.from_single_file, ...

is now almost as fast as from_pretrained(...) and it's much more tested now.

All commits:

  • Make sure torch compile doesn't access unet config by @patrickvonplaten in #4008
  • [DiffusionPipeline] Deprecate not throwing error when loading non-existant variant by @patrickvonplaten in #4011
  • Correctly keep vae in float16 when using PyTorch 2 or xFormers by @pcuenca in #4019
  • minor improvements to the SDXL doc. by @sayakpaul in #3985
  • Remove remaining not in upscale pipeline by @pcuenca in #4020
  • FIX force_download in download utility by @Wauplin in #4036
  • Improve single loading file by @patrickvonplaten in #4041
  • keep usedefault_values as a list type by @oOraph in #4040

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release for Stable Diffusion XL 0.9

Patch release 0.18.1: Stable Diffusion XL 0.9 Research Release

Stable Diffusion XL 0.9 is now fully supported under the SDXL 0.9 Research License license here.

Having received access to stabilityai/stable-diffusion-xl-base-0.9, you can easily use it with diffusers:

Text-to-Image

```py from diffusers import StableDiffusionXLPipeline import torch

pipe = StableDiffusionXLPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-0.9", torchdtype=torch.float16, variant="fp16", use_safetensors=True ) pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt=prompt).images[0] ```

aaa (1)

Refining the image output

```py from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline import torch

pipe = StableDiffusionXLPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-0.9", torchdtype=torch.float16, variant="fp16", use_safetensors=True ) pipe.to("cuda")

refiner = StableDiffusionXLImg2ImgPipeline.frompretrained( "stabilityai/stable-diffusion-xl-refiner-0.9", torchdtype=torch.float16, use_safetensors=True, variant="fp16" ) refiner.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

image = pipe(prompt=prompt, outputtype="latent" if userefiner else "pil").images[0] image = refiner(prompt=prompt, image=image[None, :]).images[0] ```

Loading single file checkpoitns / original file format

```py from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline import torch

pipe = StableDiffusionXLPipeline.frompretrained( "stabilityai/stable-diffusion-xl-base-0.9", torchdtype=torch.float16, variant="fp16", use_safetensors=True ) pipe.to("cuda")

refiner = StableDiffusionXLImg2ImgPipeline.frompretrained( "stabilityai/stable-diffusion-xl-refiner-0.9", torchdtype=torch.float16, use_safetensors=True, variant="fp16" ) refiner.to("cuda") ```

Memory optimization via model offloading

diff - pipe.to("cuda") + pipe.enable_model_cpu_offload()

and

diff - refiner.to("cuda") + refiner.enable_model_cpu_offload()

Speed-up inference with torch.compile

diff + pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) + refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)

Note: If you're running the model with < torch 2.0, please make sure to run:

diff +pipe.enable_xformers_memory_efficient_attention() +refiner.enable_xformers_memory_efficient_attention()

For more details have a look at the official docs.

All commits

  • typo in safetensors (safetenstors) by @YoraiLevi in #3976
  • Fix code snippet for Audio Diffusion by @osanseviero in #3987
  • feat: add Dropout to Flax UNet by @SauravMaheshkar in #3894
  • Add 'rank' parameter to Dreambooth LoRA training script by @isidentical in #3945
  • Don't use bare prints in a library by @cmd410 in #3991
  • [Tests] Fix some slow tests by @patrickvonplaten in #3989
  • Add sdxl prompt embeddings by @patrickvonplaten in #3995

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Shap-E, Consistency Models, Video2Video

Shap-E

Shap-E is a 3D image generation model from OpenAI introduced in Shap-E: Generating Conditional 3D Implicit Functions.

We provide support for text-to-3d image generation and 2d-to-3d image generation from Diffusers.

Text to 3D

```py import torch from diffusers import ShapEPipeline from diffusers.utils import exporttogif

ckptid = "openai/shap-e" pipe = ShapEPipeline.frompretrained(ckpt_id).to("cuda")

guidancescale = 15.0 prompt = "A birthday cupcake" images = pipe( prompt, guidancescale=guidancescale, numinferencesteps=64, framesize=256, ).images

gifpath = exporttogif(images[0], "cake3d.gif") ```

cake_3d

Image to 3D

```py import torch from diffusers import ShapEImg2ImgPipeline from diffusers.utils import exporttogif, load_image

ckptid = "openai/shap-e-img2img" pipe = ShapEImg2ImgPipeline.frompretrained(ckpt_id).to("cuda")

imgurl = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shape/burgerin.png" image = loadimage(img_url)

generator = torch.Generator(device="cuda").manualseed(0) batchsize = 4 guidance_scale = 3.0

images = pipe( image, numimagesperprompt=batchsize, generator=generator, guidancescale=guidancescale, numinferencesteps=64, framesize =256, outputtype="pil" ).images

gifpath = exporttogif(images[0], "burgersampled_3d.gif") ```

Original image image

Generated burger_sampled_3d

For more details, check out the official documentation.

The model was contributed by @yiyixuxu in https://github.com/huggingface/diffusers/pull/3742.

Consistency models

Consistency models are diffusion models supporting fast one or few-step image generation. It was proposed by OpenAI in Consistency Models.

```python import torch

from diffusers import ConsistencyModelPipeline

device = "cuda"

Load the cdimagenet64l2 checkpoint.

modelidorpath = "openai/diffusers-cdimagenet64l2" pipe = ConsistencyModelPipeline.frompretrained(modelidorpath, torchdtype=torch.float16) pipe.to(device)

Onestep Sampling

image = pipe(numinferencesteps=1).images[0] image.save("consistencymodelonestep_sample.png")

Onestep sampling, class-conditional image generation

ImageNet-64 class label 145 corresponds to king penguins

image = pipe(numinferencesteps=1, classlabels=145).images[0] image.save("consistencymodelonestepsample_penguin.png")

Multistep sampling, class-conditional image generation

Timesteps can be explicitly specified; the particular timesteps below are from the original Github repo.

https://github.com/openai/consistency_models/blob/main/scripts/launch.sh#L77

image = pipe(timesteps=[22, 0], classlabels=145).images[0] image.save("consistencymodelmultistepsample_penguin.png") ```

For more details, see the official docs.

The model was contributed by our community members @dg845 and @ayushtues in https://github.com/huggingface/diffusers/pull/3492.

# Video-to-Video

Previous video generation pipelines tended to produce watermarks because those watermarks were present in their pretraining dataset. With the latest additions of the following checkpoints, we can now generate watermark-free videos:

```python import torch from diffusers import DiffusionPipeline from diffusers.utils import exporttovideo

pipe = DiffusionPipeline.frompretrained("cerspense/zeroscopev2576w", torchdtype=torch.float16) pipe.enablemodelcpu_offload()

memory optimization

pipe.unet.enableforwardchunking(chunksize=1, dim=1) pipe.enablevae_slicing()

prompt = "Darth Vader surfing a wave" videoframes = pipe(prompt, numframes=24).frames videopath = exporttovideo(videoframes) ```

darth_vader_waves

For more details, check out the official docs.

It was contributed by @patrickvonplaten in https://github.com/huggingface/diffusers/pull/3900.

All commits

  • remove seed by @yiyixuxu in #3734
  • Correct Token to upload docs by @patrickvonplaten in #3744
  • Correct another push token by @patrickvonplaten in #3745
  • [Stable Diffusion Inpaint & ControlNet inpaint] Correct timestep inpaint by @patrickvonplaten in #3749
  • [Documentation] Replace dead link to Flax install guide by @JeLuF in #3739
  • [documentation] grammatical fixes in installation.mdx by @LiamSwayne in #3735
  • Text2video zero refinements by @19and99 in #3733
  • [Tests] Relax tolerance of flaky failing test by @patrickvonplaten in #3755
  • [MultiControlNet] Allow save and load by @patrickvonplaten in #3747
  • Update pipelineflaxstablediffusioncontrolnet.py by @jfozard in #3306
  • update conversion script for Kandinsky unet by @yiyixuxu in #3766
  • [docs] Fix Colab notebook cells by @stevhliu in #3777
  • [Bug Report template] modify the issue template to include core maintainers. by @sayakpaul in #3785
  • [Enhance] Update reference by @okotaku in #3723
  • Fix broken cpu-offloading in legacy inpainting SD pipeline by @cmdr2 in #3773
  • Fix some bad comment in training scripts by @patrickvonplaten in #3798
  • Added LoRA loading to StableDiffusionKDiffusionPipeline by @tripathiarpan20 in #3751
  • UnCLIP Image Interpolation -> Keep same initial noise across interpolation steps by @Abhinay1997 in #3782
  • feat: add PR template. by @sayakpaul in #3786
  • Ldm3d first PR by @estelleafl in #3668
  • Complete setattnprocessor for prior and vae by @patrickvonplaten in #3796
  • fix typo by @Isotr0py in #3800
  • manual check for checkpointstotallimit instead of using accelerate by @williamberman in #3681
  • [train text to image] add note to loading from checkpoint by @williamberman in #3806
  • device map legacy attention block weight conversion by @williamberman in #3804
  • [docs] Zero SNR by @stevhliu in #3776
  • [ldm3d] Fixed small typo by @estelleafl in #3820
  • [Examples] Improve the model card pushed from the train_text_to_image.py script by @sayakpaul in #3810
  • [Docs] add missing pipelines from the overview pages and minor fixes by @sayakpaul in #3795
  • [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models by @AndyShih12 in #3716
  • Update control_brightness.mdx by @dqueue in #3825
  • Support ControlNet models with different number of channels in control images by @JCBrouwer in #3815
  • Add ddpm kandinsky by @yiyixuxu in #3783
  • [docs] More API stuff by @stevhliu in #3835
  • relax tol attention conversion test by @williamberman in #3842
  • fix: random module seeding by @sayakpaul in #3846
  • fix audio_diffusion tests by @teticio in #3850
  • Correct bad attn naming by @patrickvonplaten in #3797
  • [Conversion] Small fixes by @patrickvonplaten in #3848
  • Fix some audio tests by @patrickvonplaten in #3841
  • [Docs] add: contributor note in the paradigms docs. by @sayakpaul in #3852
  • Update Habana Gaudi doc by @regisss in #3863
  • Add guidance start/stop by @holwech in #3770
  • feat: rename single-letter vars in resnet.py by @SauravMaheshkar in #3868
  • Fixing the global_step key not found by @VincentNeemie in #3844
  • Support for manual CLIP loading in StableDiffusionPipeline - txt2img. by @WadRex in #3832
  • fix sde add noise typo by @UranusITS in #3839
  • [Tests] add test for checking soft dependencies. by @sayakpaul in #3847
  • [Enhance] Add LoRA rank args in traintexttoimagelora by @okotaku in #3866
  • [docs] Model API by @stevhliu in #3562
  • fix/docs: Fix the broken doc links by @Aisuko in #3897
  • Add video img2img by @patrickvonplaten in #3900
  • fix/doc-code: Updating to the latest version parameters by @Aisuko in #3924
  • fix/doc: no import torch issue by @Aisuko in #3923
  • Correct controlnet out of list error by @patrickvonplaten in #3928
  • Adding better way to define multiple concepts and also validation capabilities. by @mauricio-repetto in #3807
  • [ldm3d] Update code to be functional with the new checkpoints by @estelleafl in #3875
  • Improve memory text to video by @patrickvonplaten in #3930
  • revert automatic chunking by @patrickvonplaten in #3934
  • avoid upcasting by assigning dtype to noise tensor by @prathikr in #3713
  • Fix failing np tests by @patrickvonplaten in #3942
  • Add timestep_spacing and steps_offset to schedulers by @pcuenca in #3947
  • Add Consistency Models Pipeline by @dg845 in #3492
  • Update consistency_models.mdx by @sayakpaul in #3961
  • Make UNet2DConditionOutput pickle-able by @prathikr in #3857
  • [Consistency Models] correct checkpoint url in the doc by @sayakpaul in #3962
  • [Text-to-video] Add torch.compile() compatibility by @sayakpaul in #3949
  • [SD-XL] Add new pipelines by @patrickvonplaten in #3859
  • Kandinsky 2.2 by @cene555 in #3903
  • Add Shap-E by @yiyixuxu in #3742
  • disable num attenion heads by @patrickvonplaten in #3969
  • Improve SD XL by @patrickvonplaten in #3968
  • fix/doc-code: import torch and fix the broken document address by @Aisuko in #3941

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @estelleafl
    • Ldm3d first PR (#3668)
    • [ldm3d] Fixed small typo (#3820)
    • [ldm3d] Update code to be functional with the new checkpoints (#3875)
  • @AndyShih12
    • [Pipeline] Add new pipeline for ParaDiGMS -- parallel sampling of diffusion models (#3716)
  • @dg845
    • Add Consistency Models Pipeline (#3492)

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: v0.17.1

Patch release to fix timestep for inpainting - Stable Diffusion Inpaint & ControlNet inpaint - Correct timestep inpaint in #3749 by @patrickvonplaten

- Python
Published by patrickvonplaten over 2 years ago

diffusers - v0.17.0 Improved LoRA, Kandinsky 2.1, Torch Compile Speed-up & More

Kandinsky 2.1

Kandinsky 2.1 inherits best practices from DALL-E 2 and Latent Diffusion while introducing some new ideas.

Installation

bash pip install diffusers transformers accelerate

Code example

```python from diffusers import DiffusionPipeline import torch

pipeprior = DiffusionPipeline.frompretrained("kandinsky-community/kandinsky-2-1-prior", torchdtype=torch.float16) pipeprior.to("cuda")

t2ipipe = DiffusionPipeline.frompretrained("kandinsky-community/kandinsky-2-1", torchdtype=torch.float16) t2ipipe.to("cuda")

prompt = "A alien cheeseburger creature eating itself, claymation, cinematic, moody lighting" negative_prompt = "low quality, bad quality"

generator = torch.Generator(device="cuda").manualseed(12) imageembeds, negativeimageembeds = pipeprior(prompt, negativeprompt, guidancescale=1.0, generator=generator).totuple()

image = t2ipipe(prompt, negativeprompt=negativeprompt, imageembeds=imageembeds, negativeimageembeds=negativeimageembeds).images[0] image.save("cheeseburgermonster.png")

```

img

To learn more about the Kandinsky pipelines, and more details about speed and memory optimizations, please have a look at the docs.

Thanks @ayushtues, for helping with the integration of Kandinsky 2.1!

UniDiffuser

UniDiffuser introduces a multimodal diffusion process that is capable of handling different generation tasks using a single unified approach:

  • Unconditional image and text generation
  • Joint image-text generation
  • Text-to-image generation
  • Image-to-text generation
  • Image variation
  • Text variation

Below is an example of how to use UniDiffuser for text-to-image generation:

```python import torch from diffusers import UniDiffuserPipeline

modelidorpath = "thu-ml/unidiffuser-v1" pipe = UniDiffuserPipeline.frompretrained(modelidorpath, torchdtype=torch.float16) pipe.to("cuda")

This mode can be inferred from the input provided to the pipe.

pipe.settexttoimagemode()

prompt = "an elephant under the sea" sample = pipe(prompt=prompt, numinferencesteps=20, guidance_scale=8.0).images[0] sample.save("elephant.png") ```

Check out the UniDiffuser docs to know more.

UniDiffuser was added by @dg845 in this PR.

LoRA

We're happy to support the A1111 formatted CivitAI LoRA checkpoints in a limited capacity.

First, download a checkpoint. We’ll use this one for demonstration purposes.

bash wget https://civitai.com/api/download/models/15603 -O light_and_shadow.safetensors

Next, we initialize a DiffusionPipeline:

```python import torch

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

pipeline = StableDiffusionPipeline.frompretrained( "gsdf/Counterfeit-V2.5", torchdtype=torch.float16, safetychecker=None ).to("cuda") pipeline.scheduler = DPMSolverMultistepScheduler.fromconfig( pipeline.scheduler.config, usekarrassigmas=True ) ```

We then load the checkpoint downloaded from CivitAI:

python pipeline.load_lora_weights(".", weight_name="light_and_shadow.safetensors")

(If you’re loading a checkpoint in the safetensors format, please ensure you have safetensors installed.)

And then it’s time for running inference:

```python prompt = "masterpiece, best quality, 1girl, at dusk" negative_prompt = ("(low quality, worst quality:1.4), (bad anatomy), (inaccurate limb:1.2), " "bad composition, inaccurate eyes, extra digit, fewer digits, (extra arms:1.2), large breasts")

images = pipeline(prompt=prompt, negativeprompt=negativeprompt, width=512, height=768, numinferencesteps=15, numimagesperprompt=4, generator=torch.manualseed(0) ).images ```

Below is a comparison between the LoRA and the non-LoRA results:

Check out the docs to learn more.

Thanks to @takuma104 for contributing this feature via this PR.

Torch 2.0 Compile Speed-up

We introduced Torch 2.0 support for computing attention efficiently in 0.13.0. Since then, we have made a number of improvements to ensure the number of "graph breaks" in our models is reduced so that the models can be compiled with torch.compile(). As a result, we are happy to report massive improvements in the inference speed of our most popular pipelines. Check out this doc to know more.

Thanks to @Chillee for helping us with this. Thanks to @patrickvonplaten for fixing the problems stemming from "graph breaks" in this PR.

VAE pre-processing

We added a Vae Image processor class that provides a unified API for pipelines to prepare their image inputs, as well as post-processing their outputs. It supports resizing, normalization, and conversion between PIL Image, PyTorch, and Numpy arrays.

With that, all Stable diffusion pipelines now accept image inputs in the format of Pytorch Tensor and Numpy array, in addition to PIL Image, and can produce outputs in these 3 formats. It will also accept and return latents. This means you can now take generated latents from one pipeline and pass them to another as inputs, without leaving the latent space. If you work with multiple pipelines, you can pass Pytorch Tensor between them without converting to PIL Image.

To learn more about the API, check out our doc here

ControlNet Img2Img & Inpainting

ControlNet is one of the most used diffusion models and upon strong demand from the community we added controlnet img2img and controlnet inpaint pipelines. This allows to use any controlnet checkpoint for both image-2-image setting as well as for inpaint.

:pointright: Inpaint: See controlnet inpaint model here :pointright: Image-to-Image: Any controlnet checkpoint can be used for image to image, e.g.: ```py from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, UniPCMultistepScheduler from diffusers.utils import load_image import numpy as np import torch

import cv2 from PIL import Image

download an image

image = loadimage( "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/inputimagevermeer.png" ) npimage = np.array(image)

get canny image

npimage = cv2.Canny(npimage, 100, 200) npimage = npimage[:, :, None] npimage = np.concatenate([npimage, npimage, npimage], axis=2) cannyimage = Image.fromarray(npimage)

load control net and stable diffusion v1-5

controlnet = ControlNetModel.frompretrained("lllyasviel/sd-controlnet-canny", torchdtype=torch.float16) pipe = StableDiffusionControlNetImg2ImgPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torchdtype=torch.float16 )

speed up diffusion process with faster scheduler and memory optimization

pipe.scheduler = UniPCMultistepScheduler.fromconfig(pipe.scheduler.config) pipe.enablemodelcpuoffload()

generate image

generator = torch.manualseed(0) image = pipe( "futuristic-looking woman", numinferencesteps=20, generator=generator, image=image, controlimage=canny_image, ).images[0] ```

Diffedit Zero-Shot Inpainting Pipeline

This pipeline (introduced in DiffEdit: Diffusion-based semantic image editing with mask guidance) allows for image editing with natural language. Below is an end-to-end example.

First, let’s load our pipeline:

```python import torch from diffusers import DDIMScheduler, DDIMInverseScheduler, StableDiffusionDiffEditPipeline

sdmodelckpt = "stabilityai/stable-diffusion-2-1" pipeline = StableDiffusionDiffEditPipeline.frompretrained( sdmodelckpt, torchdtype=torch.float16, safetychecker=None, ) pipeline.scheduler = DDIMScheduler.fromconfig(pipeline.scheduler.config) pipeline.inversescheduler = DDIMInverseScheduler.fromconfig(pipeline.scheduler.config) pipeline.enablemodelcpuoffload() pipeline.enablevaeslicing() generator = torch.manualseed(0) ```

Then, we load an input image to edit using our method:

```python from diffusers.utils import load_image

imgurl = "https://github.com/Xiang-cd/DiffEdit-stable-diffusion/raw/main/assets/origin.png" rawimage = loadimage(imgurl).convert("RGB").resize((768, 768)) ```

Then, we employ the source and target prompts to generate the editing mask:

python source_prompt = "a bowl of fruits" target_prompt = "a basket of fruits" mask_image = pipeline.generate_mask( image=raw_image, source_prompt=source_prompt, target_prompt=target_prompt, generator=generator, )

Then, we employ the caption and the input image to get the inverted latents:

python inv_latents = pipeline.invert(prompt=source_prompt, image=raw_image, generator=generator).latents

Now, generate the image with the inverted latents and semantically generated mask:

python image = pipeline( prompt=target_prompt, mask_image=mask_image, image_latents=inv_latents, generator=generator, negative_prompt=source_prompt, ).images[0] image.save("edited_image.png")

Check out the docs to learn more about this pipeline.

Thanks to @clarencechen for contributing this pipeline in this PR.

Docs

Apart from these, we have made multiple improvements to the overall quality-of-life of our docs.

Thanks to @stevhliu for leading the charge here.

Misc

  • xformers attention processor fix when using LoRA (PR by @takuma104)
  • Pytorch 2.0 SDPA implementation of the LoRA attention processor (PR)

All commits

  • Post release for 0.16.0 by @patrickvonplaten in #3244
  • [docs] only mention one stage by @pcuenca in #3246
  • Write model card in controlnet training script by @pcuenca in #3229
  • [2064]: Add stochastic sampler (sampledpmppsde) by @nipunjindal in #3020
  • [Stochastic Sampler][Slow Test]: Cuda test fixes by @nipunjindal in #3257
  • Remove required from trackerprojectname by @pcuenca in #3260
  • adding required parameters while calling the getupblock and getdownblock by @init-22 in #3210
  • [docs] Update interface in repaint.mdx by @ernestchu in #3119
  • Update IF name to XL by @apolinario in #3262
  • fix typo in score sde pipeline by @fecet in #3132
  • Fix typo in textual inversion JAX training script by @jairtrejo in #3123
  • AudioDiffusionPipeline - fix encode method after config changes by @teticio in #3114
  • Revert "Revert "[Community Pipelines] Update lpwstablediffusion pipeline"" by @patrickvonplaten in #3265
  • Fix community pipelines by @patrickvonplaten in #3266
  • update notebook by @yiyixuxu in #3259
  • [docs] add notes for stateful model changes by @williamberman in #3252
  • [LoRA] quality of life improvements in the loading semantics and docs by @sayakpaul in #3180
  • [Community Pipelines] EDICT pipeline implementation by @Joqsan in #3153
  • [Docs]zh translated docs update by @DrDavidS in #3245
  • Update logging.mdx by @standardAI in #2863
  • Add multiple conditions to StableDiffusionControlNetInpaintPipeline by @timegate in #3125
  • Let's make sure that dreambooth always uploads to the Hub by @patrickvonplaten in #3272
  • Diffedit Zero-Shot Inpainting Pipeline by @clarencechen in #2837
  • add constant learning rate with custom rule by @jason9075 in #3133
  • Allow disabling torch 2_0 attention by @patrickvonplaten in #3273
  • [doc] add link to training script by @yiyixuxu in #3271
  • temp disable spectogram diffusion tests by @williamberman in #3278
  • Changed sample[0] to images[0] by @IliaLarchenko in #3304
  • Typo in tutorial by @IliaLarchenko in #3295
  • Torch compile graph fix by @patrickvonplaten in #3286
  • Postprocessing refactor img2img by @yiyixuxu in #3268
  • [Torch 2.0 compile] Fix more torch compile breaks by @patrickvonplaten in #3313
  • fix: scale_lr and sync example readme and docs. by @sayakpaul in #3299
  • Update stable_diffusion.mdx by @mu94-csl in #3310
  • Fix missing variable assign in DeepFloyd-IF-II by @gitmylo in #3315
  • Correct doc build for patch releases by @patrickvonplaten in #3316
  • Add Stable Diffusion RePaint to community pipelines by @Markus-Pobitzer in #3320
  • Fix multistep dpmsolver for cosine schedule (suitable for deepfloyd-if) by @LuChengTHU in #3314
  • [docs] Improve LoRA docs by @stevhliu in #3311
  • Added input pretubation by @isamu-isozaki in #3292
  • Update writeownpipeline.mdx by @csaybar in #3323
  • update controlling generation doc with latest goodies. by @sayakpaul in #3321
  • [Quality] Make style by @patrickvonplaten in #3341
  • Fix config dpm by @patrickvonplaten in #3343
  • Add the SDE variant of DPM-Solver and DPM-Solver++ by @LuChengTHU in #3344
  • Add upsample_size to AttnUpBlock2D, AttnDownBlock2D by @will-rice in #3275
  • Rename --onlysaveembeds to --saveasfull_pipeline by @arrufat in #3206
  • [AudioLDM] Generalise conversion script by @sanchit-gandhi in #3328
  • Fix TypeError when using promptembeds and negativeprompt by @At-sushi in #2982
  • Fix pipeline class on README by @themrzmaster in #3345
  • Inpainting: typo in docs by @LysandreJik in #3331
  • Add use_Karras_sigmas to LMSDiscreteScheduler by @Isotr0py in #3351
  • Batched load of textual inversions by @pdoane in #3277
  • [docs] Fix docstring by @stevhliu in #3334
  • if dreambooth lora by @williamberman in #3360
  • Postprocessing refactor all others by @yiyixuxu in #3337
  • [docs] Improve safetensors docstring by @stevhliu in #3368
  • add: a warning message when using xformers in a PT 2.0 env. by @sayakpaul in #3365
  • StableDiffusionInpaintingPipeline - resize image w.r.t height and width by @rupertmenneer in #3322
  • [docs] Adapt a model by @stevhliu in #3326
  • [docs] Load safetensors by @stevhliu in #3333
  • [Docs] Fix stable_diffusion.mdx typo by @sudowind in #3398
  • Support ControlNet v1.1 shuffle properly by @takuma104 in #3340
  • [Tests] better determinism by @sayakpaul in #3374
  • [docs] Add transformers to install by @stevhliu in #3388
  • [deepspeed] partial ZeRO-3 support by @stas00 in #3076
  • Add omegaconf for tests by @patrickvonplaten in #3400
  • Fix various bugs with LoRA Dreambooth and Dreambooth script by @patrickvonplaten in #3353
  • Fix docker file by @patrickvonplaten in #3402
  • fix: deepseepd_plugin retrieval from accelerate state by @sayakpaul in #3410
  • [Docs] Add sigmoid beta_scheduler to docstrings of relevant Schedulers by @Laurent2916 in #3399
  • Don't install accelerate and transformers from source by @patrickvonplaten in #3415
  • Don't install transformers and accelerate from source by @patrickvonplaten in #3414
  • Improve fast tests by @patrickvonplaten in #3416
  • attention refactor: the trilogy by @williamberman in #3387
  • [Docs] update the PT 2.0 optimization doc with latest findings by @sayakpaul in #3370
  • Fix style rendering by @pcuenca in #3433
  • unCLIP scheduler do not use note by @williamberman in #3417
  • Replace deprecated command with environment file by @jongwooo in #3409
  • fix warning message pipeline loading by @patrickvonplaten in #3446
  • add stable diffusion tensorrt img2img pipeline by @asfiyab-nvidia in #3419
  • Refactor controlnet and add img2img and inpaint by @patrickvonplaten in #3386
  • [Scheduler] DPM-Solver (++) Inverse Scheduler by @clarencechen in #3335
  • [Docs] Fix incomplete docstring for resnet.py by @Laurent2916 in #3438
  • fix tiled vae blend extent range by @superlabs-dev in #3384
  • Small update to "Next steps" section by @pcuenca in #3443
  • Allow arbitrary aspect ratio in IFSuperResolutionPipeline by @devxpy in #3298
  • Adding 'strength' parameter to StableDiffusionInpaintingPipeline by @rupertmenneer in #3424
  • [WIP] Bugfix - Pipeline.from_pretrained is broken when the pipeline is partially downloaded by @vimarshc in #3448
  • Fix gradient checkpointing bugs in freezing part of models (requires_grad=False) by @7eu7d7 in #3404
  • Make dreambooth lora more robust to orig unet by @patrickvonplaten in #3462
  • Reduce peak VRAM by releasing large attention tensors (as soon as they're unnecessary) by @cmdr2 in #3463
  • Add min snr to text2img lora training script by @wfng92 in #3459
  • Add inpaint lora scale support by @Glaceon-Hyy in #3460
  • [From ckpt] Fix from_ckpt by @patrickvonplaten in #3466
  • Update full dreambooth script to work with IF by @williamberman in #3425
  • Add IF dreambooth docs by @williamberman in #3470
  • parameterize pass single args through tuple by @williamberman in #3477
  • attend and excite tests disable determinism on the class level by @williamberman in #3478
  • dreambooth docs torch.compile note by @williamberman in #3471
  • add: if entry in the dreambooth training docs. by @sayakpaul in #3472
  • [docs] Textual inversion inference by @stevhliu in #3473
  • [docs] Distributed inference by @stevhliu in #3376
  • [{Up,Down}sample1d] explicit view kernel size as number elements in flattened indices by @williamberman in #3479
  • mps & onnx tests rework by @pcuenca in #3449
  • [Attention processor] Better warning message when shifting to AttnProcessor2_0 by @sayakpaul in #3457
  • [Docs] add note on local directory path. by @sayakpaul in #3397
  • Refactor full determinism by @patrickvonplaten in #3485
  • Fix DPM single by @patrickvonplaten in #3413
  • Add use_Karras_sigmas to DPMSolverSinglestepScheduler by @Isotr0py in #3476
  • Adds localfilesonly bool to prevent forced online connection by @w4ffl35 in #3486
  • [Docs] Korean translation (optimization, training) by @Snailpong in #3488
  • DataLoader respecting EXIF data in Training Images by @Ambrosiussen in #3465
  • feat: allow disk offload for diffuser models by @hari10599 in #3285
  • [Community] reference only control by @okotaku in #3435
  • Support for cross-attention bias / mask by @Birch-san in #2634
  • do not scale the initial global step by gradient accumulation steps when loading from checkpoint by @williamberman in #3506
  • Fix bug in panorama pipeline when using dpmsolver scheduler by @Isotr0py in #3499
  • [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU by @yingjie-han in #3105
  • [Community] ControlNet Reference by @okotaku in #3508
  • Allow custom pipeline loading by @patrickvonplaten in #3504
  • Make sure Diffusers works even if Hub is down by @patrickvonplaten in #3447
  • Improve README by @patrickvonplaten in #3524
  • Update README.md by @patrickvonplaten in #3525
  • Run torch.compile tests in separate subprocesses by @pcuenca in #3503
  • fix attention mask pad check by @williamberman in #3531
  • explicit broadcasts for assignments by @williamberman in #3535
  • [Examples/DreamBooth] refactor savemodelcard utility in dreambooth examples by @sayakpaul in #3543
  • Fix panorama to support all schedulers by @Isotr0py in #3546
  • Add open parti prompts to docs by @patrickvonplaten in #3549
  • Add Kandinsky 2.1 by @yiyixuxu @ayushtues in #3308
  • fix broken change for vq pipeline by @yiyixuxu in #3563
  • [Stable Diffusion Inpainting] Allow standard text-to-img checkpoints to be useable for SD inpainting by @patrickvonplaten in #3533
  • Fix loaded_token reference before definition by @eminn in #3523
  • renamed variable to input_ and output_ by @vikasmech in #3507
  • Correct inpainting controlnet docs by @patrickvonplaten in #3572
  • Fix controlnet guess mode euler by @patrickvonplaten in #3571
  • [docs] Add AttnProcessor to docs by @stevhliu in #3474
  • [WIP] Add UniDiffuser model and pipeline by @dg845 in #2963
  • Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled by @takuma104 in #3556
  • fix dreambooth attention mask by @linbo0518 in #3541
  • [IF super res] correctly normalize PIL input by @williamberman in #3536
  • [docs] Maintenance by @stevhliu in #3552
  • [docs] update the broken links by @brandonJY in #3568
  • [docs] Working with different formats by @stevhliu in #3534
  • remove print statements from attention processor. by @sayakpaul in #3592
  • Fix temb attention by @patrickvonplaten in #3607
  • [docs] update the broken links by @kadirnar in #3577
  • [UniDiffuser Tests] Fix some tests by @sayakpaul in #3609
  • #3487 Fix inpainting strength for various samplers by @rupertmenneer in #3532
  • [Community] Support StableDiffusionTilingPipeline by @kadirnar in #3586
  • [Community, Enhancement] Add reference tricks in README by @okotaku in #3589
  • [Feat] Enable State Dict For Textual Inversion Loader by @ghunkins in #3439
  • [Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline by @TheDenk in #3587
  • fix tests by @patrickvonplaten in #3614
  • Make sure we also change the config when setting encoder_hid_dim_type=="text_proj" and allow xformers by @patrickvonplaten in #3615
  • goodbye frog by @williamberman in #3617
  • update code to reflect latest changes as of May 30th by @prathikr in #3616
  • update dreambooth lora to work with IF stage II by @williamberman in #3560
  • Full Dreambooth IF stage II upscaling by @williamberman in #3561
  • [Docs] include the instruction-tuning blog link in the InstructPix2Pix docs by @sayakpaul in #3644
  • [Kandinsky] Improve kandinsky API a bit by @patrickvonplaten in #3636
  • Support Kohya-ss style LoRA file format (in a limited capacity) by @takuma104 in #3437
  • Iterate over unique tokens to avoid duplicate replacements for multivector embeddings by @lachlan-nicholson in #3588
  • fixed typo in example traintextto_image.py by @kashif in #3608
  • fix inpainting pipeline when providing initial latents by @yiyixuxu in #3641
  • [Community Doc] Updated the filename and readme file. by @kadirnar in #3634
  • add Stable Diffusion TensorRT Inpainting pipeline by @asfiyab-nvidia in #3642
  • set config from original module but set compiled module on class by @williamberman in #3650
  • dreambooth if docs - stage II, more info by @williamberman in #3628
  • linting fix by @williamberman in #3653
  • Set steprules correctly for piecewiseconstant scheduler by @0x1355 in #3605
  • Allow setting numcycles for cosinewith_restarts lr scheduler by @0x1355 in #3606
  • [docs] Load A1111 LoRA by @stevhliu in #3629
  • dreambooth upscaling fix added latents by @williamberman in #3659
  • Correct multi gpu dreambooth by @patrickvonplaten in #3673
  • Fix from_ckpt not working properly on windows by @LyubimovVladislav in #3666
  • Update Compel documentation for textual inversions by @pdoane in #3663
  • [UniDiffuser test] fix one test so that it runs correctly on V100 by @sayakpaul in #3675
  • [docs] More API fixes by @stevhliu in #3640
  • [WIP]Vae preprocessor refactor (PR1) by @yiyixuxu in #3557
  • small tweaks for parsing thibaudz controlnet checkpoints by @williamberman in #3657
  • move activation dispatches into helper function by @williamberman in #3656
  • [docs] Fix link to loader method by @stevhliu in #3680
  • Add function to remove monkey-patch for text encoder LoRA by @takuma104 in #3649
  • [LoRA] feat: add lora attention processor for pt 2.0. by @sayakpaul in #3594
  • refactor Image processor for x4 upscaler by @yiyixuxu in #3692
  • feat: when using PT 2.0 use LoRAAttnProcessor2_0 for text enc LoRA. by @sayakpaul in #3691
  • Fix the Kandinsky docstring examples by @freespirit in #3695
  • Support views batch for panorama by @Isotr0py in #3632
  • Fix from_ckpt for Stable Diffusion 2.x by @ctrysbita in #3662
  • Add draft for lora text encoder scale by @patrickvonplaten in #3626

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @nipunjindal
    • [2064]: Add stochastic sampler (sampledpmppsde) (#3020)
    • [Stochastic Sampler][Slow Test]: Cuda test fixes (#3257)
  • @clarencechen
    • Diffedit Zero-Shot Inpainting Pipeline (#2837)
    • [Scheduler] DPM-Solver (++) Inverse Scheduler (#3335)
  • @Markus-Pobitzer
    • Add Stable Diffusion RePaint to community pipelines (#3320)
  • @takuma104
    • Support ControlNet v1.1 shuffle properly (#3340)
    • Fix to apply LoRAXFormersAttnProcessor instead of LoRAAttnProcessor when xFormers is enabled (#3556)
    • Support Kohya-ss style LoRA file format (in a limited capacity) (#3437)
    • Add function to remove monkey-patch for text encoder LoRA (#3649)
  • @asfiyab-nvidia
    • add stable diffusion tensorrt img2img pipeline (#3419)
    • add Stable Diffusion TensorRT Inpainting pipeline (#3642)
  • @Snailpong
    • [Docs] Korean translation (optimization, training) (#3488)
  • @okotaku
    • [Community] reference only control (#3435)
    • [Community] ControlNet Reference (#3508)
    • [Community, Enhancement] Add reference tricks in README (#3589)
  • @Birch-san
    • Support for cross-attention bias / mask (#2634)
  • @yingjie-han
    • [Community Pipelines]Accelerate inference of stable diffusion by IPEX on CPU (#3105)
  • @dg845
    • [WIP] Add UniDiffuser model and pipeline (#2963)
  • @kadirnar
    • [docs] update the broken links (#3577)
    • [Community] Support StableDiffusionTilingPipeline (#3586)
    • [Community Doc] Updated the filename and readme file. (#3634)
  • @TheDenk
    • [Community] CLIP Guided Images Mixing with Stable DIffusion Pipeline (#3587)
  • @prathikr
    • update code to reflect latest changes as of May 30th (#3616)

- Python
Published by patrickvonplaten over 2 years ago

diffusers - Patch Release: v0.16.1

v0.16.1: Patch Release to fix IF naming, community pipeline versioning, and to allow disable VAE PT 2 attention

  • merge conflict by @apolinario (direct commit on v0.16.1-patch)
  • Fix community pipelines by @patrickvonplaten in #3266
  • Allow disabling torch 2_0 attention by @patrickvonplaten in #3273

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - v0.16.0 DeepFloyd IF & ControlNet v1.1

DeepFloyd's IF: The open-sourced Imagen

IF

IF is a pixel-based text-to-image generation model and was released in late April 2023 by DeepFloyd.

The model architecture is strongly inspired by Google's closed-sourced Imagen and a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding:

nabla (1)

Installation

pip install torch --upgrade # diffusers' IF is optimized for torch 2.0 pip install diffusers --upgrade

Accept the License

Before you can use IF, you need to accept its usage conditions. To do so:

  1. Make sure to have a Hugging Face account and be logged in
  2. Accept the license on the model card of DeepFloyd/IF-I-XL-v1.0
  3. Log-in locally

```py from huggingface_hub import login

login() ```

and enter your Hugging Face Hub access token.

Code example

```py from diffusers import DiffusionPipeline from diffusers.utils import pttopil import torch

stage 1

stage1 = DiffusionPipeline.frompretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", torchdtype=torch.float16) stage1.enablemodelcpu_offload()

stage 2

stage2 = DiffusionPipeline.frompretrained( "DeepFloyd/IF-II-L-v1.0", textencoder=None, variant="fp16", torchdtype=torch.float16 ) stage2.enablemodelcpuoffload()

stage 3

safetymodules = { "featureextractor": stage1.featureextractor, "safetychecker": stage1.safetychecker, "watermarker": stage1.watermarker, } stage3 = DiffusionPipeline.frompretrained( "stabilityai/stable-diffusion-x4-upscaler", **safetymodules, torchdtype=torch.float16 ) stage3.enablemodelcpuoffload()

prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"' generator = torch.manual_seed(1)

text embeds

promptembeds, negativeembeds = stage1.encodeprompt(prompt)

stage 1

image = stage1( promptembeds=promptembeds, negativepromptembeds=negativeembeds, generator=generator, outputtype="pt" ).images pttopil(image)[0].save("./ifstage_I.png") ```

```py

stage 2

image = stage2( image=image, promptembeds=promptembeds, negativepromptembeds=negativeembeds, generator=generator, outputtype="pt", ).images pttopil(image)[0].save("./ifstage_II.png") ```

```py

stage 3

image = stage3(prompt=prompt, image=image, noiselevel=100, generator=generator).images image[0].save("./ifstageIII.png") ```

For more details about speed and memory optimizations, please have a look at the blog or docs below.

Useful links

:pointright: The official codebase :pointright: Blog post :pointright: Space Demo :pointright: In-detail docs

ControlNet v1.1

Lvmin Zhang has released improved ControlNet checkpoints as well as a couple of new ones.

You can find all :firecracker: Diffusers checkpoints here Please have a look directly at the model cards on how to use the checkpoins:

Improved checkpoints:

| Model Name | Control Image Overview| Control Image Example | Generated Image Example | |---|---|---|---| |lllyasviel/controlv11psd15_canny
Trained with canny edge detection | A monochrome image with white edges on a black background.||| |lllyasviel/controlv11psd15_mlsd
Trained with multi-level line segment detection | An image with annotated line segments.||| |lllyasviel/controlv11f1psd15_depth
Trained with depth estimation | An image with depth information, usually represented as a grayscale image.||| |lllyasviel/controlv11psd15_normalbae
Trained with surface normal estimation | An image with surface normal information, usually represented as a color-coded image.||| |lllyasviel/controlv11psd15_seg
Trained with image segmentation | An image with segmented regions, usually represented as a color-coded image.||| |lllyasviel/controlv11psd15_lineart
Trained with line art generation | An image with line art, usually black lines on a white background.||| |lllyasviel/controlv11psd15_openpose
Trained with human pose estimation | An image with human poses, usually represented as a set of keypoints or skeletons.||| |lllyasviel/controlv11psd15_scribble
Trained with scribble-based image generation | An image with scribbles, usually random or user-drawn strokes.||| |lllyasviel/controlv11psd15_softedge
Trained with soft edge image generation | An image with soft edges, usually to create a more painterly or artistic effect.|||

New checkpoints:

| Model Name | Control Image Overview| Control Image Example | Generated Image Example | |---|---|---|---| |lllyasviel/controlv11esd15_ip2p
Trained with pixel to pixel instruction | No condition .||| |lllyasviel/controlv11psd15_inpaint
Trained with image inpainting | No condition.||| |lllyasviel/controlv11esd15_shuffle
Trained with image shuffling | An image with shuffled patches or regions.||| |lllyasviel/controlv11psd15s2lineartanime
Trained with anime line art generation | An image with anime-style line art.|||  

All commits

  • [Tests] Speed up panorama tests by @sayakpaul in #3067
  • [Post release] v0.16.0dev by @patrickvonplaten in #3072
  • Adds profiling flags, computes train metrics average. by @andsteing in #3053
  • [Pipelines] Make sure that None functions are correctly not saved by @patrickvonplaten in #3080
  • doc string example remove from_pt by @yiyixuxu in #3083
  • [Tests] parallelize by @patrickvonplaten in #3078
  • Throw deprecation warning for returncachedfolder by @patrickvonplaten in #3092
  • Allow SD attend and excite pipeline to work with any size output images by @jcoffland in #2835
  • [docs] Update community pipeline docs by @stevhliu in #2989
  • Add to support Guess Mode for StableDiffusionControlnetPipleline by @takuma104 in #2998
  • fix default value for attend-and-excite by @yiyixuxu in #3099
  • remvoe one line as requested by gc team by @yiyixuxu in #3077
  • ddpm custom timesteps by @williamberman in #3007
  • Fix breaking change in pipeline_stable_diffusion_controlnet.py by @remorses in #3118
  • Add global pooling to controlnet by @patrickvonplaten in #3121
  • [Bug fix] Fix img2img processor with safety checker by @patrickvonplaten in #3127
  • [Bug fix] Make sure correct timesteps are chosen for img2img by @patrickvonplaten in #3128
  • Improve deprecation warnings by @patrickvonplaten in #3131
  • Fix config deprecation by @patrickvonplaten in #3129
  • feat: verfication of multi-gpu support for select examples. by @sayakpaul in #3126
  • speed up attend-and-excite fast tests by @yiyixuxu in #3079
  • Optimize logvalidation in traincontrolnet_flax by @cgarciae in #3110
  • make style by @patrickvonplaten (direct commit on main)
  • Correct textual inversion readme by @patrickvonplaten in #3145
  • Add unet act fn to other model components by @williamberman in #3136
  • class labels timestep embeddings projection dtype cast by @williamberman in #3137
  • [ckpt loader] Allow loading the Inpaint and Img2Img pipelines, while loading a ckpt model by @cmdr2 in #2705
  • add from_ckpt method as Mixin by @1lint in #2318
  • Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils by @asfiyab-nvidia in #2974
  • Correct Transformer2DModel.forward docstring by @off99555 in #3074
  • Update pipelinestablediffusioninpaintlegacy.py by @hwuebben in #2903
  • Modified altdiffusion pipline to support altdiffusion-m18 by @superhero-7 in #2993
  • controlnet training resize inputs to multiple of 8 by @williamberman in #3135
  • adding custom diffusion training to diffusers examples by @nupurkmr9 in #3031
  • Update custom_diffusion.mdx by @mishig25 in #3165
  • Added distillation for quantization example on textual inversion. by @XinyuYe-Intel in #2760
  • make style by @patrickvonplaten (direct commit on main)
  • Merge branch 'main' of https://github.com/huggingface/diffusers by @patrickvonplaten (direct commit on main)
  • Update Noise Autocorrelation Loss Function for Pix2PixZero Pipeline by @clarencechen in #2942
  • [DreamBooth] add text encoder LoRA support in the DreamBooth training script by @sayakpaul in #3130
  • Update Habana Gaudi documentation by @regisss in #3169
  • Add model offload to x4 upscaler by @patrickvonplaten in #3187
  • [docs] Deterministic algorithms by @stevhliu in #3172
  • Update custom_diffusion.mdx to credit the author by @sayakpaul in #3163
  • Fix TensorRT community pipeline device set function by @asfiyab-nvidia in #3157
  • make from_flax work for controlnet by @yiyixuxu in #3161
  • [docs] Clarify training args by @stevhliu in #3146
  • Multi Vector Textual Inversion by @patrickvonplaten in #3144
  • Add Karras sigmas to HeunDiscreteScheduler by @youssefadr in #3160
  • [AudioLDM] Fix dtype of returned waveform by @sanchit-gandhi in #3189
  • Fix bug in traindreamboothlora by @crywang in #3183
  • [Community Pipelines] Update lpwstablediffusion pipeline by @SkyTNT in #3197
  • Make sure VAE attention works with Torch 2_0 by @patrickvonplaten in #3200
  • Revert "[Community Pipelines] Update lpwstablediffusion pipeline" by @williamberman in #3201
  • [Bug fix] Fix batch size attention head size mismatch by @patrickvonplaten in #3214
  • fix mixed precision training on traindreamboothinpaint_lora by @themrzmaster in #3138
  • adding enablevaetiling and disablevaetiling functions by @init-22 in #3225
  • Add ControlNet v1.1 docs by @patrickvonplaten in #3226
  • Fix issue in maybeconvertprompt by @pdoane in #3188
  • Sync cache version check from transformers by @ychfan in #3179
  • Fix docs text inversion by @patrickvonplaten in #3166
  • add model by @patrickvonplaten in #3230
  • Allow return pt x4 by @patrickvonplaten in #3236
  • Allow fp16 attn for x4 upscaler by @patrickvonplaten in #3239
  • fix fast test by @patrickvonplaten in #3241
  • Adds a document on token merging by @sayakpaul in #3208
  • [AudioLDM] Update docs to use updated ckpt by @sanchit-gandhi in #3240
  • Release: v0.16.0 by @patrickvonplaten (direct commit on main)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @1lint
    • add from_ckpt method as Mixin (#2318)
  • @asfiyab-nvidia
    • Add TensorRT SD/txt2img Community Pipeline to diffusers along with TensorRT utils (#2974)
    • Fix TensorRT community pipeline device set function (#3157)
  • @nupurkmr9
    • adding custom diffusion training to diffusers examples (#3031)
  • @XinyuYe-Intel
    • Added distillation for quantization example on textual inversion. (#2760)
  • @SkyTNT
    • [Community Pipelines] Update lpwstablediffusion pipeline (#3197)

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - v0.15.1: Patch Release to fix safety checker, config access and uneven scheduler

Fixes bugs related to missing global pooling in controlnet, img2img processor issue with safety checker, uneven timesteps and better config deprecation

  • [Bug fix] Add global pooling to controlnet by @patrickvonplaten in #3121
  • [Bug fix] Fix img2img processor with safety checker by @patrickvonplaten in #3127
  • [Bug fix] Make sure correct timesteps are chosen for img2img by @patrickvonplaten in #3128
  • [Bug fix] Fix config deprecation by @patrickvonplaten in #3129

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - v0.15.0 Beyond Image Generation

Taking Diffusers Beyond Image Generation

We are very excited about this release! It brings new pipelines for video and audio to diffusers, showing that diffusion is a great choice for all sorts of generative tasks. The modular, pluggable approach of diffusers was crucial to integrate the new models intuitively and cohesively with the rest of the library. We hope you appreciate the consistency of the APIs and implementations, as our ultimate goal is to provide the best toolbox to help you solve the tasks you're interested in. Don't hesitate to get in touch if you use diffusers for other projects!

In addition to that, diffusers 0.15 includes a lot of new features and improvements. From performance and deployment improvements (faster pipeline loading) to increased flexibility for creative tasks (Karras sigmas, weight prompting, support for Automatic1111 textual inversion embeddings) to additional customization options (Multi-ControlNet) to training utilities (ControlNet, Min-SNR weighting). Read on for the details!

🎬 Text-to-Video

Text-guided video generation is not a fantasy anymore - it's as simple as spinning up a colab and running any of the two powerful open-sourced video generation models.

Text-to-Video

Alibaba's DAMO Vision Intelligence Lab has open-sourced a first research-only video generation model that can generatae some powerful video clips of up to a minute. To see Darth Vader riding a wave simply copy-paste the following lines into your favorite Python interpreter:

```py import torch from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler from diffusers.utils import exporttovideo

pipe = DiffusionPipeline.frompretrained("damo-vilab/text-to-video-ms-1.7b", torchdtype=torch.float16, variant="fp16") pipe.scheduler = DPMSolverMultistepScheduler.fromconfig(pipe.scheduler.config) pipe.enablemodelcpuoffload()

prompt = "Spiderman is surfing" videoframes = pipe(prompt, numinferencesteps=25).frames videopath = exporttovideo(video_frames) ```

vader

For more information you can have a look at "damo-vilab/text-to-video-ms-1.7b"

Text-to-Video Zero

Text2Video-Zero is a zero-shot text-to-video synthesis diffusion model that enables low cost yet consistent video generation with only pre-trained text-to-image diffusion models using simple pre-trained stable diffusion models, such as Stable Diffusion v1-5. Text2Video-Zero also naturally supports cool extension works of pre-trained text-to-image models such as Instruct Pix2Pix, ControlNet and DreamBooth, and based on which we present Video Instruct Pix2Pix, Pose Conditional, Edge Conditional and, Edge Conditional and DreamBooth Specialized applications.

https://user-images.githubusercontent.com/23423619/231516176-813133f9-1216-4845-8b49-4e062610f12c.mp4

For more information please have a look at PAIR/Text2Video-Zero

🔉 Audio Generation

Text-guided audio generation has made great progress over the last months with many advances being based on diffusion models. The 0.15.0 release includes two powerful audio diffusion models.

AudioLDM

Inspired by Stable Diffusion, AudioLDM is a text-to-audio latent diffusion model (LDM) that learns continuous audio representations from CLAP latents. AudioLDM takes a text prompt as input and predicts the corresponding audio. It can generate text-conditional sound effects, human speech and music.

```python from diffusers import AudioLDMPipeline import torch

repoid = "cvssp/audioldm" pipe = AudioLDMPipeline.frompretrained(repoid, torchdtype=torch.float16) pipe = pipe.to("cuda")

prompt = "Techno music with a strong, upbeat tempo and high melodic riffs" audio = pipe(prompt, numinferencesteps=10, audiolengthin_s=5.0).audios[0] ```

The resulting audio output can be saved as a .wav file: ```python import scipy

scipy.io.wavfile.write("techno.wav", rate=16000, data=audio) ```

For more information see cvssp/audioldm

Spectrogram Diffusion

This model from the Magenta team is a MIDI to audio generator. The pipeline takes a MIDI file as input and autoregressively generates 5-sec spectrograms which are concated together in the end and decoded to audio via a Spectrogram decoder.

```python from diffusers import SpectrogramDiffusionPipeline, MidiProcessor

pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion") pipe = pipe.to("cuda") processor = MidiProcessor()

Download MIDI from: wget http://www.piano-midi.de/midis/beethoven/beethovenhammerklavier2.mid

output = pipe(processor("beethovenhammerklavier2.mid"))

audio = output.audios[0] ```

📗 New Docs

Documentation is crucially important for diffusers, as it's one of the first resources where people try to understand how everything works and fix any issues they are observing. We have spent a lot of time in this release reviewing all documents, adding new ones, reorganizing sections and bringing code examples up to date with the latest APIs. This effort has been led by @stevhliu (thanks a lot! 🙌) and @yiyixuxu, but many others have chimed in and contributed.

Check it out: https://huggingface.co/docs/diffusers/index

Don't hesitate to open PRs for fixes to the documentation, they are greatly appreciated as discussed in our (revised, of course) contribution guide.

Screenshot from 2023-04-12 18-08-35

🪄 Stable UnCLIP

Stable UnCLIP is the best open-sourced image variation model out there. Pass an initial image and optionally a prompt to generate variations of the image:

```py from diffusers import DiffusionPipeline from diffusers.utils import load_image import torch

pipe = DiffusionPipeline.frompretrained("stabilityai/stable-diffusion-2-1-unclip-small", torchdtype=torch.float16) pipe.to("cuda")

get image

url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/stableunclip/tarsiladoamaral.png" image = loadimage(url)

run image variation

image = pipe(image).images[0] ```

For more information you can have a look at "stabilityai/stable-diffusion-2-1-unclip"

https://user-images.githubusercontent.com/23423619/231513081-ace66d77-39d4-4064-bb20-2db2ce6b000a.mp4

🚀 More ControlNet

ControlNet was released in diffusers in version 0.14.0, but we have some exciting developments: Multi-ControlNet, a training script, and upcoming event and a community image-to-image pipeline contributed by @mikegarts!

Multi-ControlNet

Thanks to community member @takuma104, it's now possible to use several ControlNet conditioning models at once! It works with the same API as before, only supplying a list of ControlNets instead of just once:

```Python import torch from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

controlnetcanny = ControlNetModel.frompretrained("lllyasviel/sd-controlnet-canny", torchdtype=torch.float16).to("cuda") controlnetpose = ControlNetModel.frompretrained("lllyasviel/sd-controlnet-openpose", torchdtype=torch.float16).to("cuda")

pipe = StableDiffusionControlNetPipeline.frompretrained( "example/a-sd15-variant-model", torchdtype=torch.float16, controlnet=[controlnetpose, controlnetcanny] ).to("cuda")

poseimage = ... cannyimage = ... prompt = ...

image = pipe(prompt=prompt, image=[poseimage, cannyimage]).images[0] ```

And this is an example of how this affects generation:

|Control Image1|Control Image2|Generated| |---|---|---| |||| ||(none)|| ||(none)||

ControlNet Training

We have created a training script for ControlNet, and can't wait to see what new ideas the community may come up with! In fact, we are so pumped about it that we are organizing a JAX Diffusers sprint with a special focus on ControlNet, where participant teams will be assigned TPUs v4-8 to work on their projects :exploding_head:. Those are some mean machines, so make sure you join our discord to follow the event: https://discord.com/channels/879548962464493619/897387888663232554/1092751149217615902.

🐈‍⬛ Textual Inversion, Revisited

Several great contributors have been working on textual inversion to get the most of it. @isamu-isozaki made it possible to perform multitoken training, and @piEsposito & @GuiyeC created an easy way to load textual inversion embeddings. These contributors are always a pleasure to work with 🙌, we feel honored and proud of this community 🙏

Loading textual inversion embeddings is compatible with the Automatic1111 format, so you can download embeddings from other services (such as civitai), and easily apply them in diffusers. Please check the updated documentation for details.

🏃 Faster loading of cached pipelines

We conducted a thorough investigation of the pipeline loading process to make it as fast as possible. This is the before and after:

Previous: 2.27 sec Now: 1.1 sec

Instead of performing 3 HTTP operations, we now get all we need with just one. That single call is necessary to check whether any of the components in the pipeline were updated – if that's the case, then we need to download the new files. This improvement also applies when you load individual models instead of pre-trained pipelines.

This may not sound as much, but many people use diffusers for user-facing services where models and pipelines have to be reused on demand. By minimizing latency, they can provide a better service to their users and minimize operating costs.

This can be further reduced by forcing diffusers to just use the items on disk and never check for updates. This is not recommended for most users, but can be interesting in production environments.

🔩 Weight prompting using compel

Weight prompting is a popular method to increase the importance of some of the elements that appear in a text prompt, as a way to force image generation to obey to those concepts. Because diffusers is used in multitude of services and projects, we wanted to provide a very flexible way to adopt prompt weighting, so users can ultimately build the system they prefer. Our apprach was to:

  • Make the Stable Diffusion pipelines accept raw prompt embeddings. You are free to create the embeddings however you see fit, so users can come up with new ideas to express weighting in their projects.
  • At the same time, we adopted compel, by @damian0815, as a higher-level library to create the weighted embeddings.

You don't have to use compel to create the embeddings, but if you do, this is an example of how it looks in practice:

```Python from diffusers import StableDiffusionPipeline, UniPCMultistepScheduler from compel import Compel

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-4") pipe.scheduler = UniPCMultistepScheduler.fromconfig(pipe.scheduler.config)

compelproc = Compel(tokenizer=pipe.tokenizer, textencoder=pipe.textencoder) prompt = "a red cat playing with a ball++" promptembeds = compel_proc(prompt)

image = pipe(promptembeds=promptembeds, numinferencesteps=20).images[0] ```

img

As you can see, we assign more weight to the ball word using a compel-specific syntax (ball++). You can use other libraries (or your own) to create appropriate embeddings to pass to the pipeline.

You can read more details in the documentation.

🎲 Karras Sigmas for schedulers

Some diffusers schedulers now support Karras sigmas! Thanks @nipunjindal !

See Add Karras pattern to discrete euler in #2956 for more information.

All commits

  • Adding support for safetensors and LoRa. by @Narsil in #2448
  • [Post release] Push post release by @patrickvonplaten in #2546
  • Correct section docs by @patrickvonplaten in #2540
  • adds xformers support to train_unconditional.py by @vvvm23 in #2520
  • Bug Fix: Remove explicit message argument in deprecate by @alvanli in #2421
  • Update pipelinestablediffusioninpaintlegacy.py resize to integer multiple of 8 instead of 32 for init image and mask by @Laveraaa in #2350
  • move test numimagesper_prompt to pipeline mixin by @williamberman in #2488
  • Training tutorial by @stevhliu in #2473
  • Fix regression introduced in #2448 by @Narsil in #2551
  • Fix for InstructPix2PixPipeline to allow for prompt embeds to be passed in without prompts. by @DN6 in #2456
  • [PipelineTesterMixin] Handle non-image outputs for attn slicing test by @sanchit-gandhi in #2504
  • [Community Pipeline] Unclip Image Interpolation by @Abhinay1997 in #2400
  • Fix: controlnet docs format by @vicoooo26 in #2559
  • ema step, don't empty cuda cache by @williamberman in #2563
  • Add custom vae (diffusers type) to onnx converter by @ForserX in #2325
  • add OnnxStableDiffusionUpscalePipeline pipeline by @ssube in #2158
  • Support convert LoRA safetensors into diffusers format by @haofanwang in #2403
  • [Unet1d] correct docs by @patrickvonplaten in #2565
  • [Training] Fix tensorboard typo by @patrickvonplaten in #2566
  • allow Attend-and-excite pipeline work with different image sizes by @yiyixuxu in #2476
  • Allow textualinversionflax script to use save_steps and revision flag by @haixinxu in #2075
  • add intermediate logging for dreambooth training script by @yiyixuxu in #2557
  • community controlnet inpainting pipelines by @williamberman in #2561
  • [docs] Move relevant code for text2image to docs by @stevhliu in #2537
  • [docs] Move DreamBooth training materials to docs by @stevhliu in #2547
  • [docs] Move text-to-image LoRA training from blog to docs by @stevhliu in #2527
  • Update quicktour by @stevhliu in #2463
  • Support revision in Flax text-to-image training by @pcuenca in #2567
  • fix the default value of doc by @xiaohu2015 in #2539
  • Added multitoken training for textual inversion. Issue 369 by @isamu-isozaki in #661
  • [Docs]Fix invalid link to Pokemons dataset by @zxypro1 in #2583
  • [Docs] Weight prompting using compel by @patrickvonplaten in #2574
  • community stablediffusion controlnet img2img pipeline by @mikegarts in #2584
  • Improve dynamic thresholding and extend to DDPM and DDIM Schedulers by @clarencechen in #2528
  • [docs] Move Textual Inversion training examples to docs by @stevhliu in #2576
  • add deps table check updated to ci by @williamberman in #2590
  • Add notebook doc img2img by @yiyixuxu in #2472
  • [docs] Build notebooks from Markdown by @stevhliu in #2570
  • [Docs] Fix link to colab by @patrickvonplaten in #2604
  • [docs] Update unconditional image generation docs by @stevhliu in #2592
  • Add OpenVINO documentation by @echarlaix in #2569
  • Support LoRA for text encoder by @haofanwang in #2588
  • fix: un-existing tmp config file in linux, avoid unnecessary disk IO by @knoopx in #2591
  • Fixed incorrect width/height assignment in StableDiffusionDepth2ImgPi… by @antoche in #2558
  • add flax pipelines to api doc + doc string examples by @yiyixuxu in #2600
  • Fix typos by @standardAI in #2608
  • Migrate blog content to docs by @stevhliu in #2477
  • Add cache_dir to docs by @patrickvonplaten in #2624
  • Make sure that DEIS, DPM and UniPC can correctly be switched in & out by @patrickvonplaten in #2595
  • Revert "[docs] Build notebooks from Markdown" by @patrickvonplaten in #2625
  • Up vesion at which we deprecate "revision='fp16'" since transformers is not released yet by @patrickvonplaten in #2623
  • [Tests] Split scheduler tests by @patrickvonplaten in #2630
  • Improve ddim scheduler and fix bug when prediction type is "sample" by @PeterL1n in #2094
  • update paint by example docs by @williamberman in #2598
  • [From pretrained] Speed-up loading from cache by @patrickvonplaten in #2515
  • add translated docs by @LolitaSian in #2587
  • [Dreambooth] Editable number of class images by @Mr-Philo in #2251
  • Update quicktour.mdx by @standardAI in #2637
  • Update basic_training.mdx by @standardAI in #2639
  • controlnet sd 2.1 checkpoint conversions by @williamberman in #2593
  • [docs] Update readme by @stevhliu in #2612
  • [Pipeline loading] Remove send_telemetry by @patrickvonplaten in #2640
  • [docs] Build Jax notebooks for real by @stevhliu in #2641
  • Update loading.mdx by @standardAI in #2642
  • Support non square image generation for StableDiffusionSAGPipeline by @AkiSakurai in #2629
  • Update schedulers.mdx by @standardAI in #2647
  • [attention] Fix attention by @patrickvonplaten in #2656
  • Add support for Multi-ControlNet to StableDiffusionControlNetPipeline by @takuma104 in #2627
  • [Tests] Adds a test suite for EMAModel by @sayakpaul in #2530
  • fix the in-place modification in unet condition when using controlnet by @andrehuang in #2586
  • image generation main process checks by @williamberman in #2631
  • [Hub] Upgrade to 0.13.2 by @patrickvonplaten in #2670
  • AutoencoderKL: clamp indices of blendh and blendv to input size by @kig in #2660
  • Update README.md by @qwjaskzxl in #2653
  • [Lora] correct lora saving & loading by @patrickvonplaten in #2655
  • Add ddim noise comparative analysis pipeline by @aengusng8 in #2665
  • Add support for different model prediction types in DDIMInverseScheduler by @clarencechen in #2619
  • controlnet integration tests numinferencesteps=3 by @williamberman in #2672
  • Controlnet training by @Ttl in #2545
  • [Docs] Adds a documentation page for evaluating diffusion models by @sayakpaul in #2516
  • [Tests] fix: slow serialization test by @sayakpaul in #2678
  • Update Dockerfile CUDA by @patrickvonplaten in #2682
  • T5Attention support for cross-attention by @kashif in #2654
  • Update custompipelineoverview.mdx by @standardAI in #2684
  • Update kerascv.mdx by @standardAI in #2685
  • Update img2img.mdx by @standardAI in #2688
  • Update conditionalimagegeneration.mdx by @standardAI in #2687
  • Update controlling_generation.mdx by @standardAI in #2690
  • Update unconditionalimagegeneration.mdx by @standardAI in #2686
  • Add image_processor by @yiyixuxu in #2617
  • [docs] Add overviews to each section by @stevhliu in #2657
  • [docs] Create better navigation on index by @stevhliu in #2658
  • [docs] Reorganize table of contents by @stevhliu in #2671
  • Rename attention by @patrickvonplaten in #2691
  • Adding use_safetensors argument to give more control to users by @Narsil in #2123
  • [docs] Add safety checker to ethical guidelines by @stevhliu in #2699
  • train_unconditional save restore unet parameters by @williamberman in #2706
  • Improve deprecation error message when using cross_attention import by @patrickvonplaten in #2710
  • fix image link in inpaint doc by @yiyixuxu in #2693
  • [docs] Update ONNX doc to use optimum by @sayakpaul in #2702
  • Enabling gradient checkpointing for VAE by @Pie31415 in #2536
  • [Tests] Correct PT2 by @patrickvonplaten in #2724
  • Update mps.mdx by @standardAI in #2749
  • Update torch2.0.mdx by @standardAI in #2748
  • Update fp16.mdx by @standardAI in #2746
  • Update dreambooth.mdx by @standardAI in #2742
  • Update philosophy.mdx by @standardAI in #2752
  • Update text_inversion.mdx by @standardAI in #2751
  • add: controlnet entry to training section in the docs. by @sayakpaul in #2677
  • Update numbers for Habana Gaudi in documentation by @regisss in #2734
  • Improve Contribution Doc by @patrickvonplaten in #2043
  • Fix typos by @apivovarov in #2715
  • [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline by @nipunjindal in #2723
  • Add guidance start/end parameters to StableDiffusionControlNetImg2ImgPipeline by @hyowon-ha in #2731
  • Fix mps tests on torch 2.0 by @pcuenca in #2766
  • Add option to set dtype in pipeline.to() method by @1lint in #2317
  • stable diffusion depth batching fix by @williamberman in #2757
  • [docs] update torch 2 benchmark by @pcuenca in #2764
  • [docs] Clarify purpose of reproducibility docs by @stevhliu in #2756
  • [MS Text To Video] Add first text to video by @patrickvonplaten in #2738
  • mps: remove warmup passes by @pcuenca in #2771
  • Support for Offset Noise in examples by @haofanwang in #2753
  • add: section on multiple controlnets. by @sayakpaul in #2762
  • [Examples] InstructPix2Pix instruct training script by @sayakpaul in #2478
  • deduplicate training section in the docs. by @sayakpaul in #2788
  • [UNet3DModel] Fix with attn processor by @patrickvonplaten in #2790
  • [doc wip] literalinclude by @mishig25 in #2718
  • Rename 'CLIPFeatureExtractor' class to 'CLIPImageProcessor' by @ainoya in #2732
  • Music Spectrogram diffusion pipeline by @kashif in #1044
  • [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline by @nipunjindal in #2779
  • [Docs] small fixes to the text to video doc. by @sayakpaul in #2787
  • Update traintexttoimagelora.py by @haofanwang in #2767
  • Skip mps in text-to-video tests by @pcuenca in #2792
  • Flax controlnet by @yiyixuxu in #2727
  • [docs] Add Colab notebooks and Spaces by @stevhliu in #2713
  • Add AudioLDM by @sanchit-gandhi in #2232
  • Update traintexttoimagelora.py by @haofanwang in #2795
  • Add ModelEditing pipeline by @bahjat-kawar in #2721
  • Relax DiT test by @kashif in #2808
  • Update onnxruntime package candidates by @PeixuanZuo in #2666
  • [Stable UnCLIP] Finish Stable UnCLIP by @patrickvonplaten in #2814
  • [Docs] update docs (Stable unCLIP) to reflect the updated ckpts. by @sayakpaul in #2815
  • StableDiffusionModelEditingPipeline documentation by @bahjat-kawar in #2810
  • Update examples README.md to include the latest examples by @sayakpaul in #2839
  • Ruff: apply same rules as in transformers by @pcuenca in #2827
  • [Tests] Fix slow tests by @patrickvonplaten in #2846
  • Fix StableUnCLIPImg2ImgPipeline handling of explicitly passed image embeddings by @unishift in #2845
  • Helper function to disable custom attention processors by @pcuenca in #2791
  • improve stable unclip doc. by @sayakpaul in #2823
  • add: better warning messages when handling multiple conditionings. by @sayakpaul in #2804
  • [WIP]Flax training script for controlnet by @yiyixuxu in #2818
  • Make dynamo wrapped modules work with save_pretrained by @pcuenca in #2726
  • [Init] Make sure shape mismatches are caught early by @patrickvonplaten in #2847
  • updated onnx pndm test by @kashif in #2811
  • [Stable Diffusion] Allow users to disable Safety checker if loading model from checkpoint by @Stax124 in #2768
  • fix KarrasVePipeline bug by @junhsss in #2828
  • StableDiffusionLongPromptWeightingPipeline: Do not hardcode pad token by @AkiSakurai in #2832
  • Remove suggestion to use cuDNN benchmark in docs by @d1g1t in #2793
  • Remove duplicate sentence in docstrings by @qqaatw in #2834
  • Update the legacy inpainting SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2842
  • Fix link to LoRA training guide in DreamBooth training guide by @ushuz in #2836
  • [WIP][Docs] Use DiffusionPipeline Instead of Child Classes when Loading Pipeline by @dg845 in #2809
  • Add last_epoch argument to optimization.get_scheduler by @felixblanke in #2850
  • [WIP] Check UNet shapes in StableDiffusionInpaintPipeline init by @dg845 in #2853
  • [2761]: Add documentation for extrainchannels UNet1DModel by @nipunjindal in #2817
  • [Tests] Adds a test to check if image_embeds None case is handled properly in StableUnCLIPImg2ImgPipeline by @sayakpaul in #2861
  • Update evaluation.mdx by @standardAI in #2862
  • Update overview.mdx by @standardAI in #2864
  • Update alt_diffusion.mdx by @standardAI in #2865
  • Update paintbyexample.mdx by @standardAI in #2869
  • Update stablediffusionsafe.mdx by @standardAI in #2870
  • [Docs] Correct phrasing by @patrickvonplaten in #2873
  • [Examples] Add streaming support to the ControlNet training example in JAX by @sayakpaul in #2859
  • feat: allow offset_noise in dreambooth training example by @yamanahlawat in #2826
  • [docs] Performance tutorial by @stevhliu in #2773
  • [Docs] add an example use for StableUnCLIPPipeline in the pipeline docs by @sayakpaul in #2897
  • add flax requirement by @yiyixuxu in #2894
  • Support fp16 in conversion from original ckpt by @burgalon in #2733
  • img2img.multiple.controlnets.pipeline by @mikegarts in #2833
  • add load textual inversion embeddings to stable diffusion by @piEsposito in #2009
  • [docs] add the Stable diffusion with Jax/Flax Guide into the docs by @yiyixuxu in #2487
  • Add support Karras sigmas for StableDiffusionKDiffusionPipeline by @takuma104 in #2874
  • Fix textual inversion loading by @GuiyeC in #2914
  • Fix slow tests text inv by @patrickvonplaten in #2915
  • Fix check_inputs in upscaler pipeline to allow embeds by @d1g1t in #2892
  • Modify example with intel optimization by @mengfei25 in #2896
  • [2884]: Fix crossattentionkwargs in StableDiffusionImg2ImgPipeline by @nipunjindal in #2902
  • [Tests] Speed up test by @patrickvonplaten in #2919
  • Have fix current pipeline link by @guspan-tanadi in #2910
  • Update image_variation.mdx by @standardAI in #2911
  • Update controlnet.mdx by @standardAI in #2912
  • Update pipelinestablediffusion_controlnet.py by @patrickvonplaten in #2917
  • Check for all different packages of opencv by @wfng92 in #2901
  • fix: norm group test for UNet3D. by @sayakpaul in #2959
  • Update euler_ancestral.mdx by @standardAI in #2932
  • Update unipc.mdx by @standardAI in #2936
  • Update scoresdeve.mdx by @standardAI in #2937
  • Update scoresdevp.mdx by @standardAI in #2938
  • Update ddim.mdx by @standardAI in #2926
  • Update ddpm.mdx by @standardAI in #2929
  • Removing explicit markdown extension by @guspan-tanadi in #2944
  • Ensure validation image RGB not RGBA by @ernestchu in #2945
  • Use upload_folder in training scripts by @Wauplin in #2934
  • allow use custom local dataset for controlnet training scripts by @yiyixuxu in #2928
  • fix post-processing by @yiyixuxu in #2968
  • [docs] Simplify loading guide by @stevhliu in #2694
  • update flax controlnet training script by @yiyixuxu in #2951
  • [Pipeline download] Improve pipeline download for index and passed co… by @patrickvonplaten in #2980
  • The variable name has been updated. by @kadirnar in #2970
  • Update the K-Diffusion SD pipeline, to allow calling it with only prompt_embeds (instead of always requiring a prompt) by @cmdr2 in #2962
  • [Examples] Add support for Min-SNR weighting strategy for better convergence by @sayakpaul in #2899
  • [scheduler] fix some scheduler dtype error by @furry-potato-maker in #2992
  • minor fix in controlnet flax example by @yiyixuxu in #2986
  • Explain how to install test dependencies by @pcuenca in #2983
  • docs: Link Navigation Path API Pipelines by @guspan-tanadi in #2976
  • add Min-SNR loss to Controlnet flax train script by @yiyixuxu in #3016
  • dynamic threshold sampling bug fixes and docs by @williamberman in #3003
  • Initial draft of Core ML docs by @pcuenca in #2987
  • [Pipeline] Add TextToVideoZeroPipeline by @19and99 in #2954
  • Small typo correction in comments by @rogerioagjr in #3012
  • mps: skip unstable test by @pcuenca in #3037
  • Update contribution.mdx by @mishig25 in #3054
  • fix report tool by @patrickvonplaten in #3047
  • Fix config prints and save, load of pipelines by @patrickvonplaten in #2849
  • [docs] Reusing components by @stevhliu in #3000
  • Fix imports for composablestablediffusion pipeline by @nthh in #3002
  • config fixes by @williamberman in #3060
  • accelerate min version for ProjectConfiguration import by @williamberman in #3042
  • AttentionProcessor.group_norm numchannels should be `querydim` by @williamberman in #3046
  • Update documentation by @George-Ogden in #2996
  • Fix scheduler type mismatch by @pcuenca in #3041
  • Fix invocation of some slow Flax tests by @pcuenca in #3058
  • add only cross attention to simple attention blocks by @williamberman in #3011
  • Fix typo and format BasicTransformerBlock attributes by @off99555 in #2953
  • unet time embedding activation function by @williamberman in #3048
  • Attention processor cross attention norm group norm by @williamberman in #3021
  • Attn added kv processor torch 2.0 block by @williamberman in #3023
  • [Examples] Fix type-casting issue in the ControlNet training script by @sayakpaul in #2994
  • [LoRA] Enabling limited LoRA support for text encoder by @sayakpaul in #2918
  • fix slow tsets by @patrickvonplaten in #3066
  • Fix InstructPix2Pix training in multi-GPU mode by @sayakpaul in #2978
  • [Docs] update Self-Attention Guidance docs by @SusungHong in #2952
  • Flax memory efficient attention by @pcuenca in #2889
  • [WIP] implement rest of the test cases (LoRA tests) by @Pie31415 in #2824
  • fix pipeline setattr value == None by @williamberman in #3063
  • add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines by @ssube in #2597
  • [2064]: Add Karras to DPMSolverMultistepScheduler by @nipunjindal in #3001
  • Finish docs textual inversion by @patrickvonplaten in #3068
  • [Docs] refactor text-to-video zero by @sayakpaul in #3049
  • Update Flax TPU tests by @pcuenca in #3069
  • Fix a bug of pano when not doing CFG by @ernestchu in #3030
  • Text2video zero refinements by @19and99 in #3070

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @Abhinay1997
    • [Community Pipeline] Unclip Image Interpolation (#2400)
  • @ssube
    • add OnnxStableDiffusionUpscalePipeline pipeline (#2158)
    • add support for pre-calculated prompt embeds to Stable Diffusion ONNX pipelines (#2597)
  • @haofanwang
    • Support convert LoRA safetensors into diffusers format (#2403)
    • Support LoRA for text encoder (#2588)
    • Support for Offset Noise in examples (#2753)
    • Update traintexttoimagelora.py (#2767)
    • Update traintexttoimagelora.py (#2795)
  • @isamu-isozaki
    • Added multitoken training for textual inversion. Issue 369 (#661)
  • @mikegarts
    • community stablediffusion controlnet img2img pipeline (#2584)
    • img2img.multiple.controlnets.pipeline (#2833)
  • @LolitaSian
    • add translated docs (#2587)
  • @Ttl
    • Controlnet training (#2545)
  • @nipunjindal
    • [1929]: Add CLIP guidance for Img2Img stable diffusion pipeline (#2723)
    • [2737]: Add DPMSolverMultistepScheduler to CLIP guided community pipeline (#2779)
    • [2761]: Add documentation for extrainchannels UNet1DModel (#2817)
    • [2884]: Fix crossattentionkwargs in StableDiffusionImg2ImgPipeline (#2902)
    • [2905]: Add Karras pattern to discrete euler (#2956)
    • [2064]: Add Karras to DPMSolverMultistepScheduler (#3001)
  • @bahjat-kawar
    • Add ModelEditing pipeline (#2721)
    • StableDiffusionModelEditingPipeline documentation (#2810)
  • @piEsposito
    • add load textual inversion embeddings to stable diffusion (#2009)
  • @19and99
    • [Pipeline] Add TextToVideoZeroPipeline (#2954)
    • Text2video zero refinements (#3070)
  • @MuhHanif
    • Flax memory efficient attention (#2889)

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - ControlNet, 8K VAE decoding

:rocket: ControlNet comes to 🧨 Diffusers!

Thanks to an amazing collaboration with community member @takuma104 🙌, diffusers fully supports ControlNet! All 8 control models from the paper are available for you to use: depth, scribbles, edges, and more. Best of all is that you can take advantage of all the other goodies and optimizations that Diffusers provides out of the box, making this an ultra fast implementation of ControlNet. Take it for a spin to see for yourself.

ControlNet works by training a copy of some of the layers of the original Stable Diffusion model on additional signals, such as depth maps or scribbles. After training, you can provide a depth map as a strong hint of the composition you want to achieve, and have Stable Diffusion fill in the details for you. For example:

Before After

Currently, there are 8 published control models, all of which were trained on runwayml/stable-diffusion-v1-5 (i.e., Stable Diffusion version 1.5). This is an example that uses the scribble controlnet model:

Before After

Or you can turn a cartoon into a realistic photo with incredible coherence:

ControlNet showing a photo generated from a cartoon frame

How do you use ControlNet in diffusers? Just like this (example for the canny edges control model):

```Python from diffusers import StableDiffusionControlNetPipeline, ControlNetModel import torch

controlnet = ControlNetModel.frompretrained("lllyasviel/sd-controlnet-canny", torchdtype=torch.float16) pipe = StableDiffusionControlNetPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torchdtype=torch.float16 ) ```

As usual, you can use all the features in the diffusers toolbox: super-fast schedulers, memory-efficient attention, model offloading, etc. We think 🧨 Diffusers is the best way to iterate on your ControlNet experiments!

Please, refer to our blog post and documentation for details.

(And, coming soon, ControlNet training – stay tuned!)

:diamondshapewithadot_inside: VAE tiling for ultra-high resolution generation

Another community member, @kig, conceived, proposed and fully implemented an amazing PR that allows generation of ultra-high resolution images without memory blowing up 🤯. They follow a tiling approach during the image decoding phase of the process, generating a piece of the image at a time and then stitching them all together. Tiles are blended carefully to avoid visible seems between them, and the final result is amazing. This is the additional code you need to use to enjoy high-resolution generations:

Python pipe.vae.enable_tiling()

That's it!

For a complete example, refer to the PR or the code snippet we reproduce here for your convenience:

```Python import torch from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torchdtype=torch.float16) pipe = pipe.to("cuda") pipe.enablexformersmemoryefficientattention() pipe.vae.enable_tiling()

prompt = "a beautiful landscape photo" image = pipe(prompt, width=4096, height=2048, numinferencesteps=10).images[0]

image.save("4k_landscape.jpg") ```

All commits

  • [Docs] Add a note on SDEdit by @sayakpaul in #2433
  • small bugfix at StableDiffusionDepth2ImgPipeline call to check_inputs and batch size calculation by @mikegarts in #2423
  • add demo by @yiyixuxu in #2436
  • fix: code snippet of instruct pix2pix from the docs. by @sayakpaul in #2446
  • Update traintexttoimagelora.py by @haofanwang in #2464
  • mps test fixes by @pcuenca in #2470
  • Fix test train_unconditional by @pcuenca in #2481
  • add MultiDiffusion to controlling generation by @omerbt in #2490
  • imagenoiser -> imagenormalizer comment by @williamberman in #2496
  • [Safetensors] Make sure metadata is saved by @patrickvonplaten in #2506
  • Add 4090 benchmark (PyTorch 2.0) by @pcuenca in #2503
  • [Docs] Improve safetensors by @patrickvonplaten in #2508
  • Disable ONNX tests by @patrickvonplaten in #2509
  • attend and excite batch test causing timeouts by @williamberman in #2498
  • move pipeline based test skips out of pipeline mixin by @williamberman in #2486
  • pix2pix tests no write to fs by @williamberman in #2497
  • [Docs] Include more information in the "controlling generation" doc by @sayakpaul in #2434
  • Use "hub" directory for cache instead of "diffusers" by @pcuenca in #2005
  • Sequential cpu offload: require accelerate 0.14.0 by @pcuenca in #2517
  • issafetensorscompatible refactor by @williamberman in #2499
  • [Copyright] 2023 by @patrickvonplaten in #2524
  • Bring Flax attention naming in sync with PyTorch by @pcuenca in #2511
  • [Tests] Fix slow tests by @patrickvonplaten in #2526
  • PipelineTesterMixin parameter configuration refactor by @williamberman in #2502
  • Add a ControlNet model & pipeline by @takuma104 in #2407
  • 8k Stable Diffusion with tiled VAE by @kig in #1441
  • Textual inv make save log both steps by @isamu-isozaki in #2178
  • Fix convert SD to diffusers error by @fkunn1326 in #1979)
  • Small fixes for controlnet by @patrickvonplaten in #2542
  • Fix ONNX checkpoint loading by @anton-l in #2544
  • [Model offload] Add nice warning by @patrickvonplaten in #2543

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @takuma104
    • Add a ControlNet model & pipeline (#2407)

New Contributors

  • @mikegarts made their first contribution in https://github.com/huggingface/diffusers/pull/2423
  • @fkunn1326 made their first contribution in https://github.com/huggingface/diffusers/pull/2529

Full Changelog: https://github.com/huggingface/diffusers/compare/v0.13.0...v0.14.0

- Python
Published by patrickvonplaten almost 3 years ago

diffusers - v0.13.1: Patch Release to fix warning when loading from `revision="fp16"`

  • fix transformers naming by @patrickvonplaten in #2430
  • remove author names. by @sayakpaul in #2428
  • Fix deprecation warning by @patrickvonplaten in #2426
  • fix the get_indices function by @yiyixuxu in #2418
  • Update pipeline_utils.py by @haofanwang in #2415

- Python
Published by patrickvonplaten about 3 years ago

diffusers - Controllable Generation: Pix2Pix0, Attend and Excite, SEGA, SAG, ...

:dart: Controlling Generation

There has been much recent work on fine-grained control of diffusion networks!

Diffusers now supports: 1. Instruct Pix2Pix 2. Pix2Pix 0, more details in docs 3. Attend and excite, more details in docs 4. Semantic guidance, more details in docs 5. Self-attention guidance, more details in docs 6. Depth2image 7. MultiDiffusion panorama, more details in docs

See our doc on controlling image generation and the individual pipeline docs for more details on the individual methods.

:up: Latent Upscaler

Latent Upscaler is a diffusion model that is designed explicitly for Stable Diffusion. You can take the generated latent from Stable Diffusion and pass it into the upscaler before decoding with your standard VAE. Or you can take any image, encode it into the latent space, use the upscaler, and decode it. It is incredibly flexible and can work with any SD checkpoints. Original output image | 2x upscaled output image :-------------------------:|:-------------------------: |

The model was developed by Katherine Crowson in collaboration with Stability AI ```python from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline import torch

pipeline = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-4", torchdtype=torch.float16) pipeline.to("cuda")

upscaler = StableDiffusionLatentUpscalePipeline.frompretrained("stabilityai/sd-x2-latent-upscaler", torchdtype=torch.float16) upscaler.to("cuda")

prompt = "a photo of an astronaut high resolution, unreal engine, ultra realistic" generator = torch.manual_seed(33)

we stay in latent space! Let's make sure that Stable Diffusion returns the image

in latent space

lowreslatents = pipeline(prompt, generator=generator, output_type="latent").images

upscaledimage = upscaler( prompt=prompt, image=lowreslatents, numinferencesteps=20, guidancescale=0, generator=generator, ).images[0]

Let's save the upscaled image under "upscaled_astronaut.png"

upscaledimage.save("astronaut1024.png")

as a comparison: Let's also save the low-res image

with torch.nograd(): image = pipeline.decodelatents(lowreslatents) image = pipeline.numpytopil(image)[0]

image.save("astronaut_512.png")

```

:zap: Optimization

In addition to new features and an increasing number of pipelines, diffusers cares a lot about performance. This release brings a number of optimizations that you can turn on easily.

xFormers

Memory efficient attention, as implemented by xFormers, has been available in diffusers for some time. The problem was that installing xFormers could be complicated because there were no official pip wheels (or they were outdated), and you had to resort to installing from source.

From xFormers 0.0.16, official pip wheels are now published with every release, so installing and using xFormers is now as simple as these two steps:

  1. pip install xformers in your terminal.
  2. pipe.enable_xformers_memory_efficient_attention() in your code to opt-in in your pipelines.

These actions will unlock dramatic memory savings, and usually faster inference too!

See more details in the documentation.

Torch 2.0

Speaking of memory-efficient attention, Accelerated PyTorch 2.0 Transformers now comes with built-in native support for it! When PyTorch 2.0 is released you'll no longer have to install xFormers or any third-party package to take advantage of it. In diffusers we are already preparing for that, and it works out of the box. So, if you happen to be using the latest "nightlies" of PyTorch 2.0 beta, then you're all set – diffusers will use Accelerated PyTorch 2.0 Transformers by default.

In our tests, the built-in PyTorch 2.0 implementation is usually as fast as xFormers', and sometimes even faster. Performance depends on the card you are using and whether you run your code in float16 or float32, so check our documentation for details.

Coarse-grained CPU offload

Community member @keturn, with whom we have enjoyed thoughtful software design conversations, called our attention to the fact that enabling sequential cpu offloading via enable_sequential_cpu_offload worked great to save a lot of memory, but made inference much slower.

This is because enable_sequential_cpu_offload() is optimized for memory, and it recursively works across all the submodules contained in a model, moving them to GPU when they are needed and back to CPU when another submodule needs to run. These cpu-to-gpu-to-cpu transfers happen hundreds of times during the stable diffusion denoising loops, because the UNet runs multiple times and it consists of several PyTorch modules.

This release of diffusers introduces a coarser enable_model_cpu_offload() pipeline API, which copies whole models (not modules) to GPU and makes sure they stay there until another model needs to run. The consequences are: - Less memory savings than enable_sequential_cpu_offload, but: - Almost as fast inference as when the pipeline is used without any type of offloading.

Pix2Pix Zero

Remember the CycleGAN days where one would turn a horse into a zebra in an image while keeping the rest of the content almost untouched? Well, that day has arrived but in the context of Diffusion. Pix2Pix Zero allows users to edit a particular image (be it real or generated), targeting a source concept (horse, for example) and replacing it with a target concept (zebra, for example).

Input image | Edited image :-------------------------:|:-------------------------: original | edited

Pix2Pix Zero was proposed in Zero-shot Image-to-Image Translation. The StableDiffusionPix2PixZeroPipeline allows you to

  1. Edit an image generated from an input prompt
  2. Provide an input image and edit it

For the latter, it uses the newly introduced DDIMInverseScheduler to first obtain the inverted noise from the input image and use that in the subsequent generation process.

Both of the use cases leverage the idea of "edit directions", used for steering the generation toward the target concept gradually from the source concept. To know more, we recommend checking out the official documentation.

Attend and excite

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models. Attend-and-Excite, guides the generative model to modify the cross-attention values during the image synthesis process to generate images that more faithfully depict the input text prompt. It allows creating images that are more semantically faithful with respect to the input text prompts. Thanks to community contributor @evinpinar for leading the charge to add this pipeline!

  • Attend and excite 2 by @evinpinar @yiyixuxu #2369

Semantic guidance

Semantic Guidance for Diffusion Models was proposed in SEGA: Instructing Diffusion using Semantic Dimensions and provides strong semantic control over image generation. Small changes to the text prompt usually result in entirely different output images. However, with SEGA, a variety of changes to the image are enabled that can be controlled easily and intuitively and stay true to the original image composition. Thanks to the lead author of SEFA, Manuel (@manuelbrack), who added the pipeline in #2223.

Here is a simple demo:

```py import torch from diffusers import SemanticStableDiffusionPipeline

pipe = SemanticStableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5", torchdtype=torch.float16) pipe = pipe.to("cuda")

out = pipe( prompt="a photo of the face of a woman", numimagesperprompt=1, guidancescale=7, editingprompt=[ "smiling, smile", # Concepts to apply "glasses, wearing glasses", "curls, wavy hair, curly hair", "beard, full beard, mustache", ], reverseeditingdirection=[False, False, False, False], # Direction of guidance i.e. increase all concepts editwarmupsteps=[10, 10, 10, 10], # Warmup period for each concept editguidancescale=[4, 5, 5, 5.4], # Guidance scale for each concept editthreshold=[ 0.99, 0.975, 0.925, 0.96, ], # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions editmomentumscale=0.3, # Momentum scale that will be added to the latent guidance editmombeta=0.6, # Momentum beta edit_weights=[1, 1, 1, 1, 1], # Weights of the individual concepts against each other ) ```

Self-attention guidance

SAG was proposed in Improving Sample Quality of Diffusion Models Using Self-Attention Guidance. SAG works by extracting the intermediate attention map from a diffusion model at every iteration and selects tokens above a certain attention score for masking and blurring to obtain a partially blurred input. Then, the dissimilarity is measured between the predicted noise outputs obtained from feeding the blurred and original input to the diffusion model and this is further leveraged as guidance. With this guidance, the authors observe apparent improvements in a wide range of diffusion models.

```python import torch from diffusers import StableDiffusionSAGPipeline from accelerate.utils import set_seed

pipe = StableDiffusionSAGPipeline.frompretrained("CompVis/stable-diffusion-v1-4", torchdtype=torch.float16) pipe = pipe.to("cuda")

seed = 8978 prompt = "." guidancescale = 7.5 numimagesperprompt = 1

sag_scale = 1.0

setseed(seed) images = pipe( prompt, numimagesperprompt=numimagesperprompt, guidancescale=guidancescale, sagscale=sag_scale ).images images[0].save("example.png") ```

SAG was contributed by @SusungHong (lead author of SAG) in https://github.com/huggingface/diffusers/pull/2193.

MultiDiffusion panorama

Proposed in MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation, it presents a new generation process, "MultiDiffusion", based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints.

```python import torch from diffusers import StableDiffusionPanoramaPipeline, DDIMScheduler

modelckpt = "stabilityai/stable-diffusion-2-base" scheduler = DDIMScheduler.frompretrained(modelckpt, subfolder="scheduler") pipe = StableDiffusionPanoramaPipeline.frompretrained(modelckpt, scheduler=scheduler, torchdtype=torch.float16)

pipe = pipe.to("cuda")

prompt = "a photo of the dolomites" image = pipe(prompt).images[0] image.save("dolomites.png") ```

The pipeline was contributed by @omerbt (lead author of MultiDiffusion Panorama) and @sayakpaul in #2393.

Ethical Guidelines

Diffusers is no stranger to the different opinions and perspectives about the challenges that generative technologies bring. Thanks to @giadilli, we have drafted our first Diffusers' Ethical Guidelines with which we hope to initiate a fruitful conversation with the community.

Keras Integration

Many practitioners find it easy to fine-tune the Stable Diffusion models shipped by KerasCV. At the same time, diffusers provides a lot of options for inference, deployment and optimization. We have made it possible to easily import and use KerasCV Stable Diffusion checkpoints in diffusers, read more about the process in our new guide.

:clock3: UniPC scheduler

UniPC is a new fast scheduler in diffusion town! UniPC is a training-free framework designed for the fast sampling of diffusion models, which consists of a corrector (UniC) and a predictor (UniP) that share a unified analytical form and support arbitrary orders. The orginal codebase can be found here. Thanks to @wl-zhao for the great work and integrating UniPC into the diffusers!

  • add the UniPC scheduler by @wl-zhao in #2373

:runner: Training: consistent EMA support

As part of 0.13.0 we improved the support for EMA in training. We added a common EMAModel in diffusers.training_utils which can be used by all scripts. The EMAModel is improved to support distributed training, new methods to easily evaluate the EMA model during training and a consistent way to save and load the EMA model similar to other models in diffusers.

  • Fix EMA for multi-gpu training in the unconditional example by @anton-l, @patil-suraj #1930
  • [Utils] Adds store() and restore() methods to EMAModel by @sayakpaul #2302
  • Use accelerate save & loading hooks to have better checkpoint structure by @patrickvonplaten #2048

:dog: Ruff & black

We have replaced flake8 with ruff (much faster), and updated our version of black. These tools are now in sync with the ones used in transformers, so the contributing experience is now more consistent for people using both codebases :)

All commits

  • [lora] Fix bug with training without validation by @orenwang in #2106
  • [Bump version] 0.13.0dev0 & Deprecate predict_epsilon by @patrickvonplaten in #2109
  • [dreambooth] check the low-precision guard before preparing model by @patil-suraj in #2102
  • [textual inversion] Allow validation images by @pcuenca in #2077
  • Allow UNet2DModel to use arbitrary class embeddings by @pcuenca in #2080
  • make scaling factor a config arg of vae/vqvae by @patil-suraj in #1860
  • [Import Utils] Fix naming by @patrickvonplaten in #2118
  • Fix unable to save_pretrained when using pathlib by @Cyberes in #1972
  • fuse attention mask by @williamberman in #2111
  • Fix model card of LoRA by @hysts in #2114
  • [nit] torch_dtype used twice in doc string by @williamberman in #2126
  • [LoRA] Make sure LoRA can be disabled after it's run by @patrickvonplaten in #2128
  • remove redundant allow_patterns by @williamberman in #2130
  • Allow lora from pipeline by @patrickvonplaten in #2129
  • Fix typos in loaders.py by @kuotient in #2137
  • Typo fix: torwards -> towards by @RahulBhalley in #2134
  • Don't call the Hub if local_files_only is specifiied by @patrickvonplaten in #2119
  • [from_pretrained] only load config one time by @williamberman in #2131
  • Adding some safetensors docs. by @Narsil in #2122
  • Fix typo by @pcuenca in #2138
  • fix typo in EMAModel's loadstatedict() by @dasayan05 in #2151
  • [diffusers-cli] Fix typo in accelerate and transformers versions by @pcuenca in #2154
  • [Design philosopy] Create official doc by @patrickvonplaten in #2140
  • Section on using LoRA alpha / scale by @pcuenca in #2139
  • Don't copy when unwrapping model by @pcuenca in #2166
  • Add instance prompt to model card of lora dreambooth example by @hysts in #2112
  • [Bug]: fix DDPM scheduler arbitrary infer steps count. by @dudulightricks in #2076
  • [examples] Fix CLI argument in the launch script command for text2image with LoRA by @sayakpaul in #2171
  • [Breaking change] fix legacy inpaint noise and resize mask tensor by @1lint in #2147
  • Use requests instead of wget in convert_from_ckpt.py by @Abhishek-Varma in #2168
  • [Docs] Add components to docs by @patrickvonplaten in #2175
  • [Docs] remove license by @patrickvonplaten in #2188
  • Pass LoRA rank to LoRALinearLayer by @asadm in #2191
  • add: guide on kerascv conversion tool. by @sayakpaul in #2169
  • Fix a dimension bug in Transform2d by @lmxyy in #2144
  • [Loading] Better error message on missing keys by @patrickvonplaten in #2198
  • Update xFormers docs by @pcuenca in #2208
  • add CITATION.cff by @kashif in #2211
  • Create traindreamboothinpaint_lora.py by @thedarkzeno in #2205
  • Docs: short section on changing the scheduler in Flax by @pcuenca in #2181
  • [Bug] schedulingddpm: fix variance in the case of learnedrange type. by @dudulightricks in #2090
  • refactor onnxruntime integration by @prathikr in #2042
  • Fix timestep dtype in legacy inpaint by @dymil in #2120
  • [nit] negative_prompt typo by @williamberman in #2227
  • removes ~s in favor of full-fledged links. by @sayakpaul in #2229
  • [LoRA] Make sure validation works in multi GPU setup by @patrickvonplaten in #2172
  • fix: flagged_images implementation by @justinmerrell in #1947
  • Hotfix textual inv logging by @isamu-isozaki in #2183
  • Fixes LoRAXFormersCrossAttnProcessor by @jorgemcgomes in #2207
  • Fix typo in StableDiffusionInpaintPipeline by @hutec in #2197
  • [Flax DDPM] Make key optional so default pipelines don't fail by @pcuenca in #2176
  • Show error when loading safetychecker `fromflax` by @pcuenca in #2187
  • Fix kdpm2 & kdpm2_a on MPS by @psychedelicious in #2241
  • Fix a typo: bfloa16 -> bfloat16 by @nickkolok in #2243
  • Mention training problems with xFormers 0.0.16 by @pcuenca in #2254
  • fix distributed init twice by @Fazziekey in #2252
  • Fixes prompt input checks in StableDiffusion img2img pipeline by @jorgemcgomes in #2206
  • Create convertvaepttodiffusers.py by @chavinlo in #2215
  • Stable Diffusion Latent Upscaler by @yiyixuxu in #2059
  • [Examples] Remove datasets important that is not needed by @patrickvonplaten in #2267
  • Make center crop and random flip as args for unconditional image generation by @wfng92 in #2259
  • [Tests] Fix slow tests by @patrickvonplaten in #2271
  • Fix torchvision.transforms and transforms function naming clash by @wfng92 in #2274
  • mps cross-attention hack: don't crash on fp16 by @pcuenca in #2258
  • Use accelerate save & loading hooks to have better checkpoint structure by @patrickvonplaten in #2048
  • Replace flake8 with ruff and update black by @patrickvonplaten in #2279
  • Textual inv save log memory by @isamu-isozaki in #2184
  • EMA: fix state_dict() and load_state_dict() & add cur_decay_value by @chenguolin in #2146
  • [Examples] Test all examples on CPU by @patrickvonplaten in #2289
  • fix pix2pix docs by @patrickvonplaten in #2290
  • misc fixes by @williamberman in #2282
  • Run same number of DDPM steps in inference as training by @bencevans in #2263
  • [LoRA] Freezing the model weights by @erkams in #2245
  • Fast CPU tests should also run on main by @patrickvonplaten in #2313
  • Correct fast tests by @patrickvonplaten in #2314
  • remove ddpm testfullinference by @williamberman in #2291
  • convert ckpt script docstring fixes by @williamberman in #2293
  • [Community Pipeline] UnCLIP Text Interpolation Pipeline by @Abhinay1997 in #2257
  • [Tests] Refactor push tests by @patrickvonplaten in #2329
  • Add ethical guidelines by @giadilli in #2330
  • Fix running LoRA with xformers by @bddppq in #2286
  • Fix typo in loadpipelinefromoriginalstablediffusionckpt() method by @p1atdev in #2320
  • [Docs] Fix ethical guidelines docs by @patrickvonplaten in #2333
  • [Versatile Diffusion] Fix tests by @patrickvonplaten in #2336
  • [Latent Upscaling] Remove unused noise by @patrickvonplaten in #2298
  • [Tests] Remove unnecessary tests by @patrickvonplaten in #2337
  • karlo image variation use kakaobrain upload by @williamberman in #2338
  • github issue forum link by @williamberman in #2335
  • dreambooth checkpointing tests and docs by @williamberman in #2339
  • unet check length inputs by @williamberman in #2327
  • unCLIP variant by @williamberman in #2297
  • Log Unconditional Image Generation Samples to W&B by @bencevans in #2287
  • Fix callback type hints - no optional function argument by @patrickvonplaten in #2357
  • [Docs] initial docs about KarrasDiffusionSchedulers by @kashif in #2349
  • KarrasDiffusionSchedulers type note by @williamberman in #2365
  • [Tests] Add MPS skip decorator by @patrickvonplaten in #2362
  • Funky spacing issue by @meg-huggingface in #2368
  • schedulers add glide noising schedule by @williamberman in #2347
  • add total number checkpoints to training scripts by @williamberman in #2367
  • checkpointingstepstotallimit->checkpointstotal_limit by @williamberman in #2374
  • Fix 3-way merging with the checkpoint_merger community pipeline by @damian0815 in #2355
  • [Variant] Add "variant" as input kwarg so to have better UX when downloading no_ema or fp16 weights by @patrickvonplaten in #2305
  • [Pipelines] Adds pix2pix zero by @sayakpaul in #2334
  • Add Self-Attention-Guided (SAG) Stable Diffusion pipeline by @SusungHong in #2193
  • [SchedulingPNDM ] reset curmodeloutput after each call by @patil-suraj in #2376
  • traintextto_image EMAModel saving by @williamberman in #2341
  • [Utils] Adds store() and restore() methods to EMAModel by @sayakpaul in #2302
  • enable_model_cpu_offload by @pcuenca in #2285
  • add the UniPC scheduler by @wl-zhao in #2373
  • Replace torch.concat calls by torch.cat by @fxmarty in #2378
  • Make diffusers importable with transformers < 4.26 by @pcuenca in #2380
  • [Examples] Make sure EMA works with any device by @patrickvonplaten in #2382
  • [Dummy imports] Add missing if else statements for SD] by @patrickvonplaten in #2381
  • Attend and excite 2 by @yiyixuxu in #2369
  • [Pix2Pix0] Add utility function to get edit vector by @patrickvonplaten in #2383
  • Revert "[Pix2Pix0] Add utility function to get edit vector" by @patrickvonplaten in #2384
  • Fix stable diffusion onnx pipeline error when batch_size > 1 by @tianleiwu in #2366
  • [Docs] Fix UniPC docs by @wl-zhao in #2386
  • [Pix2Pix Zero] Fix slow tests by @sayakpaul in #2391
  • [Pix2Pix] Add utility function by @patrickvonplaten in #2385
  • Fix UniPC tests and remove some test warnings by @pcuenca in #2396
  • [Pipelines] Add a section on generating captions and embeddings for Pix2Pix Zero by @sayakpaul in #2395
  • Torch2.0 scaleddotproduct_attention processor by @patil-suraj in #2303
  • add: inversion to pix2pix zero docs. by @sayakpaul in #2398
  • Add semantic guidance pipeline by @manuelbrack in #2223
  • Add ddim inversion pix2pix by @patrickvonplaten in #2397
  • add MultiDiffusionPanorama pipeline by @omerbt in #2393
  • Fixing typos in documentation by @anagri in #2389
  • controlling generation docs by @williamberman in #2388
  • applyforwardhook simply returns if no accelerate by @daquexian in #2387
  • Revert "Release: v0.13.0" by @williamberman in #2405
  • controlling generation doc nits by @williamberman in #2406
  • Fix typo in AttnProcessor2_0 symbol by @pcuenca in #2404
  • add index page by @yiyixuxu in #2401
  • add xformers 0.0.16 warning message by @williamberman in #2345

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @thedarkzeno
    • Create traindreamboothinpaint_lora.py (#2205)
  • @prathikr
    • refactor onnxruntime integration (#2042)
  • @Abhinay1997
    • [Community Pipeline] UnCLIP Text Interpolation Pipeline (#2257)
  • @SusungHong
    • Add Self-Attention-Guided (SAG) Stable Diffusion pipeline (#2193)
  • @wl-zhao
    • add the UniPC scheduler (#2373)
    • [Docs] Fix UniPC docs (#2386)
  • @manuelbrack
    • Add semantic guidance pipeline (#2223)
  • @omerbt
    • add MultiDiffusionPanorama pipeline (#2393)

- Python
Published by patrickvonplaten about 3 years ago

diffusers - v0.12.1: Patch Release to fix local files only

Make sure cached models can be loaded in offline mode.

  • Don't call the Hub if local_files_only is specifiied by @patrickvonplaten in #2119

- Python
Published by patrickvonplaten about 3 years ago

diffusers - Instruct-Pix2Pix, DiT, LoRA

🪄 Instruct-Pix2Pix

Instruct-Pix2Pix is a Stable Diffusion model fine-tuned for editing images from human instructions. Given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.

image

The model was released with the paper InstructPix2Pix: Learning to Follow Image Editing Instructions. More information about the model can be found in the paper.

pip install diffusers transformers safetensors accelerate

```python import PIL import requests import torch from diffusers import StableDiffusionInstructPix2PixPipeline

modelid = "timbrooks/instruct-pix2pix" pipe = StableDiffusionInstructPix2PixPipeline.frompretrained(modelid, torchdtype=torch.float16).to("cuda")

url = "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png" def downloadimage(url): image = PIL.Image.open(requests.get(url, stream=True).raw) image = PIL.ImageOps.exiftranspose(image) image = image.convert("RGB") return image image = download_image(url)

prompt = "make the mountains snowy" edit = pipe(prompt, image=image, numinferencesteps=20, imageguidancescale=1.5, guidancescale=7).images[0] images[0].save("snowymountains.png") ``` * Add InstructPix2Pix pipeline by @patil-suraj #2040

🤖 DiT

Diffusion Transformers (DiTs) is a class conditional latent diffusion model which replaces the commonly used U-Net backbone with a transformer operating on latent patches. The pretrained model is trained on the ImageNet-1K dataset and is able to generate class conditional images of 256x256 or 512x512 pixels.

dit

The model was released with the paper Scalable Diffusion Models with Transformers.

```python import torch from diffusers import DiTPipeline

modelid = "facebook/DiT-XL-2-256" pipe = DiTPipeline.frompretrained(modelid, torchdtype=torch.float16).to("cuda")

pick words that exist in ImageNet

words = ["white shark", "umbrella"] classids = pipe.getlabel_ids(words)

output = pipe(classlabels=classids) image = output.images[0] # label 'white shark' ```

⚡ LoRA

LoRA is a technique for performing parameter-efficient fine-tuning for large models. LoRA works by adding so-called "update matrices" to specific blocks of a pre-trained model. During fine-tuning, only these update matrices are updated while the pre-trained model parameters are kept frozen. This allows us to achieve greater memory efficiency as well as easier portability during fine-tuning.

LoRA was proposed in LoRA: Low-Rank Adaptation of Large Language Models. In the original paper, the authors investigated LoRA for fine-tuning large language models like GPT-3. cloneofsimo was the first to try out LoRA training for Stable Diffusion in the popular lora GitHub repository.

Diffusers now supports LoRA! This means you can now fine-tune a model like Stable Diffusion using consumer GPUs like Tesla T4 or RTX 2080 Ti. LoRA support was added to UNet2DConditionModel and DreamBooth training script by @patrickvonplaten in #1884.

By using LoRA, the fine-tuned checkpoints will be just 3 MBs in size. After fine-tuning, you can use the LoRA checkpoints like so:

```py from diffusers import StableDiffusionPipeline import torch

modelpath = "sayakpaul/sd-model-finetuned-lora-t4" pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-4", torchdtype=torch.float16) pipe.unet.loadattnprocs(modelpath) pipe.to("cuda")

prompt = "A pokemon with blue eyes." image = pipe(prompt, numinferencesteps=30, guidance_scale=7.5).images[0] image.save("pokemon.png") ```

pokemon-image

You can follow these resources to know more about how to use LoRA in diffusers:

📐 Customizable Cross Attention

LoRA leverages a new method to customize the cross attention layers deep in the UNet. This can be useful for other creative approaches such as Prompt-to-Prompt, and it makes it easier to apply optimizers like xFormers. This new "attention processor" abstraction was created by @patrickvonplaten in #1639 after discussing the design with the community, and we have used it to rewrite our xFormers and attention slicing implementations!

🌿 Flax => PyTorch

A long requested feature, prolific community member @camenduru took up the gauntlet in #1900 and created a way to convert Flax model weights for PyTorch. This means that you can train or fine-tune models super fast using Google TPUs, and then convert the weights to PyTorch for everybody to use. Thanks @camenduru!

🌀 Flax Img2Img

Another community member, @dhruvrnaik, ported the image-to-image pipeline to Flax in #1355! Using a TPU v2-8 (available in Colab's free tier), you can generate 8 images at once in a few seconds!

🎲 DEIS Scheduler

DEIS (Diffusion Exponential Integrator Sampler) is a new fast mult step scheduler that can generate high-quality samples in fewer steps. The scheduler was introduced in the paper Fast Sampling of Diffusion Models with Exponential Integrator. More information about the scheduler can be found in the paper.

```python from diffusers import StableDiffusionPipeline, DEISMultistepScheduler import torch

pipe = StableDiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5", torchdtype=torch.float16) pipe.scheduler = DEISMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars" generator = torch.Generator(device="cuda").manualseed(0) image = pipe(prompt, generator=generator, numinference_steps=25).images[0 ``` * feat : add log-rho deis multistep scheduler by @qsh-zh #1432

Reproducibility

One can now pass CPU generators to all pipelines even if the pipeline is on GPU. This ensures much better reproducibility across GPU hardware:

```python import torch from diffusers import DDIMPipeline import numpy as np

model_id = "google/ddpm-cifar10-32"

load model and scheduler

ddim = DDIMPipeline.frompretrained(modelid) ddim.to("cuda")

create a generator for reproducibility

generator = torch.manual_seed(0)

run pipeline for just two steps and return numpy tensor

image = ddim(numinferencesteps=2, output_type="np", generator=generator).images print(np.abs(image).sum()) ```

See: #1902 and https://huggingface.co/docs/diffusers/using-diffusers/reproducibility

Important New Guides

  • Stable Diffusion 101: https://huggingface.co/docs/diffusers/stable_diffusion
  • Reproducibility: https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
  • LoRA: https://huggingface.co/docs/diffusers/training/lora

Important Bug Fixes

  • Don't download safetensors if library is not installed: #2057
  • Make sure that save_pretrained(...) doesn't accidentally delete files: #2038
  • Fix CPU offload docs for maximum memory gain: #1968
  • Fix conversion for exotically sorted weight names: #1959
  • Fix intermediate checkpointing for textual inversion, thanks @lstein #2072

All commits

  • update composable diffusion for an updated diffuser library by @nanlliu in #1697
  • [Tests] Fix UnCLIP cpu offload tests by @anton-l in #1769
  • Bump to 0.12.0.dev0 by @anton-l in #1771
  • [Dreambooth] flax fixes by @pcuenca in #1765
  • update trainunconditionalort.py by @prathikr in #1775
  • Only test for xformers when enabling them #1773 by @kig in #1776
  • expose polynomial:power and cosinewithrestarts:num_cycles params by @zetyquickly in #1737
  • [Flax] Stateless schedulers, fixes and refactors by @skirsten in #1661
  • Correct hf hub download by @patrickvonplaten in #1767
  • Dreambooth docs: minor fixes by @pcuenca in #1758
  • Fix num images per prompt unclip by @patil-suraj in #1787
  • Add Flax stable diffusion img2img pipeline by @dhruvrnaik in #1355
  • Refactor cross attention and allow mechanism to tweak cross attention function by @patrickvonplaten in #1639
  • Fix OOM when using PyTorch with JAX installed. by @pcuenca in #1795
  • reorder model wrap + bug fix by @prathikr in #1799
  • Remove hardcoded names from PT scripts by @patrickvonplaten in #1778
  • [textualinversion] unwrapmodel text encoder before accessing weights by @patil-suraj in #1816
  • fix small mistake in annotation: 32 -> 64 by @Line290 in #1780
  • Make safety_checker optional in more pipelines by @pcuenca in #1796
  • Device to use (e.g. cpu, cuda:0, cuda:1, etc.) by @camenduru in #1844
  • Avoid duplicating PyTorch + safetensors downloads. by @pcuenca in #1836
  • Width was typod as weight by @Helw150 in #1800
  • fix: resize transform now preserves aspect ratio by @parlance-zz in #1804
  • Make xformers optional even if it is available by @kn in #1753
  • Allow selecting precision to make Dreambooth class images by @kabachuha in #1832
  • unCLIP image variation by @williamberman in #1781
  • [Community Pipeline] MagicMix by @daspartho in #1839
  • [Versatile Diffusion] Fix crossattentionkwargs by @patrickvonplaten in #1849
  • [Dtype] Align dtype casting behavior with Transformers and Accelerate by @patrickvonplaten in #1725
  • [StableDiffusionInpaint] Correct test by @patrickvonplaten in #1859
  • [textual inversion] add gradient checkpointing and small fixes. by @patil-suraj in #1848
  • Flax: Fix img2img and align with other pipeline by @skirsten in #1824
  • Make repo structure consistent by @patrickvonplaten in #1862
  • [Unclip] Make sure textembeddings & imageembeddings can directly be passed to enable interpolation tasks. by @patrickvonplaten in #1858
  • Fix ema decay by @pcuenca in #1868
  • [Docs] Improve docs by @patrickvonplaten in #1870
  • [examples] update loss computation by @patil-suraj in #1861
  • [traintextto_image] allow using non-ema weights for training by @patil-suraj in #1834
  • [Attention] Finish refactor attention file by @patrickvonplaten in #1879
  • Fix typo in traindreamboothinpaint by @pcuenca in #1885
  • Update ONNX Pipelines to use np.float64 instead of np.float by @agizmo in #1789
  • [examples] misc fixes by @patil-suraj in #1886
  • Fixes to the help for report_to in training scripts by @pcuenca in #1888
  • updated doc for stable diffusion pipelines by @yiyixuxu in #1770
  • Add UnCLIPImageVariationPipeline to dummy imports by @anton-l in #1897
  • Add accelerate and xformers versions to diffusers-cli env by @anton-l in #1898
  • [addresses issue #1642] add add_noise to scheduling-sde-ve by @aengusng8 in #1827
  • Add condtional generation to AudioDiffusionPipeline by @teticio in #1826
  • Fixes in comments in SD2 D2I by @neverix in #1903
  • [Deterministic torch randn] Allow tensors to be generated on CPU by @patrickvonplaten in #1902
  • [Docs] Remove duplicated API doc string by @patrickvonplaten in #1901
  • fix: DDPMScheduler.set_timesteps() by @Joqsan in #1912
  • Fix --resumefromcheckpoint step in traintextto_image.py by @merfnad in #1914
  • Support training SD V2 with Flax by @yasyf in #1783
  • Fix lr-scaling storetrue & default=True cli argument for textualinversion training. by @aredden in #1090
  • Various Fixes for Flax Dreambooth by @yasyf in #1782
  • Test ResnetBlock2D by @hchings in #1850
  • Init for korean docs by @seriousran in #1910
  • New Pipeline: Tiled-upscaling with depth perception to avoid blurry spots by @peterwilli in #1615
  • Improve reproduceability 2/3 by @patrickvonplaten in #1906
  • feat : add log-rho deis multistep scheduler by @qsh-zh in #1432
  • Feature/colossalai by @Fazziekey in #1793
  • [Docs] Add TRANSLATING.md file by @seriousran in #1920
  • [StableDiffusionimg2img] validating input type by @Shubhamai in #1913
  • [dreambooth] low precision guard by @williamberman in #1916
  • [Stable Diffusion Guide] 101 Stable Diffusion Guide directly into the docs by @patrickvonplaten in #1927
  • [Conversion] Make sure ema weights are extracted correctly by @patrickvonplaten in #1937
  • fix path to logo by @vvssttkk in #1939
  • Add automatic doc sorting by @patrickvonplaten in #1940
  • update to latest colossalai by @Fazziekey in #1951
  • fix typo in imagicstablediffusion.py by @andreemic in #1956
  • [Conversion SD] Make sure weirdly sorted keys work as well by @patrickvonplaten in #1959
  • allow loading ddpm models into ddim by @patrickvonplaten in #1932
  • [Community] Correct checkpoint merger by @patrickvonplaten in #1965
  • Update CLIPGuidedStableDiffusion.feature_extractor.size to fix TypeError by @oxidase in #1938
  • [CPU offload] correct cpu offload by @patrickvonplaten in #1968
  • [Docs] Update README.md by @haofanwang in #1960
  • Research project multi subject dreambooth by @klopsahlong in #1948
  • Example tests by @patrickvonplaten in #1982
  • Fix slow tests by @patrickvonplaten in #1983
  • Fix unused upcastattn flag in convertoriginalstablediffusiontodiffusers script by @kn in #1942
  • Allow converting Flax to PyTorch by adding a "from_flax" keyword by @camenduru in #1900
  • Update docstring by @Warvito in #1971
  • [SD Img2Img] resize source images to multiple of 8 instead of 32 by @vvsotnikov in #1571
  • Update README.md to include our blog post by @sayakpaul in #1998
  • Fix a couple typos in Dreambooth readme by @pcuenca in #2004
  • Add tests for 2D UNet blocks by @hchings in #1945
  • [Conversion] Support convert diffusers to safetensors by @hua1995116 in #1996
  • [Community] Fix merger by @patrickvonplaten in #2006
  • [Conversion] Improve safetensors by @patrickvonplaten in #1989
  • [Black] Update black library by @patrickvonplaten in #2007
  • Fix typos in ColossalAI example by @haofanwang in #2001
  • Use pipeline tests mixin for UnCLIP pipeline tests + unCLIP MPS fixes by @williamberman in #1908
  • Change PNDMPipeline to use PNDMScheduler by @willdalh in #2003
  • [train_unconditional] fix LR scheduler init by @patil-suraj in #2010
  • [Docs] No more autocast by @patrickvonplaten in #2021
  • [Flax] Add Flax inpainting impl by @xvjiarui in #1966
  • Check k-diffusion version is at least 0.0.12 by @pcuenca in #2022
  • DiT Pipeline by @kashif in #1806
  • fix dit doc header by @patil-suraj in #2027
  • [LoRA] Add LoRA training script by @patrickvonplaten in #1884
  • [Dit] Fix dit tests by @patrickvonplaten in #2034
  • Fix typos and minor redundancies by @Joqsan in #2029
  • [Lora] Model card by @patrickvonplaten in #2032
  • [Save Pretrained] Remove dead code lines that can accidentally remove pytorch files by @patrickvonplaten in #2038
  • Fix EMA for multi-gpu training in the unconditional example by @anton-l in #1930
  • Minor fix in the documentation of LoRA by @hysts in #2045
  • Add InstructPix2Pix pipeline by @patil-suraj in #2040
  • Create repo before cloning in examples by @Wauplin in #2047
  • Remove modelcards dependency by @Wauplin in #2050
  • Module-ise "original stable diffusion to diffusers" conversion script by @damian0815 in #2019
  • [StableDiffusionInstructPix2Pix] use cpu generator in slow tests by @patil-suraj in #2051
  • [From pretrained] Don't download .safetensors files if safetensors is… by @patrickvonplaten in #2057
  • Correct Pix2Pix example by @patrickvonplaten in #2056
  • add community pipeline: StableUnCLIPPipeline by @budui in #2037
  • [LoRA] Adds example on text2image fine-tuning with LoRA by @sayakpaul in #2031
  • Safetensors loading in "convertdiffuserstooriginalstable_diffusion" by @cafeai in #2054
  • [examples] add dataloadernumworkers argument by @patil-suraj in #2070
  • Dreambooth: reduce VRAM usage by @gleb-akhmerov in #2039
  • [Paint by example] Fix cpu offload for paint by example by @patrickvonplaten in #2062
  • [textual_inversion] Fix resuming state when using gradient checkpointing by @pcuenca in #2072
  • [lora] Log images when using tensorboard by @pcuenca in #2078
  • Fix resume epoch for all training scripts except textual_inversion by @pcuenca in #2079
  • [dreambooth] fix multi on gpu. by @patil-suraj in #2088
  • Run inference on a specific condition and fix call of manual_seed() by @shirayu in #2074
  • [Feat] checkpoint_merger works on local models as well as ones that use safetensors by @lstein in #2060
  • xFormers attention op arg by @takuma104 in #2049
  • [docs] [dreambooth] note random crop by @williamberman in #2085
  • Remove wandb from texttoimage requirements.txt by @pcuenca in #2092
  • [doc] update example for pix2pix by @patil-suraj in #2101
  • Add lora tag to the model tags by @apolinario in #2103
  • [docs] Adds a doc on LoRA support for diffusers by @sayakpaul in #2086
  • Allow directly passing text embeddings to Stable Diffusion Pipeline for prompt weighting by @patrickvonplaten in #2071
  • Improve transformers versions handling by @patrickvonplaten in #2104
  • Reproducibility 3/3 by @patrickvonplaten in #1924

🙌 Significant community contributions 🙌

The following contributors have made significant changes to the library over the last release:

  • @nanlliu
    • update composable diffusion for an updated diffuser library (#1697)
  • @skirsten
    • [Flax] Stateless schedulers, fixes and refactors (#1661)
    • Flax: Fix img2img and align with other pipeline (#1824)
  • @hchings
    • Test ResnetBlock2D (#1850)
    • Add tests for 2D UNet blocks (#1945)
  • @seriousran
    • Init for korean docs (#1910)
    • [Docs] Add TRANSLATING.md file (#1920)
  • @qsh-zh
    • feat : add log-rho deis multistep scheduler (#1432)
  • @Fazziekey
    • Feature/colossalai (#1793)
    • update to latest colossalai (#1951)
  • @klopsahlong
    • Research project multi subject dreambooth (#1948)
  • @xvjiarui
    • [Flax] Add Flax inpainting impl (#1966)
  • @damian0815
    • Module-ise "original stable diffusion to diffusers" conversion script (#2019)
  • @camenduru
    • Allow converting Flax to PyTorch by adding a "from_flax" keyword (#1900)

- Python
Published by patrickvonplaten about 3 years ago

diffusers - v0.11.1: Patch release

This patch release fixes a bug with num_images_per_prompt in the UnCLIPPipeline * Fix num images per prompt unclip by @patil-suraj in #1787

- Python
Published by anton-l about 3 years ago

diffusers - v0.11.0: Karlo UnCLIP, safetensors, pipeline versions

:magic_wand: Karlo UnCLIP by Kakao Brain

Karlo is a text-conditional image generation model based on OpenAI's unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details in a small number of denoising steps.

This alpha version of Karlo is trained on 115M image-text pairs, including COYO-100M high-quality subset, CC3M, and CC12M. For more information about the architecture, see the Karlo repository: https://github.com/kakaobrain/karlo image

pip install diffusers transformers safetensors accelerate

```python import torch from diffusers import UnCLIPPipeline

pipe = UnCLIPPipeline.frompretrained("kakaobrain/karlo-v1-alpha", torchdtype=torch.float16) pipe = pipe.to("cuda")

prompt = "a high-resolution photograph of a big red frog on a green leaf." image = pipe(prompt).images[0] ```

img

:octocat: Community pipeline versioning

The community pipelines hosted in diffusers/examples/community will now follow the installed version of the library.

E.g. if you have diffusers==0.9.0 installed, the pipelines from the v0.9.0 branch will be used: https://github.com/huggingface/diffusers/tree/v0.9.0/examples/community

If you've installed diffusers from source, e.g. with pip install git+https://github.com/huggingface/diffusers then the latest versions of the pipelines will be fetched from the main branch.

To change the custom pipeline version, set the custom_revision variable like so: python pipeline = DiffusionPipeline.from_pretrained( "google/ddpm-cifar10-32", custom_pipeline="one_step_unet", custom_revision="0.10.2" )

:safety_vest: safetensors

Many of the most important checkpoints now have https://github.com/huggingface/safetensors available. Upon installing safetensors with:

pip install safetensors

You will see a nice speed-up when loading your model :rocket:

Some of the most improtant checkpoints have safetensor weights added now: - https://huggingface.co/stabilityai/stable-diffusion-2 - https://huggingface.co/stabilityai/stable-diffusion-2-1 - https://huggingface.co/stabilityai/stable-diffusion-2-depth - https://huggingface.co/stabilityai/stable-diffusion-2-inpainting

Batched generation bug fixes :bug:

  • Make sure all pipelines can run with batched input by @patrickvonplaten in #1669

We fixed a lot of bugs for batched generation. All pipelines should now correctly process batches of prompts and images :hugs: Also we made it much easier to tweak images with reproducible seeds: https://huggingface.co/docs/diffusers/using-diffusers/reusing_seeds

:memo: Changelog

  • Remove spurious arg in training scripts by @pcuenca in #1644
  • dreambooth: fix #1566: maintain fp32 wrapper when saving a checkpoint to avoid crash when running fp16 by @timh in #1618
  • Allow k pipeline to generate > 1 images by @pcuenca in #1645
  • Remove unnecessary offset in img2img by @patrickvonplaten in #1653
  • Remove unnecessary kwargs in depth2img by @maruel in #1648
  • Add text encoder conversion by @lawfordp2017 in #1559
  • VersatileDiffusion: fix input processing by @LukasStruppek in #1568
  • tensor format ort bug fix by @prathikr in #1557
  • Deprecate init image correctly by @patrickvonplaten in #1649
  • fix bug if we don't doclassifierfree_guidance by @MKFMIKU in #1601
  • Handle missing globalstep key in scripts/convertoriginalstablediffusiontodiffusers.py by @Cyberes in #1612
  • [SD] Make sure scheduler is correct when converting by @patrickvonplaten in #1667
  • [Textual Inversion] Do not update other embeddings by @patrickvonplaten in #1665
  • Added Community pipeline for comparing Stable Diffusion v1.1-4 checkpoints by @suvadityamuk in #1584
  • Fix wrong type checking in convert_diffusers_to_original_stable_diffusion.py by @apolinario in #1681
  • [Version] Bump to 0.11.0.dev0 by @patrickvonplaten in #1682
  • Dreambooth: save / restore training state by @pcuenca in #1668
  • Disable telemetry when DISABLE_TELEMETRY is set by @w4ffl35 in #1686
  • Change one-step dummy pipeline for testing by @patrickvonplaten in #1690
  • [Community pipeline] Add github mechanism by @patrickvonplaten in #1680
  • Dreambooth: use warnings instead of logger in parse_args() by @pcuenca in #1688
  • manually update trainunconditionalort by @prathikr in #1694
  • Remove all local telemetry by @anton-l in #1702
  • Update main docs by @patrickvonplaten in #1706
  • [Readme] Clarify package owners by @anton-l in #1707
  • Fix the bug that torch version less than 1.12 throws TypeError by @chinoll in #1671
  • RePaint fast tests and API conforming by @anton-l in #1701
  • Add state checkpointing to other training scripts by @pcuenca in #1687
  • Improve pipelinestablediffusioninpaintlegacy.py by @cyber-meow in #1585
  • apply amp bf16 on textual inversion by @jiqing-feng in #1465
  • Add examples with Intel optimizations by @hshen14 in #1579
  • Added a README page for docs and a "schedulers" page by @yiyixuxu in #1710
  • Accept latents as optional input in Latent Diffusion pipeline by @daspartho in #1723
  • Fix ONNX img2img preprocessing and add fast tests coverage by @anton-l in #1727
  • Fix ldm tests on master by not running the CPU tests on GPU by @patrickvonplaten in #1729
  • Docs: recommend xformers by @pcuenca in #1724
  • Nightly integration tests by @anton-l in #1664
  • [Batched Generators] This PR adds generators that are useful to make batched generation fully reproducible by @patrickvonplaten in #1718
  • Fix ONNX img2img preprocessing by @peterto in #1736
  • Fix MPS fast test warnings by @anton-l in #1744
  • Fix/update the LDM pipeline and tests by @anton-l in #1743
  • kakaobrain unCLIP by @williamberman in #1428
  • [fix] pipeline_unclip generator by @williamberman in #1751
  • unCLIP docs by @williamberman in #1754
  • Correct help text for scheduler_type flag in scripts. by @msiedlarek in #1749
  • Add resnettimescale_shift to VD layers by @anton-l in #1757
  • Add attention mask to uclip by @patrickvonplaten in #1756
  • Support attn2==None for xformers by @anton-l in #1759
  • [UnCLIPPipeline] fix numimagesper_prompt by @patil-suraj in #1762
  • Add CPU offloading to UnCLIP by @anton-l in #1761
  • [Versatile] fix attention mask by @patrickvonplaten in #1763
  • [Revision] Don't recommend using revision by @patrickvonplaten in #1764
  • [Examples] Update train_unconditional.py to include logging argument for Wandb by @ash0ts in #1719
  • Transformers version req for UnCLIP by @anton-l in #1766

- Python
Published by anton-l about 3 years ago

diffusers - v0.10.2: Patch release

This patch removes the hard requirement for transformers>=4.25.1 in case external libraries were downgrading the library upon startup in a non-controllable way.

  • do not automatically enable xformers by @patrickvonplaten in #1640
  • Adapt to forced transformers version in some dependent libraries by @anton-l in #1638
  • Re-add xformers enable to UNet2DCondition by @patrickvonplaten in #1627

🚨🚨🚨 Note that xformers in not automatically enabled anymore 🚨🚨🚨

The reasons for this are given here: https://github.com/huggingface/diffusers/pull/1640#discussion_r1044651551:

We should not automatically enable xformers for three reasons:

It's not PyTorch-like API. PyTorch doesn't by default enable all the fastest options available We allocate GPU memory before the user even does .to("cuda") This behavior is not consistent with cases where xformers is not installed

=> This means: If you were used to have xformers automatically enabled, please make sure to add the following now:

```python from diffusers.utils.importutils import isxformers_available

unet = ... # load unet

if isxformersavailable(): try: unet.enablexformersmemoryefficientattention(True) except Exception as e: logger.warning( "Could not enable memory efficient attention. Make sure xformers is installed" f" correctly and a GPU is available: {e}" ) ```

for the UNet (e.g. in dreambooth) or for the pipeline:

```py from diffusers.utils.importutils import isxformers_available

pipe = ... # load pipeline

if isxformersavailable(): try: pipe.enablexformersmemoryefficientattention(True) except Exception as e: logger.warning( "Could not enable memory efficient attention. Make sure xformers is installed" f" correctly and a GPU is available: {e}" ) ```

- Python
Published by anton-l about 3 years ago

diffusers - v0.10.1: Patch release

This patch returns enable_xformers_memory_efficient_attention() to UNet2DCondition to restore backward compatibility.

  • Re-add xformers enable to UNet2DCondition by @patrickvonplaten in #1627

- Python
Published by anton-l about 3 years ago

diffusers - v0.10.0: Depth Guidance and Safer Checkpoints

🐳 Depth-Guided Stable Diffusion and 2.1 checkpoints

The new depth-guided stable diffusion model is fully supported in this release. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.

image

Installing the transformers library from source is required for the MiDaS model: bash pip install --upgrade git+https://github.com/huggingface/transformers/ ```python import torch import requests from PIL import Image from diffusers import StableDiffusionDepth2ImgPipeline

pipe = StableDiffusionDepth2ImgPipeline.frompretrained( "stabilityai/stable-diffusion-2-depth", torchdtype=torch.float16, ).to("cuda")

url = "http://images.cocodataset.org/val2017/000000039769.jpg" init_image = Image.open(requests.get(url, stream=True).raw)

prompt = "two tigers" npropmt = "bad, deformed, ugly, bad anotomy" image = pipe(prompt=prompt, image=initimage, negativeprompt=npropmt, strength=0.7).images[0] ```

The updated Stable Diffusion 2.1 checkpoints are also released and fully supported: * https://huggingface.co/stabilityai/stable-diffusion-2-1 * https://huggingface.co/stabilityai/stable-diffusion-2-1-base

:safety_vest: Safe Tensors

We now support SafeTensors: a new simple format for storing tensors safely (as opposed to pickle) that is still fast (zero-copy). * [Proposal] Support loading from safetensors if file is present. by @Narsil in #1357 * [Proposal] Support saving to safetensors by @MatthieuBizien in #1494

| Format | Safe | Zero-copy | Lazy loading | No file size limit | Layout control | Flexibility | Bfloat16 | ----------------------- | --- | --- | --- | --- | --- | --- | --- | | pickle (PyTorch) | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✓ | | H5 (Tensorflow) | ✓ | ✗ | ✓ | ✓ | ~ | ~ | ✗ | | SavedModel (Tensorflow) | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ | | MsgPack (flax) | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | | SafeTensors | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |

**More details about the comparison here: https://github.com/huggingface/safetensors#yet-another-format- pip install safetensors ```python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("stabilityai/stable-diffusion-2-1") pipe.savepretrained("./safe-stable-diffusion-2-1", safe_serialization=True)

you can also push this checkpoint to the HF Hub and load from there

safepipe = StableDiffusionPipeline.frompretrained("./safe-stable-diffusion-2-1") ```

New Pipelines

:paintbrush: Paint-by-example

An implementation of Paint by Example: Exemplar-based Image Editing with Diffusion Models by Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, Fang Wen * Add paint by example by @patrickvonplaten in #1533

image

```python import PIL import requests import torch from io import BytesIO from diffusers import DiffusionPipeline

def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB")

imgurl = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/image/example1.png" maskurl = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/mask/example1.png" exampleurl = "https://raw.githubusercontent.com/Fantasy-Studio/Paint-by-Example/main/examples/reference/example1.jpg"

initimage = downloadimage(imgurl).resize((512, 512)) maskimage = downloadimage(maskurl).resize((512, 512)) exampleimage = downloadimage(example_url).resize((512, 512))

pipe = DiffusionPipeline.frompretrained("Fantasy-Studio/Paint-by-Example", torchdtype=torch.float16) pipe = pipe.to("cuda")

image = pipe(image=initimage, maskimage=maskimage, exampleimage=example_image).images[0] ```

Audio Diffusion and Latent Audio Diffusion

Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to and from mel spectrogram images. * add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334 by @teticio in #1426 ```python from IPython.display import Audio from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to("cuda")

output = pipe() display(output.images[0]) display(Audio(output.audios[0], rate=pipe.mel.getsamplerate())) ```

[Experimental] K-Diffusion pipeline for Stable Diffusion

This pipeline is added to support the latest schedulers from @crowsonkb's k-diffusion The purpose of this pipeline is to compare scheduler implementations and updates, so new features from other pipelines are unlikely to be supported!

  • [K Diffusion] Add k diffusion sampler natively by @patrickvonplaten in #1603 pip install k-diffusion ```python from diffusers import StableDiffusionKDiffusionPipeline import torch

pipe = StableDiffusionKDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base") pipe = pipe.to("cuda")

pipe.setscheduler("sampleheun") image = pipe("astronaut riding horse", numinferencesteps=25).images[0] ```

New Schedulers

Heun scheduler inspired by Karras et. al

Algorithm 1 of Karras et. al. Scheduler ported from @crowsonkb’s k-diffusion

  • Add 2nd order heun scheduler by @patrickvonplaten in #1336 ```python from diffusers import HeunDiscreteScheduler

pipe = StableDiffusionPipeline.frompretrained("stabilityai/stable-diffusion-2-1") pipe.scheduler = HeunDiscreteScheduler.fromconfig(pipe.scheduler.config) ```

Single step DPM-Solver

Original paper can be found here and the improved version. The original implementation can be found here. * Add Singlestep DPM-Solver (singlestep high-order schedulers) by @LuChengTHU in #1442 ```python from diffusers import DPMSolverSinglestepScheduler

pipe = StableDiffusionPipeline.frompretrained("stabilityai/stable-diffusion-2-1") pipe.scheduler = DPMSolverSinglestepScheduler.fromconfig(pipe.scheduler.config) ```

:memo: Changelog

  • [Proposal] Support loading from safetensors if file is present. by @Narsil in #1357
  • Hotfix for AttributeErrors in OnnxStableDiffusionInpaintPipelineLegacy by @anton-l in #1448
  • Speed up test and remove kwargs from call by @patrickvonplaten in #1446
  • v-prediction training support by @patil-suraj in #1455
  • Fix Flax from_pt by @pcuenca in #1436
  • Ensure Flax pipeline always returns numpy array by @pcuenca in #1435
  • Add 2nd order heun scheduler by @patrickvonplaten in #1336
  • fix slow tests by @patrickvonplaten in #1467
  • Flax support for Stable Diffusion 2 by @pcuenca in #1423
  • Updates Image to Image Inpainting community pipeline README by @vvvm23 in #1370
  • StableDiffusion: Decode latents separately to run larger batches by @kig in #1150
  • Fix bug in half precision for DPMSolverMultistepScheduler by @rtaori in #1349
  • [Train unconditional] Unwrap model before EMA by @anton-l in #1469
  • Add ort_nightly_directml to the onnxruntime candidates by @anton-l in #1458
  • Allow saving trained betas by @patrickvonplaten in #1468
  • Fix dtype model loading by @patrickvonplaten in #1449
  • [Dreambooth] Make compatible with alt diffusion by @patrickvonplaten in #1470
  • Add better docs xformers by @patrickvonplaten in #1487
  • Remove reminder comment by @pcuenca in #1489
  • Bump to 0.10.0.dev0 + deprecations by @anton-l in #1490
  • Add doc for Stable Diffusion on Habana Gaudi by @regisss in #1496
  • Replace deprecated hub utils in train_unconditional_ort by @anton-l in #1504
  • [Deprecate] Correct stacklevel by @patrickvonplaten in #1483
  • simplyfy AttentionBlock by @patil-suraj in #1492
  • Standardize on using image argument in all pipelines by @fboulnois in #1361
  • support v prediction in other schedulers by @patil-suraj in #1505
  • Fix Flax flipsinto_cos by @akashgokul in #1369
  • Add an explicit --image_size to the conversion script by @anton-l in #1509
  • fix heun scheduler by @patil-suraj in #1512
  • [docs] [dreambooth training] accelerate.utils.writebasicconfig by @williamberman in #1513
  • [docs] [dreambooth training] numclassimages clarification by @williamberman in #1508
  • [From pretrained] Allow returning local path by @patrickvonplaten in #1450
  • Update conversion script to correctly handle SD 2 by @patrickvonplaten in #1511
  • [refactor] Making the xformers mem-efficient attention activation recursive by @blefaudeux in #1493
  • Do not use torch.long in mps by @pcuenca in #1488
  • Fix Imagic example by @dhruvrnaik in #1520
  • Fix training docs to install datasets by @pedrogengo in #1476
  • Finalize 2nd order schedulers by @patrickvonplaten in #1503
  • Fixed mask+masked_image in sd inpaint pipeline by @antoche in #1516
  • Create traindreamboothinpaint.py by @thedarkzeno in #1091
  • Update FlaxLMSDiscreteScheduler by @dzlab in #1474
  • [Proposal] Support saving to safetensors by @MatthieuBizien in #1494
  • Add xformers attention to VAE by @kig in #1507
  • [CI] Add slow MPS tests by @anton-l in #1104
  • [Stable Diffusion Inpaint] Allow tensor as input image & mask by @patrickvonplaten in #1527
  • Compute embedding distances with torch.cdist by @blefaudeux in #1459
  • [Upscaling] Fix batch size by @patrickvonplaten in #1525
  • Update bug-report.yml by @patrickvonplaten in #1548
  • [Community Pipeline] Checkpoint Merger based on Automatic1111 by @Abhinay1997 in #1472
  • [textual_inversion] Add an option for only saving the embeddings by @allo- in #781
  • [examples] use from_pretrained to load scheduler by @patil-suraj in #1549
  • fix mask discrepancies in traindreamboothinpaint by @thedarkzeno in #1529
  • [refactor] make setattentionslice recursive by @patil-suraj in #1532
  • Research folder by @patrickvonplaten in #1553
  • add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334 by @teticio in #1426
  • [Community download] Fix cache dir by @patrickvonplaten in #1555
  • [Docs] Correct docs by @patrickvonplaten in #1554
  • Fix typo by @pcuenca in #1558
  • [docs] [dreambooth training] default accelerate config by @williamberman in #1564
  • Mega community pipeline by @patrickvonplaten in #1561
  • [examples] add checkminversion by @patil-suraj in #1550
  • [dreambooth] make collate_fn global by @patil-suraj in #1547
  • Standardize fast pipeline tests with PipelineTestMixin by @anton-l in #1526
  • Add paint by example by @patrickvonplaten in #1533
  • [Community Pipeline] fix lpwstablediffusion by @SkyTNT in #1570
  • [Paint by Example] Better default for image width by @patrickvonplaten in #1587
  • Add from_pretrained telemetry by @anton-l in #1461
  • Correct order height & width in pipelinepaintby_example.py by @Fantasy-Studio in #1589
  • Fix common tests for FP16 by @anton-l in #1588
  • [UNet2DConditionModel] add an option to upcast attention to fp32 by @patil-suraj in #1590
  • Flax: avoid recompilation when params change by @pcuenca in #1096
  • Add Singlestep DPM-Solver (singlestep high-order schedulers) by @LuChengTHU in #1442
  • fix upcast in slice attention by @patil-suraj in #1591
  • Update scheduling_repaint.py by @Randolph-zeng in #1582
  • Update RL docs for better sharing / adding models by @natolambert in #1563
  • Make cross-attention check more robust by @pcuenca in #1560
  • [ONNX] Fix flaky tests by @anton-l in #1593
  • Trivial fix for undefined symbol in train_dreambooth.py by @bcsherma in #1598
  • [K Diffusion] Add k diffusion sampler natively by @patrickvonplaten in #1603
  • [Versatile Diffusion] add upcast_attention by @patil-suraj in #1605
  • Fix PyCharm/VSCode static type checking for dummy objects by @anton-l in #1596

- Python
Published by anton-l about 3 years ago

diffusers - v0.9.0: Stable Diffusion 2

:art: Stable Diffusion 2 is here!

Installation

pip install diffusers[torch]==0.9 transformers

Stable Diffusion 2.0 is available in several flavors:

Stable Diffusion 2.0-V at 768x768

New stable diffusion model (Stable Diffusion 2.0-v) at 768x768 resolution. Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. SD 2.0-v is a so-called v-prediction model.

image

```python import torch from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repoid = "stabilityai/stable-diffusion-2" pipe = DiffusionPipeline.frompretrained(repoid, torchdtype=torch.float16, revision="fp16") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda")

prompt = "High quality photo of an astronaut riding a horse in space" image = pipe(prompt, guidancescale=9, numinference_steps=25).images[0] image.save("astronaut.png") ```

Stable Diffusion 2.0-base at 512x512

The above model is finetuned from SD 2.0-base, which was trained as a standard noise-prediction model on 512x512 images and is also made available.

image

```python import torch from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repoid = "stabilityai/stable-diffusion-2-base" pipe = DiffusionPipeline.frompretrained(repoid, torchdtype=torch.float16, revision="fp16") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda")

prompt = "High quality photo of an astronaut riding a horse in space" image = pipe(prompt, numinferencesteps=25).images[0] image.save("astronaut.png") ```

Stable Diffusion 2.0 for Inpanting

This model for text-guided inpanting is finetuned from SD 2.0-base. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning.

image

```python import PIL import requests import torch from io import BytesIO from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB")

imgurl = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpaintingexamples/overture-creations-5sI6fQgYIuo.png" maskurl = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpaintingexamples/overture-creations-5sI6fQgYIuomask.png" initimage = downloadimage(imgurl).resize((512, 512)) maskimage = downloadimage(mask_url).resize((512, 512))

repoid = "stabilityai/stable-diffusion-2-inpainting" pipe = DiffusionPipeline.frompretrained(repoid, torchdtype=torch.float16, revision="fp16") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench" image = pipe(prompt=prompt, image=initimage, maskimage=maskimage, numinferencesteps=25).images[0] image.save("yellowcat.png") ```

Stable Diffusion X4 Upscaler

The model was trained on crops of size 512x512 and is a text-guided latent upscaling diffusion model. In addition to the textual input, it receives a noise_level as an input parameter, which can be used to add noise to the low-resolution input according to a predefined diffusion schedule.

image

```python import requests from PIL import Image from io import BytesIO from diffusers import StableDiffusionUpscalePipeline import torch

modelid = "stabilityai/stable-diffusion-x4-upscaler" pipeline = StableDiffusionUpscalePipeline.frompretrained(modelid, revision="fp16", torchdtype=torch.float16) pipeline = pipeline.to("cuda")

url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/lowrescat.png" response = requests.get(url) lowresimg = Image.open(BytesIO(response.content)).convert("RGB") lowresimg = lowresimg.resize((128, 128))

prompt = "a white cat" upscaledimage = pipeline(prompt=prompt, image=lowresimg).images[0] upscaledimage.save("upsampled_cat.png") ```

Saving & Loading is fixed for Versatile Diffusion

Previously there was a :bug: when saving & loading versatile diffusion - this is fixed now so that memory efficient saving & loading works as expected.

  • [Versatile Diffusion] Fix remaining tests by @patrickvonplaten in #1418

:memo: Changelog

  • add v prediction by @patil-suraj in #1386
  • Adapt UNet2D for supre-resolution by @patil-suraj in #1385
  • Version 0.9.0.dev0 by @anton-l in #1394
  • Make height and width optional by @patrickvonplaten in #1401
  • [Config] Add optional arguments by @patrickvonplaten in #1395
  • Upscaling fixed by @patrickvonplaten in #1402
  • Add the new SD2 attention params to the VD text unet by @anton-l in #1400
  • Deprecate sample size by @patrickvonplaten in #1406
  • Support SD2 attention slicing by @anton-l in #1397
  • Add SD2 inpainting integration tests by @anton-l in #1412
  • Fix sample size conversion script by @patrickvonplaten in #1408
  • fix clip guided by @patrickvonplaten in #1414
  • Fix all stable diffusion by @patrickvonplaten in #1415
  • [MPS] call contiguous after permute by @kashif in #1411
  • Deprecate predict_epsilon by @pcuenca in #1393
  • Fix ONNX conversion and inference by @anton-l in #1416
  • Allow to set config params directly in init by @patrickvonplaten in #1419
  • Add tests for Stable Diffusion 2 V-prediction 768x768 by @anton-l in #1420
  • StableDiffusionUpscalePipeline by @patil-suraj in #1396
  • added initial v-pred support to DPM-solver by @kashif in #1421
  • SD2 docs by @patrickvonplaten in #1424

- Python
Published by anton-l over 3 years ago

diffusers - v0.8.1: Patch release

This patch release fixes an error with CLIPVisionModelWithProjection imports on a non-git transformers installation.

:warning: Please upgrade with pip install --upgrade diffusers or pip install diffusers==0.8.1

  • [Bad dependencies] Fix imports (https://github.com/huggingface/diffusers/pull/1382) by @patrickvonplaten

- Python
Published by anton-l over 3 years ago

diffusers - v0.8.0: Versatile Diffusion - Text, Images and Variations All in One Diffusion Model

🙆‍♀️ New Models

VersatileDiffusion

VersatileDiffusion, released by SHI-Labs, is a unified multi-flow multimodal diffusion model that is capable of doing multiple tasks such as text2image, image variations, dual-guided(text+image) image generation, image2text.

  • [Versatile Diffusion] Add versatile diffusion model by @patrickvonplaten @anton-l #1283 Make sure to install transformers from "main":

bash pip install git+https://github.com/huggingface/transformers

Then you can run:

```python from diffusers import VersatileDiffusionPipeline import torch import requests from io import BytesIO from PIL import Image

pipe = VersatileDiffusionPipeline.frompretrained("shi-labs/versatile-diffusion", torchdtype=torch.float16) pipe = pipe.to("cuda")

initial image

url = "https://huggingface.co/datasets/diffusers/images/resolve/main/benz.jpg" response = requests.get(url) image = Image.open(BytesIO(response.content)).convert("RGB")

prompt

prompt = "a red car"

text to image

image = pipe.texttoimage(prompt).images[0]

image variation

image = pipe.image_variation(image).images[0]

image variation

image = pipe.dual_guided(prompt, image).images[0] ```

More in-depth details can be found on: - Model card - Docs

AltDiffusion

AltDiffusion is a multilingual latent diffusion model that supports text-to-image generation for 9 different languages: English, Chinese, Spanish, French, Japanese, Korean, Arabic, Russian and Italian.

  • Add AltDiffusion by @patrickvonplaten @patil-suraj #1299

Stable Diffusion Image Variations

StableDiffusionImageVariationPipeline by @justinpinkney is a stable diffusion model that takes an image as an input and generates variations of that image. It is conditioned on CLIP image embeddings instead of text.

  • StableDiffusionImageVariationPipeline by @patil-suraj #1365

Safe Latent Diffusion

Safe Latent Diffusion (SLD), released by ml-research@TUDarmstadt group, is a new practical and sophisticated approach to prevent unsolicited content from being generated by diffusion models. One of the authors of the research contributed their implementation to diffusers.

  • Add Safe Stable Diffusion Pipeline by @manuelbrack #1244

VQ-Diffusion with classifier-free sampling

vq diffusion classifier free sampling by @williamberman #1294

LDM super resolution

LDM super resolution is a latent 4x super-resolution diffusion model released by CompVis.

  • Add LDM Super Resolution pipeline by @duongna21 #1116

CycleDiffusion

CycleDiffusion is a method that uses Text-to-Image Diffusion Models for Image-to-Image Editing. It is capable of

  1. Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion. Traditional unpaired image-to-image translation with diffusion models trained on two related domains.
  2. Zero-shot image-to-image translation with text-to-image diffusion models such as Stable Diffusion. Traditional unpaired image-to-image translation with diffusion models trained on two related domains.
  • Add CycleDiffusion pipeline using Stable Diffusion by @ChenWu98 #888

CLIPSeg + StableDiffusionInpainting.

Uses CLIPSeg to automatically generate a mask using segmentation, and then applies Stable Diffusion in-painting.

K-Diffusion wrapper

K-Diffusion Pipeline is community pipeline that allows to use any sampler from K-diffusion with diffusers models.

  • [Community Pipelines] K-Diffusion Pipeline by @patrickvonplaten #1360

🌀New SOTA Scheduler

DPMSolverMultistepScheduler is the 🧨 diffusers implementation of DPM-Solver++, a state-of-the-art scheduler that was contributed by one of the authors of the paper. This scheduler is able to achieve great quality in as few as 20 steps. It's a drop-in replacement for the default Stable Diffusion scheduler, so you can use it to essentially half generation times. It works so well that we adopted it for the Stable Diffusion demo Spaces: https://huggingface.co/spaces/stabilityai/stable-diffusion, https://huggingface.co/spaces/runwayml/stable-diffusion-v1-5.

You can use it like this:

```Python from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

repoid = "runwayml/stable-diffusion-v1-5" scheduler = DPMSolverMultistepScheduler.frompretrained(repoid, subfolder="scheduler") stablediffusion = DiffusionPipeline.frompretrained(repoid, scheduler=scheduler) ```

🌐 Better scheduler API

The example above also demonstrates how to load schedulers using a new API that is coherent with model loading and therefore more natural and intuitive.

You can load a scheduler using from_pretrained, as demonstrated above, or you can instantiate one from an existing scheduler configuration. This is a way to replace the scheduler of a pipeline that was previously loaded:

```Python from diffusers import DiffusionPipeline, EulerDiscreteScheduler

pipeline = DiffusionPipeline.frompretrained("runwayml/stable-diffusion-v1-5") pipeline.scheduler = DDIMScheduler.fromconfig(pipeline.scheduler.config) ```

Read more about these changes in the documentation. See also the community pipeline that allows using any of the K-diffusion samplers with diffusers, as mentioned above!

🎉 Performance

We work relentlessly to incorporate performance optimizations and memory reduction techniques to 🧨 diffusers. These are two of the most noteworthy incorporations in this release:

  • Enable memory-efficient attention by default if xFormers is installed.
  • Use batched-matmuls when possible.

🎁 Quality of Life improvements

  • Fix/Enable all schedulers for in-painting
  • Easier loading of local pipelines
  • cpu offloading: mutli GPU support

:memo: Changelog

  • Add multistep DPM-Solver discrete scheduler by @LuChengTHU in #1132
  • Remove warning about half precision on MPS by @pcuenca in #1163
  • Fix typo latens -> latents by @duongna21 in #1171
  • Fix community pipeline links by @pcuenca in #1162
  • [Docs] Add loading script by @patrickvonplaten in #1174
  • Fix dtype safety checker inpaint legacy by @patrickvonplaten in #1137
  • Community pipeline img2img inpainting by @vvvm23 in #1114
  • [Community Pipeline] Add multilingual stable diffusion to community pipelines by @juancopi81 in #1142
  • [Flax examples] Load text encoder from subfolder by @duongna21 in #1147
  • Link to Dreambooth blog post instead of W&B report by @pcuenca in #1180
  • Fix small typo by @pcuenca in #1178
  • [DDIMScheduler] fix noise device in ddim step by @patil-suraj in #1189
  • MPS schedulers: don't use float64 by @pcuenca in #1169
  • Warning for invalid options without "--withpriorpreservation" by @shirayu in #1065
  • [ONNX] Improve ONNXPipeline scheduler compatibility, fix safety_checker by @anton-l in #1173
  • Restore compatibility with deprecated StableDiffusionOnnxPipeline by @pcuenca in #1191
  • Update pr docs actions by @mishig25 in #1194
  • handle dtype xformers attention by @patil-suraj in #1196
  • [Scheduler] Move predict epsilon to init by @patrickvonplaten in #1155
  • add licenses to pipelines by @natolambert in #1201
  • Fix cpu offloading by @anton-l in #1177
  • Fix slow tests by @patrickvonplaten in #1210
  • [Flax] fix extra copy pasta 🍝 by @camenduru in #1187
  • [CLIPGuidedStableDiffusion] support DDIM scheduler by @patil-suraj in #1190
  • Fix layer names convert LDM script by @duongna21 in #1206
  • [Loading] Make sure loading edge cases work by @patrickvonplaten in #1192
  • Add LDM Super Resolution pipeline by @duongna21 in #1116
  • [Conversion] Improve conversion script by @patrickvonplaten in #1218
  • DDIM docs by @patrickvonplaten in #1219
  • apply repeat_interleave fix for mps to stable diffusion image2image pipeline by @jncasey in #1135
  • Flax tests: don't hardcode number of devices by @pcuenca in #1175
  • Improve documentation for the LPW pipeline by @exo-pla-net in #1182
  • Factor out encode text with Copied from by @patrickvonplaten in #1224
  • Match the generator device to the pipeline for DDPM and DDIM by @anton-l in #1222
  • [Tests] Fix mps+generator fast tests by @anton-l in #1230
  • [Tests] Adjust TPU test values by @anton-l in #1233
  • Add a reference to the name 'Sampler' by @apolinario in #1172
  • Fix Flax usage comments by @pcuenca in #1211
  • [Docs] improve img2img example by @ruanrz in #1193
  • [Stable Diffusion] Fix padding / truncation by @patrickvonplaten in #1226
  • Finalize stable diffusion refactor by @patrickvonplaten in #1269
  • Edited attention.py for older xformers by @Lime-Cakes in #1270
  • Fix wrong link in text2img fine-tuning documentation by @daspartho in #1282
  • [StableDiffusionInpaintPipeline] fix batch_size for mask and masked latents by @patil-suraj in #1279
  • Add UNet 1d for RL model for planning + colab by @natolambert in #105
  • Fix documentation typo for UNet2DModel and UNet2DConditionModel by @xenova in #1275
  • add source link to composable diffusion model by @nanliu1 in #1293
  • Fix incorrect link to Stable Diffusion notebook by @dhruvrnaik in #1291
  • [dreambooth] link to bitsandbytes readme for installation by @0xdevalias in #1229
  • Add Scheduler.from_pretrained and better scheduler changing by @patrickvonplaten in #1286
  • Add AltDiffusion by @patrickvonplaten in #1299
  • Better error message for transformers dummy by @patrickvonplaten in #1306
  • Revert "Update pr docs actions" by @mishig25 in #1307
  • [AltDiffusion] add tests by @patil-suraj in #1311
  • Add improved handling of pil by @patrickvonplaten in #1309
  • cpu offloading: mutli GPU support by @dblunk88 in #1143
  • vq diffusion classifier free sampling by @williamberman in #1294
  • doc string args shape fix by @kamalkraj in #1243
  • [Community Pipeline] CLIPSeg + StableDiffusionInpainting by @unography in #1250
  • Temporary local test for PIL_INTERPOLATION by @pcuenca in #1317
  • Fix gpu_id by @anton-l in #1326
  • integrate ort by @prathikr in #1110
  • [Custom pipeline] Easier loading of local pipelines by @patrickvonplaten in #1327
  • [ONNX] Support Euler schedulers by @anton-l in #1328
  • img2text Typo by @patrickvonplaten in #1329
  • add docs for multi-modal examples by @natolambert in #1227
  • [Flax] Fix loading scheduler from subfolder by @skirsten in #1319
  • Fix/Enable all schedulers for in-painting by @patrickvonplaten in #1331
  • Correct path to schedlure by @patrickvonplaten in #1322
  • Avoid nested fix-copies by @anton-l in #1332
  • Fix img2img speed with LMS-Discrete Scheduler by @NotNANtoN in #896
  • Fix the order of casts for onnx inpainting by @anton-l in #1338
  • Legacy Inpainting Pipeline for Onnx Models by @ctsims in #1237
  • Jax infer support negative prompt by @entrpn in #1337
  • Update README.md: IMAGIC example code snippet misspelling by @ki-arie in #1346
  • Update README.md: Minor change to Imagic code snippet, missing dir error by @ki-arie in #1347
  • Handle batches and Tensors in pipeline_stable_diffusion_inpaint.py:prepare_mask_and_masked_image by @vict0rsch in #1003
  • change the sample model by @shunxing1234 in #1352
  • Add bit diffusion [WIP] by @kingstut in #971
  • perf: prefer batched matmuls for attention by @Birch-san in #1203
  • [Community Pipelines] K-Diffusion Pipeline by @patrickvonplaten in #1360
  • Add Safe Stable Diffusion Pipeline by @manuelbrack in #1244
  • [examples] fix mixed_precision arg by @patil-suraj in #1359
  • use memoryefficientattention by default by @patil-suraj in #1354
  • Replace logger.warn by logger.warning by @regisss in #1366
  • Fix using non-square images with UNet2DModel and DDIM/DDPM pipelines by @jenkspt in #1289
  • handle fp16 in UNet2DModel by @patil-suraj in #1216
  • StableDiffusionImageVariationPipeline by @patil-suraj in #1365

- Python
Published by patil-suraj over 3 years ago

diffusers - v0.7.2: Patch release

This patch release fixes a bug that broken the Flax Stable Diffusion Inference. Thanks a mille for spotting it @camenduru in https://github.com/huggingface/diffusers/issues/1145 and thanks a lot to @pcuenca and @kashif for fixing it in https://github.com/huggingface/diffusers/pull/1149

  • Flax: Flip sin to cos in time embeddings #1149 by @pcuenca

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.7.1: Patch release

This patch release makes accelerate a soft dependency to avoid an error when installing diffusers with pre-existing torch.

  • Move accelerate to a soft-dependency #1134 by @patrickvonplaten

- Python
Published by anton-l over 3 years ago

diffusers - v0.7.0: Optimized for Apple Silicon, Improved Performance, Awesome Community

:heart: PyTorch + Accelerate

:warning: The PyTorch pipelines now require accelerate for improved model loading times! Install Diffusers with pip install --upgrade diffusers[torch] to get everything in a single command.

🍎 Apple Silicon support with PyTorch 1.13

PyTorch and Apple have been working on improving mps support in PyTorch 1.13, so Apple Silicon is now a first-class citizen in diffusers 0.7.0!

Requirements

  • Mac computer with Apple silicon (M1/M2) hardware.
  • macOS 12.6 or later (13.0 or later recommended, as support is even better).
  • arm64 version of Python.
  • PyTorch 1.13.0 official release, installed from pip or the conda channels.

Memory efficient generation

Memory management is crucial to achieve fast generation speed. We recommend to always use attention slicing on Apple Silicon, as it drastically reduces memory pressure and prevents paging or swapping. This is especially important for computers with less than 64 GB of Unified RAM, and may be the difference between generating an image in seconds rather than in minutes. Use it like this:

```Python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe = pipe.to("mps")

Recommended if your computer has < 64 GB of RAM

pipe.enableattentionslicing()

prompt = "a photo of an astronaut riding a horse on mars"

First-time "warmup" pass

_ = pipe(prompt, numinferencesteps=1)

image = pipe(prompt).images[0] image.save("astronaut.png") ```

Continuous Integration

Our automated tests now include a full battery of tests on the mps device. This will be helpful to identify issues early and ensure the quality on Apple Silicon going forward.

See more details in the documentation.

💃 Dance Diffusion

diffusers goes audio 🎵 Dance Diffusion by Harmonai is the first audio model in 🧨Diffusers!

  • [Dance Diffusion] Add dance diffusion by @patrickvonplaten #803

Try it out to generate some random music:

```python from diffusers import DiffusionPipeline import scipy

modelid = "harmonai/maestro-150k" pipeline = DiffusionPipeline.frompretrained(model_id) pipeline = pipeline.to("cuda")

audio = pipeline(audiolengthin_s=4.0).audios[0]

To save locally

scipy.io.wavfile.write("maestrotest.wav", pipe.unet.samplerate, audio.transpose()) ```

🎉 Euler schedulers

These are the Euler schedulers, from the paper Elucidating the Design Space of Diffusion-Based Generative Models by Karras et al. (2022). The diffusers implementation is based on the original k-diffusion implementation by Katherine Crowson. The Euler schedulers are fast, often times generating really good outputs with 20-30 steps.

  • k-diffusion-euler by @hlky #1019

```python from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

eulerscheduler = EulerDiscreteScheduler.fromconfig("runwayml/stable-diffusion-v1-5", subfolder="scheduler") pipeline = StableDiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", scheduler=eulerscheduler, revision="fp16", torch_dtype=torch.float16 ) pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars" image = pipeline(prompt, numinferencesteps=25).images[0] ```

```python from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler

eulerancestralscheduler = EulerAncestralDiscreteScheduler.fromconfig("runwayml/stable-diffusion-v1-5", subfolder="scheduler") pipeline = StableDiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", scheduler=eulerscheduler, revision="fp16", torchdtype=torch.float16 ) pipeline.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars" image = pipeline(prompt, numinferencesteps=25).images[0] ```

🔥 Up to 2x faster inference with memory_efficient_attention

Even faster and memory efficient stable diffusion using the efficient flash attention implementation from xformers

  • Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR #532

To leverage it just make sure you have: - PyTorch > 1.12 - Cuda available - Installed the xformers library ```python from diffusers import StableDiffusionPipeline import torch

pipe = StableDiffusionPipeline.frompretrained( "runwayml/stable-diffusion-v1-5", revision="fp16", torchdtype=torch.float16, ).to("cuda")

pipe.enablexformersmemoryefficientattention()

with torch.inference_mode(): sample = pipe("a small cat")

optional: You can disable it via

pipe.disablexformersmemoryefficientattention()

```

🚀 Much faster loading

Thanks to accelerate, pipeline loading is much, much faster. There are two parts to it:

  • First, when a model is created PyTorch initializes its weights by default. This takes a good amount of time. Using low_cpu_mem_usage (enabled by default), no initialization will be performed.
  • Optionally, you can also use device_map="auto" to automatically select the best device(s) where the pre-trained weights will be initially sent to.

In our tests, loading time was more than halved on CUDA devices, and went down from 12s to 4s on an Apple M1 computer.

As a side effect, CPU usage will be greatly reduced during loading, because no temporary copies of the weights are necessary.

This feature requires PyTorch 1.9 or better and accelerate 0.8.0 or higher.

🎨 RePaint

RePaint allows to reuse any pretrained DDPM model for free-form inpainting by adding restarts to the denoising schedule. Based on the paper RePaint: Inpainting using Denoising Diffusion Probabilistic Models by Andreas Lugmayr et al.

```python from diffusers import RePaintPipeline, RePaintScheduler

Load the RePaint scheduler and pipeline based on a pretrained DDPM model

scheduler = RePaintScheduler.fromconfig("google/ddpm-ema-celebahq-256") pipe = RePaintPipeline.frompretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler) pipe = pipe.to("cuda")

generator = torch.Generator(device="cuda").manualseed(0) output = pipe( originalimage=originalimage, maskimage=maskimage, numinferencesteps=250, eta=0.0, jumplength=10, jumpnsample=10, generator=generator, ) inpainted_image = output.images[0] ```

image

:earth_africa: Community Pipelines

Long Prompt Weighting Stable Diffusion

The Pipeline lets you input prompt without 77 token length limit. And you can increase words weighting by using "()" or decrease words weighting by using "[]". The Pipeline also lets you use the main use cases of the stable diffusion pipeline in a single class. For a code example, see Long Prompt Weighting Stable Diffusion * [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907

Speech to Image

Generate an image from an audio sample using pre-trained OpenAI whisper-small and Stable Diffusion. For a code example, see Speech to Image * [Examples] add speech to image pipeline example by @MikailINTech in https://github.com/huggingface/diffusers/pull/897

Wildcard Stable Diffusion

A minimal implementation that allows for users to add "wildcards", denoted by __wildcard__ to prompts that are used as placeholders for randomly sampled values given by either a dictionary or a .txt file. For a code example, see Wildcard Stable Diffusion * Wildcard stable diffusion pipeline by @shyamsn97 in #900

Composable Stable Diffusion

Use logic operators to do compositional generation. For a code example, see Composable Stable Diffusion * Add Composable diffusion to community pipeline examples by @MarkRich in #951

Imagic Stable Diffusion

Image editing with Stable Diffusion. For a code example, see Imagic Stable Diffusion * Add imagic to community pipelines by @MarkRich in #958

Seed Resizing

Allows to generate a larger image while keeping the content of the original image. For a code example, see Seed Resizing * Add seed resizing to community pipelines by @MarkRich in #1011

:memo: Changelog

  • [Community Pipelines] Long Prompt Weighting Stable Diffusion Pipelines by @SkyTNT in #907
  • [Stable Diffusion] Add components function by @patrickvonplaten in #889
  • [PNDM Scheduler] Make sure list cannot grow forever by @patrickvonplaten in #882
  • [DiffusionPipeline.from_pretrained] add warning when passing unused k… by @patrickvonplaten in #870
  • DOC Dreambooth Add --samplebatchsize=1 to the 8 GB dreambooth example script by @leszekhanusz in #829
  • [Examples] add speech to image pipeline example by @MikailINTech in #897
  • [dreambooth] dont use safety check when generating prior images by @patil-suraj in #922
  • Dreambooth class image generation: using unique names to avoid overwriting existing image by @leszekhanusz in #847
  • fix test_components by @patil-suraj in #928
  • Fix Compatibility with Nvidia NGC Containers by @tasercake in #919
  • [Community Pipelines] Fix padtokensandweights in lpwstable_diffusion by @SkyTNT in #925
  • Bump the version to 0.7.0.dev0 by @anton-l in #912
  • Introduce the copy mechanism by @anton-l in #924
  • [Tests] Move stable diffusion into their own files by @patrickvonplaten in #936
  • [Flax] dont warn for bf16 weights by @patil-suraj in #923
  • Support LMSDiscreteScheduler in LDMPipeline by @mkshing in #891
  • Wildcard stable diffusion pipeline by @shyamsn97 in #900
  • [MPS] fix mps failing tests by @kashif in #934
  • fix a small typo in pipeline_ddpm.py by @chenguolin in #948
  • Reorganize pipeline tests by @anton-l in #963
  • v1-5 docs updates by @apolinario in #921
  • add community pipeline docs; add minimal text to some empty doc pages by @natolambert in #930
  • Fix typo: torch_type -> torch_dtype by @pcuenca in #972
  • add numinferencesteps arg to DDPM by @tmabraham in #935
  • Add Composable diffusion to community pipeline examples by @MarkRich in #951
  • [Flax] added broadcasttoshapefromleft helper and Scheduler tests by @kashif in #864
  • [Tests] Fix mps reproducibility issue when running with pytest-xdist by @anton-l in #976
  • mps changes for PyTorch 1.13 by @pcuenca in #926
  • [Onnx] support half-precision and fix bugs for onnx pipelines by @SkyTNT in #932
  • [Dance Diffusion] Add dance diffusion by @patrickvonplaten in #803
  • [Dance Diffusion] FP16 by @patrickvonplaten in #980
  • [Dance Diffusion] Better naming by @patrickvonplaten in #981
  • Fix typo in documentation title by @echarlaix in #975
  • Add --pretrainedmodelnamerevision option to traindreambooth.py by @shirayu in #933
  • Do not use torch.float64 on the mps device by @pcuenca in #942
  • CompVis -> diffusers script - allow converting from merged checkpoint to either EMA or non-EMA by @patrickvonplaten in #991
  • fix a bug in the new version by @xiaohu2015 in #957
  • Fix typos by @shirayu in #978
  • Add missing import by @juliensimon in #979
  • minimal stable diffusion GPU memory usage with accelerate hooks by @piEsposito in #850
  • [inpaint pipeline] fix bug for multiple prompts inputs by @xiaohu2015 in #959
  • Enable multi-process DataLoader for dreambooth by @skirsten in #950
  • Small modification to enable usage by external scripts by @briancw in #956
  • [Flax] Add Textual Inversion by @duongna21 in #880
  • Continuation of #942: additional float64 failure by @pcuenca in #996
  • fix dreambooth script. by @patil-suraj in #1017
  • [Accelerate model loading] Fix meta device and super low memory usage by @patrickvonplaten in #1016
  • [Flax] Add finetune Stable Diffusion by @duongna21 in #999
  • [DreamBooth] Set train mode for text encoder by @duongna21 in #1012
  • [Flax] Add DreamBooth by @duongna21 in #1001
  • Deprecate init_git_repo, refactor train_unconditional.py by @anton-l in #1022
  • update readme for flax examples by @patil-suraj in #1026
  • Probably nicer to specify dependency on tensorboard in the training example by @lukovnikov in #998
  • Add --dataloader_num_workers to the DDPM training example by @anton-l in #1027
  • Document sequential CPU offload method on Stable Diffusion pipeline by @piEsposito in #1024
  • Support grayscale images in numpy_to_pil by @anton-l in #1025
  • [Flax SD finetune] Fix dtype by @duongna21 in #1038
  • fix F.interpolate() for large batch sizes by @NouamaneTazi in #1006
  • [Tests] Improve unet / vae tests by @patrickvonplaten in #1018
  • [Tests] Speed up slow tests by @patrickvonplaten in #1040
  • Fix some failing tests by @patrickvonplaten in #1041
  • [Tests] Better prints by @patrickvonplaten in #1043
  • [Tests] no random latents anymore by @patrickvonplaten in #1045
  • Update training and fine-tuning docs by @pcuenca in #1020
  • Fix speedup ratio in fp16.mdx by @mwbyeon in #837
  • clean incomplete pages by @natolambert in #1008
  • Add seed resizing to community pipelines by @MarkRich in #1011
  • Tests: upgrade PyTorch cuda to 11.7 to fix examples tests. by @pcuenca in #1048
  • Experimental: allow fp16 in mps by @pcuenca in #961
  • Move safety detection to model call in Flax safety checker by @jonatanklosko in #1023
  • Fix pipelines user_agent, ignore CI requests by @anton-l in #1058
  • [GitBot] Automatically close issues after inactivitiy by @patrickvonplaten in #1079
  • Allow safety_checker to be None when using CPU offload by @pcuenca in #1078
  • k-diffusion-euler by @hlky in #1019
  • [Better scheduler docs] Improve usage examples of schedulers by @patrickvonplaten in #890
  • [Tests] Fix slow tests by @patrickvonplaten in #1087
  • Remove nn sequential by @patrickvonplaten in #1086
  • Remove some unused parameter in CrossAttnUpBlock2D by @LaurentMazare in #1034
  • Add imagic to community pipelines by @MarkRich in #958
  • Up to 2x speedup on GPUs using memory efficient attention by @MatthieuTPHR in #532
  • [docs] add euler scheduler in docs, how to use differnet schedulers by @patil-suraj in #1089
  • Integration tests precision improvement for inpainting by @Lewington-pitsos in #1052
  • lpwstablediffusion: Add iscancelledcallback by @irgolic in #1053
  • Rename latent by @patrickvonplaten in #1102
  • fix typo in examples dreambooth README.md by @jorahn in #1073
  • fix model card url in text inversion readme. by @patil-suraj in #1103
  • [CI] Framework and hardware-specific CI tests by @anton-l in #997
  • Fix a small typo of a variable name by @omihub777 in #1063
  • Fix tests for equivalence of DDIM and DDPM pipelines by @sgrigory in #1069
  • Fix padding in dreambooth by @shirayu in #1030
  • [Flax] time embedding by @kashif in #1081
  • Training to predict x0 in training example by @lukovnikov in #1031
  • [Loading] Ignore unneeded files by @patrickvonplaten in #1107
  • Fix hub-dependent tests for PRs by @anton-l in #1119
  • Allow saving None pipeline components by @anton-l in #1118
  • feat: add repaint by @Revist in #974
  • Continuation of #1035 by @pcuenca in #1120
  • VQ-diffusion by @williamberman in #658

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.6.0: Finetuned Stable Diffusion inpainting

:art: Finetuned Stable Diffusion inpainting

The first official stable diffusion checkpoint fine-tuned on inpainting has been released.

You can try it out in the official demo here

or code it up yourself :computer: :

```python from io import BytesIO

import torch

import PIL import requests from diffusers import StableDiffusionInpaintPipeline

def download_image(url): response = requests.get(url) return PIL.Image.open(BytesIO(response.content)).convert("RGB")

imgurl = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpaintingexamples/overture-creations-5sI6fQgYIuo.png" maskurl = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpaintingexamples/overture-creations-5sI6fQgYIuomask.png" image = downloadimage(imgurl).resize((512, 512)) maskimage = downloadimage(maskurl).resize((512, 512))

pipe = StableDiffusionInpaintPipeline.frompretrained( "runwayml/stable-diffusion-inpainting", revision="fp16", torchdtype=torch.float16, ) pipe.to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"

output = pipe(prompt=prompt, image=image, maskimage=maskimage) image = output.images[0] ```

gives:

image | mask_image | prompt | | Output | :-------------------------:|:-------------------------:|:-------------------------:|:-------------------------:|-------------------------:| drawing | drawing | Face of a yellow cat, high resolution, sitting on a park bench | => | drawing |

:warning: This release deprecates the unsupervised noising-based inpainting pipeline into StableDiffusionInpaintPipelineLegacy. The new StableDiffusionInpaintPipeline is based on a Stable Diffusion model finetuned for the inpainting task: https://huggingface.co/runwayml/stable-diffusion-inpainting

Note When loading StableDiffusionInpaintPipeline with a non-finetuned model (i.e. the one saved with diffusers<=0.5.1), the pipeline will default to StableDiffusionInpaintPipelineLegacy, to maintain backward compatibility :sparkles:

```python from diffusers import StableDiffusionInpaintPipeline

pipe = StableDiffusionInpaintPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

assert pipe.class .name == "StableDiffusionInpaintPipelineLegacy" ```

Context:

Why this change? When Stable Diffusion came out ~2 months ago, there were many unofficial in-painting demos using the original v1-4 checkpoint ("CompVis/stable-diffusion-v1-4"). These demos worked reasonably well, so that we integrated an experimental StableDiffusionInpaintPipeline class into diffusers. Now that the official inpainting checkpoint was released: https://github.com/runwayml/stable-diffusion we decided to make this our official pipeline and move the old / hacky one to "StableDiffusionInpaintPipelineLegacy".

:rocket: ONNX pipelines for image2image and inpainting

Thanks to the contribution by @zledas (#552) this release supports OnnxStableDiffusionImg2ImgPipeline and OnnxStableDiffusionInpaintPipeline optimized for CPU inference:

```python from diffusers import OnnxStableDiffusionImg2ImgPipeline, OnnxStableDiffusionInpaintPipeline

imgpipeline = OnnxStableDiffusionImg2ImgPipeline.frompretrained( "CompVis/stable-diffusion-v1-4", revision="onnx", provider="CPUExecutionProvider" )

inpaintpipeline = OnnxStableDiffusionInpaintPipeline.frompretrained( "runwayml/stable-diffusion-inpainting", revision="onnx", provider="CPUExecutionProvider" ) ```

:earth_africa: Community Pipelines

Two new community pipelines have been added to diffusers :fire:

Stable Diffusion Interpolation example

Interpolate the latent space of Stable Diffusion between different prompts/seeds. For more info see stable-diffusion-videos.

For a code example, see Stable Diffusion Interpolation

  • Add Stable Diffusion Interpolation Example by @nateraw in #862

Stable Diffusion Interpolation Mega

One Stable Diffusion Pipeline with all functionalities of Text2Image, Image2Image and Inpainting

For a code example, see Stable Diffusion Mega

  • All in one Stable Diffusion Pipeline by @patrickvonplaten in #821

:memo: Changelog

  • [Community] One step unet by @patrickvonplaten in #840
  • Remove unneeded useauthtoken by @osanseviero in #839
  • Bump to 0.6.0.dev0 by @anton-l in #831
  • Remove the last of ["sample"] by @anton-l in #842
  • Fix Flax pipeline: width and height are ignored #838 by @camenduru in #848
  • [DeviceMap] Make sure stable diffusion can be loaded from older trans… by @patrickvonplaten in #860
  • Fix small community pipeline import bug and finish README by @patrickvonplaten in #869
  • Fix training pushtohub (unconditional image generation): models were not saved before pushing to hub by @pcuenca in #868
  • Fix table in community README.md by @nateraw in #879
  • Add generic inference example to community pipeline readme by @apolinario in #874
  • Rename frame filename in interpolation community example by @nateraw in #881
  • Add Apple M1 tests by @anton-l in #796
  • Fix autoencoder test by @pcuenca in #886
  • Rename StableDiffusionOnnxPipeline -> OnnxStableDiffusionPipeline by @anton-l in #887
  • Fix DDIM on Windows not using int64 for timesteps by @hafriedlander in #819
  • [dreambooth] allow fine-tuning text encoder by @patil-suraj in #883
  • Stable Diffusion image-to-image and inpaint using onnx. by @zledas in #552
  • Improve ONNX img2img numpy handling, temporarily fix the tests by @anton-l in #899
  • [Stable Diffusion Inpainting] Deprecate inpainting pipeline in favor of official one by @patrickvonplaten in #903
  • [Communit Pipeline] Make sure "mega" uses correct inpaint pipeline by @patrickvonplaten in #908
  • Stable diffusion inpainting by @patil-suraj in #904
  • ONNX supervised inpainting by @anton-l in #906

- Python
Published by anton-l over 3 years ago

diffusers - v0.5.1: Patch release

This patch release fixes an bug with Flax's NFSW safety checker in the pipeline.

https://github.com/huggingface/diffusers/pull/832 by @patil-suraj

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.5.0: JAX/Flax and TPU support

:earofrice: JAX/Flax integration for super fast Stable Diffusion on TPUs.

We added JAX support for Stable Diffusion! You can now run Stable Diffusion on Colab TPUs (and GPUs too!) for faster inference.

Check out this TPU-ready colab for a Stable Diffusion pipeline: Open In Colab And a detailed blog post on Stable Diffusion and parallelism in JAX / Flax :hugs: https://huggingface.co/blog/stablediffusionjax

The most used models, schedulers and pipelines have been ported to JAX/Flax, namely: - Models: FlaxAutoencoderKL, FlaxUNet2DConditionModel - Schedulers: FlaxDDIMScheduler, FlaxDDIMScheduler, FlaxPNDMScheduler - Pipelines: FlaxStableDiffusionPipeline

Changelog: - Implement FlaxModelMixin #493 by @mishig25 , @patil-suraj, @patrickvonplaten , @pcuenca - Karras VE, DDIM and DDPM flax schedulers #508 by @kashif - initial flax pndm scheduler #492 by @kashif - FlaxDiffusionPipeline & FlaxStableDiffusionPipeline #559 by @mishig25 , @patrickvonplaten , @pcuenca - Flax pipeline pndm #583 by @pcuenca - Add frompt argument in .frompretrained #527 by @younesbelkada - Make flax from_pretrained work with local subfolder #608 by @mishig25

:fire: DeepSpeed low-memory training

Thanks to the :hugs: accelerate integration with DeepSpeed, a few of our training examples became even more optimized in terms of VRAM and speed: * DreamBooth is now trainable on 8GB GPUs thanks to a contribution from @Ttl! Find out how to run it here. * The Text2Image finetuning example is also fully compatible with DeepSpeed.

:pencil2: Changelog

  • Revert "[v0.4.0] Temporarily remove Flax modules from the public API by @anton-l in #755)"
  • Fix pushtohub for dreambooth and textual_inversion by @YaYaB in #748
  • Fix ONNX conversion script opset argument type by @justinchuby in #739
  • Add final latent slice checks to SD pipeline intermediate state tests by @jamestiotio in #731
  • fix(DDIM scheduler): use correct dtype for noise by @keturn in #742
  • [Tests] Fix tests by @patrickvonplaten in #774
  • debug an exception by @LowinLi in #638
  • Clean up resnet.py file by @natolambert in #780
  • add sigmoid betas by @natolambert in #777
  • [Low CPU memory] + device map by @patrickvonplaten in #772
  • Fix gradient checkpointing test by @patrickvonplaten in #797
  • fix typo docstring in unet2d by @natolambert in #798
  • DreamBooth DeepSpeed support for under 8 GB VRAM training by @Ttl in #735
  • support bf16 for stable diffusion by @patil-suraj in #792
  • stable diffusion fine-tuning by @patil-suraj in #356
  • Flax: Trickle down norm_num_groups by @akash5474 in #789
  • Eventually preserve this typo? :) by @spezialspezial in #804
  • Fix indentation in the code example by @osanseviero in #802
  • [Img2Img] Fix batch size mismatch prompts vs. init images by @patrickvonplaten in #793
  • Minor package fixes by @anton-l in #809
  • [Dummy imports] Better error message by @patrickvonplaten in #795
  • add or fix license formatting in models directory by @natolambert in #808
  • [train_text2image] Fix EMA and make it compatible with deepspeed. by @patil-suraj in #813
  • Fix fine-tuning compatibility with deepspeed by @pink-red in #816
  • Add diffusers version and pipeline class to the Hub UA by @anton-l in #814
  • [Flax] Add test by @patrickvonplaten in #824
  • update flax scheduler API by @patil-suraj in #822
  • Fix dreambooth loss type with prior_preservation and fp16 by @anton-l in #826
  • Fix type mismatch error, add tests for negative prompts by @anton-l in #823
  • Give more customizable options for safety checker by @patrickvonplaten in #815
  • Flax safety checker by @pcuenca in #825
  • Align PT and Flax API - allow loading checkpoint from PyTorch configs by @patrickvonplaten in #827

- Python
Published by anton-l over 3 years ago

diffusers - v0.4.2: Patch release

This patch release allows the img2img pipeline to be run on fp16 and fixes a bug with the "mps" device.

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.4.1: Patch release

This patch release fixes an bug with incorrect module naming for community pipelines and an incorrect breaking change when moving piplines in fp16 to "cpu" or "mps".

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.4.0 Better, faster, stronger!

🚗 Faster

We have thoroughly profiled our codebase and applied a number of incremental improvements that, when combined, provide a speed improvement of almost 3x.

On top of that, we now default to using the float16 format. It's much faster than float32 and, according to our tests, produces images with no discernible difference in quality. This beats the use of autocast, so the resulting code is cleaner!

🔑 use_auth_token no more

The recently released version of huggingface-hub automatically uses your access token if you are logged in, so you don't need to put it everywhere in your code. All you need to do is authenticate once using huggingface-cli login in your terminal and you're all set.

diff - pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True) + pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

We bumped huggingface-hub version to 0.10.0 in our dependencies to achieve this.

🎈More flexible APIs

  • Schedulers now use a common, simpler unified API design. This has allowed us to remove many conditionals and special cases in the rest of the code, including the pipelines. This is very important for us and for the users of 🧨 diffusers: we all gain clarity and a solid abstraction for schedulers. See the description in https://github.com/huggingface/diffusers/pull/719 for more details

Please update any custom Stable Diffusion pipelines accordingly: diff - if isinstance(self.scheduler, LMSDiscreteScheduler): - latents = latents * self.scheduler.sigmas[0] + latents = latents * self.scheduler.init_noise_sigma diff - if isinstance(self.scheduler, LMSDiscreteScheduler): - sigma = self.scheduler.sigmas[i] - latent_model_input = latent_model_input / ((sigma**2 + 1) ** 0.5) + latent_model_input = self.scheduler.scale_model_input(latent_model_input, t) diff - if isinstance(self.scheduler, LMSDiscreteScheduler): - latents = self.scheduler.step(noise_pred, i, latents, **extra_step_kwargs).prev_sample - else: - latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample + latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample

  • Pipeline callbacks. As a community project (h/t @jamestiotio!), diffusers pipelines can now invoke a callback function during generation, providing the latents at each step of the process. This makes it easier to perform tasks such as visualization, inspection, explainability and others the community may invent.

🛠️ More tasks

Building on top of the previous foundations, this release incorporates several new tasks that have been adapted from research papers or community projects. These include:

  • Textual inversion. Makes it possible to quickly train a new concept or style and incorporate it into the vocabulary of Stable Diffusion. Hundreds of people have already created theirs, and they can be shared and combined together. See the training Colab to get started.
  • Dreambooth. Similar goal to textual inversion, but instead of creating a new item in the vocabulary it fine-tunes the model to make it learn a new concept. Training Colab.
  • Negative prompts. Another community effort led by @shirayu. The Stable Diffusion pipeline can now receive both a positive prompt (the one you want to create), and a negative prompt (something you want to drive the model away from). This opens up a lot of creative possibilities!

🏃‍♀️ Under the hood changes to support better fine-tuning

Gradient checkpointing and 8-bit optimizers have been successfully applied to achieve Dreambooth fine-tuning in a Colab notebook! These updates will make it easier for diffusers to support general-purpose fine-tuning (coming soon!).

⚠️ Experimental: community pipelines

This is big, but it's still an experimental feature that may change in the future.

We are constantly amazed at the amount of imagination and creativity in the diffusers community, so we've made it easy to create custom pipelines and share them with others. You can write your own pipeline code, store it in 🤗 Hub, GitHub or your local filesystem and StableDiffusionPipeline.from_pretrained will be able to load and run it. Read more in the documentation.

We can't wait to see what new tasks the community creates!

💪 Quality of life fixes

Bug fixing, improved documentation, better tests are all important to ensure diffusers is a high-quality codebase, and we always spend a lot of effort working on them. Several first-time contributors have helped here, and we are very grateful for their efforts!

🙏 Significant community contributions

The following people have made significant contributions to the library over the last release:

  • @Victarry – Add training example for DreamBooth (#554)
  • @jamestiotio – Add callback parameters for Stable Diffusion pipelines (#521)
  • @jachiam – Allow resolutions that are not multiples of 64 (#505)
  • @johnowhitaker – Adding predoriginalsample to SchedulerOutput for some samplers (#614).
  • @keturn – Interesting discussions and insights on many topics.

✏️ Change list

  • [Docs] Correct links by @patrickvonplaten in #432
  • [Black] Update black by @patrickvonplaten in #433
  • use torch.matmul instead of einsum in attnetion. by @patil-suraj in #445
  • Renamed variables from single letter to better naming by @daspartho in #449
  • Docs: fix installation typo by @daspartho in #453
  • fix table formatting for stable diffusion pipeline doc (add blank line) by @natolambert in #471
  • update expected results of slow tests by @kashif in #268
  • [Flax] Make room for more frameworks by @patrickvonplaten in #494
  • Fix disable_attention_slicing in pipelines by @pcuenca in #498
  • Rename testscheduleroutputs_equivalence in model tests. by @pcuenca in #451
  • Scheduler docs update by @natolambert in #464
  • Fix scheduler inference steps error with power of 3 by @natolambert in #466
  • initial flax pndm schedular by @kashif in #492
  • Fix vae tests for cpu and gpu by @kashif in #480
  • [Docs] Add subfolder docs by @patrickvonplaten in #500
  • docs: bocken doc links for relative links by @jjmachan in #504
  • Removing .float() (autocast in fp16 will discard this (I think)). by @Narsil in #495
  • Fix MPS scheduler indexing when using mps by @pcuenca in #450
  • [CrossAttention] add different method for sliced attention by @patil-suraj in #446
  • Implement FlaxModelMixin by @mishig25 in #493
  • Karras VE, DDIM and DDPM flax schedulers by @kashif in #508
  • [UNet2DConditionModel, UNet2DModel] pass normnumgroups to all the blocks by @patil-suraj in #442
  • Add init_weights method to FlaxMixin by @mishig25 in #513
  • UNet Flax with FlaxModelMixin by @pcuenca in #502
  • Stable diffusion text2img conversion script. by @patil-suraj in #154
  • [CI] Add stalebot by @anton-l in #481
  • Fix isonnxavailable by @SkyTNT in #440
  • [Tests] Test attention.py by @sidthekidder in #368
  • Finally fix the image-based SD tests by @anton-l in #509
  • Remove the usage of numpy in up/down sample_2d by @ydshieh in #503
  • Fix typos and add Typo check GitHub Action by @shirayu in #483
  • Quick fix for the img2img tests by @anton-l in #530
  • [Tests] Fix spatial transformer tests on GPU by @anton-l in #531
  • [StableDiffusionInpaintPipeline] accept tensors for init and mask image by @patil-suraj in #439
  • adding more typehints to DDIM scheduler by @vishnu-anirudh in #456
  • Revert "adding more typehints to DDIM scheduler" by @patrickvonplaten in #533
  • Add LMSDiscreteSchedulerTest by @sidthekidder in #467
  • [Download] Smart downloading by @patrickvonplaten in #512
  • [Hub] Update hub version by @patrickvonplaten in #538
  • Unify offset configuration in DDIM and PNDM schedulers by @jonatanklosko in #479
  • [Configuration] Better logging by @patrickvonplaten in #545
  • make fixup support by @younesbelkada in #546
  • FlaxUNet2DConditionOutput @flax.struct.dataclass by @mishig25 in #550
  • [Flax] fix Flax scheduler by @kashif in #564
  • JAX/Flax safety checker by @pcuenca in #558
  • Flax: ignore dtype for configuration by @pcuenca in #565
  • Remove checktfutils to avoid an unnecessary TF import for now by @anton-l in #566
  • Fix _upsample_2d by @ydshieh in #535
  • [Flax] Add Vae for Stable Diffusion by @patrickvonplaten in #555
  • [Flax] Solve problem with VAE by @patrickvonplaten in #574
  • [Tests] Upload custom test artifacts by @anton-l in #572
  • [Tests] Mark the ncsnpp model tests as slow by @anton-l in #575
  • [examples/community] add CLIPGuidedStableDiffusion by @patil-suraj in #561
  • Fix CrossAttention._sliced_attention by @ydshieh in #563
  • Fix typos by @shirayu in #568
  • Add from_pt argument in .from_pretrained by @younesbelkada in #527
  • [FlaxAutoencoderKL] rename weights to align with PT by @patil-suraj in #584
  • Fix BaseOutput initialization from dict by @anton-l in #570
  • Add the K-LMS scheduler to the inpainting pipeline + tests by @anton-l in #587
  • [flax safety checker] Use FlaxPreTrainedModel for saving/loading by @patil-suraj in #591
  • FlaxDiffusionPipeline & FlaxStableDiffusionPipeline by @mishig25 in #559
  • [Flax] Fix unet and ddim scheduler by @patrickvonplaten in #594
  • Fix params replication when using the dummy checker by @pcuenca in #602
  • Allow dtype to be specified in Flax pipeline by @pcuenca in #600
  • Fix flax from_pretrained pytorch weight check by @mishig25 in #603
  • Mv weights name consts to diffusers.utils by @mishig25 in #605
  • Replace dropout_prob by dropout in vae by @younesbelkada in #595
  • Add smoke tests for the training examples by @anton-l in #585
  • Add torchvision to training deps by @anton-l in #607
  • Return Flax scheduler state by @pcuenca in #601
  • [ONNX] Collate the external weights, speed up loading from the hub by @anton-l in #610
  • docs: fix Berkeley ref by @ryanrussell in #611
  • Handle the PIL.Image.Resampling deprecation by @anton-l in #588
  • Make flax from_pretrained work with local subfolder by @mishig25 in #608
  • [flax] 'dtype' should not be part of self.internaldict by @mishig25 in #609
  • [UNet2DConditionModel] add gradient checkpointing by @patil-suraj in #461
  • docs: fix stochastic_karras_ve ref by @ryanrussell in #618
  • Adding predoriginalsample to SchedulerOutput for some samplers by @johnowhitaker in #614
  • docs: .md readability fixups by @ryanrussell in #619
  • Flax documentation by @younesbelkada in #589
  • fix docs: change sample to images by @AbdullahAlfaraj in #613
  • refactor: pipelines readability improvements by @ryanrussell in #622
  • Allow passing session_options for ORT backend by @cloudhan in #620
  • Fix breaking error: "ort is not defined" by @pcuenca in #626
  • docs: src/diffusers readability improvements by @ryanrussell in #629
  • Fix formula for noise levels in Karras scheduler and tests by @sgrigory in #627
  • [CI] Fix onnxruntime installation order by @anton-l in #633
  • Warning for too long prompts in DiffusionPipelines (Resolve #447) by @shirayu in #472
  • Fix docs link to train_unconditional.py by @AbdullahAlfaraj in #642
  • Remove deprecated torch_device kwarg by @pcuenca in #623
  • refactor: custom_init_isort readability fixups by @ryanrussell in #631
  • Remove inappropriate docstrings in LMS docstrings. by @pcuenca in #634
  • Flax pipeline pndm by @pcuenca in #583
  • Fix SpatialTransformer by @ydshieh in #578
  • Add training example for DreamBooth. by @Victarry in #554
  • [Pytorch] Pytorch only schedulers by @kashif in #534
  • [examples/dreambooth] don't pass tensor_format to scheduler. by @patil-suraj in #649
  • [dreambooth] update install section by @patil-suraj in #650
  • [DDIM, DDPM] fix add_noise by @patil-suraj in #648
  • [Pytorch] add dep. warning for pytorch schedulers by @kashif in #651
  • [CLIPGuidedStableDiffusion] remove set_format from pipeline by @patil-suraj in #653
  • Fix onnx tensor format by @anton-l in #654
  • Fix main: stable diffusion pipelines cannot be loaded by @pcuenca in #655
  • Fix the LMS pytorch regression by @anton-l in #664
  • Added script to save during textual inversion training. Issue 524 by @isamu-isozaki in #645
  • [CLIPGuidedStableDiffusion] take the correct text embeddings by @patil-suraj in #667
  • Update index.mdx by @tmabraham in #670
  • [examples] update transfomers version by @patil-suraj in #665
  • [gradient checkpointing] lower tolerance for test by @patil-suraj in #652
  • Flax from_pretrained: clean up mismatched_keys. by @pcuenca in #630
  • trained_betas ignored in some schedulers by @vishnu-anirudh in #635
  • Renamed x -> hidden_states in resnet.py by @daspartho in #676
  • Optimize Stable Diffusion by @NouamaneTazi in #371
  • Allow resolutions that are not multiples of 64 by @jachiam in #505
  • refactor: update ldm-bert config.json url closes #675 by @ryanrussell in #680
  • [docs] fix table in fp16.mdx by @NouamaneTazi in #683
  • Fix slow tests by @NouamaneTazi in #689
  • Fix BibText citation by @osanseviero in #693
  • Add callback parameters for Stable Diffusion pipelines by @jamestiotio in #521
  • [dreambooth] fix applying clipgradnorm_ by @patil-suraj in #686
  • Flax: add shape argument to set_timesteps by @pcuenca in #690
  • Fix type annotations on StableDiffusionPipeline.call by @tasercake in #682
  • Fix import with Flax but without PyTorch by @pcuenca in #688
  • [Support PyTorch 1.8] Remove inference mode by @patrickvonplaten in #707
  • [CI] Speed up slow tests by @anton-l in #708
  • [Utils] Add deprecate function and move testing_utils under utils by @patrickvonplaten in #659
  • Checkpoint conversion script from Diffusers => Stable Diffusion (CompVis) by @jachiam in #701
  • [Docs] fix docstring for issue #709 by @kashif in #710
  • Update schedulers README.md by @tmabraham in #694
  • add accelerate to load models with smaller memory footprint by @piEsposito in #361
  • Fix typos by @shirayu in #718
  • Add an argument "negative_prompt" by @shirayu in #549
  • Fix import if PyTorch is not installed by @pcuenca in #715
  • Remove comments no longer appropriate by @pcuenca in #716
  • [trainunconditional] fix applying clipgradnorm by @patil-suraj in #721
  • renamed x to meaningful variable in resnet.py by @i-am-epic in #677
  • [Tests] Add accelerate to testing by @patrickvonplaten in #729
  • [dreambooth] Using already created Path in dataset by @DrInfiniteExplorer in #681
  • Include CLIPTextModel parameters in conversion by @kanewallmann in #695
  • Avoid negative strides for tensors by @shirayu in #717
  • [Pytorch] pytorch only timesteps by @kashif in #724
  • [Scheduler design] The pragmatic approach by @anton-l in #719
  • Removing autocast for 35-25% speedup. (autocast considered harmful). by @Narsil in #511
  • No more useauthtoken=True by @patrickvonplaten in #733
  • remove useauthtoken from remaining places by @patil-suraj in #737
  • Replace messages that have empty backquotes by @pcuenca in #738
  • [Docs] Advertise fp16 instead of autocast by @patrickvonplaten in #740
  • remove useauthtoken from for TI test by @patil-suraj in #747
  • allow multiple generations per prompt by @patil-suraj in #741
  • Add back-compatibility to LMS timesteps by @anton-l in #750
  • update the clip guided PR according to the new API by @patil-suraj in #751
  • Raise an error when moving an fp16 pipeline to CPU by @anton-l in #749
  • Better steps deprecation for LMS by @anton-l in #753

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.3.0: New API, Stable Diffusion pipelines, low-memory inference, MPS backend, ONNX

:books: Shiny new docs!

Thanks to the community efforts for [Docs] and [Type Hints] we've started populating the Diffusers documentation pages with lots of helpful guides, links and API references.

:memo: New API & breaking changes

New API

Pipeline, Model, and Scheduler outputs can now be both dataclasses, Dicts, and Tuples:

python image = pipe("The red cat is sitting on a chair")["sample"][0]

is now replaced by:

```python image = pipe("The red cat is sitting on a chair").images[0]

or

image = pipe("The red cat is sitting on a chair")["image"][0]

or

image = pipe("The red cat is sitting on a chair")[0] ```

Similarly:

python sample = unet(...).sample and

python prev_sample = scheduler(...).prev_sample

is now possible!

🚨🚨🚨 Breaking change 🚨🚨🚨

This PR introduces breaking changes for the following public-facing methods:

  • VQModel.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latents = model.encode(...) to latents = model.encode(...)[0] or latents = model.encode(...).latens
  • VQModel.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sample
  • VQModel.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sample
  • AutoencoderKL.encode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change latent_dist = model.encode(...) to latent_dist = model.encode(...)[0] or latent_dist = model.encode(...).latent_dist
  • AutoencoderKL.decode -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model.decode(...) to sample = model.decode(...)[0] or sample = model.decode(...).sample
  • AutoencoderKL.forward -> we return a dict/dataclass instead of a single tensor. In the future it's very likely required to return more than just one tensor. Please make sure to change sample = model(...) to sample = model(...)[0] or sample = model(...).sample

:art: New Stable Diffusion pipelines

A couple of new pipelines have been added to Diffusers! We invite you to experiment with them, and to take them as inspiration to create your cool new tasks. These are the new pipelines:

  • Image-to-image generation. In addition to using a text prompt, this pipeline lets you include an example image to be used as the initial state of the process. 🤗 Diffuse the Rest is a cool demo about it!
  • Inpainting (experimental). You can provide an image and a mask and ask Stable Diffusion to replace the mask.

For more details about how they work, please visit our new API documentation.

This is a summary of all the Stable Diffusion tasks that can be easily used with 🤗 Diffusers:

| Pipeline | Tasks | Colab | Demo |---|---|:---:|:---:| | pipelinestablediffusion.py | Text-to-Image Generation | Open In Colab | 🤗 Stable Diffusion | pipelinestablediffusion_img2img.py | Image-to-Image Text-Guided Generation | Open In Colab | 🤗 Diffuse the Rest | pipelinestablediffusion_inpaint.py | ExperimentalText-Guided Image Inpainting | Open In Colab | Coming soon

:candy: Less memory usage for smaller GPUs

Now the diffusion models can take up significantly less VRAM (3.2 GB for Stable Diffusion) at the expense of 10% of speed thanks to the optimizations discussed in https://github.com/basujindal/stable-diffusion/pull/117.

To make use of the attention optimization, just enable it with .enable_attention_slicing() after loading the pipeline: ```python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torchdtype=torch.float16, useauthtoken=True ) pipe = pipe.to("cuda") pipe.enableattentionslicing() ```

This will allow many more users to play with Stable Diffusion in their own computers! We can't wait to see what new ideas and results will be created by the community!

:black_cat: Textual Inversion

Textual Inversion lets you personalize a Stable Diffusion model on your own images with just 3-5 samples.

GitHub: https://github.com/huggingface/diffusers/tree/main/examples/textualinversion Training: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sdtextualinversiontraining.ipynb Inference: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stableconceptualizerinference.ipynb

:apple: MPS backend for Apple Silicon

🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch mps device. You need to install PyTorch Preview (Nightly) on a Mac with M1 or M2 CPU, and then use the pipeline as usual:

```python from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-4", useauth_token=True) pipe = pipe.to("mps")

prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] ```

We are seeing great speedups (31s vs 214s in a M1 Max), but there are still a couple of limitations. We encourage you to read the documentation for the details.

:factory: Experimental ONNX exporter and pipeline for Stable Diffusion

We introduce a new (and experimental) Stable Diffusion pipeline compatible with the ONNX Runtime. This allows you to run Stable Diffusion on any hardware that supports ONNX (including a significant speedup on CPUs).

You need to use StableDiffusionOnnxPipeline instead of StableDiffusionPipeline. You also need to download the weights from the onnx branch of the repository, and indicate the runtime provider you want to use (CPU, in the following example):

```python from diffusers import StableDiffusionOnnxPipeline

pipe = StableDiffusionOnnxPipeline.frompretrained( "CompVis/stable-diffusion-v1-4", revision="onnx", provider="CPUExecutionProvider", useauth_token=True, )

prompt = "a photo of an astronaut riding a horse on mars" image = pipe(prompt).images[0] ```

:warning: Warning: the script above takes a long time to download the external ONNX weights, so it will be faster to convert the checkpoint yourself (see below).

To convert your own checkpoint, run the conversion script locally: bash python scripts/convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx" After that it can be loaded from the local path: python pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="CPUExecutionProvider")

Improvements and bugfixes

  • Mark in painting experimental by @patrickvonplaten in #430
  • Add config docs by @patrickvonplaten in #429
  • [Docs] Models by @kashif in #416
  • [Docs] Using diffusers by @patrickvonplaten in #428
  • [Outputs] Improve syntax by @patrickvonplaten in #423
  • Initial ONNX doc (TODO: Installation) by @pcuenca in #426
  • [Tests] Correct image folder tests by @patrickvonplaten in #427
  • [MPS] Make sure it doesn't break torch < 1.12 by @patrickvonplaten in #425
  • [ONNX] Stable Diffusion exporter and pipeline by @anton-l in #399
  • [Tests] Make image-based SD tests reproducible with fixed datasets by @anton-l in #424
  • [Docs] Outputs.mdx by @patrickvonplaten in #422
  • [Docs] Fix scheduler docs by @patrickvonplaten in #421
  • [Docs] DiffusionPipeline by @patrickvonplaten in #418
  • Improve unconditional diffusers example by @satpalsr in #414
  • Improve latent diff example by @satpalsr in #413
  • Inference support for mps device by @pcuenca in #355
  • [Docs] Minor fixes in optimization section by @patrickvonplaten in #420
  • [Docs] Pipelines for inference by @satpalsr in #417
  • [Docs] Training docs by @patrickvonplaten in #415
  • Docs: fp16 page by @pcuenca in #404
  • Add typing to schedulingsdeve: init, settimesteps, and setsigmas function definitions by @danielpatrickhug in #412
  • Docs fix some typos by @natolambert in #408
  • [docs sprint] schedulers docs, will update by @natolambert in #376
  • Docs: fix undefined in toctree by @natolambert in #406
  • Attention slicing by @patrickvonplaten in #407
  • Rename variables from single letter to meaningful name fix by @rashmimarganiatgithub in #395
  • Docs: Stable Diffusion pipeline by @pcuenca in #386
  • Small changes to Philosophy by @pcuenca in #403
  • karras-ve docs by @kashif in #401
  • Score sde ve doc by @kashif in #400
  • [Docs] Finish Intro Section by @patrickvonplaten in #402
  • [Docs] Quicktour by @patrickvonplaten in #397
  • ddim docs by @kashif in #396
  • Docs: optimization / special hardware by @pcuenca in #390
  • added pndm docs by @kashif in #391
  • Update text_inversion.mdx by @johnowhitaker in #393
  • [Docs] Logging by @patrickvonplaten in #394
  • [Pipeline Docs] ddpm docs for sprint by @kashif in #382
  • [Pipeline Docs] Unconditional Latent Diffusion by @satpalsr in #388
  • Docs: Conceptual section by @pcuenca in #392
  • [Pipeline Docs] Latent Diffusion by @patrickvonplaten in #377
  • [textual-inversion] fix saving embeds by @patil-suraj in #387
  • [Docs] Let's go by @patrickvonplaten in #385
  • Add colab links to textual inversion by @apolinario in #375
  • Efficient Attention by @patrickvonplaten in #366
  • Use expand instead of ones to broadcast tensor by @pcuenca in #373
  • [Tests] Fix SD slow tests by @anton-l in #364
  • [Type Hint] VAE models by @daspartho in #365
  • [Type hint] scheduling lms discrete by @santiviquez in #360
  • [Type hint] scheduling karras ve by @santiviquez in #359
  • type hints: models/vae.py by @shepherd1530 in #346
  • [Type Hints] DDIM pipelines by @sidthekidder in #345
  • [ModelOutputs] Replace dict outputs with Dict/Dataclass and allow to return tuples by @patrickvonplaten in #334
  • package version on main should have .dev0 suffix by @mishig25 in #354
  • [textualinversion] use tokenizer.addtokens to add placeholder_token by @patil-suraj in #357
  • [Type hint] scheduling ddim by @santiviquez in #343
  • [Type Hints] VAE models by @daspartho in #344
  • [Type Hint] DDPM schedulers by @daspartho in #349
  • [Type hint] PNDM schedulers by @daspartho in #335
  • Fix typo in unet_blocks.py by @da03 in #353
  • [Commands] Add env command by @patrickvonplaten in #352
  • Add transformers and scipy to dependency table by @patrickvonplaten in #348
  • [Type Hint] Unet Models by @sidthekidder in #330
  • [Img2Img2] Re-add K LMS scheduler by @patrickvonplaten in #340
  • Use ONNX / Core ML compatible method to broadcast by @pcuenca in #310
  • [Type hint] PNDM pipeline by @daspartho in #327
  • [Type hint] Latent Diffusion Uncond pipeline by @santiviquez in #333
  • Add contributions to README and re-order a bit by @patrickvonplaten in #316
  • [CI] try to fix GPU OOMs between tests and excessive tqdm logging by @anton-l in #323
  • README: stable diffusion version v1-3 -> v1-4 by @pcuenca in #331
  • Textual inversion by @patil-suraj in #266
  • [Type hint] Score SDE VE pipeline by @santiviquez in #325
  • [CI] Cancel pending jobs for PRs on new commits by @anton-l in #324
  • [train_unconditional] fix gradient accumulation. by @patil-suraj in #308
  • Fix nondeterministic tests for GPU runs by @anton-l in #314
  • Improve README to show how to use SD without an access token by @patrickvonplaten in #315
  • Fix flake8 F401 imported but unused by @anton-l in #317
  • Allow downloading of revisions for models. by @okalldal in #303
  • Fix more links by @python273 in #312
  • Changed variable name from "h" to "hidden_states" by @JC-swEng in #285
  • Fix stable-diffusion-seeds.ipynb link by @python273 in #309
  • [Tests] Add fast pipeline tests by @patrickvonplaten in #302
  • Improve README by @patrickvonplaten in #301
  • [Refactor] Remove set_seed by @patrickvonplaten in #289
  • [Stable Diffusion] Hotfix by @patrickvonplaten in #299
  • Check dummy file by @patrickvonplaten in #297
  • Add missing auth tokens for two SD tests by @anton-l in #296
  • Fix GPU tests (token + single-process) by @anton-l in #294
  • [PNDM Scheduler] format timesteps attrs to np arrays by @NouamaneTazi in #273
  • Fix link by @python273 in #286
  • [Type hint] Karras VE pipeline by @patrickvonplaten in #288
  • Add datasets + transformers + scipy to test deps by @anton-l in #279
  • Easily understandable error if inference steps not set before using scheduler by @samedii in #263)
  • [Docs] Add some guides by @patrickvonplaten in #276
  • [README] Add readme for SD by @patrickvonplaten in #274
  • Refactor Pipelines / Community pipelines and add better explanations. by @patrickvonplaten in #257
  • Refactor progress bar by @hysts in #242
  • Support K-LMS in img2img by @anton-l in #270
  • [BugFix]: Fixed add_noise in LMSDiscreteScheduler by @nicolas-dufour in #253
  • [Tests] Make sure tests are on GPU by @patrickvonplaten in #269
  • Adds missing torch imports to inpainting and imagetoimage example by @PulkitMishra in #265
  • Fix typo in README.md by @webel in #260
  • Fix inpainting script by @patil-suraj in #258
  • Initialize CI for code quality and testing by @anton-l in #256
  • add inpainting example script by @nagolinc in #241
  • Update README.md with examples by @natolambert in #252
  • Reproducible images by supplying latents to pipeline by @pcuenca in #247
  • Style the scripts directory by @anton-l in #250
  • Pin black==22.3 to keep a stable --preview flag by @anton-l in #249
  • [Clean up] Clean unused code by @patrickvonplaten in #245
  • added test workflow and fixed failing test by @kashif in #237
  • split testsmodelingutils by @kashif in #223
  • [example/image2image] raise error if strength is not in desired range by @patil-suraj in #238
  • Add image2image example script. by @patil-suraj in #231
  • Remove dead code in resnet.py by @ydshieh in #218

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @kashif
    • [Docs] Models (#416)
    • karras-ve docs (#401)
    • Score sde ve doc (#400)
    • ddim docs (#396)
    • added pndm docs (#391)
    • [Pipeline Docs] ddpm docs for sprint (#382)
    • added test workflow and fixed failing test (#237)
    • split testsmodelingutils (#223)

- Python
Published by anton-l over 3 years ago

diffusers - v0.2.4: Patch release

This patch release allows the Stable Diffusion pipelines to be loaded with float16 precision: python pipe = StableDiffusionPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ) pipe = pipe.to("cuda")

The resulting models take up less than 6900 MiB of GPU memory.

  • [Loading] allow modules to be loaded in fp16 by @patrickvonplaten in #230

- Python
Published by anton-l over 3 years ago

diffusers - v0.2.3: Stable Diffusion public release

:art: Stable Diffusion public release

The Stable Diffusion checkpoints are now public and can be loaded by anyone! :partying_face:

Make sure to accept the license terms on the model page first (requires login): https://huggingface.co/CompVis/stable-diffusion-v1-4 Install the required packages: pip install diffusers==0.2.3 transformers scipy And log in on your machine using the huggingface-cli login command.

```python from torch import autocast from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

this will substitute the default PNDM scheduler for K-LMS

lms = LMSDiscreteScheduler( betastart=0.00085, betaend=0.012, betaschedule="scaledlinear" )

pipe = StableDiffusionPipeline.frompretrained( "CompVis/stable-diffusion-v1-4", scheduler=lms, useauth_token=True ).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars" with autocast("cuda"): image = pipe(prompt)["sample"][0]

image.save("astronautrideshorse.png") ```

The safety checker

Following the model authors' guidelines and code, the Stable Diffusion inference results will now be filtered to exclude unsafe content. Any images classified as unsafe will be returned as blank. To check if the safety module is triggered programmaticaly, check the nsfw_content_detected flag like so:

python outputs = pipe(prompt) image = outputs if any(outputs["nsfw_content_detected"]): print("Potential unsafe content was detected in one or more images. Try again with a different prompt and/or seed.")

Improvements and bugfixes

  • add add_noise method in LMSDiscreteScheduler, PNDMScheduler by @patil-suraj in #227
  • hotfix for pdnm test by @natolambert in #220
  • Restore is_modelcards_available in .utils by @pcuenca in #224
  • Update README for 0.2.3 release by @pcuenca in #225
  • Pipeline to device by @pcuenca in #210
  • fix safety check by @patil-suraj in #217
  • Add safety module by @patil-suraj in #213
  • Support one-string prompts and custom image size in LDM by @anton-l in #212
  • Add is_torch_available, is_flax_available by @anton-l in #204
  • Revive make quality by @anton-l in #203
  • [StableDiffusionPipeline] use default params in call by @patil-suraj in #196
  • fix testfrompretrainedhubpass_model by @patil-suraj in #194
  • Match params with official Stable Diffusion lib by @apolinario in #192

Full Changelog: https://github.com/huggingface/diffusers/compare/v0.2.2...v0.2.3

- Python
Published by anton-l over 3 years ago

diffusers - v0.2.2

This patch release fixes an import of the StableDiffusionPipeline

[K-LMS Scheduler] fix import by @patrickvonplaten in #191

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.2.1 Patch release

This patch release fixes a small bug of the StableDiffusionPipeline

  • [Stable diffusion] Hot fix by @patrickvonplaten in 50a9ae

- Python
Published by patrickvonplaten over 3 years ago

diffusers - v0.2.0: Stable Diffusion early access, K-LMS sampling

Stable Diffusion

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It's trained on 512x512 images from a subset of the LAION-5B database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the model card for more information.

The Stable Diffusion weights are currently only available to universities, academics, research institutions and independent researchers. Please request access applying to this form

```python from torch import autocast from diffusers import StableDiffusionPipeline

make sure you're logged in with huggingface-cli login

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-3-diffusers", useauth_token=True)

prompt = "a photograph of an astronaut riding a horse" with autocast("cuda"): image = pipe(prompt, guidance_scale=7)["sample"][0] # image here is in PIL format

image.save(f"astronautrideshorse.png") ```

K-LMS sampling

The new LMSDiscreteScheduler is a port of k-lms from k-diffusion by Katherine Crowson. The scheduler can be easily swapped into existing pipelines like so:

```python from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler

model_id = "CompVis/stable-diffusion-v1-3-diffusers"

Use the K-LMS scheduler here instead

scheduler = LMSDiscreteScheduler(betastart=0.00085, betaend=0.012, betaschedule="scaledlinear", numtraintimesteps=1000) pipe = StableDiffusionPipeline.frompretrained(modelid, scheduler=scheduler, useauthtoken=True) ```

Integration test with text-to-image script of Stable-Diffusion

182 and #186 make sure that DDIM and PNDM/PLMS scheduler yield 1-to-1 the same results as stable diffusion.

Try it out yourself:

In Stable-Diffusion:

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code --plms or python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --n_samples 4 --n_iter 1 --fixed_code

In diffusers:

```py from diffusers import StableDiffusionPipeline, DDIMScheduler from time import time from PIL import Image from einops import rearrange import numpy as np import torch from torch import autocast from torchvision.utils import make_grid

torch.manual_seed(42)

prompt = "a photograph of an astronaut riding a horse"

prompt = "a photograph of the eiffel tower on the moon"

prompt = "an oil painting of a futuristic forest gives"

uncomment to use DDIM

scheduler = DDIMScheduler(betastart=0.00085, betaend=0.012, betaschedule="scaledlinear", clipsample=False, setalphatoone=False)

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-3-diffusers", useauth_token=True, scheduler=scheduler) # make sure you're logged in with huggingface-cli login

pipe = StableDiffusionPipeline.frompretrained("CompVis/stable-diffusion-v1-3-diffusers", useauth_token=True) # make sure you're logged in with huggingface-cli login

allimages = [] numrows = 1 numcolumns = 4 for _ in range(numrows): with autocast("cuda"): images = pipe(numcolumns * [prompt], guidancescale=7.5, outputtype="np")["sample"] # image here is in PIL format allimages.append(torch.from_numpy(images))

additionally, save as grid

grid = torch.stack(allimages, 0) grid = rearrange(grid, 'n b h w c -> (n b) h w c') grid = rearrange(grid, 'n h w c -> n c h w') grid = makegrid(grid, nrow=num_rows)

to image

grid = 255. * rearrange(grid, 'c h w -> h w c').cpu().numpy() image = Image.fromarray(grid.astype(np.uint8))

image.save(f"./images/diffusers/{''.join(prompt.split())}{round(time())}.png") ```

Improvements and bugfixes

  • Allow passing non-default modules to pipeline by @pcuenca in #188
  • Add K-LMS scheduler from k-diffusion by @anton-l in #185
  • [Naming] correct config naming of DDIM pipeline by @patrickvonplaten in #187
  • [PNDM] Stable diffusion by @patrickvonplaten in #186
  • [Half precision] Make sure half-precision is correct by @patrickvonplaten in #182
  • allow custom height, width in StableDiffusionPipeline by @patil-suraj in #179
  • add tests for stable diffusion pipeline by @patil-suraj in #178
  • Stable diffusion pipeline by @patil-suraj in #168
  • [LDM pipeline] fix eta condition. by @patil-suraj in #171
  • [PNDM in LDM pipeline] use inspect in pipeline instead of unused kwargs by @patil-suraj in #167
  • allow pndm scheduler to be used with ldm pipeline by @patil-suraj in #165
  • add scaled_linear schedule in PNDM and DDPM by @patil-suraj in #164
  • add attention up/down blocks for VAE by @patil-suraj in #161
  • Add an alternative Karras et al. stochastic scheduler for VE models by @anton-l in #160
  • [LDMTextToImagePipeline] make text model generic by @patil-suraj in #162
  • Minor typos by @pcuenca in #159
  • Fix arg key for dataset_name in create_model_card by @pcuenca in #158
  • [VAE] fix the downsample block in Encoder. by @patil-suraj in #156
  • [UNet2DConditionModel] add crossattentiondim as an argument by @patil-suraj in #155
  • Added diffusers to conda-forge and updated README for installation instruction by @sugatoray in #129
  • Add issue templates for feature requests and bug reports by @osanseviero in #153
  • Support training with a local image folder by @anton-l in #152
  • Allow DDPM scheduler to use model's predicated variance by @eyalmazuz in #132

Full Changelog: https://github.com/huggingface/diffusers/compare/0.1.3...v0.2.0

- Python
Published by anton-l over 3 years ago

diffusers - 0.1.3 Patch release

This patch releases refactors the model architecture of VQModel or AutoencoderKL including the weight naming. Therefore the official weights of the CompVis organization have been re-uploaded, see: - https://huggingface.co/CompVis/ldm-celebahq-256/commit/63b33cf3bbdd833de32080a8ba55ba4d0b111859 - https://huggingface.co/CompVis/ldm-celebahq-256/commit/03978f22272a3c2502da709c3940e227c9714bdd - https://huggingface.co/CompVis/ldm-text2im-large-256/commit/31ff4edafd3ee09656d2068d05a4d5338129d592 - https://huggingface.co/CompVis/ldm-text2im-large-256/commit/9bd2b48d2d45e6deb6fb5a03eb2a601e4b95bd91

Corresponding PR: https://github.com/huggingface/diffusers/pull/137

Please make sure to upgrade diffusers to have those models running correctly: pip install --upgrade diffusers

Bug fixes

  • Fix FileNotFoundError: 'model_card_template.md' https://github.com/huggingface/diffusers/pull/136

- Python
Published by patrickvonplaten over 3 years ago

diffusers - Initial release of 🧨 Diffusers

These are the release notes of the 🧨 Diffusers library

Introducing Hugging Face's new library for diffusion models.

Diffusion models proved themselves very effective in artificial synthesis, even beating GANs for images. Because of that, they gained traction in the machine learning community and play an important role for systems like DALL-E 2 or Imagen to generate photorealistic images when prompted on text.

While the most prolific successes of diffusion models have been in the computer vision community, these models have also achieved remarkable results in other domains, such as:

and more.

Goals

The goals of diffusers are:

  • to centralize the research of diffusion models from independent repositories to a clear and maintained project,
  • to reproduce high impact machine learning systems such as DALLE and Imagen in a manner that is accessible for the public, and
  • to create an easy to use API that enables one to train their own models or re-use checkpoints from other repositories for inference.

Release overview

Quickstart: - For a light walk-through of the library, please have a look at the Official 🧨 Diffusers Notebook. - To directly jump into training a diffusion model yourself, please have a look at the Training Diffusers Notebook

Diffusers aims to be a modular toolbox for diffusion techniques, with a focus the following categories:

:bullettrain_side: Inference pipelines

Inference pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box. The goal is for them to stick as close as possible to their original implementation, and they can include components of other libraries (such as text encoders).

The original release contains the following pipelines:

We are currently working on enabling other pipelines for different modalities. The following pipelines are expected to land in a subsequent release:

  • BDDMPipeline for spectrogram-to-sound vocoding
  • GLIDEPipeline to support OpenAI's GLIDE model
  • Grad-TTS for text to audio generation / conditional audio generation
  • A reinforcement learning pipeline (happening in https://github.com/huggingface/diffusers/pull/105)

:alarm_clock: Schedulers

  • Schedulers are the algorithms to use diffusion models in inference as well as for training. They include the noise schedules and define algorithm-specific diffusion steps.
  • Schedulers can be used interchangable between diffusion models in inference to find the preferred tradef-off between speed and generation quality.
  • Schedulers are available in numpy, but can easily be transformed into PyTorch.

The goal is for each scheduler to provide one or more step() functions that should be called iteratively to unroll the diffusion loop during the forward pass. They are framework agnostic, but offer conversion methods which should allow easy conversion to PyTorch utilities.

The initial release contains the following schedulers:

:factory: Models

Models are hosted in the src/diffusers/models folder.

For the initial release, you'll get to see a few building blocks, as well as some resulting models:

  • UNet2DModel can be seen as a version of the recent UNet architectures as shown in recent papers. It can be seen as the unconditional version of the UNet model, in opposition to the conditional version that follows below.
  • UNet2DConditionModel is similar to the UNet2DModel, but is conditional: it uses the cross-attention mechanism in order to have skip connections in its downsample and upsample layers. These cross-attentions can be fed by other models. An example of a pipeline using a conditional UNet model is the latent diffusion pipeline.
  • AutoencoderKL and VQModel are still experimental models that are prone to breaking changes in the near future. However, they can already be used as part of the Latent Diffusion pipelines.

:pagewithcurl: Training example

The first release contains a dataset-agnostic unconditional example and a training notebook:

Credits

This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:

  • @CompVis' latent diffusion models library, available here
  • @hojonathanho original DDPM implementation, available here as well as the extremely useful translation into PyTorch by @pesser, available here
  • @ermongroup's DDIM implementation, available here.
  • @yang-song's Score-VE and Score-VP implementations, available here

We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available here.

- Python
Published by LysandreJik over 3 years ago