collection-of-stable-diffusion-test-time-plugins

Performing Layout-Free Spatial Compositions for Text-to-Image Diffusion Models

https://github.com/maitreyapatel/collection-of-stable-diffusion-test-time-plugins

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Performing Layout-Free Spatial Compositions for Text-to-Image Diffusion Models

Basic Info

Host: GitHub
Owner: Maitreyapatel
License: mit
Language: Python
Default Branch: main
Size: 229 KB

Statistics

Stars: 3
Watchers: 2
Forks: 0
Open Issues: 2
Releases: 1

Created about 3 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License Citation

Collection of Stable Diffusion Test-time Plugins

Stable Diffusion cannot perform well on various compositions. It observes attribute leakage, missing objects, and issues with spatial understanding. Several works have focused on introducing test-time plugins (e.g., Attend-And-Excite, Layout-Guidance, etc.) to effectively control image generation. While the other set of work focuses on introducing new layers to control the Stable Diffusion (e.g., ControlNet, GLIGEN).

However, the question remains How to improve the Stable Diffusion without any additional information? Therefore, this repository focuses on first understanding the limitations of the current pre-training method and then introducing the new pre-training strategy. Currently, the repository contains several test-time baseline methodologies along with an object-proposal-based LoRA fine-tuning strategy.

Contributions are welcome! If interested, reach out to Maitreya via mpatel57@asu.edu.

Note: HuggingFace diffusers is a great library with many of the presented pipelines. However, it is difficult for researchers to modify the existing pipelines in the backend to understand what's going on behind the scenes. This repository wants to bridge this gap to also inspire the research on Stable Diffusion plugins.

Supported Features

Baselines and repository setup related features:

[x] Setup initial attention store
[x] Add Attend-and-Excite
[x] Add Composable Diffusion Models
[x] Add training-free layout guided inference with attention aggregation methods - <aggregate_attention, all_attention, aggregate_layer_attention>
[x] Add CAR+SAR based layout guided inference
[ ] Add support to LLM-based layout generation

Additional training-time features:

[x] Add biased sampling -- COSINE
[x] Fine-tune whole UNet
[x] LoRA based fine-tuning
[ ] Orthogonal fine-tuning

How to run the experiments?

Installation

```bash conda create LSDGen python=3.8 conda activate LSDGen

pip install -r requirements.txt ```

Run baseline experiment:

For more details on "Attend & Excite", and "Layout Guidance" config requirements, visit: Config ```bash

for attend-and-excite

python main.py --expname=aae --aae.prompt="a dog and a cat" --aae.tokenindices [2,5] --aae.seeds [42]

for composable-diffusion-models

python main.py --expname=cdm --cdm.prompt="a dog and a cat" --cdm.prompta="a dog" --cdm.prompt_b="a cat" --cdm.seeds [42]

for layout-guidance

python main.py --expname=lg --lg.seeds=[42] --lg.prompt="an apple to the right of the dog." --lg.phrases="dog;apple" --lg.boundingbox="[[[0.1, 0.2, 0.5, 0.8]],[[0.75, 0.6, 0.95, 0.8]]]" --lg.attentionaggregationmethod="aggregate_attention"

for attention refocus

python main.py --expname=af --af.seeds=[42] --af.prompt="an apple to the right of the dog." --af.phrases="dog;apple" --af.boundingbox="[[[0.1, 0.2, 0.5, 0.8]],[[0.75, 0.6, 0.95, 0.8]]]" ```

Custom trainer

To fine-tune the stable diffusion model run the following command (under development):

```bash

bash script defining all parameters

bash ./scripts/train.sh

bash script for LoRA-based fine-tuning

bash ./scripts/train_lora.sh

Alternatively define the parameters manually

export MODELNAME="CompVis/stable-diffusion-v1-4" export PKLPATH="data/cocodata.pkl" # a pre-processed sample pickle file (reach out for access) export INSTANCEDIR="/data/data/matt/datasets/VGENOME" export OUTPUTDIR="logs/masktrain_10k"

Change Cuda device as needed

CUDAVISIBLEDEVICES=0 python main.py --expname="train" \ --train.pretrainedmodelnameorpath=$MODELNAME \ --train.instancepklpath=$PKLPATH \ --train.instancedatadir=$INSTANCEDIR \ --train.outputdir=$OUTPUTDIR \ --train.traintextencoder=False \ --train.resolution=512 \ --train.trainbatchsize=1 \ # !!!! The current version only supports single-batch size --train.gradientaccumulationsteps=1 \ --train.learningrate=5e-6 \ --train.lrscheduler="constant" \ --train.lrwarmupsteps=0 \ --train.maxtrainsteps=10000 \ --train.checkpointingsteps=5000 \ --train.regularizer="lg" \ --train.regularizerweight=5.0 \ --debugme=True # only pass if you want to perform debugging ```

Currently supported tasks:

Attend-and-Excite ("aae")
Layout Guided inference ("lg") -- attention aggregation methods - <aggregate_attention, all_attention, aggregate_layer_attention>
Attention Refocus ("af")
Composable Diffusion Models ("cdm")

Acknowledgement

This repository is build after diffusers, Attend-and-Excite, and Training-Free Layout Control with Cross-Attention Guidance.

Owner

Name: Maitreya Patel
Login: Maitreyapatel
Kind: user
Location: Tempe, Arizona, USA

Website: maitreyapatel.com
Twitter: patelmaitreya
Repositories: 5
Profile: https://github.com/Maitreyapatel

Vision ⇌ Language

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Patel"
  given-names: "Maitreya"
- family-names: "Vengurlekar"
  given-names: "Omkar"
title: "Collection of Stable Diffusion Test time Plugins "
version: 0.0.1
doi: 10.5281/zenodo.8329771
date-released: 2023-09-08
url: "https://github.com/Maitreyapatel/Collection-of-Stable-Diffusion-Test-time-Plugins"

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

accelerate *
coloredlogs *
diffusers ==0.19.1
ftfy *
ipywidgets *
matplotlib *
opencv-python *
pyrallis *
torch ==1.13.1
transformers *
wandb *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science