collection-of-stable-diffusion-test-time-plugins
Performing Layout-Free Spatial Compositions for Text-to-Image Diffusion Models
https://github.com/maitreyapatel/collection-of-stable-diffusion-test-time-plugins
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Repository
Performing Layout-Free Spatial Compositions for Text-to-Image Diffusion Models
Basic Info
- Host: GitHub
- Owner: Maitreyapatel
- License: mit
- Language: Python
- Default Branch: main
- Size: 229 KB
Statistics
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 2
- Releases: 1
Metadata Files
README.md
Collection of Stable Diffusion Test-time Plugins
Stable Diffusion cannot perform well on various compositions. It observes attribute leakage, missing objects, and issues with spatial understanding. Several works have focused on introducing test-time plugins (e.g., Attend-And-Excite, Layout-Guidance, etc.) to effectively control image generation. While the other set of work focuses on introducing new layers to control the Stable Diffusion (e.g., ControlNet, GLIGEN).
However, the question remains How to improve the Stable Diffusion without any additional information? Therefore, this repository focuses on first understanding the limitations of the current pre-training method and then introducing the new pre-training strategy. Currently, the repository contains several test-time baseline methodologies along with an object-proposal-based LoRA fine-tuning strategy.
Contributions are welcome! If interested, reach out to Maitreya via mpatel57@asu.edu.
Note: HuggingFace diffusers is a great library with many of the presented pipelines. However, it is difficult for researchers to modify the existing pipelines in the backend to understand what's going on behind the scenes. This repository wants to bridge this gap to also inspire the research on Stable Diffusion plugins.
Supported Features
Baselines and repository setup related features:
- [x] Setup initial attention store
- [x] Add Attend-and-Excite
- [x] Add Composable Diffusion Models
- [x] Add training-free layout guided inference with attention aggregation methods -
<aggregate_attention, all_attention, aggregate_layer_attention> - [x] Add CAR+SAR based layout guided inference
- [ ] Add support to LLM-based layout generation
Additional training-time features:
- [x] Add biased sampling -- COSINE
- [x] Fine-tune whole UNet
- [x] LoRA based fine-tuning
- [ ] Orthogonal fine-tuning
How to run the experiments?
Installation
```bash conda create LSDGen python=3.8 conda activate LSDGen
pip install -r requirements.txt ```
Run baseline experiment:
For more details on "Attend & Excite", and "Layout Guidance" config requirements, visit: Config ```bash
for attend-and-excite
python main.py --expname=aae --aae.prompt="a dog and a cat" --aae.tokenindices [2,5] --aae.seeds [42]
for composable-diffusion-models
python main.py --expname=cdm --cdm.prompt="a dog and a cat" --cdm.prompta="a dog" --cdm.prompt_b="a cat" --cdm.seeds [42]
for layout-guidance
python main.py --expname=lg --lg.seeds=[42] --lg.prompt="an apple to the right of the dog." --lg.phrases="dog;apple" --lg.boundingbox="[[[0.1, 0.2, 0.5, 0.8]],[[0.75, 0.6, 0.95, 0.8]]]" --lg.attentionaggregationmethod="aggregate_attention"
for attention refocus
python main.py --expname=af --af.seeds=[42] --af.prompt="an apple to the right of the dog." --af.phrases="dog;apple" --af.boundingbox="[[[0.1, 0.2, 0.5, 0.8]],[[0.75, 0.6, 0.95, 0.8]]]" ```
Custom trainer
To fine-tune the stable diffusion model run the following command (under development):
```bash
bash script defining all parameters
bash ./scripts/train.sh
bash script for LoRA-based fine-tuning
bash ./scripts/train_lora.sh
Alternatively define the parameters manually
export MODELNAME="CompVis/stable-diffusion-v1-4" export PKLPATH="data/cocodata.pkl" # a pre-processed sample pickle file (reach out for access) export INSTANCEDIR="/data/data/matt/datasets/VGENOME" export OUTPUTDIR="logs/masktrain_10k"
Change Cuda device as needed
CUDAVISIBLEDEVICES=0 python main.py --expname="train" \ --train.pretrainedmodelnameorpath=$MODELNAME \ --train.instancepklpath=$PKLPATH \ --train.instancedatadir=$INSTANCEDIR \ --train.outputdir=$OUTPUTDIR \ --train.traintextencoder=False \ --train.resolution=512 \ --train.trainbatchsize=1 \ # !!!! The current version only supports single-batch size --train.gradientaccumulationsteps=1 \ --train.learningrate=5e-6 \ --train.lrscheduler="constant" \ --train.lrwarmupsteps=0 \ --train.maxtrainsteps=10000 \ --train.checkpointingsteps=5000 \ --train.regularizer="lg" \ --train.regularizerweight=5.0 \ --debugme=True # only pass if you want to perform debugging ```
Currently supported tasks:
- Attend-and-Excite ("aae")
- Layout Guided inference ("lg") -- attention aggregation methods -
<aggregate_attention, all_attention, aggregate_layer_attention> - Attention Refocus ("af")
- Composable Diffusion Models ("cdm")
Acknowledgement
This repository is build after diffusers, Attend-and-Excite, and Training-Free Layout Control with Cross-Attention Guidance.
Owner
- Name: Maitreya Patel
- Login: Maitreyapatel
- Kind: user
- Location: Tempe, Arizona, USA
- Website: maitreyapatel.com
- Twitter: patelmaitreya
- Repositories: 5
- Profile: https://github.com/Maitreyapatel
Vision ⇌ Language
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Patel" given-names: "Maitreya" - family-names: "Vengurlekar" given-names: "Omkar" title: "Collection of Stable Diffusion Test time Plugins " version: 0.0.1 doi: 10.5281/zenodo.8329771 date-released: 2023-09-08 url: "https://github.com/Maitreyapatel/Collection-of-Stable-Diffusion-Test-time-Plugins"
GitHub Events
Total
Last Year
Dependencies
- accelerate *
- coloredlogs *
- diffusers ==0.19.1
- ftfy *
- ipywidgets *
- matplotlib *
- opencv-python *
- pyrallis *
- torch ==1.13.1
- transformers *
- wandb *