collection-of-stable-diffusion-test-time-plugins

Performing Layout-Free Spatial Compositions for Text-to-Image Diffusion Models

https://github.com/maitreyapatel/collection-of-stable-diffusion-test-time-plugins

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Performing Layout-Free Spatial Compositions for Text-to-Image Diffusion Models

Basic Info
  • Host: GitHub
  • Owner: Maitreyapatel
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 229 KB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 0
  • Open Issues: 2
  • Releases: 1
Created about 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Collection of Stable Diffusion Test-time Plugins

DOI

Stable Diffusion cannot perform well on various compositions. It observes attribute leakage, missing objects, and issues with spatial understanding. Several works have focused on introducing test-time plugins (e.g., Attend-And-Excite, Layout-Guidance, etc.) to effectively control image generation. While the other set of work focuses on introducing new layers to control the Stable Diffusion (e.g., ControlNet, GLIGEN).

However, the question remains How to improve the Stable Diffusion without any additional information? Therefore, this repository focuses on first understanding the limitations of the current pre-training method and then introducing the new pre-training strategy. Currently, the repository contains several test-time baseline methodologies along with an object-proposal-based LoRA fine-tuning strategy.

Contributions are welcome! If interested, reach out to Maitreya via mpatel57@asu.edu.

Note: HuggingFace diffusers is a great library with many of the presented pipelines. However, it is difficult for researchers to modify the existing pipelines in the backend to understand what's going on behind the scenes. This repository wants to bridge this gap to also inspire the research on Stable Diffusion plugins.

Supported Features

Baselines and repository setup related features:

  • [x] Setup initial attention store
  • [x] Add Attend-and-Excite
  • [x] Add Composable Diffusion Models
  • [x] Add training-free layout guided inference with attention aggregation methods - <aggregate_attention, all_attention, aggregate_layer_attention>
  • [x] Add CAR+SAR based layout guided inference
  • [ ] Add support to LLM-based layout generation

Additional training-time features:

  • [x] Add biased sampling -- COSINE
  • [x] Fine-tune whole UNet
  • [x] LoRA based fine-tuning
  • [ ] Orthogonal fine-tuning

How to run the experiments?

Installation

```bash conda create LSDGen python=3.8 conda activate LSDGen

pip install -r requirements.txt ```

Run baseline experiment:

For more details on "Attend & Excite", and "Layout Guidance" config requirements, visit: Config ```bash

for attend-and-excite

python main.py --expname=aae --aae.prompt="a dog and a cat" --aae.tokenindices [2,5] --aae.seeds [42]

for composable-diffusion-models

python main.py --expname=cdm --cdm.prompt="a dog and a cat" --cdm.prompta="a dog" --cdm.prompt_b="a cat" --cdm.seeds [42]

for layout-guidance

python main.py --expname=lg --lg.seeds=[42] --lg.prompt="an apple to the right of the dog." --lg.phrases="dog;apple" --lg.boundingbox="[[[0.1, 0.2, 0.5, 0.8]],[[0.75, 0.6, 0.95, 0.8]]]" --lg.attentionaggregationmethod="aggregate_attention"

for attention refocus

python main.py --expname=af --af.seeds=[42] --af.prompt="an apple to the right of the dog." --af.phrases="dog;apple" --af.boundingbox="[[[0.1, 0.2, 0.5, 0.8]],[[0.75, 0.6, 0.95, 0.8]]]" ```

Custom trainer

To fine-tune the stable diffusion model run the following command (under development):

```bash

bash script defining all parameters

bash ./scripts/train.sh

bash script for LoRA-based fine-tuning

bash ./scripts/train_lora.sh

Alternatively define the parameters manually

export MODELNAME="CompVis/stable-diffusion-v1-4" export PKLPATH="data/cocodata.pkl" # a pre-processed sample pickle file (reach out for access) export INSTANCEDIR="/data/data/matt/datasets/VGENOME" export OUTPUTDIR="logs/masktrain_10k"

Change Cuda device as needed

CUDAVISIBLEDEVICES=0 python main.py --expname="train" \ --train.pretrainedmodelnameorpath=$MODELNAME \ --train.instancepklpath=$PKLPATH \ --train.instancedatadir=$INSTANCEDIR \ --train.outputdir=$OUTPUTDIR \ --train.traintextencoder=False \ --train.resolution=512 \ --train.trainbatchsize=1 \ # !!!! The current version only supports single-batch size --train.gradientaccumulationsteps=1 \ --train.learningrate=5e-6 \ --train.lrscheduler="constant" \ --train.lrwarmupsteps=0 \ --train.maxtrainsteps=10000 \ --train.checkpointingsteps=5000 \ --train.regularizer="lg" \ --train.regularizerweight=5.0 \ --debugme=True # only pass if you want to perform debugging ```

Currently supported tasks:

  • Attend-and-Excite ("aae")
  • Layout Guided inference ("lg") -- attention aggregation methods - <aggregate_attention, all_attention, aggregate_layer_attention>
  • Attention Refocus ("af")
  • Composable Diffusion Models ("cdm")

Acknowledgement

This repository is build after diffusers, Attend-and-Excite, and Training-Free Layout Control with Cross-Attention Guidance.

Owner

  • Name: Maitreya Patel
  • Login: Maitreyapatel
  • Kind: user
  • Location: Tempe, Arizona, USA

Vision ⇌ Language

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Patel"
  given-names: "Maitreya"
- family-names: "Vengurlekar"
  given-names: "Omkar"
title: "Collection of Stable Diffusion Test time Plugins "
version: 0.0.1
doi: 10.5281/zenodo.8329771
date-released: 2023-09-08
url: "https://github.com/Maitreyapatel/Collection-of-Stable-Diffusion-Test-time-Plugins"

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • accelerate *
  • coloredlogs *
  • diffusers ==0.19.1
  • ftfy *
  • ipywidgets *
  • matplotlib *
  • opencv-python *
  • pyrallis *
  • torch ==1.13.1
  • transformers *
  • wandb *