tomebrush

인페인팅 기법에서 masking에 따른 token merging 비율 조정

https://github.com/pinht126/tomebrush

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

인페인팅 기법에서 masking에 따른 token merging 비율 조정

Basic Info

Host: GitHub
Owner: pinht126
License: other
Language: Python
Default Branch: main
Homepage:
Size: 22.9 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Contributing License Code of conduct Citation

Token Merging for BrushNet

BrushNet is a diffusion-based text-guided image inpainting model designed with reference to the ControlNet architecture. The inpainting model generates new content within the masked regions based on the text input, while preserving the non-masked regions to remain identical to the input image.

In practice, the model first generates the entire image such that the non-masked regions closely resemble the input, and then overlays the inpainted result on top of the original image using the mask(blending operation).

This raises the question: is generating the non-masked regions necessary?

While it may seem redundant, the generation of the non-masked regions is not meaningless. Changes in these areas can affect the overall composition and coherence of the final image. However, their importance is not as critical as that of the masked regions.

Therefore, we aim to reduce the computational overhead for the non-masked regions using Token Merging for Stable Diffusion. Token Merging, which merges 50% of tokens, has been shown to reduce computation without sacrificing generation quality. By applying a higher merging ratio to the non-mask regions, we can significantly reduce the overall computational load.

Based on this idea, we apply a region-aware token merging strategy to the inpainting model, assigning different merging ratios to masked and non-masked areas.

🚀 Getting Started

Environment Requirement 🌍

The environment of ToMeBrush is largely similar to that of BrushNet.

BrushNet has been implemented and tested on Pytorch 1.12.1 with python 3.9.

Clone the repo:

git clone https://github.com/TencentARC/BrushNet.git

We recommend you first use conda to create virtual environment, and install pytorch following official instructions. For example:

conda create -n diffusers python=3.9 -y conda activate diffusers python -m pip install --upgrade pip pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116

Then, you can install diffusers (implemented in this repo) with:

pip install -e .

After that, you can install required packages thourgh:

cd examples/brushnet/ pip install -r requirements.txt pip install tomesd

Data Download ⬇️

Dataset

We use the same dataset as BrushNet.

You can download the BrushData and BrushBench here (as well as the EditBench we re-processed), which are used for training and testing the BrushNet. By downloading the data, you are agreeing to the terms and conditions of the license. The data structure should be like:

Noted: We only provide a part of the BrushData in google drive due to the space limit. random123123 has helped upload a full dataset on hugging face here. Thank for his help!

Checkpoints

Checkpoints of BrushNet can be downloaded from here. The ckpt folder contains

BrushNet pretrained checkpoints for Stable Diffusion v1.5 (segmentation_mask_brushnet_ckpt and random_mask_brushnet_ckpt)
pretrinaed Stable Diffusion v1.5 checkpoint (e.g., realisticVisionV60B1v51VAE from Civitai). You can use `scripts/convertoriginalstablediffusiontodiffusers.py` to process other models downloaded from Civitai.
BrushNet pretrained checkpoints for Stable Diffusion XL (segmentation_mask_brushnet_ckpt_sdxl_v1 and random_mask_brushnet_ckpt_sdxl_v0). A better version will be shortly released by yuanhang. Please stay tuned!
pretrinaed Stable Diffusion XL checkpoint (e.g., juggernautXLjuggernautX from Civitai). You can use `StableDiffusionXLPipeline.fromsinglefile("path of safetensors").savepretrained("path to save",safe_serialization=False)` to process other models downloaded from Civitai.

The data structure should be like:

The checkpoint in segmentation_mask_brushnet_ckpt and segmentation_mask_brushnet_ckpt_sdxl_v0 provide checkpoints trained on BrushData, which has segmentation prior (mask are with the same shape of objects). The random_mask_brushnet_ckpt and random_mask_brushnet_ckpt_sdxl provide a more general ckpt for random mask shape.

🏃🏼 Running Scripts

Inference 📜

You can inference with the script:

```

sd v1.5

python examples/brushnet/test_brushnet.py ```

Since BrushNet is trained on Laion, it can only guarantee the performance on general scenarios. We recommend you train on your own data (e.g., product exhibition, virtual try-on) if you have high-quality industrial application requirements. We would also be appreciate if you would like to contribute your trained model!

You can also inference through gradio demo:

```

sd v1.5

python examples/brushnet/app_brushnet.py ```

Evaluation 📏

You can evaluate using the script:

python examples/brushnet/evaluate_brushnet.py \ --brushnet_ckpt_path data/ckpt/segmentation_mask_brushnet_ckpt \ --image_save_path runs/evaluation_result/BrushBench/brushnet_segmask/inside \ --mapping_file data/BrushBench/mapping_file.json \ --base_dir data/BrushBench \ --mask_key inpainting_mask

The --mask_key indicates which kind of mask to use, inpainting_mask for inside inpainting and outpainting_mask for outside inpainting. The evaluation results (images and metrics) will be saved in --image_save_path.

Noted that you need to ignore the nsfw detector in src/diffusers/pipelines/brushnet/pipeline_brushnet.py#1261 to get the correct evaluation results. Moreover, we find different machine may generate different images, thus providing the results on our machine here.

Result (ToMeBrush)

Comparison of results without applying the blending operation Our model demonstrated superior performance compared to existing models on BrushBench in all metrics except for LPIPS. Under the SDXL framework, higher resolutions exhibited better efficiency in terms of memory usage and speed. Compared to BrushNet, ToMeBrush achieves a 5.86% improvement in inference speed, reduces the average processing time by 0.29 seconds, and lowers memory usage by 0.06 GB.

Owner

Name: Jiwon Heo
Login: pinht126
Kind: user

Repositories: 1
Profile: https://github.com/pinht126

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'Diffusers: State-of-the-art diffusion models'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Patrick
    family-names: von Platen
  - given-names: Suraj
    family-names: Patil
  - given-names: Anton
    family-names: Lozhkov
  - given-names: Pedro
    family-names: Cuenca
  - given-names: Nathan
    family-names: Lambert
  - given-names: Kashif
    family-names: Rasul
  - given-names: Mishig
    family-names: Davaadorj
  - given-names: Thomas
    family-names: Wolf
repository-code: 'https://github.com/huggingface/diffusers'
abstract: >-
  Diffusers provides pretrained diffusion models across
  multiple modalities, such as vision and audio, and serves
  as a modular toolbox for inference and training of
  diffusion models.
keywords:
  - deep-learning
  - pytorch
  - image-generation
  - hacktoberfest
  - diffusion
  - text2image
  - image2image
  - score-based-generative-modeling
  - stable-diffusion
  - stable-diffusion-diffusers
license: Apache-2.0
version: 0.12.1

GitHub Events

Total

Watch event: 1
Push event: 3
Public event: 1

Last Year

Watch event: 1
Push event: 3
Public event: 1

Dependencies

.github/actions/setup-miniconda/action.yml actions

actions/cache v2 composite

.github/workflows/benchmark.yml actions

actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/build_docker_images.yml actions

actions/checkout v3 composite
docker/build-push-action v3 composite
docker/login-action v2 composite
slackapi/slack-github-action 6c661ce58804a1a20f6dc5fbee7f0381b469e001 composite

.github/workflows/build_documentation.yml actions

.github/workflows/build_pr_documentation.yml actions

.github/workflows/nightly_tests.yml actions

./.github/actions/setup-miniconda * composite
actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/pr_dependency_test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/pr_flax_dependency_test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/pr_test_fetcher.yml actions

actions/checkout v3 composite
actions/upload-artifact v3 composite
actions/upload-artifact v2 composite

.github/workflows/pr_test_peft_backend.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/pr_tests.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
actions/upload-artifact v2 composite

.github/workflows/pr_torch_dependency_test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/push_tests.yml actions

actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/push_tests_fast.yml actions

actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/push_tests_mps.yml actions

./.github/actions/setup-miniconda * composite
actions/checkout v3 composite
actions/upload-artifact v2 composite

.github/workflows/stale.yml actions

actions/checkout v2 composite
actions/setup-python v1 composite

.github/workflows/typos.yml actions

actions/checkout v3 composite
crate-ci/typos v1.12.4 composite

.github/workflows/upload_pr_documentation.yml actions

docker/diffusers-flax-cpu/Dockerfile docker

ubuntu 20.04 build

docker/diffusers-flax-tpu/Dockerfile docker

ubuntu 20.04 build

docker/diffusers-onnxruntime-cpu/Dockerfile docker

ubuntu 20.04 build

docker/diffusers-onnxruntime-cuda/Dockerfile docker

nvidia/cuda 12.1.0-runtime-ubuntu20.04 build

docker/diffusers-pytorch-compile-cuda/Dockerfile docker

nvidia/cuda 12.1.0-runtime-ubuntu20.04 build

docker/diffusers-pytorch-cpu/Dockerfile docker

ubuntu 20.04 build

docker/diffusers-pytorch-cuda/Dockerfile docker

nvidia/cuda 12.1.0-runtime-ubuntu20.04 build

docker/diffusers-pytorch-xformers-cuda/Dockerfile docker

nvidia/cuda 12.1.0-runtime-ubuntu20.04 build

examples/advanced_diffusion_training/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
peft ==0.7.0
tensorboard *
torchvision *
transformers >=4.25.1

examples/brushnet/requirements.txt pypi

Pillow ==9.5.0
accelerate ==0.20.3
clip *
datasets *
ftfy *
gradio ==3.50.0
hpsv2 *
image-reward *
imgaug *
open-clip-torch *
opencv-python *
segment_anything *
tensorboard *
torchmetrics *
torchvision *
transformers >=4.25.1

examples/consistency_distillation/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1
webdataset *

examples/controlnet/requirements.txt pypi

accelerate >=0.16.0
datasets *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/controlnet/requirements_flax.txt pypi

Jinja2 *
datasets *
flax *
ftfy *
optax *
tensorboard *
torch *
torchvision *
transformers >=4.25.1

examples/controlnet/requirements_sdxl.txt pypi

Jinja2 *
accelerate >=0.16.0
datasets *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1
wandb *

examples/custom_diffusion/requirements.txt pypi

Jinja2 *
accelerate *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/instruct_pix2pix/requirements.txt pypi

accelerate >=0.16.0
datasets *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/kandinsky2_2/text_to_image/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
datasets *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/colossalai/requirement.txt pypi

Jinja2 *
diffusers *
ftfy *
tensorboard *
torch *
torchvision *
transformers *

examples/research_projects/consistency_training/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/diffusion_dpo/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
peft *
tensorboard *
torchvision *
transformers >=4.25.1
wandb *

examples/research_projects/dreambooth_inpaint/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
diffusers ==0.9.0
ftfy *
tensorboard *
torchvision *
transformers >=4.21.0

examples/research_projects/intel_opts/textual_inversion/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
intel_extension_for_pytorch >=1.13
tensorboard *
torchvision *
transformers >=4.21.0

examples/research_projects/intel_opts/textual_inversion_dfq/requirements.txt pypi

accelerate *
ftfy *
modelcards *
neural-compressor *
tensorboard *
torchvision *
transformers >=4.25.0

examples/research_projects/lora/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
datasets *
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/multi_subject_dreambooth/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/multi_subject_dreambooth_inpainting/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
datasets >=2.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1
wandb >=0.16.1

examples/research_projects/multi_token_textual_inversion/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/multi_token_textual_inversion/requirements_flax.txt pypi

Jinja2 *
flax *
ftfy *
optax *
tensorboard *
torch *
torchvision *
transformers >=4.25.1

examples/research_projects/onnxruntime/text_to_image/requirements.txt pypi

accelerate >=0.16.0
datasets *
ftfy *
modelcards *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/onnxruntime/textual_inversion/requirements.txt pypi

accelerate >=0.16.0
ftfy *
modelcards *
tensorboard *
torchvision *
transformers >=4.25.1

examples/research_projects/onnxruntime/unconditional_image_generation/requirements.txt pypi

accelerate >=0.16.0
datasets *
tensorboard *
torchvision *

examples/research_projects/realfill/requirements.txt pypi

Jinja2 ==3.1.3
accelerate ==0.23.0
diffusers ==0.20.1
ftfy ==6.1.1
peft ==0.5.0
tensorboard ==2.14.0
torch ==2.0.1
torchvision >=0.16
transformers ==4.36.0

examples/t2i_adapter/requirements.txt pypi

accelerate >=0.16.0
datasets *
ftfy *
safetensors *
tensorboard *
torchvision *
transformers >=4.25.1
wandb *

examples/text_to_image/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
datasets *
ftfy *
peft ==0.7.0
tensorboard *
torchvision *
transformers >=4.25.1

examples/text_to_image/requirements_flax.txt pypi

Jinja2 *
datasets *
flax *
ftfy *
optax *
tensorboard *
torch *
torchvision *
transformers >=4.25.1

examples/text_to_image/requirements_sdxl.txt pypi

Jinja2 *
accelerate >=0.22.0
datasets *
ftfy *
peft ==0.7.0
tensorboard *
torchvision *
transformers >=4.25.1

examples/textual_inversion/requirements.txt pypi

Jinja2 *
accelerate >=0.16.0
ftfy *
tensorboard *
torchvision *
transformers >=4.25.1

examples/textual_inversion/requirements_flax.txt pypi

Jinja2 *
flax *
ftfy *
optax *
tensorboard *
torch *
torchvision *
transformers >=4.25.1

examples/unconditional_image_generation/requirements.txt pypi

accelerate >=0.16.0
datasets *
torchvision *

examples/wuerstchen/text_to_image/requirements.txt pypi

accelerate >=0.16.0
bitsandbytes *
deepspeed *
peft >=0.6.0
torchvision *
transformers >=4.25.1
wandb *

pyproject.toml pypi

setup.py pypi

deps *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

tomebrush

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Token Merging for BrushNet

This raises the question: is generating the non-masked regions necessary?

🚀 Getting Started

Environment Requirement 🌍

Data Download ⬇️

🏃🏼 Running Scripts

Inference 📜

sd v1.5

sd v1.5

Evaluation 📏

Result (ToMeBrush)

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies