brushedit

[TPAMI under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"

https://github.com/tencentarc/brushedit

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary

Keywords

diffusion-models image-editing image-inpainting
Last synced: 6 months ago · JSON representation

Repository

[TPAMI under review] The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing"

Basic Info
Statistics
  • Stars: 567
  • Watchers: 7
  • Forks: 27
  • Open Issues: 11
  • Releases: 0
Topics
diffusion-models image-editing image-inpainting
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

BrushEdit

** Please check out our latest DiT-based image customization project IC-Custom, which provides powerful ID-Consistent editing capabilities!**

This repository contains the implementation of "BrushEdit: All-In-One Image Inpainting and Editing".

Keywords: Image Inpainting, Image Generation, Image Editing, Diffusion Models, MLLM Agent, Instruction-basd Editing

TL;DR: BrushEdit is an advanced, unified AI agent for image inpainting and editing.
Main Elements: Fully automated / Interactive editing.

Yaowei Li1, Yuxuan Bian3, Xuan Ju3, Zhaoyang Zhang2, Junhao Zhuang4, Ying Shan2, Yuexian Zou1
, Qiang Xu3
1Peking University 2ARC Lab, Tencent PCG 3The Chinese University of Hong Kong 4Tsinghua University
Equal Contribution Project Lead Corresponding Author

Project Page | Arxiv | Video | Hugging Face Demo | Hugging Model |

https://github.com/user-attachments/assets/fde82f21-8b36-4584-8460-c109c195e614

4K HD Introduction Video: Youtube.

** Table of Contents**

TODO

  • [X] Release the code of BrushEdit. (MLLM-dirven Agent for Image Editing and Inpainting)
  • [X] Release the paper and webpage. More info: BrushEdit
  • [X] Release the BrushNetX checkpoint(a more powerful BrushNet).
  • [X] Release gradio demo.

Pipeline Overview

BrushEdit consists of four main steps: (i) Editing category classification: determine the type of editing required. (ii) Identification of the primary editing object: Identify the main object to be edited. (iii) Acquisition of the editing mask and target Caption: Generate the editing mask and corresponding target caption. (iv) Image inpainting: Perform the actual image editing. Steps (i) to (iii) utilize pre-trained MLLMs and detection models to ascertain the editing type, target object, editing masks, and target caption. Step (iv) involves image editing using the dual-branch inpainting model improved BrushNet. This model inpaints the target areas based on the target caption and editing masks, leveraging the generative potential and background preservation capabilities of inpainting models.

teaser

Getting Started

Environment Requirement

BrushEdit has been implemented and tested on CUDA118, Pytorch 2.0.1, python 3.10.6.

Clone the repo:

git clone https://github.com/TencentARC/BrushEdit.git

We recommend you first use conda to create virtual environment, and install pytorch following official instructions. For example:

conda create -n brushedit python=3.10.6 -y conda activate brushedit python -m pip install --upgrade pip pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Then, you can install diffusers (implemented in this repo) with:

pip install -e .

After that, you can install required packages thourgh:

pip install -r app/requirements.txt

Download Checkpoints

Checkpoints of BrushEdit can be downloaded using the following command.

sh app/down_load_brushedit.sh

The ckpt folder contains

  • BrushNetX pretrained checkpoints for Stable Diffusion v1.5 (brushnetX)
  • Pretrained Stable Diffusion v1.5 checkpoint (e.g., realisticVisionV60B1v51VAE from Civitai). You can use `scripts/convertoriginalstablediffusiontodiffusers.py` to process other models downloaded from Civitai.
  • Pretrained GroundingDINO checkpoint from offical.
  • Pretrained SAM checkpoint from offical.

The checkpoint structure should be like:

``` |-- models |-- basemodel |-- realisticVisionV60B1v51VAE |-- modelindex.json |-- vae |-- ... |-- dreamshaper8 |-- ... |-- epicrealismnaturalSinRC1VAE |-- ... |-- meinamixmeinaV11 |-- ... |-- ... |-- brushnetX |-- config.json |-- diffusionpytorchmodel.safetensors |-- groundingdino |-- groundingdinoswintogc.pth |-- sam |-- samvith4b8939.pth |-- vlm |-- llava-v1.6-mistral-7b-hf |-- ... |-- llava-v1.6-vicuna-13b-hf |-- ... |-- Qwen2-VL-7B-Instruct |-- ... |-- ...

```

We provide five base diffusion models, including:

  • Dreamshapre_8 is a versatile model that can generate impressive portraits and landscape images.
  • Epicrealism_naturalSinRC1VAE is a realistic style model that excels at generating portraits
  • HenmixReal_v5c is a model that specializes in generating realistic images of women.
  • Meinamix_meinaV11 is a model that excels at generating images in an animated style.
  • RealisticVisionV60B1_v51VAE is a highly generalized realistic style model.

The BrushNetX checkpoint represents an enhanced version of BrushNet, having been trained on a more diverse dataset to improve its editing capabilities, such as deletion and replacement.

We provide two VLM models, including Qwen2-VL-7B-Instruct and LLama3-LLaa-next-8b-hf. We strongly recommend using GPT-4o for reasoning. After selecting the VLM model as gpt4-o, enter the API KEY and click the Submit and Verify button. If the output is success, you can use gpt4-o normally. Secondarily, we recommend using the Qwen2VL model.

And you can download more prefromhuggingfacehubimporthfhubdownload, snapshotdownloadtrained VLMs model from QwenVL and LLaVA-Next.

Running Scripts

BrushEidt demo

You can run the demo using the script:

sh app/run_app.sh

Demo Features

demo_vis

Fundamental Features:

  • Aspect Ratio: Select the aspect ratio of the image. To prevent OOM, 1024px is the maximum resolution.
  • VLM Model: Select the VLM model. We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
  • Generate Mask: According to the input instructions, generate a mask for the area that may need to be edited.
  • Square/Circle Mask: Based on the existing mask, generate masks for squares and circles. (The coarse-grained mask provides more editing imagination.)
  • Invert Mask: Invert the mask to generate a new mask.
  • Dilation/Erosion Mask: Expand or shrink the mask to include or exclude more areas.
  • Move Mask: Move the mask to a new position.
  • Generate Target Prompt: Generate a target prompt based on the input instructions.
  • Target Prompt: Description for masking area, manual input or modification can be made when the content generated by VLM does not meet expectations.
  • Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
  • Control length: The intensity of editing and inpainting.

Advanced Features:

  • Base Model: We use preloaded models to save time. To use other VLM models, download them and uncomment the relevant lines in vlm_template.py from our GitHub repo.
  • Blending: Blending brushnet's output and the original input, ensuring the original image details in the unedited areas. (turn off is beeter when removing.)
  • Control length: The intensity of editing and inpainting.
  • Num samples: The number of samples to generate.
  • Negative prompt: The negative prompt for the classifier-free guidance.
  • Guidance scale: The guidance scale for the classifier-free guidance.

Cite Us

``` @misc{li2024brushedit, title={BrushEdit: All-In-One Image Inpainting and Editing}, author={Yaowei Li and Yuxuan Bian and Xuan Ju and Zhaoyang Zhang and and Junhao Zhuang and Ying Shan and Yuexian Zou and Qiang Xu}, year={2024}, eprint={2412.10316}, archivePrefix={arXiv}, primaryClass={cs.CV} }

```

Acknowledgement

Our code is modified based on diffusers and BrushNet here, thanks to all the contributors!

Contact

For any question, feel free to email liyaowei01@gmail.com.

Star History

Star History Chart

Owner

  • Name: ARC Lab, Tencent PCG
  • Login: TencentARC
  • Kind: organization
  • Email: arc@tencent.com

GitHub Events

Total
  • Commit comment event: 1
  • Issues event: 41
  • Watch event: 539
  • Issue comment event: 46
  • Push event: 20
  • Public event: 2
  • Fork event: 27
Last Year
  • Commit comment event: 1
  • Issues event: 41
  • Watch event: 539
  • Issue comment event: 46
  • Push event: 20
  • Public event: 2
  • Fork event: 27

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 20
  • Total Committers: 1
  • Avg Commits per committer: 20.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 20
  • Committers: 1
  • Avg Commits per committer: 20.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
liyaowei-stu y****l@s****n 20
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 24
  • Total pull requests: 0
  • Average time to close issues: 5 days
  • Average time to close pull requests: N/A
  • Total issue authors: 19
  • Total pull request authors: 0
  • Average comments per issue: 1.75
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 24
  • Pull requests: 0
  • Average time to close issues: 5 days
  • Average time to close pull requests: N/A
  • Issue authors: 19
  • Pull request authors: 0
  • Average comments per issue: 1.75
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mrlihellohorld (3)
  • robbxu (2)
  • gecade (2)
  • 1091492188 (2)
  • yincangshiwei (1)
  • kada0720 (1)
  • rishipandey125 (1)
  • juxingyiwan (1)
  • shellin-star (1)
  • bengen-y (1)
  • hdjsjyl (1)
  • Twinkle-ce (1)
  • shikasensei-dev (1)
  • Jandown (1)
  • Fuuuuuuge (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

app/requirements.txt pypi
  • Pillow ==9.5.0
  • accelerate ==0.26.0
  • clip *
  • datasets ==3.1.0
  • fastapi ==0.112.4
  • ftfy ==6.1.1
  • gradio ==4.38.1
  • hpsv2 *
  • huggingface_hub ==0.23.2
  • image-reward *
  • imgaug ==0.4.0
  • open-clip-torch *
  • openai *
  • opencv-python ==4.8.1.78
  • qwen_vl_utils *
  • segment_anything *
  • tensorboard *
  • transformers ==4.46.3
examples/advanced_diffusion_training/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/brushnet/requirements.txt pypi
  • Pillow ==9.5.0
  • accelerate ==0.20.3
  • clip *
  • datasets *
  • ftfy *
  • gradio ==4.44.1
  • hpsv2 *
  • image-reward *
  • imgaug *
  • open-clip-torch *
  • opencv-python *
  • segment_anything *
  • tensorboard *
  • torchmetrics *
  • torchvision *
  • transformers >=4.25.1
examples/consistency_distillation/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • webdataset *
examples/controlnet/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/controlnet/requirements_flax.txt pypi
  • Jinja2 *
  • datasets *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/controlnet/requirements_sdxl.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • wandb *
examples/custom_diffusion/requirements.txt pypi
  • Jinja2 *
  • accelerate *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/dreambooth/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/dreambooth/requirements_flax.txt pypi
  • Jinja2 *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/dreambooth/requirements_sdxl.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/instruct_pix2pix/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/kandinsky2_2/text_to_image/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/colossalai/requirement.txt pypi
  • Jinja2 *
  • diffusers *
  • ftfy *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers *
examples/research_projects/consistency_training/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/diffusion_dpo/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • peft *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • wandb *
examples/research_projects/dreambooth_inpaint/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • diffusers ==0.9.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.21.0
examples/research_projects/intel_opts/textual_inversion/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • intel_extension_for_pytorch >=1.13
  • tensorboard *
  • torchvision *
  • transformers >=4.21.0
examples/research_projects/intel_opts/textual_inversion_dfq/requirements.txt pypi
  • accelerate *
  • ftfy *
  • modelcards *
  • neural-compressor *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.0
examples/research_projects/lora/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/multi_subject_dreambooth/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/multi_subject_dreambooth_inpainting/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets >=2.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • wandb >=0.16.1
examples/research_projects/multi_token_textual_inversion/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/multi_token_textual_inversion/requirements_flax.txt pypi
  • Jinja2 *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/onnxruntime/text_to_image/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • modelcards *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/onnxruntime/textual_inversion/requirements.txt pypi
  • accelerate >=0.16.0
  • ftfy *
  • modelcards *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/onnxruntime/unconditional_image_generation/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • tensorboard *
  • torchvision *
examples/research_projects/realfill/requirements.txt pypi
  • Jinja2 ==3.1.3
  • accelerate ==0.23.0
  • diffusers ==0.20.1
  • ftfy ==6.1.1
  • peft ==0.5.0
  • tensorboard ==2.14.0
  • torch ==2.0.1
  • torchvision >=0.16
  • transformers ==4.36.0
examples/t2i_adapter/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • safetensors *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • wandb *
examples/text_to_image/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/text_to_image/requirements_flax.txt pypi
  • Jinja2 *
  • datasets *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/text_to_image/requirements_sdxl.txt pypi
  • Jinja2 *
  • accelerate >=0.22.0
  • datasets *
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/textual_inversion/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/textual_inversion/requirements_flax.txt pypi
  • Jinja2 *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/unconditional_image_generation/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • torchvision *
examples/wuerstchen/text_to_image/requirements.txt pypi
  • accelerate >=0.16.0
  • bitsandbytes *
  • deepspeed *
  • peft >=0.6.0
  • torchvision *
  • transformers >=4.25.1
  • wandb *
pyproject.toml pypi
setup.py pypi
  • deps *
docker/diffusers-flax-cpu/Dockerfile docker
  • ubuntu 20.04 build
docker/diffusers-flax-tpu/Dockerfile docker
  • ubuntu 20.04 build
docker/diffusers-onnxruntime-cpu/Dockerfile docker
  • ubuntu 20.04 build
docker/diffusers-onnxruntime-cuda/Dockerfile docker
  • nvidia/cuda 12.1.0-runtime-ubuntu20.04 build
docker/diffusers-pytorch-compile-cuda/Dockerfile docker
  • nvidia/cuda 12.1.0-runtime-ubuntu20.04 build
docker/diffusers-pytorch-cpu/Dockerfile docker
  • ubuntu 20.04 build
docker/diffusers-pytorch-cuda/Dockerfile docker
  • nvidia/cuda 12.1.0-runtime-ubuntu20.04 build
docker/diffusers-pytorch-xformers-cuda/Dockerfile docker
  • nvidia/cuda 12.1.0-runtime-ubuntu20.04 build