blobctrl

[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

https://github.com/tencentarc/blobctrl

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary

Keywords

aigc image-editing
Last synced: 6 months ago · JSON representation ·

Repository

[Arxiv'25] BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing

Basic Info
Statistics
  • Stars: 90
  • Watchers: 9
  • Forks: 2
  • Open Issues: 1
  • Releases: 0
Topics
aigc image-editing
Created 12 months ago · Last pushed 12 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

BlobCtrl

😃 This repository contains the implementation of "BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing".

Keywords: Image Generation, Image Editing, Diffusion Models, Element-level

TL;DR: BlobCtrl enables precise, user-friendly multi-round element-level visual manipulation.
Main Features: 🦉Element-level Add/Remove/Move/Replace/Enlarge/Shrink.

Yaowei Li 1, Lingen Li 3, Zhaoyang Zhang 2‡, Xiaoyu Li 2, Guangzhi Wang 2, Hongxiang Li 1, Xiaodong Cun 2, Ying Shan 2, Yuexian Zou 1✉
1Peking University 2ARC Lab, Tencent PCG 3The Chinese University of Hong Kong Project Lead Corresponding Author

🌐Project Page | 📜Arxiv | 📹Video | 🤗Hugging Face Demo | 🤗Hugging Model

🤗Hugging Data (TBD) | 🤗Hugging Benchmark (TBD)

https://github.com/user-attachments/assets/ec5fab3c-fa84-4f5d-baf9-1e744f577515

Youtube Introduction Video: Youtube.

📖 Table of Contents

🔥 Update Logs

  • [TBD] Release the data preprocessing code.
  • [TBD] Release the BlobData and BlobBench.
  • [TBD] Release the training code
  • [X] [20/03/2025] Release the inference code.
  • [X] [17/03/2025] Release the paper, webpage and gradio demo.

🛠️ Method Overview

We introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks, offering a practical solution for precise and flexible visual content creation.

🚀 Getting Started

Environment Requirement 🌍
BlobCtrl has been implemented and tested on CUDA121, Pytorch 2.2.0, python 3.10.15. Clone the repo: ``` git clone git@github.com:TencentARC/BlobCtrl.git ``` We recommend you first use `conda` to create virtual environment, and install needed libraries. For example: ``` conda create -n blobctrl python=3.10.15 -y conda activate blobctrl python -m pip install --upgrade pip pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu121 pip install xformers torch==2.2.0 --index-url https://download.pytorch.org/whl/cu121 pip install -r requirements.txt ``` Then, you can install diffusers (implemented in this repo) with: ``` pip install -e . ```
Download Model Checkpoints 💾
Download the corresponding checkpoints of BlobCtrl. ``` sh examples/blobctrl/scripts/download_models.sh ``` **The ckpt folder contains** - Our provided [BlobCtrl](https://huggingface.co/Yw22/BlobCtrl) checkpoints (`UNet LoRA` + `BlobNet`). - Pretrained [SD-v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint. - Pretrained [DINOv2](https://huggingface.co/facebook/dinov2-large) checkpoint. - Pretrained [SAM](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth) checkpoint. The checkpoint structure should be like: ``` |-- models |-- blobnet |-- config.json |-- diffusion_pytorch_model.safetensors |-- dinov2-large |-- config.json |-- model.safetensors ... |-- sam |-- sam_vit_h_4b8939.pth |-- unet_lora |-- pytorch_lora_weights.safetensors ```

🏃🏼 Running Scripts

BlobCtrl demo 🤗
You can run the demo using the script: ``` sh examples/blobctrl/scripts/run_app.sh ```
BlobCtrl Inference 🌠
You can run the inference using the script: ``` examples/blobctrl/scripts/inference.sh ```

🤝🏼 Cite Us

@misc{li2024brushedit, title={BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing}, author={Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou}, year={2025}, eprint={2503.13434}, archivePrefix={arXiv}, primaryClass={cs.CV} }

💖 Acknowledgement

Our implementation builds upon the diffusers library. We extend our sincere gratitude to all the contributors of the diffusers project!

We also acknowledge the BlobGAN project for providing valuable insights and inspiration for our blob-based representation approach.

❓ Contact

For any question, feel free to email liyaowei01@gmail.com.

🌟 Star History

Star History Chart

Owner

  • Name: ARC Lab, Tencent PCG
  • Login: TencentARC
  • Kind: organization
  • Email: arc@tencent.com

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'Diffusers: State-of-the-art diffusion models'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Patrick
    family-names: von Platen
  - given-names: Suraj
    family-names: Patil
  - given-names: Anton
    family-names: Lozhkov
  - given-names: Pedro
    family-names: Cuenca
  - given-names: Nathan
    family-names: Lambert
  - given-names: Kashif
    family-names: Rasul
  - given-names: Mishig
    family-names: Davaadorj
  - given-names: Dhruv
    family-names: Nair
  - given-names: Sayak
    family-names: Paul
  - given-names: Steven
    family-names: Liu
  - given-names: William
    family-names: Berman
  - given-names: Yiyi
    family-names: Xu
  - given-names: Thomas
    family-names: Wolf
repository-code: 'https://github.com/huggingface/diffusers'
abstract: >-
  Diffusers provides pretrained diffusion models across
  multiple modalities, such as vision and audio, and serves
  as a modular toolbox for inference and training of
  diffusion models.
keywords:
  - deep-learning
  - pytorch
  - image-generation
  - hacktoberfest
  - diffusion
  - text2image
  - image2image
  - score-based-generative-modeling
  - stable-diffusion
  - stable-diffusion-diffusers
license: Apache-2.0
version: 0.12.1

GitHub Events

Total
  • Issues event: 8
  • Watch event: 85
  • Issue comment event: 7
  • Public event: 1
  • Push event: 7
  • Fork event: 3
Last Year
  • Issues event: 8
  • Watch event: 85
  • Issue comment event: 7
  • Public event: 1
  • Push event: 7
  • Fork event: 3

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 7
  • Total Committers: 1
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 7
  • Committers: 1
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
liyaowei-stu y****l@s****n 7
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 5
  • Total pull requests: 0
  • Average time to close issues: 15 days
  • Average time to close pull requests: N/A
  • Total issue authors: 4
  • Total pull request authors: 0
  • Average comments per issue: 1.6
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 0
  • Average time to close issues: 15 days
  • Average time to close pull requests: N/A
  • Issue authors: 4
  • Pull request authors: 0
  • Average comments per issue: 1.6
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • zyx1213271098 (2)
  • NielsRogge (1)
  • GPAIPAI (1)
  • Looperswag (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

docker/diffusers-doc-builder/Dockerfile docker
  • ubuntu 20.04 build
docker/diffusers-flax-cpu/Dockerfile docker
  • ubuntu 20.04 build
docker/diffusers-flax-tpu/Dockerfile docker
  • ubuntu 20.04 build
docker/diffusers-onnxruntime-cpu/Dockerfile docker
  • ubuntu 20.04 build
docker/diffusers-onnxruntime-cuda/Dockerfile docker
  • nvidia/cuda 12.1.0-runtime-ubuntu20.04 build
docker/diffusers-pytorch-compile-cuda/Dockerfile docker
  • nvidia/cuda 12.1.0-runtime-ubuntu20.04 build
docker/diffusers-pytorch-cpu/Dockerfile docker
  • ubuntu 20.04 build
docker/diffusers-pytorch-cuda/Dockerfile docker
  • nvidia/cuda 12.1.0-runtime-ubuntu20.04 build
docker/diffusers-pytorch-xformers-cuda/Dockerfile docker
  • nvidia/cuda 12.1.0-runtime-ubuntu20.04 build
examples/advanced_diffusion_training/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/consistency_distillation/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • webdataset *
examples/controlnet/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/controlnet/requirements_flax.txt pypi
  • Jinja2 *
  • datasets *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/controlnet/requirements_sdxl.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • wandb *
examples/custom_diffusion/requirements.txt pypi
  • Jinja2 *
  • accelerate *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/dreambooth/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/dreambooth/requirements_flax.txt pypi
  • Jinja2 *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/dreambooth/requirements_sd3.txt pypi
  • Jinja2 *
  • accelerate >=0.31.0
  • ftfy *
  • peft ==0.11.1
  • sentencepiece *
  • tensorboard *
  • torchvision *
  • transformers >=4.41.2
examples/dreambooth/requirements_sdxl.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/instruct_pix2pix/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/kandinsky2_2/text_to_image/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/colossalai/requirement.txt pypi
  • Jinja2 *
  • diffusers *
  • ftfy *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers *
examples/research_projects/consistency_training/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/diffusion_dpo/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • peft *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • wandb *
examples/research_projects/diffusion_orpo/requirements.txt pypi
  • accelerate *
  • datasets *
  • peft *
  • torchvision *
  • transformers *
  • wandb *
  • webdataset *
examples/research_projects/dreambooth_inpaint/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • diffusers ==0.9.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.21.0
examples/research_projects/gligen/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • diffusers *
  • fairscale *
  • ftfy *
  • scipy *
  • tensorboard *
  • timm *
  • torchvision *
  • transformers >=4.25.1
  • wandb *
examples/research_projects/intel_opts/textual_inversion/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • intel_extension_for_pytorch >=1.13
  • tensorboard *
  • torchvision *
  • transformers >=4.21.0
examples/research_projects/intel_opts/textual_inversion_dfq/requirements.txt pypi
  • accelerate *
  • ftfy *
  • modelcards *
  • neural-compressor *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.0
examples/research_projects/lora/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/multi_subject_dreambooth/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/multi_subject_dreambooth_inpainting/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets >=2.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • wandb >=0.16.1
examples/research_projects/multi_token_textual_inversion/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/multi_token_textual_inversion/requirements_flax.txt pypi
  • Jinja2 *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/onnxruntime/text_to_image/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • modelcards *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/onnxruntime/textual_inversion/requirements.txt pypi
  • accelerate >=0.16.0
  • ftfy *
  • modelcards *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/research_projects/onnxruntime/unconditional_image_generation/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • tensorboard *
  • torchvision *
examples/research_projects/realfill/requirements.txt pypi
  • Jinja2 ==3.1.3
  • accelerate ==0.23.0
  • diffusers ==0.20.1
  • ftfy ==6.1.1
  • peft ==0.5.0
  • tensorboard ==2.14.0
  • torch ==2.0.1
  • torchvision >=0.16
  • transformers ==4.38.0
examples/t2i_adapter/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • ftfy *
  • safetensors *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
  • wandb *
examples/text_to_image/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • datasets >=2.19.1
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/text_to_image/requirements_flax.txt pypi
  • Jinja2 *
  • datasets *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/text_to_image/requirements_sdxl.txt pypi
  • Jinja2 *
  • accelerate >=0.22.0
  • datasets *
  • ftfy *
  • peft ==0.7.0
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/textual_inversion/requirements.txt pypi
  • Jinja2 *
  • accelerate >=0.16.0
  • ftfy *
  • tensorboard *
  • torchvision *
  • transformers >=4.25.1
examples/textual_inversion/requirements_flax.txt pypi
  • Jinja2 *
  • flax *
  • ftfy *
  • optax *
  • tensorboard *
  • torch *
  • torchvision *
  • transformers >=4.25.1
examples/unconditional_image_generation/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • torchvision *
examples/vqgan/requirements.txt pypi
  • accelerate >=0.16.0
  • datasets *
  • numpy *
  • tensorboard *
  • timm *
  • torchvision *
  • tqdm *
  • transformers >=4.25.1
examples/wuerstchen/text_to_image/requirements.txt pypi
  • accelerate >=0.16.0
  • bitsandbytes *
  • deepspeed *
  • peft >=0.6.0
  • torchvision *
  • transformers >=4.25.1
  • wandb *
pyproject.toml pypi
requirements.txt pypi
  • accelerate ==1.5.2
  • einops ==0.8.1
  • gradio ==5.21.0
  • huggingface_hub ==0.29.3
  • ipdb *
  • matplotlib ==3.10.1
  • numpy ==1.26.0
  • opencv-python ==4.8.1.78
  • peft ==0.14.0
  • segment_anything *
  • transformers ==4.49.0
setup.py pypi
  • deps *