https://github.com/bytedance/uno

[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary

Keywords

diffusion diffusion-transformer flux image-generation in-context-learning subject-driven-generation text-to-image universal-image-generation

Last synced: 5 months ago · JSON representation

Repository

[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning

Basic Info

Host: GitHub
Owner: bytedance
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://bytedance.github.io/UNO/
Size: 39.4 MB

Statistics

Stars: 1,219
Watchers: 13
Forks: 75
Open Issues: 24
Releases: 0

Topics

diffusion diffusion-transformer flux image-generation in-context-learning subject-driven-generation text-to-image universal-image-generation

Created 11 months ago · Last pushed 6 months ago

Metadata Files

Readme License

Less-to-More Generalization:
Unlocking More Controllability by In-Context Generation

Shaojin Wu, Mengqi Huang^*, Wenxu Wu, Yufeng Cheng, Fei Ding⁺, Qian He
Intelligent Creation Team, ByteDance

🔥 News

2025.08.29 🔥 We are excited to share our new open-source project USO, which can freely combine any subjects with any styles in any scenarios while ensuring photorealistic results. 🔥

You can visit our project page or try the live demo for more examples.
2025.08.18 ✨ We open-sourced the UNO-1M dataset, which is a large and high-quality dataset (~1M paired images). We hope it can further benefit research.
2025.06.26 🎉 Congratulations! UNO has been accepted by ICCV 2025!
2025.04.16 🔥 Our companion project RealCustom is released.
2025.04.10 🔥 Update fp8 mode as a primary low vmemory usage support. Gift for consumer-grade GPU users. The peak Vmemory usage is ~16GB now. We may try further inference optimization later.
2025.04.03 🔥 The demo of UNO is released.
2025.04.03 🔥 The training code, inference code, and model of UNO are released.
2025.04.02 🔥 The project page of UNO is created.
2025.04.02 🔥 The arXiv paper of UNO is released.

📖 Introduction

In this study, we propose a highly-consistent data synthesis pipeline to tackle this challenge. This pipeline harnesses the intrinsic in-context generation capabilities of diffusion transformers and generates high-consistency multi-subject paired data. Additionally, we introduce UNO, which consists of progressive cross-modal alignment and universal rotary position embedding. It is a multi-image conditioned subject-to-image model iteratively trained from a text-to-image model. Extensive experiments show that our method can achieve high consistency while ensuring controllability in both single-subject and multi-subject driven generation.

⚡️ Quick Start

🔧 Requirements and Installation

Install the requirements ```bash

pip install -r requirements.txt # legacy installation command

create a virtual environment with python >= 3.10 <= 3.12, like

python -m venv uno_env

source uno_env/bin/activate

or

conda create -n uno_env python=3.10 -y

conda activate uno_env

then install the requirements by you need

!!! if you are using amd GPU/NV RTX50 series/macos MPS, you should install the correct torch version by yourself first

!!! then run the install command

pip install -e . # for who wanna to run the demo/inference only pip install -e .[train] # for who also want to train the model ```

then download checkpoints in one of the three ways: 1. Directly run the inference scripts, the checkpoints will be downloaded automatically by the hf_hub_download function in the code to your $HF_HOME(the default value is ~/.cache/huggingface). 2. use huggingface-cli download <repo name> to download black-forest-labs/FLUX.1-dev, xlabs-ai/xflux_text_encoders, openai/clip-vit-large-patch14, bytedance-research/UNO, then run the inference scripts. You can just download the checkpoint in need only to speed up your set up and save your disk space. i.e. for black-forest-labs/FLUX.1-dev use huggingface-cli download black-forest-labs/FLUX.1-dev flux1-dev.safetensors and huggingface-cli download black-forest-labs/FLUX.1-dev ae.safetensors, ignoreing the text encoder in black-forest-labes/FLUX.1-dev model repo(They are here for diffusers call). All of the checkpoints will take 37 GB of disk space. 3. use huggingface-cli download <repo name> --local-dir <LOCAL_DIR> to download all the checkpoints mentioned in 2. to the directories your want. Then set the environment variable AE, FLUX_DEV(or FLUX_DEV_FP8 if you use fp8 mode), T5, CLIP, LORA to the corresponding paths. Finally, run the inference scripts. 4. If you already have some of the checkpoints, you can set the environment variable AE, FLUX_DEV, T5, CLIP, LORA to the corresponding paths. Finally, run the inference scripts.

🌟 Gradio Demo

bash python app.py

For low vmemory usage, please pass the --offload and --name flux-dev-fp8 args. The peak memory usage will be 16GB. Just for reference, the end2end inference time is 40s to 1min on RTX 3090 in fp8 and offload mode.

bash python app.py --offload --name flux-dev-fp8

✍️ Inference

Start from the examples below to explore and spark your creativity. ✨ bash python inference.py --prompt "A clock on the beach is under a red sun umbrella" --image_paths "assets/clock.png" --width 704 --height 704 python inference.py --prompt "The figurine is in the crystal ball" --image_paths "assets/figurine.png" "assets/crystal_ball.png" --width 704 --height 704 python inference.py --prompt "The logo is printed on the cup" --image_paths "assets/cat_cafe.png" "assets/cup.png" --width 704 --height 704

Optional prepreration: If you want to test the inference on dreambench at the first time, you should clone the submodule dreambench to download the dataset.

bash git submodule update --init Then running the following scripts: ```bash

inference on dreambench

for single-subject

python inference.py --evaljsonpath ./datasets/dreambench_singleip.json

for multi-subject

python inference.py --evaljsonpath ./datasets/dreambench_multiip.json ```

🔍 Evaluation

```bash

evaluated on dreambench

for single-subject

python eval/evaluateclipdinoscoresinglesubject.py --resultroot -savedir <theevaluationresultsave_path>

for multi-subject

python eval/evaluateclipdinoscoremultisubject.py --resultroot -savedir <theevaluationresultsave_path> ```

🚄 Training

If you want to train on UNO-1M, you need to download the dataset from HuggingFace, extract and put it in ./datasets/UNO-1M. The directory will be like: bash ├── datasets │ └── UNO-1M │ ├── images │ │ ├── split1 │ │ │ ├── object365_w1024_h1536_split_Bread_0_0_1_725x1024.png │ │ │ ├── object365_w1024_h1536_split_Bread_0_0_2_811x1024.png │ │ │ └── ... │ │ └── ... │ └── uno_1m_total_labels.json Then run the training script: ```bash

filter and format the dataset

python uno/utils/filteruno1mdataset.py ./datasets/UNO-1M/uno1mtotallabels.json ./datasets/UNO-1M/uno1mtotallabelsconvert.json 4

train

accelerate launch train.py --traindatajson ./datasets/UNO-1M/uno1mtotallabelsconvert.json ```

📌 Tips and Notes

We integrate single-subject and multi-subject generation within a unified model. For single-subject scenarios, the longest side of the reference image is set to 512 by default, while for multi-subject scenarios, it is set to 320. UNO demonstrates remarkable flexibility across various aspect ratios, thanks to its training on a multi-scale dataset. Despite being trained within 512 buckets, it can handle higher resolutions, including 512, 568, and 704, among others.

UNO excels in subject-driven generation but has room for improvement in generalization due to dataset constraints. We are actively developing an enhanced model—stay tuned for updates. Your feedback is valuable, so please feel free to share any suggestions.

🎨 Application Scenarios

📄 Disclaimer

We open-source this project for academic research. The vast majority of images used in this project are either generated or licensed. If you have any concerns, please contact us, and we will promptly remove any inappropriate content. Our code is released under the Apache 2.0 License. Any used base model must adhere to the original licensing terms.

This research aims to advance the field of generative AI. Users are free to create images using this tool, provided they comply with local laws and exercise responsible usage. The developers are not liable for any misuse of the tool by users.

🚀 Updates

For the purpose of fostering research and the open-source community, we plan to open-source the entire project, encompassing training, inference, weights, etc. Thank you for your patience and support! 🌟 - [x] Release github repo. - [x] Release inference code. - [x] Release training code. - [x] Release model checkpoints. - [x] Release arXiv paper. - [x] Release huggingface space demo. - [x] Release in-context data generation pipelines (instructions provided in ./template). - [x] Release dataset (UNO-1M).

Related resources

ComfyUI

https://github.com/jax-explorer/ComfyUI-UNO a ComfyUI node implementation of UNO by jax-explorer.
https://github.com/HM-RunningHub/ComfyUIRHUNO a ComfyUI node implementation of UNO by HM-RunningHub.
https://github.com/ShmuelRonen/ComfyUI-UNO-Wrapper a ComfyUI node implementation of UNO by ShmuelRonen.
https://github.com/Yuan-ManX/ComfyUI-UNO a ComfyUI node implementation of UNO by Yuan-ManX.
https://github.com/QijiTec/ComfyUI-RED-UNO a ComfyUI node implementation of UNO by QijiTec.

We thanks the passionate community contributors, since we have reviced many requests about comfyui, but there aren't so much time to make so many adaptations by ourselves. if you wanna try our work in comfyui, you can try the above repos. Remember, they are slightly different, so you may need some trail and error to make find the best match repo for you.

Citation

If UNO is helpful, please help to ⭐ the repo.

If you find this project useful for your research, please consider citing our paper: bibtex @article{wu2025less, title={Less-to-More Generalization: Unlocking More Controllability by In-Context Generation}, author={Wu, Shaojin and Huang, Mengqi and Wu, Wenxu and Cheng, Yufeng and Ding, Fei and He, Qian}, journal={arXiv preprint arXiv:2504.02160}, year={2025} }

Owner

Name: Bytedance Inc.
Login: bytedance
Kind: organization
Location: Singapore

Website: https://opensource.bytedance.com
Twitter: ByteDanceOSS
Repositories: 255
Profile: https://github.com/bytedance

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 58
Total pull requests: 19
Average time to close issues: 6 days
Average time to close pull requests: about 13 hours
Total issue authors: 55
Total pull request authors: 9
Average comments per issue: 1.5
Average comments per pull request: 0.21
Merged pull requests: 9
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 58
Pull requests: 19
Average time to close issues: 6 days
Average time to close pull requests: about 13 hours
Issue authors: 55
Pull request authors: 9
Average comments per issue: 1.5
Average comments per pull request: 0.21
Merged pull requests: 9
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mokby (2)
bstarker33 (2)
MaargoGysarova (2)
asvilesov (1)
softicelee2 (1)
Morganismine (1)
CccccchenJD (1)
SlZeroth (1)
Sakura-hub47 (1)
mike-srp (1)
DsnTgr (1)
Alarm1673 (1)
dingangui (1)
PlanPersisitentPatient (1)
xilai0715 (1)

Pull Request Authors

ValMystletainn (7)
fenfenfenfan (5)
rphmeier (1)
yuyou-dev (1)
adityanagachandra (1)
regikono (1)
mpuels (1)
eltociear (1)
cb1cyf (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

accelerate ==1.1.1
datasets ==2.21.0
deepspeed ==0.14.4
diffusers ==0.30.1
einops ==0.8.0
gradio ==5.22.0
httpx ==0.23.3
huggingface-hub ==0.24.5
matplotlib ==3.9.2
omegaconf ==2.3.0
onnxruntime ==1.19.0
opencv-python ==4.10.0.84
optimum-quanto ==0.2.4
pycocotools ==2.0.8
sentencepiece ==0.2.0
timm ==1.0.9
torch ==2.4.0
torchaudio ==2.4.0
torchvision ==0.19.0
transformers ==4.43.3

pyproject.toml pypi

diffusers >=0.30.1
einops >=0.8.0
gradio >=5.22.0
huggingface-hub *
sentencepiece ==0.2.0
torch >=2.4.0
torchvision >=0.19.0
transformers >=4.43.3

https://github.com/bytedance/uno

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Less-to-More Generalization: Unlocking More Controllability by In-Context Generation

🔥 News

📖 Introduction

⚡️ Quick Start

🔧 Requirements and Installation

pip install -r requirements.txt # legacy installation command

create a virtual environment with python >= 3.10 <= 3.12, like

python -m venv uno_env

source uno_env/bin/activate

or

conda create -n uno_env python=3.10 -y

conda activate uno_env

then install the requirements by you need

!!! if you are using amd GPU/NV RTX50 series/macos MPS, you should install the correct torch version by yourself first

!!! then run the install command

🌟 Gradio Demo

✍️ Inference

inference on dreambench

for single-subject

for multi-subject

🔍 Evaluation

evaluated on dreambench

for single-subject

for multi-subject

🚄 Training

filter and format the dataset

train

📌 Tips and Notes

🎨 Application Scenarios

📄 Disclaimer

🚀 Updates

Related resources

Citation

Owner

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Less-to-More Generalization:
Unlocking More Controllability by In-Context Generation