https://github.com/bytedance/latentunfold

Implementation of paper: Flux Already Knows – Activating Subject-Driven Image Generation without Training

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.6%) to scientific vocabulary

Keywords

data-free diffusion-models flux-dev in-context-prompting subject-driven-generation training-free

Last synced: 5 months ago · JSON representation

Repository

Implementation of paper: Flux Already Knows – Activating Subject-Driven Image Generation without Training

Basic Info

Host: GitHub
Owner: bytedance
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 15.5 MB

Statistics

Stars: 12
Watchers: 3
Forks: 1
Open Issues: 1
Releases: 0

Topics

data-free diffusion-models flux-dev in-context-prompting subject-driven-generation training-free

Created 11 months ago · Last pushed 9 months ago

Metadata Files

Readme License

LatentUnfold

Header Image

Flux Already Knows – Activating Subject-Driven Image Generation without Training
Hao Kang, Stathi Fotiadis, Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Min Jin Chong, and Xin Lu
Bytedance Intelligent Creation

Abstract
We propose a simple yet effective zero-shot framework for subject-driven image generation using a vanilla Flux model. By framing the task as grid-based image completion and simply replicating the subject image(s) in a mosaic layout, we activate strong identity-preserving capabilities without any additional data, training, or inference-time fine-tuning. This “free lunch” approach is further strengthened by a novel cascade attention design and meta prompting technique, boosting fidelity and versatility. Experimental results show that our method outperforms baselines across multiple key metrics in benchmarks and human preference studies, with trade-offs in certain aspects. Additionally, it supports diverse edits, including logo insertion, virtual try-on, and subject replacement or insertion. These results demonstrate that a pre-trained foundational text-to-image model can enable high-quality, resource-efficient subject-driven generation, opening new possibilities for lightweight customization in downstream applications.

Quick Start

Environment setup (may need to modify bootstrap.sh accordingly) bash source bootstrap.sh
Run example bash # Basic Call python3 run_latent_unfold.py

```bash

Gradio Demo

python3 app.py

```

License

This repository is licensed under the Apache 2.0 License.

Acknowledgement

We would like to express our gratitude to the authors of the following repositories, from which we referenced code, model or assets:
https://github.com/huggingface/diffusers
https://github.com/wooyeolbaek/attention-map-diffusers
https://github.com/Yuanshi9815/OminiControl
https://github.com/google/dreambooth
https://huggingface.co/briaai/RMBG-2.0
https://huggingface.co/black-forest-labs/FLUX.1-dev

Citation

If you find this work useful in your research, please consider citing:

bibtex @article{kang2025latentunfold, title={Flux Already Knows - Activating Subject-Driven Image Generation without Training}, author={Kang, Hao and Fotiadis, Stathi and Jiang, Liming and Yan, Qing and Jia, Yumin and Liu, Zichuan and Chong, Min Jin and Lu, Xin}, journal={arXiv preprint}, volume={arXiv:2504.11478}, year={2025}, }

Owner

Name: Bytedance Inc.
Login: bytedance
Kind: organization
Location: Singapore

Website: https://opensource.bytedance.com
Twitter: ByteDanceOSS
Repositories: 255
Profile: https://github.com/bytedance

GitHub Events

Total

Issues event: 4
Watch event: 33
Issue comment event: 2
Member event: 1
Push event: 7
Public event: 1
Fork event: 7
Create event: 1

Last Year

Issues event: 4
Watch event: 33
Issue comment event: 2
Member event: 1
Push event: 7
Public event: 1
Fork event: 7
Create event: 1

Dependencies

requirements.txt pypi

accelerate *
diffusers *
einops *
gradio *
kornia *
openai *
opencv-python *
pyyaml *
sentencepiece *
tenacity *
timm *
torch ==2.4.1
torchvision ==0.19.1
transformers *
xformers ==0.0.28.post1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bytedance/latentunfold

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

LatentUnfold

Quick Start

Gradio Demo

```

License

Acknowledgement

Citation

Owner

GitHub Events

Total

Last Year

Dependencies