sana-fork

https://github.com/tahahah/sana-fork

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: tahahah
License: other
Language: Python
Default Branch: main
Size: 5.63 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

teaser_page1

💡 Introduction

We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include:

(1) DC-AE: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. \ (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. \ (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. \ (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence.

As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024 × 1024 resolution image. Sana enables content creation at low cost.

teaser_page2

🔥🔥 News

(🔥 New) [2024/11/27] Sana Replicate API is launching at Sana-API.
(🔥 New) [2024/11/27] Sana code-base license changed to Apache 2.0.
(🔥 New) [2024/11/26] 1.6B Sana multi-linguistic models are released. Multi-language(Emoji & Chinese & English) are supported.
(🔥 New) [2024/11] 1.6B Sana models are released.
(🔥 New) [2024/11] Training & Inference & Metrics code are released.
(🔥 New) [2024/11] Working on diffusers.
[2024/10] Demo is released.
[2024/10] DC-AE Code and weights are released!
[2024/10] Paper is on Arxiv!

Performance

| Methods (1024x1024) | Throughput (samples/s) | Latency (s) | Params (B) | Speedup | FID 👆 | CLIP 👆 | GenEval 👆 | DPG 👆 | |------------------------------|------------------------|-------------|------------|-----------|-------------|--------------|-------------|-------------| | FLUX-dev | 0.04 | 23.0 | 12.0 | 1.0× | 10.15 | 27.47 | 0.67 | 84.0 | | Sana-0.6B | 1.7 | 0.9 | 0.6 | 39.5× | 5.81 | 28.36 | 0.64 | 83.6 | | Sana-1.6B | 1.0 | 1.2 | 1.6 | 23.3× | 5.76 | 28.67 | 0.66 | 84.8 |

Click to show all

| Methods | Throughput (samples/s) | Latency (s) | Params (B) | Speedup | FID 👆 | CLIP 👆 | GenEval 👆 | DPG 👆 | |------------------------------|------------------------|-------------|------------|-----------|-------------|--------------|-------------|-------------| | _**512 × 512 resolution**_ | | | | | | | | | | PixArt-α | 1.5 | 1.2 | 0.6 | 1.0× | 6.14 | 27.55 | 0.48 | 71.6 | | PixArt-Σ | 1.5 | 1.2 | 0.6 | 1.0× | _6.34_ | _27.62_ | 0.52 | _79.5_ | | **Sana-0.6B** | 6.7 | 0.8 | 0.6 | 5.0× | 5.67 | 27.92 | _0.64_ | 84.3 | | **Sana-1.6B** | 3.8 | 0.6 | 1.6 | 2.5× | **5.16** | **28.19** | **0.66** | **85.5** | | _**1024 × 1024 resolution**_ | | | | | | | | | | LUMINA-Next | 0.12 | 9.1 | 2.0 | 2.8× | 7.58 | 26.84 | 0.46 | 74.6 | | SDXL | 0.15 | 6.5 | 2.6 | 3.5× | 6.63 | _29.03_ | 0.55 | 74.7 | | PlayGroundv2.5 | 0.21 | 5.3 | 2.6 | 4.9× | _6.09_ | **29.13** | 0.56 | 75.5 | | Hunyuan-DiT | 0.05 | 18.2 | 1.5 | 1.2× | 6.54 | 28.19 | 0.63 | 78.9 | | PixArt-Σ | 0.4 | 2.7 | 0.6 | 9.3× | 6.15 | 28.26 | 0.54 | 80.5 | | DALLE3 | - | - | - | - | - | - | _0.67_ | 83.5 | | SD3-medium | 0.28 | 4.4 | 2.0 | 6.5× | 11.92 | 27.83 | 0.62 | 84.1 | | FLUX-dev | 0.04 | 23.0 | 12.0 | 1.0× | 10.15 | 27.47 | _0.67_ | _84.0_ | | FLUX-schnell | 0.5 | 2.1 | 12.0 | 11.6× | 7.94 | 28.14 | **0.71** | **84.8** | | **Sana-0.6B** | 1.7 | 0.9 | 0.6 | **39.5×** | 5.81 | 28.36 | 0.64 | 83.6 | | **Sana-1.6B** | 1.0 | 1.2 | 1.6 | **23.3×** | **5.76** | 28.67 | 0.66 | **84.8** |

Env
Demo
Training
Testing
TODO
Citation

🔧 1. Dependencies and Installation

Python >= 3.10.0 (Recommend to use Anaconda or Miniconda)
PyTorch >= 2.0.1+cu12.1

```bash git clone https://github.com/NVlabs/Sana.git cd Sana

./environment_setup.sh sana

or you can install each components step by step following environment_setup.sh

```

💻 2. How to Play with Sana (Inference)

💰Hardware requirement

9GB VRAM is required for 0.6B model and 12GB VRAM for 1.6B model. Our later quantization version will require less than 8GB for inference.
All the tests are done on A100 GPUs. Different GPU version may be different.

🔛 Quick start with Gradio

```bash

official online demo

DEMOPORT=15432 \ python app/appsana.py \ --share \ --config=configs/sanaconfig/1024ms/Sana1600Mimg1024.yaml \ --modelpath=hf://Efficient-Large-Model/Sana1600M1024px/checkpoints/Sana1600M1024px.pth ```

```python import torch from app.sanapipeline import SanaPipeline from torchvision.utils import saveimage

device = torch.device("cuda:0" if torch.cuda.isavailable() else "cpu") generator = torch.Generator(device=device).manualseed(42)

sana = SanaPipeline("configs/sanaconfig/1024ms/Sana1600Mimg1024.yaml") sana.frompretrained("hf://Efficient-Large-Model/Sana1600M1024px/checkpoints/Sana1600M1024px.pth") prompt = 'a cyberpunk cat with a neon sign that says "Sana"'

image = sana( prompt=prompt, height=1024, width=1024, guidancescale=5.0, pagguidancescale=2.0, numinferencesteps=18, generator=generator, ) saveimage(image, 'output/sana.png', nrow=1, normalize=True, value_range=(-1, 1)) ```

Run Sana (Inference) with Docker

``` # Pull related models huggingface-cli download google/gemma-2b-it huggingface-cli download google/shieldgemma-2b huggingface-cli download mit-han-lab/dc-ae-f32c32-sana-1.0 huggingface-cli download Efficient-Large-Model/Sana_1600M_1024px # Run with docker docker build . -t sana docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ -v ~/.cache:/root/.cache \ sana ```

🔛 Run inference with TXT or JSON files

```bash

Run samples in a txt file

python scripts/inference.py \ --config=configs/sanaconfig/1024ms/Sana1600Mimg1024.yaml \ --modelpath=hf://Efficient-Large-Model/Sana1600M1024px/checkpoints/Sana1600M1024px.pth \ --txtfile=asset/samplesmini.txt

Run samples in a json file

where each line of asset/samples_mini.txt contains a prompt to generate

🔥 3. How to Train Sana

💰Hardware requirement

32GB VRAM is required for both 0.6B and 1.6B model's training

We provide a training example here and you can also select your desired config file from config files dir based on your data structure.

To launch Sana training, you will first need to prepare data in the following formats

bash asset/example_data ├── AAA.txt ├── AAA.png ├── BCC.txt ├── BCC.png ├── ...... ├── CCC.txt └── CCC.png

Then Sana's training can be launched via

```bash

Example of training Sana 0.6B with 512x512 resolution from scratch

bash trainscripts/train.sh \ configs/sanaconfig/512ms/Sana600Mimg512.yaml \ --data.datadir="[asset/exampledata]" \ --data.type=SanaImgDataset \ --model.multiscale=false \ --train.trainbatch_size=32

Example of fine-tuning Sana 1.6B with 1024x1024 resolution

bash trainscripts/train.sh \ configs/sanaconfig/1024ms/Sana1600Mimg1024.yaml \ --data.datadir="[asset/exampledata]" \ --data.type=SanaImgDataset \ --model.loadfrom=hf://Efficient-Large-Model/Sana1600M1024px/checkpoints/Sana1600M1024px.pth \ --model.multiscale=false \ --train.trainbatchsize=8 ```

💻 4. Metric toolkit

Refer to Toolkit Manual.

💪To-Do List

We will try our best to release

[x] Training code
[x] Inference code
[+] Model zoo
[ ] working on Diffusers(https://github.com/huggingface/diffusers/pull/9982)
[ ] ComfyUI
[ ] Laptop development

🤗Acknowledgements

Thanks to PixArt-α, PixArt-Σ and Efficient-ViT for their wonderful work and codebase!

📖BibTeX

@misc{xie2024sana, title={Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer}, author={Enze Xie and Junsong Chen and Junyu Chen and Han Cai and Haotian Tang and Yujun Lin and Zhekai Zhang and Muyang Li and Ligeng Zhu and Yao Lu and Song Han}, year={2024}, eprint={2410.10629}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.10629}, }

Owner

Name: Taha Ansari
Login: tahahah
Kind: user

Repositories: 1
Profile: https://github.com/tahahah

Citation (CITATION.bib)

@misc{xie2024sana,
    title={Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer},
    author={Enze Xie and Junsong Chen and Junyu Chen and Han Cai and Haotian Tang and Yujun Lin and Zhekai Zhang and Muyang Li and Ligeng Zhu and Yao Lu and Song Han},
    year={2024},
    eprint={2410.10629},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2410.10629},
}

GitHub Events

Total

Issues event: 1
Watch event: 1
Issue comment event: 1
Push event: 159
Pull request review comment event: 1
Pull request review event: 2
Pull request event: 2
Create event: 11

Last Year

Issues event: 1
Watch event: 1
Issue comment event: 1
Push event: 159
Pull request review comment event: 1
Pull request review event: 2
Pull request event: 2
Create event: 11

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 5 minutes
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.5
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 5 minutes
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.5
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

tahahah (2)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/bot-autolint.yaml actions

.github/workflows/ci.yaml actions

Dockerfile docker

nvcr.io/nvidia/pytorch 24.06-py3 build

diffusion/model/dc_ae/efficientvit/apps/setup.py pypi

pyproject.toml pypi

accelerate *
beautifulsoup4 *
bs4 *
came-pytorch *
clip @git+https://github.com/openai/CLIP.git
diffusers @git+https://github.com/huggingface/diffusers
einops *
ftfy *
gradio *
image-reward *
ipdb *
matplotlib *
mmcv ==1.7.2
omegaconf *
opencv-python *
optimum *
patch_conv *
peft *
pre-commit *
protobuf *
pyrallis *
pytorch-fid *
regex *
sentencepiece *
spaces *
tensorboard *
tensorboardX *
termcolor *
timm *
torchaudio ==2.4.0
torchvision ==0.19
transformers *
triton ==3.0.0
wandb *
webdataset *
xformers ==0.0.27.post2
yapf *

tools/metrics/clip-score/setup.py pypi

ftfy *
numpy *
pillow *
regex *
torch >=1.7.1
torchvision >=0.8.2
tqdm *

tools/metrics/dpg_bench/requirements.txt pypi

accelerate *
addict *
cloudpickle *
datasets ==2.21.0
decord >=0.6.0
diffusers *
ftfy >=6.0.3
librosa ==0.10.1
modelscope *
numpy *
opencv-python *
oss2 *
pandas *
pillow *
rapidfuzz *
rouge_score <=0.0.4
safetensors *
simplejson *
sortedcontainers *
soundfile *
taming-transformers-rom1504 *
tiktoken *
timm *
tokenizers *
torchvision *
tqdm *
transformers *
transformers_stream_generator *
unicodedata2 *
wandb *
zhconv *

tools/metrics/pytorch-fid/setup.py pypi

numpy *

tools/metrics/geneval/environment.yml conda

_libgcc_mutex 0.1
_openmp_mutex 5.1
blas 1.0
brotlipy 0.7.0
bzip2 1.0.8
ca-certificates 2023.11.17
certifi 2023.11.17
cffi 1.15.1
charset-normalizer 2.0.4
colorama 0.4.6
cryptography 39.0.1
cuda-nvcc 11.3.58
cudatoolkit 11.3.1
diffusers 0.24.0
ffmpeg 4.3
freetype 2.12.1
giflib 5.2.1
gmp 6.2.1
gnutls 3.6.15
huggingface_hub 0.19.4
idna 3.4
intel-openmp 2021.4.0
jpeg 9e
lame 3.100
lcms2 2.12
ld_impl_linux-64 2.38
lerc 3.0
libdeflate 1.17
libffi 3.4.4
libgcc-ng 11.2.0
libgomp 11.2.0
libiconv 1.16
libidn2 2.3.4
libpng 1.6.39
libstdcxx-ng 11.2.0
libtasn1 4.19.0
libtiff 4.5.0
libunistring 0.9.10
libwebp 1.2.4
libwebp-base 1.2.4
lz4-c 1.9.4
mkl 2021.4.0
mkl-service 2.4.0
mkl_fft 1.3.1
mkl_random 1.2.2
ncurses 6.4
nettle 3.7.3
numpy 1.23.1
numpy-base 1.23.1
openh264 2.1.1
openssl 1.1.1w
pillow 9.4.0
pip 20.3.3
pycparser 2.21
pyopenssl 23.0.0
pysocks 1.7.1
python 3.9.16
python_abi 3.9
pytorch 1.12.1
pytorch-mutex 1.0
pyyaml 6.0
readline 8.2
requests 2.29.0
setuptools 66.0.0
six 1.16.0
sqlite 3.41.2
tk 8.6.12
torchvision 0.13.1
typing-extensions 4.5.0
typing_extensions 4.5.0
urllib3 1.26.15
wheel 0.38.4
xz 5.4.2
yaml 0.2.5
zlib 1.2.13
zstd 1.5.5

sana-fork

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

💡 Introduction

🔥🔥 News

Performance

Click to show all

Contents

🔧 1. Dependencies and Installation

or you can install each components step by step following environment_setup.sh

💻 2. How to Play with Sana (Inference)

💰Hardware requirement

🔛 Quick start with Gradio

official online demo

Run Sana (Inference) with Docker

🔛 Run inference with TXT or JSON files

Run samples in a txt file

Run samples in a json file

🔥 3. How to Train Sana

💰Hardware requirement

Example of training Sana 0.6B with 512x512 resolution from scratch

Example of fine-tuning Sana 1.6B with 1024x1024 resolution

💻 4. Metric toolkit

💪To-Do List

🤗Acknowledgements

📖BibTeX

Owner

Citation (CITATION.bib)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies