sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

https://github.com/nvlabs/sana

Keywords

diffusion dit pytorch sana text-to-image-generation transformers

Last synced: 6 months ago · JSON representation ·

Repository

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Basic Info

Host: GitHub
Owner: NVlabs
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://nvlabs.github.io/Sana
Size: 248 MB

Statistics

Stars: 4,443
Watchers: 75
Forks: 292
Open Issues: 69
Releases: 2

Topics

diffusion dit pytorch sana text-to-image-generation transformers

Created over 1 year ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

ICLR 2025 Oral Presentation

teaser_page1

💡 TLDR: Explore everything you want here!

🚶 Basic:

Demo: SANA-1.5 | SANA-ControlNet | SANA-4bit | SANA-Sprint | SANA-Sprint (HF)
ComfyUI: ComfyUI Guidance
Model Zoo: Model Card Collects All Models
Env Preparation: One-Click Env Install
Inference:
     1) diffusers:SanaPipeline
     2) diffusers:SanaPAGPipeline
     3) Ours:SanaPipeline
     4) Inference with Docker
     5) Inference with TXT or JSON Files
Training and Data:
     1) Image-Text Pairs
     2) Multi-Scale Webdataset
     3) TAR File Multi-Scale Webdataset
     4) FSDP Launch
     5) LoRA Training
     6) SANA-Sprint Diffusers Training

🏃 Applications:

2K & 4K Resolution Generation: SANA is Capable to Generate 2K & 4K Images (Only 8BG)
ControlNet: Train&Inference Guidance | Model Zoo | Demo
Dreambooth / LoRA Training: Train&Inference Guidance
Quantization: Inference with 8bit | Inference with 4bit (8BG) | 4bit Model | 4bit Demo | 4bit Demo2
8bit Optimizer: How to Config
Inference Scaling: SANA Generate VILA Pick Inference Scaling
Metrics: Metric Toolkit: (FID, CLIP-Score, GenEval, DPG-Bench)

🚗 Advance:

🚀 Future:

Mission: TODO

🔥🔥 News

(🔥 New) [2025/8/20] We release a new DC-AE-Lite for faster inference and smaller memory. [How to config] | [diffusers PR] | Weight
(🔥 New) [2025/6/25] SANA-Sprint was accepted to ICCV'25 🏖️
(🔥 New) [2025/6/4] SANA-Sprint ComfyUI Node is released [Example] | [PR].
(🔥 New) [2025/5/8] SANA-Sprint (One-step diffusion) diffusers training code is released [Guidance].
(🔥 New) [2025/5/4] SANA-1.5 (Inference-time scaling) is accepted by ICML-2025. 🎉🎉🎉
(🔥 New) [2025/3/22] 🔥SANA-Sprint demo is hosted on Huggingface, try it! 🎉 [Demo Link]
(🔥 New) [2025/3/22] 🔥SANA-1.5 is supported in ComfyUI! 🎉: ComfyUI Guidance | ComfyUI Work Flow SANA-1.5 4.8B
(🔥 New) [2025/3/22] 🔥SANA-Sprint code & weights are released! 🎉 Include: Training & Inference code and Weights / HF are all released. [Guidance]
(🔥 New) [2025/3/21] 🚀Sana + Inference Scaling is released. [Guidance]
(🔥 New) [2025/3/16] 🔥SANA-1.5 code & weights are released! 🎉 Include: DDP/FSDP | TAR file WebDataset | Multi-Scale Training code and Weights | HF are all released.
(🔥 New) [2025/3/14] 🏃SANA-Sprint is coming out! 🎉 A new one/few-step generator of Sana. 0.1s per 1024px image on H100, 0.3s on RTX 4090. Find out more details: [Page] | [Arxiv]. Code is coming very soon along with diffusers
(🔥 New) [2025/2/10] 🚀Sana + ControlNet is released. [Guidance] | [Model] | [Demo]
(🔥 New) [2025/1/30] Release CAME-8bit optimizer code. Saving more GPU memory during training. [How to config]
(🔥 New) [2025/1/29] 🎉 🎉 🎉SANA 1.5 is out! Figure out how to do efficient training & inference scaling! 🚀[Tech Report]
(🔥 New) [2025/1/24] 4bit-Sana is released, powered by SVDQuant and Nunchaku inference engine. Now run your Sana within 8GB GPU VRAM [Guidance] [Demo] [Model]
(🔥 New) [2025/1/24] DCAE-1.1 is released, better reconstruction quality. [Model] [diffusers]
(🔥 New) [2025/1/23] Sana is accepted as Oral by ICLR-2025. 🎉🎉🎉

Click to show all updates

- (🔥 New) \[2025/1/12\] DC-AE tiling makes Sana-4K inferences 4096x4096px images within 22GB GPU memory. With model offload and 8bit/4bit quantize. The 4K Sana run within **8GB** GPU VRAM. [\[Guidance\]](asset/docs/model_zoo.md#-3-2k--4k-models) - (🔥 New) \[2025/1/11\] Sana code-base license changed to Apache 2.0. - (🔥 New) \[2025/1/10\] Inference Sana with 8bit quantization.[\[Guidance\]](asset/docs/quantize/8bit_sana.md#quantization) - (🔥 New) \[2025/1/8\] 4K resolution [Sana models](asset/docs/model_zoo.md) is supported in [Sana-ComfyUI](https://github.com/Efficient-Large-Model/ComfyUI_ExtraModels) and [work flow](asset/docs/ComfyUI/Sana_FlowEuler_4K.json) is also prepared. [\[4K guidance\]](asset/docs/ComfyUI/comfyui.md) - (🔥 New) \[2025/1/8\] 1.6B 4K resolution [Sana models](asset/docs/model_zoo.md) are released: [\[BF16 pth\]](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16) or [\[BF16 diffusers\]](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers). 🚀 Get your 4096x4096 resolution images within 20 seconds! Find more samples in [Sana page](https://nvlabs.github.io/Sana/). Thanks [SUPIR](https://github.com/Fanghua-Yu/SUPIR) for their wonderful work and support. - (🔥 New) \[2025/1/2\] Bug in the `diffusers` pipeline is solved. [Solved PR](https://github.com/huggingface/diffusers/pull/10431) - (🔥 New) \[2025/1/2\] 2K resolution [Sana models](asset/docs/model_zoo.md) is supported in [Sana-ComfyUI](https://github.com/Efficient-Large-Model/ComfyUI_ExtraModels) and [work flow](asset/docs/ComfyUI/Sana_FlowEuler_2K.json) is also prepared. - ✅ \[2024/12\] 1.6B 2K resolution [Sana models](asset/docs/model_zoo.md) are released: [\[BF16 pth\]](https://huggingface.co/Efficient-Large-Model/Sana_1600M_2Kpx_BF16) or [\[BF16 diffusers\]](https://huggingface.co/Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers). 🚀 Get your 2K resolution images within 4 seconds! Find more samples in [Sana page](https://nvlabs.github.io/Sana/). Thanks [SUPIR](https://github.com/Fanghua-Yu/SUPIR) for their wonderful work and support. - ✅ \[2024/12\] `diffusers` supports Sana-LoRA fine-tuning! Sana-LoRA's training and convergence speed is super fast. [\[Guidance\]](asset/docs/sana_lora_dreambooth.md) or [\[diffusers docs\]](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sana.md). - ✅ \[2024/12\] `diffusers` has Sana! [All Sana models in diffusers safetensors](https://huggingface.co/collections/Efficient-Large-Model/sana-673efba2a57ed99843f11f9e) are released and diffusers pipeline `SanaPipeline`, `SanaPAGPipeline`, `DPMSolverMultistepScheduler(with FlowMatching)` are all supported now. We prepare a [Model Card](asset/docs/model_zoo.md) for you to choose. - ✅ \[2024/12\] 1.6B BF16 [Sana model](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16) is released for stable fine-tuning. - ✅ \[2024/12\] We release the [ComfyUI node](https://github.com/Efficient-Large-Model/ComfyUI_ExtraModels) for Sana. [\[Guidance\]](asset/docs/ComfyUI/comfyui.md) - ✅ \[2024/11\] All multi-linguistic (Emoji & Chinese & English) SFT models are released: [1.6B-512px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing), [1.6B-1024px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing), [600M-512px](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px), [600M-1024px](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px). The metric performance is shown [here](#performance) - ✅ \[2024/11\] Sana Replicate API is launching at [Sana-API](https://replicate.com/chenxwh/sana). - ✅ \[2024/11\] 1.6B [Sana models](https://huggingface.co/collections/Efficient-Large-Model/sana-673efba2a57ed99843f11f9e) are released. - ✅ \[2024/11\] Training & Inference & Metrics code are released. - ✅ \[2024/11\] Working on [`diffusers`](https://github.com/huggingface/diffusers/pull/9982). - \[2024/10\] [Demo](https://nv-sana.mit.edu/) is released. - \[2024/10\] [DC-AE Code](https://github.com/mit-han-lab/efficientvit/blob/master/applications/dc_ae/README.md) and [weights](https://huggingface.co/collections/mit-han-lab/dc-ae-670085b9400ad7197bb1009b) are released! - \[2024/10\] [Paper](https://arxiv.org/abs/2410.10629) is on Arxiv!

💡 Introduction

We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include:

(1) DC-AE: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. \ (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. \ (3) Decoder-only text encoder: we replaced T5 with a modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. \ (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence.

As a result, Sana-0.6B is very competitive with modern giant diffusion models (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024 × 1024 resolution image. Sana enables content creation at low cost.

teaser_page2

Performance

| Methods (1024x1024) | Throughput (samples/s) | Latency (s) | Params (B) | Speedup | FID 👇 | CLIP 👆 | GenEval 👆 | DPG 👆 | |--------------------------------------------------------------------------------------------------|------------------------|-------------|------------|---------|-------------|--------------|-------------|---------------| | FLUX-dev | 0.04 | 23.0 | 12.0 | 1.0× | 10.15 | 27.47 | 0.67 | 84.0 | | Sana-0.6B | 1.7 | 0.9 | 0.6 | 39.5× | 5.81 | 28.36 | 0.64 | 83.6 | | Sana-0.6B | 1.7 | 0.9 | 0.6 | 39.5× | 5.61 | 28.80 | 0.68 | 84.2 | | Sana-1.6B | 1.0 | 1.2 | 1.6 | 23.3× | 5.92 | 28.94 | 0.69 | 84.5 | | Sana-1.5 1.6B | 1.0 | 1.2 | 1.6 | 23.3× | 5.70 | 29.12 | 0.82 | 84.5 | | Sana-1.5 4.8B | 0.26 | 4.2 | 4.8 | 6.5× | 5.99 | 29.23 | 0.81 | 84.7 |

Click to show all performance

| Methods | Throughput (samples/s) | Latency (s) | Params (B) | Speedup | FID 👆 | CLIP 👆 | GenEval 👆 | DPG 👆 | |------------------------------|------------------------|-------------|------------|-----------|-------------|--------------|-------------|-------------| | _**512 × 512 resolution**_ | | | | | | | | | | PixArt-α | 1.5 | 1.2 | 0.6 | 1.0× | 6.14 | 27.55 | 0.48 | 71.6 | | PixArt-Σ | 1.5 | 1.2 | 0.6 | 1.0× | _6.34_ | _27.62_ | 0.52 | _79.5_ | | **Sana-0.6B** | 6.7 | 0.8 | 0.6 | 5.0× | 5.67 | 27.92 | _0.64_ | 84.3 | | **Sana-1.6B** | 3.8 | 0.6 | 1.6 | 2.5× | **5.16** | **28.19** | **0.66** | **85.5** | | _**1024 × 1024 resolution**_ | | | | | | | | | | LUMINA-Next | 0.12 | 9.1 | 2.0 | 2.8× | 7.58 | 26.84 | 0.46 | 74.6 | | SDXL | 0.15 | 6.5 | 2.6 | 3.5× | 6.63 | _29.03_ | 0.55 | 74.7 | | PlayGroundv2.5 | 0.21 | 5.3 | 2.6 | 4.9× | _6.09_ | **29.13** | 0.56 | 75.5 | | Hunyuan-DiT | 0.05 | 18.2 | 1.5 | 1.2× | 6.54 | 28.19 | 0.63 | 78.9 | | PixArt-Σ | 0.4 | 2.7 | 0.6 | 9.3× | 6.15 | 28.26 | 0.54 | 80.5 | | DALLE3 | - | - | - | - | - | - | _0.67_ | 83.5 | | SD3-medium | 0.28 | 4.4 | 2.0 | 6.5× | 11.92 | 27.83 | 0.62 | 84.1 | | FLUX-dev | 0.04 | 23.0 | 12.0 | 1.0× | 10.15 | 27.47 | _0.67_ | _84.0_ | | FLUX-schnell | 0.5 | 2.1 | 12.0 | 11.6× | 7.94 | 28.14 | **0.71** | **84.8** | | **Sana-0.6B** | 1.7 | 0.9 | 0.6 | **39.5×** | 5.81 | 28.36 | 0.64 | 83.6 | | **Sana-1.6B** | 1.0 | 1.2 | 1.6 | **23.3×** | **5.76** | 28.67 | 0.66 | **84.8** |

🔧 1. Dependencies and Installation

Python >= 3.10.0 (Recommend to use Anaconda or Miniconda)
PyTorch >= 2.0.1+cu12.1

```bash git clone https://github.com/NVlabs/Sana.git cd Sana

./environment_setup.sh sana

or you can install each components step by step following environment_setup.sh

```

💻 2. How to Play with Sana (Inference)

💰Hardware requirement

9GB VRAM is required for 0.6B model and 12GB VRAM for 1.6B model. Our later quantization version will require less than 8GB for inference.
All the tests are done on A100 GPUs. Different GPU version may be different.

🔛 Choose your model: Model card

🔛 Quick start with Gradio

```bash

official online demo

DEMOPORT=15432 \ python app/appsana.py \ --share \ --config=configs/sanaconfig/1024ms/Sana1600Mimg1024.yaml \ --modelpath=hf://Efficient-Large-Model/Sana1600M1024pxBF16/checkpoints/Sana1600M1024pxBF16.pth \ --image_size=1024 ```

1. How to use `SanaPipeline` with `🧨diffusers`

[!IMPORTANT] Upgrade your diffusers>=0.32.0.dev to make the SanaPipeline and SanaPAGPipeline available!

bash pip install git+https://github.com/huggingface/diffusers

Make sure to specify pipe.transformer to default torch_dtype and variant according to Model Card.

Set pipe.text_encoder to BF16 and pipe.vae to FP32 or BF16. For more info, docs are here.

```python

run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers

import torch from diffusers import SanaPipeline

pipe = SanaPipeline.frompretrained( "Efficient-Large-Model/SANA1.51.6B1024pxdiffusers", torch_dtype=torch.bfloat16, ) pipe.to("cuda")

pipe.vae.to(torch.bfloat16) pipe.text_encoder.to(torch.bfloat16)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"' image = pipe( prompt=prompt, height=1024, width=1024, guidancescale=4.5, numinferencesteps=20, generator=torch.Generator(device="cuda").manualseed(42), )[0]

image[0].save("sana.png") ```

2. How to use `SanaPAGPipeline` with `🧨diffusers`

Click to show all

```python # run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers import torch from diffusers import SanaPAGPipeline pipe = SanaPAGPipeline.from_pretrained( "Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers", torch_dtype=torch.bfloat16, pag_applied_layers="transformer_blocks.8", ) pipe.to("cuda") pipe.text_encoder.to(torch.bfloat16) pipe.vae.to(torch.bfloat16) prompt = 'a cyberpunk cat with a neon sign that says "Sana"' image = pipe( prompt=prompt, guidance_scale=5.0, pag_scale=2.0, num_inference_steps=20, generator=torch.Generator(device="cuda").manual_seed(42), )[0] image[0].save('sana.png') ```

3. How to use Sana in this repo

Click to show all

```python import torch from app.sana_pipeline import SanaPipeline from torchvision.utils import save_image device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") generator = torch.Generator(device=device).manual_seed(42) sana = SanaPipeline("configs/sana1-5_config/1024ms/Sana_1600M_1024px_allqknorm_bf16_lr2e5.yaml") sana.from_pretrained("hf://Efficient-Large-Model/SANA1.5_1.6B_1024px/checkpoints/SANA1.5_1.6B_1024px.pth") prompt = 'a cyberpunk cat with a neon sign that says "Sana"' image = sana( prompt=prompt, height=1024, width=1024, guidance_scale=4.5, pag_guidance_scale=1.0, num_inference_steps=20, generator=generator, ) save_image(image, 'output/sana.png', nrow=1, normalize=True, value_range=(-1, 1)) ```

4. Run Sana (Inference) with Docker

Click to show all

``` # Pull related models huggingface-cli download google/gemma-2b-it huggingface-cli download google/shieldgemma-2b huggingface-cli download mit-han-lab/dc-ae-f32c32-sana-1.1 huggingface-cli download Efficient-Large-Model/Sana_1600M_1024px # Run with docker docker build . -t sana docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ -v ~/.cache:/root/.cache \ sana ```

5. Run inference with TXT or JSON files

```bash

Run samples in a txt file

python scripts/inference.py \ --config=configs/sanaconfig/1024ms/Sana1600Mimg1024.yaml \ --modelpath=hf://Efficient-Large-Model/Sana1600M1024px/checkpoints/Sana1600M1024px.pth \ --txtfile=asset/samples/samplesmini.txt

Run samples in a json file

python scripts/inference.py \ --config=configs/sanaconfig/1024ms/Sana1600Mimg1024.yaml \ --modelpath=hf://Efficient-Large-Model/Sana1600M1024px/checkpoints/Sana1600M1024px.pth \ --jsonfile=asset/samples/samplesmini.json ```

where each line of asset/samples/samples_mini.txt contains a prompt to generate

🔥 3. How to Train Sana

💰Hardware requirement

32GB VRAM is required for both 0.6B and 1.6B model's training

1). Train with image-text pairs in directory

We provide a training example here and you can also select your desired config file from config files dir based on your data structure.

To launch Sana training, you will first need to prepare data in the following formats. Here is an example for the data structure for reference.

bash asset/example_data ├── AAA.txt ├── AAA.png ├── BCC.txt ├── BCC.png ├── ...... ├── CCC.txt └── CCC.png

Then Sana's training can be launched via

```bash

Example of training Sana 0.6B with 512x512 resolution from scratch

bash trainscripts/train.sh \ configs/sanaconfig/512ms/Sana600Mimg512.yaml \ --data.datadir="[asset/exampledata]" \ --data.type=SanaImgDataset \ --model.multiscale=false \ --train.trainbatch_size=32

Example of fine-tuning Sana 1.6B with 1024x1024 resolution

bash trainscripts/train.sh \ configs/sanaconfig/1024ms/Sana1600Mimg1024.yaml \ --data.datadir="[asset/exampledata]" \ --data.type=SanaImgDataset \ --model.loadfrom=hf://Efficient-Large-Model/Sana1600M1024px/checkpoints/Sana1600M1024px.pth \ --model.multiscale=false \ --train.trainbatchsize=8 ```

2). Train with Multi-Scale WebDataset

We also provide conversion scripts to convert your data to the required format. You can refer to the data conversion scripts for more details.

bash python tools/convert_ImgDataset_to_WebDatasetMS_format.py

Then Sana's training can be launched via

```bash

Example of training Sana 0.6B with 512x512 resolution from scratch

bash trainscripts/train.sh \ configs/sanaconfig/512ms/Sana600Mimg512.yaml \ --data.datadir="[asset/exampledatatar]" \ --data.type=SanaWebDatasetMS \ --model.multiscale=true \ --train.trainbatchsize=32 ```

3). Train with TAR file

We prepared a toy TAR dataset containing 100 random images from Journey-DB, duplicated for testing purposes. Note that this dataset is not intended for training.

bash huggingface-cli download Efficient-Large-Model/toy_data --repo-type dataset --local-dir ./data/toy_data --local-dir-use-symlinks False

Then, you are ready to run with FSDP or DDP:

```bash

DDP

Example of training Sana 1.6B with 512x512 resolution from scratch

bash trainscripts/train.sh \ configs/sana1-5config/1024ms/Sana1600M1024pxallqknormbf16lr2e5.yaml \ --data.datadir="[data/toydata]" \ --data.type=SanaWebDatasetMS \ --model.multiscale=true \ --data.loadvaefeat=true \ --train.trainbatchsize=2 ```

```bash

FSDP

Example of training Sana 1.6B with 512x512 resolution from scratch

bash trainscripts/train.sh \ configs/sana1-5config/1024ms/Sana1600M1024pxAdamWfsdp.yaml \ --data.datadir="[data/toydata]" \ --data.type=SanaWebDatasetMS \ --model.multiscale=true \ --data.loadvaefeat=true \ --train.usefsdp=true \ --train.trainbatchsize=2 ```

💻 4. Metric toolkit

Refer to Toolkit Manual.

🚀 5. Inference Scaling

We trained a specialized NVILA-2B model to score images, which we named VISA (VIla as SAna verifier). By selecting the top 4 images from 2,048 candidates, we enhanced the GenEval performance of SD1.5 and SANA-1.5-4.8B v2, increasing their scores from 42 to 87 and 81 to 96, respectively. Details refer to Inference Scaling Manual.

| Method | Overall | Single | Two | Counting | Colors | Position | Color Attribution | |--------------------------------|---------|--------|------|----------|--------|----------|------------------| | SD1.5 | 0.42 | 0.98 | 0.39 | 0.31 | 0.72 | 0.04 | 0.06 | | + Inference Scaling | 0.87 | 1.00 | 0.97 | 0.93 | 0.96 | 0.75 | 0.62 | | SANA-1.5 4.8B v2 | 0.81 | 0.99 | 0.86 | 0.86 | 0.84 | 0.59 | 0.65 | | + Inference Scaling | 0.96 | 1.00 | 1.00 | 0.97 | 0.94 | 0.96 | 0.87 |

🏃 6. SANA-Sprint

Our SANA-Sprint models focus on timestep distillation, achieving high-quality generation with 1-4 inference steps. Refer to SANA-Sprint Manual for more details.

💪To-Do List

We will try our best to achieve

[✅] Training code
[✅] Inference code
[✅] Model zoo
[✅] ComfyUI Nodes(SANA, SANA-1.5, SANA-Sprint)
[✅] DC-AE Diffusers
[✅] Sana merged in Diffusers(https://github.com/huggingface/diffusers/pull/9982)
[✅] LoRA training by @paul(diffusers: https://github.com/huggingface/diffusers/pull/10234)
[✅] 2K/4K resolution models.(Thanks @SUPIR to provide a 4K super-resolution model)
[✅] 8bit / 4bit Laptop development
[✅] ControlNet (train & inference & models)
[✅] FSDP Training
[✅] SANA-1.5 (Larger model size / Inference Scaling)
[✅] SANA-Sprint: Few-step generator
[🚀] Video Generation

🤗Acknowledgements

Thanks to the following open-sourced codebase for their wonderful work and codebase!

Contribution

Thanks goes to these wonderful contributors:

🌟 Star History

📖BibTeX

@misc{xie2024sana, title={Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer}, author={Enze Xie and Junsong Chen and Junyu Chen and Han Cai and Haotian Tang and Yujun Lin and Zhekai Zhang and Muyang Li and Ligeng Zhu and Yao Lu and Song Han}, year={2024}, eprint={2410.10629}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.10629}, } @misc{xie2025sana, title={SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer}, author={Xie, Enze and Chen, Junsong and Zhao, Yuyang and Yu, Jincheng and Zhu, Ligeng and Lin, Yujun and Zhang, Zhekai and Li, Muyang and Chen, Junyu and Cai, Han and others}, year={2025}, eprint={2501.18427}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2501.18427}, } @misc{chen2025sanasprint, title={SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation}, author={Junsong Chen and Shuchen Xue and Yuyang Zhao and Jincheng Yu and Sayak Paul and Junyu Chen and Han Cai and Song Han and Enze Xie}, year={2025}, eprint={2503.09641}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.09641}, }

Owner

Name: NVIDIA Research Projects
Login: NVlabs
Kind: organization

Website: http://research.nvidia.com
Repositories: 166
Profile: https://github.com/NVlabs

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer'
message: >-
  If you use this software or research, please cite it using the
  metadata from this file.
type: misc
authors:
  - given-names: Enze
    family-names: Xie
  - given-names: Junsong
    family-names: Chen
  - given-names: Junyu
    family-names: Chen
  - given-names: Han
    family-names: Cai
  - given-names: Haotian
    family-names: Tang
  - given-names: Yujun
    family-names: Lin
  - given-names: Zhekai
    family-names: Zhang
  - given-names: Muyang
    family-names: Li
  - given-names: Ligeng
    family-names: Zhu
  - given-names: Yao
    family-names: Lu
  - given-names: Song
    family-names: Han
repository-code: 'https://github.com/NVlabs/Sana'
abstract: >-
  SANA proposes an efficient linear Diffusion Transformer (DiT) for high-resolution
  image synthesis, featuring a depth-growth paradigm, model pruning techniques,
  and inference-time scaling strategies to reduce training costs while maintaining
  generation quality. SANA-Sprint also achieves one-step generation of high-resolution images
keywords:
  - deep-learning
  - diffusion-models
  - transformer
  - image-generation
  - text-to-image
  - efficient-training
  - distillation
license: Apache-2.0
version: 1.5.0
doi: 10.48550/arXiv.2410.10629
date-released: 2024-10-16

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 131
Total pull requests: 45
Average time to close issues: about 1 month
Average time to close pull requests: 3 days
Total issue authors: 99
Total pull request authors: 17
Average comments per issue: 2.28
Average comments per pull request: 0.11
Merged pull requests: 33
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 131
Pull requests: 45
Average time to close issues: about 1 month
Average time to close pull requests: 3 days
Issue authors: 99
Pull request authors: 17
Average comments per issue: 2.28
Average comments per pull request: 0.11
Merged pull requests: 33
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

AfterHAL (9)
nitinmukesh (8)
Pevernow (6)
FurkanGozukara (6)
lawrence-cj (6)
dargma (4)
win10ogod (3)
haruharu-1105 (3)
Willian7004 (3)
xin-ran-w (3)
VeteranXT (3)
shaun-ba (3)
jacklishufan (3)
chri002 (2)
KhoiDOO (2)

Pull Request Authors

lawrence-cj (41)
yujincheng08 (4)
eltociear (2)
recoilme (1)
srikarym (1)
chenjy2003 (1)
nitinmukesh (1)
frutiemax92 (1)
suruoxi (1)
sihyeong671 (1)
odusseys (1)
CharlesCNorton (1)
hills-code (1)
Muinez (1)
lavinal712 (1)

Top Labels

Issue Labels

Answered (59) fixed (12) bug (5) Announcement (4) working (1) question (1) documentation (1) pachage version bug (1)

sana

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

ICLR 2025 Oral Presentation

💡 TLDR: Explore everything you want here!

🚶 Basic:

🏃 Applications:

🚗 Advance:

🚀 Future:

🔥🔥 News

💡 Introduction

Performance

Click to show all performance

Contents

🔧 1. Dependencies and Installation

or you can install each components step by step following environment_setup.sh

💻 2. How to Play with Sana (Inference)

💰Hardware requirement

🔛 Choose your model: Model card

🔛 Quick start with Gradio

official online demo

1. How to use SanaPipeline with 🧨diffusers

run pip install git+https://github.com/huggingface/diffusers before use Sana in diffusers

2. How to use SanaPAGPipeline with 🧨diffusers

3. How to use Sana in this repo

4. Run Sana (Inference) with Docker

5. Run inference with TXT or JSON files

Run samples in a txt file

Run samples in a json file

🔥 3. How to Train Sana

💰Hardware requirement

1). Train with image-text pairs in directory

Example of training Sana 0.6B with 512x512 resolution from scratch

Example of fine-tuning Sana 1.6B with 1024x1024 resolution

2). Train with Multi-Scale WebDataset

Example of training Sana 0.6B with 512x512 resolution from scratch

3). Train with TAR file

DDP

Example of training Sana 1.6B with 512x512 resolution from scratch

FSDP

Example of training Sana 1.6B with 512x512 resolution from scratch

💻 4. Metric toolkit

🚀 5. Inference Scaling

🏃 6. SANA-Sprint

💪To-Do List

🤗Acknowledgements

Contribution

🌟 Star History

📖BibTeX

Owner

Citation (CITATION.cff)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

1. How to use `SanaPipeline` with `🧨diffusers`

run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers

2. How to use `SanaPAGPipeline` with `🧨diffusers`