hotshot-xl
✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary
Keywords
Repository
✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL
Basic Info
- Host: GitHub
- Owner: hotshotco
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://hotshot.co
- Size: 96.7 KB
Statistics
- Stars: 1,099
- Watchers: 14
- Forks: 92
- Open Issues: 22
- Releases: 0
Topics
Metadata Files
README.md
Hotshot-XL
🌐 Try it 🃏 Model card 💬 Discord
Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL.
Hotshot-XL can generate GIFs with any fine-tuned SDXL model. This means two things: 1. You’ll be able to make GIFs with any existing or newly fine-tuned SDXL model you may want to use. 2. If you'd like to make GIFs of personalized subjects, you can load your own SDXL based LORAs, and not have to worry about fine-tuning Hotshot-XL. This is awesome because it’s usually much easier to find suitable images for training data than it is to find videos. It also hopefully fits into everyone's existing LORA usage/workflows :) See more here.
Hotshot-XL is compatible with SDXL ControlNet to make GIFs in the composition/layout you’d like. See the ControlNet section below.
Hotshot-XL was trained to generate 1 second GIFs at 8 FPS.
Hotshot-XL was trained on various aspect ratios. For best results with the base Hotshot-XL model, we recommend using it with an SDXL model that has been fine-tuned with 512x512 images. You can find an SDXL model we fine-tuned for 512x512 resolutions here.
🌐 Try It
Try Hotshot-XL yourself here: https://www.hotshot.co
Or, if you'd like to run Hotshot-XL yourself locally, continue on to the sections below.
If you’re running Hotshot-XL yourself, you are going to be able to have a lot more flexibility/control with the model. As a very simple example, you’ll be able to change the sampler. We’ve seen best results with Euler-A so far, but you may find interesting results with some other ones.
🔧 Setup
Environment Setup
pip install virtualenv --upgrade
virtualenv -p $(which python3) venv
source venv/bin/activate
pip install -r requirements.txt
Download the Hotshot-XL Weights
```
Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install git clone https://huggingface.co/hotshotco/Hotshot-XL ```
or visit https://huggingface.co/hotshotco/Hotshot-XL
Download our fine-tuned SDXL model (or BYOSDXL)
- Note: To maximize data and training efficiency, Hotshot-XL was trained at various aspect ratios around 512x512 resolution. For best results with the base Hotshot-XL model, we recommend using it with an SDXL model that has been fine-tuned with images around the 512x512 resolution. You can download an SDXL model we trained with images at 512x512 resolution below, or bring your own SDXL base model.
```
Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install git clone https://huggingface.co/hotshotco/SDXL-512 ```
or visit https://huggingface.co/hotshotco/SDXL-512
🔮 Inference
Text-to-GIF
python inference.py \
--prompt="a bulldog in the captains chair of a spaceship, hd, high quality" \
--output="output.gif"
What to Expect:
| Prompt | Sasquatch scuba diving | a camel smoking a cigarette | Ronald McDonald sitting at a vanity mirror putting on lipstick | drake licking his lips and staring through a window at a cupcake |
|-----------|----------|----------|----------|----------|
| Output |
|
|
|
|
Text-to-GIF with personalized LORAs
python inference.py \
--prompt="a bulldog in the captains chair of a spaceship, hd, high quality" \
--output="output.gif" \
--spatial_unet_base="path/to/stabilityai/stable-diffusion-xl-base-1.0/unet" \
--lora="path/to/lora"
What to Expect:
Note: The outputs below use the DDIMScheduler.
| Prompt | sks person screaming at a capri sun | sks person kissing kermit the frog | sks person wearing a tuxedo holding up a glass of champagne, fireworks in background, hd, high quality, 4K |
|-----------|----------|----------|----------|
| Output |
|
|
|
Text-to-GIF with ControlNet
python inference.py \
--prompt="a girl jumping up and down and pumping her fist, hd, high quality" \
--output="output.gif" \
--control_type="depth" \
--gif="https://media1.giphy.com/media/v1.Y2lkPTc5MGI3NjExbXNneXJicG1mOHJ2dzQ2Y2JteDY1ZWlrdjNjMjl3ZWxyeWFxY2EzdyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/YOTAoXBgMCmFeQQzuZ/giphy.gif"
By default, Hotshot-XL will create key frames from your source gif using 8 equally spaced frames and crop the keyframes to the default aspect ratio. For finer grained control, learn how to vary aspect ratios and vary frame rates/lengths.
Hotshot-XL currently supports the use of one ControlNet model at a time; supporting Multi-ControlNet would be exciting.
What to Expect:
| Prompt | pixar style girl putting two thumbs up, happy, high quality, 8k, 3d, animated disney render | keanu reaves holding a sign that says "HELP", hd, high quality | a woman laughing, hd, high quality | barack obama making a rainbow with their hands, the word "MAGIC" in front of them, wearing a blue and white striped hoodie, hd, high quality |
|-----------|----------|----------|----------|----------|
| Output |
|
|
|
|
| Control |
|
|
|
|
Varying Aspect Ratios
- Note: The base SDXL model is trained to best create images around 1024x1024 resolution. To maximize data and training efficiency, Hotshot-XL was trained at aspect ratios around 512x512 resolution. Please see Additional Notes for a list of aspect ratios the base Hotshot-XL model was trained with.
Like SDXL, Hotshot-XL was trained at various aspect ratios with aspect ratio bucketing, and includes support for SDXL parameters like target-size and original-size. This means you can create GIFs at several different aspect ratios and resolutions, just with the base Hotshot-XL model.
python inference.py \
--prompt="a bulldog in the captains chair of a spaceship, hd, high quality" \
--output="output.gif" \
--width=<WIDTH> \
--height=<HEIGHT>
What to Expect:
| | 512x512 | 672x384 | 384x672 |
|-----------|----------|----------|----------|
| a monkey playing guitar, nature footage, hd, high quality |
|
|
|
Varying frame rates & lengths (Experimental)
By default, Hotshot-XL is trained to generate GIFs that are 1 second long with 8FPS. If you'd like to play with generating GIFs with varying frame rates and time lengths, you can try out the parameters video_length and video_duration.
video_length sets the number of frames. The default value is 8.
video_duration sets the runtime of the output gif in milliseconds. The default value is 1000.
Please note that you should expect unstable/"jittery" results when modifying these parameters as the model was only trained with 1s videos @ 8fps. You'll be able to improve the stability of results for different time lengths and frame rates by fine-tuning Hotshot-XL. Please let us know if you do!
python inference.py \
--prompt="a bulldog in the captains chair of a spaceship, hd, high quality" \
--output="output.gif" \
--video_length=16 \
--video_duration=2000
Spatial Layers Only
Hotshot-XL is trained to generate GIFs alongside SDXL. If you'd like to generate just an image, you can simply set video_length=1 in your inference call and the Hotshot-XL temporal layers will be ignored, as you'd expect.
python inference.py \
--prompt="a bulldog in the captains chair of a spaceship, hd, high quality" \
--output="output.jpg" \
--video_length=1
Additional Notes
Supported Aspect Ratios
Hotshot-XL was trained at the following aspect ratios; to reliably generate GIFs outside the range of these aspect ratios, you will want to fine-tune Hotshot-XL with videos at the resolution of your desired aspect ratio.
| Aspect Ratio | Size | |--------------|------| | 0.42 |320 x 768| | 0.57 |384 x 672| | 0.68 |416 x 608| | 1.00 |512 x 512| | 1.46 |608 x 416| | 1.75 |672 x 384| | 2.40 |768 x 320|
💪 Fine-Tuning
The following section relates to fine-tuning the Hotshot-XL temporal model with additional text/video pairs. If you're trying to generate GIFs of personalized concepts/subjects, we'd recommend not fine-tuning Hotshot-XL, but instead training your own SDXL based LORAs and just loading those.
Fine-Tuning Hotshot-XL
Dataset Preparation
The fine_tune.py script expects your samples to be structured like this:
fine_tune_dataset
├── sample_001
│ ├── 0.jpg
│ ├── 1.jpg
│ ├── 2.jpg
...
...
│ ├── n.jpg
│ └── prompt.txt
Each sample directory should contain your n key frames and a prompt.txt file which contains the prompt.
The final checkpoint will be saved to output_dir.
We've found it useful to send validation GIFs to Weights & Biases every so often. If you choose to use validation with Weights & Biases, you can set how often this runs with the validate_every_steps parameter.
accelerate launch fine_tune.py \
--output_dir="<OUTPUT_DIR>" \
--data_dir="fine_tune_dataset" \
--report_to="wandb" \
--run_validation_at_start \
--resolution=512 \
--mixed_precision=fp16 \
--train_batch_size=4 \
--learning_rate=1.25e-05 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=1000 \
--save_n_steps=20 \
--validate_every_steps=50 \
--vae_b16 \
--gradient_checkpointing \
--noise_offset=0.05 \
--snr_gamma \
--test_prompts="man sits at a table in a cafe, he greets another man with a smile and a handshakes"
📝 Further work
There are lots of ways we are excited about improving Hotshot-XL. For example:
- [ ] Fine-Tuning Hotshot-XL at larger frame rates to create longer/higher frame-rate GIFs
- [ ] Fine-Tuning Hotshot-XL at larger resolutions to create higher resolution GIFs
- [ ] Training temporal layers for a latent upscaler to produce higher resolution GIFs
- [ ] Training an image conditioned "frame prediction" model for more coherent, longer GIFs
- [ ] Training temporal layers for a VAE to mitigate flickering/dithering in outputs
- [ ] Supporting Multi-ControlNet for greater control over GIF generation
- [ ] Training & integrating different ControlNet models for further control over GIF generation (finer facial expression control would be very cool)
- [ ] Moving Hotshot-XL into AITemplate for faster inference times
We 💗 contributions from the open-source community! Please let us know in the issues or PRs if you're interested in working on these improvements or anything else!
📚 BibTeX
@software{Mullan_Hotshot-XL_2023,
author = {Mullan, John and Crawbuck, Duncan and Sastry, Aakash},
license = {Apache-2.0},
month = oct,
title = {{Hotshot-XL}},
url = {https://github.com/hotshotco/hotshot-xl},
version = {1.0.0},
year = {2023}
}
🙏 Acknowledgements
Text-to-Video models are improving quickly and the development of Hotshot-XL has been greatly inspired by the following amazing works and teams:
We hope that releasing this model/codebase helps the community to continue pushing these creative tools forward in an open and responsible way.
Owner
- Name: Hotshot
- Login: hotshotco
- Kind: organization
- Email: hello@hotshot.co
- Location: United States of America
- Website: https://www.hotshot.co
- Twitter: hotshotsupport
- Repositories: 2
- Profile: https://github.com/hotshotco
The Camera for Your Imagination ✨
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Hotshot-XL
message: Personalized GIF Generation with Diffusion Models
type: software
authors:
- given-names: John
family-names: Mullan
email: john@codename.app
affiliation: 'Natural Synthetics, Inc.'
- given-names: Duncan
family-names: Crawbuck
email: duncan@codename.app
affiliation: 'Natural Synthetics, Inc.'
- given-names: Aakash
family-names: Sastry
email: aakash@codename.app
affiliation: 'Natural Synthetics, Inc.'
identifiers:
- type: url
value: 'https://hotshot.co'
description: Hotshot Website
repository-code: 'https://github.com/hotshotco/hotshot-xl'
url: 'https://hotshot.co'
repository-artifact: 'https://huggingface.co/hotshotco/Hotshot-XL'
abstract: >-
Hotshot-XL is an AI text-to-GIF model trained to work
alongside Stable Diffusion XL. Hotshot-XL can generate
GIFs with any fine-tuned SDXL model.
Hotshot-XL is able to make GIFs with any existing or newly
fine-tuned SDXL model you may want to use. If you'd like
to make GIFs of personalized subjects, you can load your
own SDXL based LORAs, and not have to worry about
fine-tuning Hotshot-XL. This is awesome because it’s
usually much easier to find suitable images for training
data than it is to find videos.
Hotshot-XL is compatible with SDXL ControlNet to make GIFs
in the composition/layout you’d like.
Hotshot-XL was trained to generate 1 second GIFs at 8 FPS.
Hotshot-XL was trained on various aspect ratios. To
achieve more efficient training + inference, we fine tuned
SDXL at/around 512 resolution prior to training
Hotshot-XL. We also publish our fine tuned SDXL spatial
model for use among the research community.
keywords:
- ai
- text-to-video
- sdxl
- text-to-video-generation
- text-to-gif
- hotshot-xl
- hotshot
license: Apache-2.0
commit: 16f99c4e8cbf8cebd038a282173767d609836889
version: 1.0.0
date-released: '2023-10-03'
GitHub Events
Total
- Issues event: 2
- Watch event: 62
- Issue comment event: 3
- Pull request event: 11
- Fork event: 11
Last Year
- Issues event: 2
- Watch event: 62
- Issue comment event: 3
- Pull request event: 11
- Fork event: 11
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 41
- Total pull requests: 15
- Average time to close issues: 2 days
- Average time to close pull requests: 3 days
- Total issue authors: 37
- Total pull request authors: 6
- Average comments per issue: 1.02
- Average comments per pull request: 0.07
- Merged pull requests: 9
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 5 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- xiefan233 (3)
- RamboRogers (2)
- howardgriffin (2)
- lindongyue7 (1)
- unbeatablered (1)
- Kevin-1342 (1)
- Number18-tong (1)
- guoqincode (1)
- wangyong860401 (1)
- ruoshiliu (1)
- eli-byers (1)
- My12123 (1)
- billzhao9 (1)
- julkaztwittera (1)
- jinga-lala (1)
Pull Request Authors
- johnmullan (8)
- Kirtanpatel11 (4)
- saisreesatyassss (2)
- painebenjamin (2)
- paul-lupu (1)
- deforum (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- GitPython ==3.1.37
- Jinja2 ==3.1.2
- MarkupSafe ==2.1.3
- Pillow ==10.0.1
- PyYAML ==6.0.1
- accelerate ==0.23.0
- appdirs ==1.4.4
- certifi ==2023.7.22
- charset-normalizer ==3.3.0
- click ==8.1.7
- cmake ==3.27.6
- decorator ==4.4.2
- diffusers ==0.21.4
- docker-pycreds ==0.4.0
- einops ==0.7.0
- filelock ==3.12.4
- fsspec ==2023.9.2
- gitdb ==4.0.10
- huggingface-hub ==0.16.4
- idna ==3.4
- imageio ==2.31.5
- imageio-ffmpeg ==0.4.9
- importlib-metadata ==6.8.0
- lit ==17.0.2
- moviepy ==1.0.3
- mpmath ==1.3.0
- networkx ==3.1
- numpy ==1.26.0
- nvidia-cublas-cu11 ==11.10.3.66
- nvidia-cuda-cupti-cu11 ==11.7.101
- nvidia-cuda-nvrtc-cu11 ==11.7.99
- nvidia-cuda-runtime-cu11 ==11.7.99
- nvidia-cudnn-cu11 ==8.5.0.96
- nvidia-cufft-cu11 ==10.9.0.58
- nvidia-curand-cu11 ==10.2.10.91
- nvidia-cusolver-cu11 ==11.4.0.1
- nvidia-cusparse-cu11 ==11.7.4.91
- nvidia-nccl-cu11 ==2.14.3
- nvidia-nvtx-cu11 ==11.7.91
- packaging ==23.2
- pathtools ==0.1.2
- proglog ==0.1.10
- protobuf ==4.24.3
- psutil ==5.9.5
- regex ==2023.10.3
- requests ==2.31.0
- safetensors ==0.3.3
- sentry-sdk ==1.31.0
- setproctitle ==1.3.3
- six ==1.16.0
- smmap ==5.0.1
- sympy ==1.12
- tokenizers ==0.14.0
- torch ==2.0.1
- torchvision ==0.15.2
- tqdm ==4.66.1
- transformers ==4.34.0
- triton ==2.0.0
- typing_extensions ==4.8.0
- urllib3 ==2.0.6
- wandb ==0.15.11
- zipp ==3.17.0
- diffusers >=0.21.4
- einops *
- torch >=2.0.1
- torchvision >=0.15.2
- transformers >=4.33.3
- pytorch/pytorch 2.0.1-cuda11.7-cudnn8-runtime build
- accelerate ==0.23.0
- diffusers ==0.21.4
- einops ==0.7.0
- imageio ==2.31.5
- moviepy ==1.0.3
- transformers ==4.34.0
- wandb ==0.15.11
- xformers ==0.0.22