https://github.com/ai-forever/kandinsky-4

Text and image to video generation: Kandinsky 4.0 (2024)

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.5%) to scientific vocabulary

Keywords

distillation image-to-video kandinsky text-to-video video video-distillation video-generation video-to-audio

Keywords from Contributors

latent-diffusion

Last synced: 11 months ago · JSON representation

Repository

Text and image to video generation: Kandinsky 4.0 (2024)

Basic Info

Host: GitHub
Owner: ai-forever
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://ai-forever.github.io/Kandinsky-4/K40
Size: 402 MB

Statistics

Stars: 145
Watchers: 11
Forks: 11
Open Issues: 2
Releases: 0

Topics

distillation image-to-video kandinsky text-to-video video video-distillation video-generation video-to-audio

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License

README.md

Shows an illustrated sun in light mode and a moon with stars in dark mode.

Kandinsky 4.0: A family of diffusion models for Video generation

In this repository, we provide a family of diffusion models to generate a video given a textual prompt or an image (Coming Soon), a distilled model for a faster generation and a video to audio generation model.

Project Updates

🔥 Source: 2024/12/13: We have open-sourced Kandinsky 4.0 T2V Flash a distilled version of Kandinsky 4.0 T2V text-to-video generation model.
🔥 Source: 2024/12/13: We have open-sourced Kandinsky 4.0 V2A a video-to-audio generation model.

Kandinsky 4.0 T2V: A text-to-video model - Coming Soon
Kandinsky 4.0 T2V Flash: A distilled version of Kandinsky 4.0 T2V 480p.
Kandinsky 4.0 I2V: An image-to-video model - Coming Soon
Kandinsky 4.0 V2A: A video-to-audio model.

Kandinsky 4.0 T2V

Coming Soon 🤗

Examples:

Kandinsky 4.0 T2V Flash

Kandinsky 4.0 is a text-to-video generation model leveraging latent diffusion to produce videos in both 480p and HD resolutions. We also introduce Kandinsky 4 Flash, a distilled version of the model capable of generating 12-second 480p videos in just 11 seconds using a single NVIDIA H100 GPU. The pipeline integrates a 3D causal CogVideoX VAE, the T5-V1.1-XXL text embedder, and our custom-trained MMDiT-like transformer model. Kandinsky 4.0 Flash was trained using the Latent Adversarial Diffusion Distillation (LADD) approach, proposed for distilling image generation models and first described in the article from Stability AI.

The following scheme describes the overall generation pipeline:

Inference

```python import torch from IPython.display import Video from kandinsky import getT2Vpipeline

devicemap = { "dit": torch.device('cuda:0'), "vae": torch.device('cuda:0'), "textembedder": torch.device('cuda:0') }

pipe = getT2Vpipeline(device_map)

images = pipe( seed=42, timelength=12, width = 672, height = 384, savepath="./test.mp4", text="Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance", )

Video("./test.mp4") ```

Please, refer to examples.ipynb notebook for more usage details.

Distributed Inference

For a faster inference, we also provide the capability to perform inference in a distributed way: NUMBER_OF_NODES=1 NUMBER_OF_DEVICES_PER_NODE=8 python -m torch.distributed.launch --nnodes $NUMBER_OF_NODES --nproc-per-node $NUMBER_OF_DEVICES_PER_NODE run_inference_distil.py

Kandinsky 4.0 I2V (image-to-video)

Coming Soon 🤗

Examples:

Examples T2I + I2V:

Kandinsky 4.0 V2A

Video to Audio pipeline consists of a visual encoder, a text encoder, UNet diffusion model to generate spectrogram and Griffin-lim algorithm to convert spectrogram into audio. Visual and text encoders share the same multimodal visual language decoder (cogvlm2-video-llama3-chat).

Our UNet diffusion model is a finetune of the music generation model riffusion. We made modifications in the architecture to condition on video frames and improve the synchronization between video and audio. Also, we replace the text encoder with the decoder of cogvlm2-video-llama3-chat.

pipeline-audio

Inference

```python import torch import torchvision

from kandinsky4video2audio.video2audiopipe import Video2AudioPipeline from kandinsky4video2audio.utils import loadvideo, create_video

device='cuda:0'

pipe = Video2AudioPipeline( "ai-forever/kandinsky-4-v2a", torch_dtype=torch.float16, device = device )

videopath = 'assets/inputs/1.mp4' video, _, fps = torchvision.io.readvideo(video_path)

prompt="clean. clear. good quality." negativeprompt = "hissing noise. drumming rythm. saying. poor quality." videoinput, videocomplete, durationsec = loadvideo(video, fps['videofps'], numframes=96, maxduration_sec=12)

out = pipe( videoinput, prompt, negativeprompt=negativeprompt, durationsec=duration_sec, )[0]

savepath = f'assets/outputs/1.mp4' createvideo( out, videocomplete, displayvideo=True, savepath=savepath, device=device ) ```

Examples:

Authors

Project Leader: Denis Dimitrov.
Scientific Advisors: Andrey Kuznetsov, Sergey Markov.
Training Pipeline & Model Pretrain & Model Distillation: Vladimir Arkhipkin, Lev Novitskiy, Maria Kovaleva.
Model Architecture: Vladimir Arkhipkin, Maria Kovaleva, Zein Shaheen, Arsen Kuzhamuratov, Nikolay Gerasimenko, Mikhail Zhirnov, Alexander Gambashidze, Konstantin Sobolev.
Data Pipeline: Ivan Kirillov, Andrei Shutkin, Kirill Chernishev, Julia Agafonova, Elizaveta Dakhova, Denis Parkhomenko.
Video-to-audio model: Zein Shaheen, Arseniy Shakhmatov, Denis Parkhomenko.
Quality Assessment: Nikolay Gerasimenko, Anna Averchenkova, Victor Panshin, Vladislav Veselov, Pavel Perminov, Vladislav Rodionov, Sergey Skachkov, Stepan Ponomarev.
Other Contributors: Viacheslav Vasilev, Andrei Filatov, Gregory Leleytner.

Owner

Name: AI Forever
Login: ai-forever
Kind: organization
Location: Armenia

Repositories: 60
Profile: https://github.com/ai-forever

Creating ML for the future. AI projects you already know. We are non-profit organization with members from all over the world.

GitHub Events

Total

Create event: 3
Issues event: 1
Watch event: 140
Delete event: 2
Issue comment event: 1
Member event: 2
Public event: 1
Push event: 54
Pull request event: 3
Fork event: 10

Last Year

Create event: 3
Issues event: 1
Watch event: 140
Delete event: 2
Issue comment event: 1
Member event: 2
Public event: 1
Push event: 54
Pull request event: 3
Fork event: 10

Committers

Last synced: about 1 year ago

All Time

Total Commits: 119
Total Committers: 9
Avg Commits per committer: 13.222
Development Distribution Score (DDS): 0.672

Past Year

Commits: 119
Committers: 9
Avg Commits per committer: 13.222
Development Distribution Score (DDS): 0.672

Top Committers

Name	Email	Commits
Lev Novitskiy	5****f	39
Viacheslav Vasilev	3****v	19
Zein Shaheen	z**e@g**m	15
MarKovka20	6****0	15
Denis	d**v@g**m	15
Arkhipkin Vladimir	3****e	11
Andrei Filatov	4****h	3
nihao88	3****8	1
Konstantin Sobolev	s**t@g**m	1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 2
Total pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: 3 minutes
Total issue authors: 2
Total pull request authors: 2
Average comments per issue: 0.5
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: 3 minutes
Issue authors: 2
Pull request authors: 2
Average comments per issue: 0.5
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

https://github.com/ai-forever/kandinsky-4

Science Score: 36.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Kandinsky 4.0: A family of diffusion models for Video generation

Project Updates

Table of contents

Kandinsky 4.0 T2V

Examples:

Kandinsky 4.0 T2V Flash

Inference

Distributed Inference

Kandinsky 4.0 I2V (image-to-video)

Examples:

Examples T2I + I2V:

Kandinsky 4.0 V2A

Inference

Examples:

Authors

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels