https://github.com/cyberagentailab/tango

[ICLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

[ICLR 2025 Oral] TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

Basic Info

Host: GitHub
Owner: CyberAgentAILab
License: other
Language: Python
Default Branch: main
Homepage: https://pantomatrix.github.io/TANGO/
Size: 155 MB

Statistics

Stars: 1,061
Watchers: 28
Forks: 136
Open Issues: 38
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

News

Welcome contributors! Feel free to submit the pull requests!

[2024/10] Welcome to try our TANGO on Hugging face space !
[2024/10] Code for creating gesture graph is available.
[2024/10] Video data download Google Drive (show-oliver and harward business)

Results Videos

demo0 demo1 demo2

demo3 demo5 demo6

demo7 demo8 demo9

Demo Video (on Youtube)

Release Plans

[x] Training codes for AuMoClip
[x] Processed Youtube Buiness Video data (very small, around 15 mins)
[x] Scripts for creating gesture graph
[x] Inference codes with AuMoClip and pretrained weights

Installation

Clone the repository

shell git clone https://github.com/CyberAgentAILab/TANGO.git cd TANGO

Build Environment

For inference and training CLIP part, we recommend a python version ==3.10.16 and cuda version ==11.8. Now HuggingFace Space version is py310 version:

```shell

[Optional] Create a virtual env

conda create -n tangopy310 python==3.10.16 conda activate tangopy310

Install with pip:

python -m pip install -r ./pre-requirements.txt python -m pip install -r ./requirements.txt ```

Training and Inference

Inference

Here is the command for running inference scripts under the path <your root>/TANGO/, it will take around 3 min to generate two 8s videos. You can visualize by directly check the video or check the result .npz files via blender using our blender addon in EMAGE.

Necessary checkpoints and pre-computed graphs will be automatically downloaded during the first run. Please ensure that at least 10GB of disk space is available.

```shell

inference

python inference.py --audiopath ./datasets/cachedaudio/examplemalevoice9seconds.wav --charactername ./datasets/cachedaudio/speaker9o7Ik1OB4TaE00-00-38.15_00-00-42.33.mp4

start gradio app like hugging face space

python app.py ```

Training JointEmbedding (CLIP)

```shell

download the training data from https://drive.google.com/file/d/11ZQI8mB7mP8OtlIdcjtxKvg7OxVZ4t7d/view?usp=drive_link

torchrun --nprocpernode=1 trainhighenv0.py --config ./configs/baselinehighenv0.yaml ```

Create the graph for custom character

For building a motion graph, we recommend a python version ==3.9.20 and cuda version ==11.8 to support mmcv and mmpose.

```shell

[Optional] Create a virtual env

conda create -n tangopy39 python==3.9.20 conda activate tangopy39

Install with pip:

python -m pip install -r ./pre-requirementspy39.txt python -m pip install -r ./requirementspy39.txt ```

```shell

set up the py39

python create_graph.py ```

Copyright Information

We thank the open-source project Wav2Lip, FiLM, SMPLerX.

Check out our previous works for Co-Speech 3D motion Generation DisCo, BEAT, EMAGE.

This project is only for research or education purposes, and not freely available for commercial use or redistribution. The script is available only under the terms of the Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

Owner

Name: CyberAgent AI Lab
Login: CyberAgentAILab
Kind: organization
Location: Japan

Website: https://cyberagent.ai/ailab/
Twitter: cyberagent_ai
Repositories: 7
Profile: https://github.com/CyberAgentAILab

GitHub Events

Total

Issues event: 61
Watch event: 1,103
Issue comment event: 120
Member event: 1
Push event: 3
Public event: 1
Pull request event: 3
Fork event: 129

Last Year

Issues event: 61
Watch event: 1,103
Issue comment event: 120
Member event: 1
Push event: 3
Public event: 1
Pull request event: 3
Fork event: 129

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 13
Total pull requests: 3
Average time to close issues: about 1 month
Average time to close pull requests: 1 day
Total issue authors: 12
Total pull request authors: 2
Average comments per issue: 0.15
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 13
Pull requests: 3
Average time to close issues: about 1 month
Average time to close pull requests: 1 day
Issue authors: 12
Pull request authors: 2
Average comments per issue: 0.15
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

luffy-git (3)
MrEdwards007 (2)
VangengLab (2)
AymaBA (2)
olonely (2)
Patientrookie (1)
JerryDaHeLian (1)
wxydaydayup (1)
TangtangJiujiu (1)
Hyper-lgy (1)
wangaocheng (1)
latiaoge (1)
gg22mm (1)
tsxxdw (1)
csimlinger (1)

https://github.com/cyberagentailab/tango

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation

News

Results Videos

Demo Video (on Youtube)

Release Plans

Installation

Clone the repository

Build Environment

[Optional] Create a virtual env

Install with pip:

Training and Inference

Inference

inference

start gradio app like hugging face space

Training JointEmbedding (CLIP)

download the training data from https://drive.google.com/file/d/11ZQI8mB7mP8OtlIdcjtxKvg7OxVZ4t7d/view?usp=drive_link

Create the graph for custom character

[Optional] Create a virtual env

Install with pip:

set up the py39

Copyright Information

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels