https://github.com/academic-hammer/talkingface-toolkit
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org, researchgate.net -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Academic-Hammer
- Language: Python
- Default Branch: main
- Size: 448 MB
Statistics
- Stars: 9
- Watchers: 2
- Forks: 40
- Open Issues: 25
- Releases: 0
Metadata Files
README.md
talkingface-toolkit
框架整体介绍
checkpoints
主要保存的是训练和评估模型所需要的额外的预训练模型,在对应文件夹的README有更详细的介绍
datset
存放数据集以及数据集预处理之后的数据,详细内容见dataset里的README
saved
存放训练过程中保存的模型checkpoint, 训练过程中保存模型时自动创建
talkingface
主要功能模块,包括所有核心代码
config
根据模型和数据集名称自动生成所有模型、数据集、训练、评估等相关的配置信息 ``` config/
├── configurator.py
```
data
- dataprocess:模型特有的数据处理代码,(可以是对方仓库自己实现的音频特征提取、推理时的数据处理)。如果实现的模型有这个需求,就要建立一对应的文件
- dataset:每个模型都要重载
torch.utils.data.Dataset用于加载数据。每个模型都要有一个model_name+'_dataset.py'文件.__getitem__()方法的返回值应处理成字典类型的数据。 (核心部分) ``` data/
├── dataprocess
| ├── wav2lip_process.py
| ├── xxxx_process.py
├── dataset
| ├── wav2lip_dataset.py
| ├── xxx_dataset.py ```
evaluate
主要涉及模型评估的代码 LSE metric 需要的数据是生成的视频列表 SSIM metric 需要的数据是生成的视频和真实的视频列表
model
实现的模型的网络和对应的方法 (核心部分)
主要分三类: - audio-driven (音频驱动) - image-driven (图像驱动) - nerf-based (基于神经辐射场的方法)
``` model/
├── audiodriventalkingface
| ├── wav2lip.py
├── imagedriventalkingface
| ├── xxxx.py
├── nerfbasedtalkingface
| ├── xxxx.py
├── abstract_talkingface.py
```
properties
保存默认配置文件,包括: - 数据集配置文件 - 模型配置文件 - 通用配置文件
需要根据对应模型和数据集增加对应的配置文件,通用配置文件overall.yaml一般不做修改
```
properties/
├── dataset
| ├── xxx.yaml
├── model
| ├── xxx.yaml
├── overall.yaml
```
quick_start
通用的启动文件,根据传入参数自动配置数据集和模型,然后训练和评估(一般不需要修改) ``` quick_start/
├── quick_start.py
```
trainer
训练、评估函数的主类。在trainer中,如果可以使用基类Trainer实现所有功能,则不需要写一个新的。如果模型训练有一些特有部分,则需要重载Trainer。需要重载部分可能主要集中于: _train_epoch(), _valid_epoch()。 重载的Trainer应该命名为:{model_name}Trainer
```
trainer/
├── trainer.py
```
utils
公用的工具类,包括s3fd人脸检测,视频抽帧、视频抽音频方法。还包括根据参数配置找对应的模型类、数据类等方法。
一般不需要修改,但可以适当添加一些必须的且相对普遍的数据处理文件。
使用方法
环境要求
python=3.8torch==1.13.1+cu116(gpu版,若设备不支持cuda可以使用cpu版)numpy==1.20.3librosa==0.10.1
尽量保证上面几个包的版本一致
提供了两种配置其他环境的方法: ``` pip install -r requirements.txt
or
conda env create -f environment.yml ```
建议使用conda虚拟环境!!!
训练和评估
bash
python run_talkingface.py --model=xxxx --dataset=xxxx (--other_parameters=xxxxxx)
权重文件
可选论文:
Aduio_driven talkingface
| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | MakeItTalk | paper | code | | MEAD | paper | code | | RhythmicHead | paper | code | | PC-AVS | paper | code | | EVP | paper | code | | LSP | paper | code | | EAMM | paper | code | | DiffTalk | paper | code | | TalkLip | paper | code | | EmoGen | paper | code | | SadTalker | paper | code | | HyperLips | paper | code | | PHADTF | paper | code | | VideoReTalking | paper | code | |
Image_driven talkingface
| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | PIRenderer | paper | code | | StyleHEAT | paper | code | | MetaPortrait | paper | code | | |
Nerf-based talkingface
| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | AD-NeRF | paper | code | | GeneFace | paper | code | | DFRF | paper | code | | |
texttospeech
| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | VITS | paper | code | | Glow TTS | paper | code | | FastSpeech2 | paper | code | | StyleTTS2 | paper | code | | Grad-TTS | paper | code | | FastSpeech | paper | code | | |
voice_conversion
| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | StarGAN-VC | paper | code | | Emo-StarGAN | paper | code | | adaptive-VC | paper | code | | DiffVC | paper | code | | Assem-VC | paper | code | | |
作业要求
- 确保可以仅在命令行输入模型和数据集名称就可以训练、验证。(部分仓库没有提供训练代码的,可以不训练)
- 每个组都要提交一个README文件,写明完成的功能、最终实现的训练、验证截图、所使用的依赖、成员分工等。
Owner
- Name: DataHammer
- Login: Academic-Hammer
- Kind: organization
- Repositories: 12
- Profile: https://github.com/Academic-Hammer
GitHub Events
Total
- Watch event: 6
- Pull request event: 1
- Fork event: 1
Last Year
- Watch event: 6
- Pull request event: 1
- Fork event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 2
- Total pull requests: 36
- Average time to close issues: 13 days
- Average time to close pull requests: 3 days
- Total issue authors: 2
- Total pull request authors: 27
- Average comments per issue: 0.5
- Average comments per pull request: 0.08
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 1.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Hujiazeng (1)
- Klayand (1)
Pull Request Authors
- Kline-song (7)
- 18pwp81 (6)
- chuyi369 (4)
- Atlus99 (3)
- Aquariuslyh (3)
- happy-fishingman (2)
- zhouchushu03 (2)
- Abstractjkc (2)
- huanranchen (2)
- LynxPeng (2)
- FFFXX0319 (2)
- ShenShuo137 (2)
- Pissohappy (2)
- yang-kun-long (2)
- vhthree (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- absl-py ==2.0.0
- addict ==2.4.0
- aiosignal ==1.3.1
- appdirs ==1.4.4
- attrs ==23.1.0
- audioread ==3.0.1
- basicsr ==1.3.4.7
- cachetools ==5.3.2
- certifi ==2020.12.5
- cffi ==1.16.0
- charset-normalizer ==3.3.2
- click ==8.1.7
- cloudpickle ==3.0.0
- colorama ==0.4.6
- colorlog ==6.7.0
- contourpy ==1.1.1
- cycler ==0.12.1
- decorator ==5.1.1
- dlib ==19.22.1
- docker-pycreds ==0.4.0
- face-alignment ==1.3.5
- ffmpeg ==1.4
- filelock ==3.13.1
- fonttools ==4.44.0
- frozenlist ==1.4.0
- future ==0.18.3
- gitdb ==4.0.11
- gitpython ==3.1.40
- glob2 ==0.7
- google-auth ==2.23.4
- google-auth-oauthlib ==0.4.6
- grpcio ==1.59.2
- hyperopt ==0.2.5
- idna ==3.4
- imageio ==2.9.0
- imageio-ffmpeg ==0.4.5
- importlib-metadata ==6.8.0
- importlib-resources ==6.1.0
- joblib ==1.3.2
- jsonschema ==4.19.2
- jsonschema-specifications ==2023.7.1
- kiwisolver ==1.4.5
- kornia ==0.5.5
- lazy-loader ==0.3
- librosa ==0.10.1
- llvmlite ==0.37.0
- lmdb ==1.2.1
- lws ==1.2.7
- markdown ==3.5.1
- markupsafe ==2.1.3
- matplotlib ==3.6.3
- msgpack ==1.0.7
- networkx ==3.1
- numba ==0.54.1
- numpy ==1.20.3
- oauthlib ==3.2.2
- opencv-python ==3.4.9.33
- packaging ==23.2
- pandas ==1.3.4
- pathtools ==0.1.2
- pillow ==6.2.1
- pkgutil-resolve-name ==1.3.10
- platformdirs ==3.11.0
- plotly ==5.18.0
- pooch ==1.8.0
- protobuf ==4.25.0
- psutil ==5.9.6
- pyasn1 ==0.5.0
- pyasn1-modules ==0.3.0
- pycparser ==2.21
- pyparsing ==3.1.1
- python-dateutil ==2.8.2
- python-speech-features ==0.6
- pytorch-fid ==0.3.0
- pytz ==2023.3.post1
- pywavelets ==1.4.1
- pyyaml ==5.3.1
- ray ==2.6.3
- referencing ==0.30.2
- requests ==2.31.0
- requests-oauthlib ==1.3.1
- rpds-py ==0.12.0
- rsa ==4.9
- scikit-image ==0.16.2
- scikit-learn ==1.3.2
- scipy ==1.5.0
- sentry-sdk ==1.34.0
- setproctitle ==1.3.3
- six ==1.16.0
- smmap ==5.0.1
- soundfile ==0.12.1
- soxr ==0.3.7
- tabulate ==0.9.0
- tb-nightly ==2.12.0a20230126
- tenacity ==8.2.3
- tensorboard ==2.7.0
- tensorboard-data-server ==0.6.1
- tensorboard-plugin-wit ==1.8.1
- texttable ==1.7.0
- thop ==0.1.1
- threadpoolctl ==3.2.0
- tomli ==2.0.1
- torch ==1.13.1
- torchaudio ==0.13.1
- torchvision ==0.14.1
- tqdm ==4.66.1
- trimesh ==3.9.20
- typing-extensions ==4.8.0
- tzdata ==2023.3
- urllib3 ==2.0.7
- wandb ==0.15.12
- werkzeug ==3.0.1
- yapf ==0.40.2
- zipp ==3.17.0
- GitPython ==3.1.40
- Markdown ==3.5.1
- MarkupSafe ==2.1.3
- Pillow ==6.2.1
- PyWavelets ==1.4.1
- PyYAML ==5.3.1
- Werkzeug ==3.0.1
- absl-py ==2.0.0
- addict ==2.4.0
- aiosignal ==1.3.1
- appdirs ==1.4.4
- attrs ==23.1.0
- audioread ==3.0.1
- basicsr ==1.3.4.7
- cachetools ==5.3.2
- certifi ==2020.12.5
- cffi ==1.16.0
- charset-normalizer ==3.3.2
- click ==8.1.7
- cloudpickle ==3.0.0
- colorama ==0.4.6
- colorlog ==6.7.0
- contourpy ==1.1.1
- cycler ==0.12.1
- decorator ==5.1.1
- dlib ==19.22.1
- docker-pycreds ==0.4.0
- face-alignment ==1.3.5
- ffmpeg ==1.4
- filelock ==3.13.1
- fonttools ==4.44.0
- frozenlist ==1.4.0
- future ==0.18.3
- gitdb ==4.0.11
- glob2 ==0.7
- google-auth ==2.23.4
- google-auth-oauthlib ==0.4.6
- grpcio ==1.59.2
- hyperopt ==0.2.5
- idna ==3.4
- imageio ==2.9.0
- imageio-ffmpeg ==0.4.5
- importlib-metadata ==6.8.0
- importlib-resources ==6.1.0
- joblib ==1.3.2
- jsonschema ==4.19.2
- jsonschema-specifications ==2023.7.1
- kiwisolver ==1.4.5
- kornia ==0.5.5
- lazy_loader ==0.3
- librosa ==0.10.1
- llvmlite ==0.37.0
- lmdb ==1.2.1
- lws ==1.2.7
- matplotlib ==3.6.3
- msgpack ==1.0.7
- networkx ==3.1
- numba ==0.54.1
- numpy ==1.20.3
- oauthlib ==3.2.2
- opencv-python ==3.4.9.33
- packaging ==23.2
- pandas ==1.3.4
- pathtools ==0.1.2
- pkgutil_resolve_name ==1.3.10
- platformdirs ==3.11.0
- plotly ==5.18.0
- pooch ==1.8.0
- protobuf ==4.25.0
- psutil ==5.9.6
- pyasn1 ==0.5.0
- pyasn1-modules ==0.3.0
- pycparser ==2.21
- pyparsing ==3.1.1
- python-dateutil ==2.8.2
- python-speech-features ==0.6
- pytorch-fid ==0.3.0
- pytz ==2023.3.post1
- ray ==2.6.3
- referencing ==0.30.2
- requests ==2.31.0
- requests-oauthlib ==1.3.1
- rpds-py ==0.12.0
- rsa ==4.9
- scikit-image ==0.16.2
- scikit-learn ==1.3.2
- scipy ==1.5.0
- sentry-sdk ==1.34.0
- setproctitle ==1.3.3
- six ==1.16.0
- smmap ==5.0.1
- soundfile ==0.12.1
- soxr ==0.3.7
- tabulate ==0.9.0
- tb-nightly ==2.12.0a20230126
- tenacity ==8.2.3
- tensorboard ==2.7.0
- tensorboard-data-server ==0.6.1
- tensorboard-plugin-wit ==1.8.1
- texttable ==1.7.0
- thop ==0.1.1.post2209072238
- threadpoolctl ==3.2.0
- tomli ==2.0.1
- torch ==1.13.1
- torchaudio ==0.13.1
- torchvision ==0.14.1
- tqdm ==4.66.1
- trimesh ==3.9.20
- typing_extensions ==4.8.0
- tzdata ==2023.3
- urllib3 ==2.0.7
- wandb ==0.15.12
- yapf ==0.40.2
- zipp ==3.17.0