https://github.com/academic-hammer/talkingface-toolkit

https://github.com/academic-hammer/talkingface-toolkit

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, researchgate.net
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.6%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: Academic-Hammer
  • Language: Python
  • Default Branch: main
  • Size: 448 MB
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 40
  • Open Issues: 25
  • Releases: 0
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme

README.md

talkingface-toolkit

框架整体介绍

checkpoints

主要保存的是训练和评估模型所需要的额外的预训练模型,在对应文件夹的README有更详细的介绍

datset

存放数据集以及数据集预处理之后的数据,详细内容见dataset里的README

saved

存放训练过程中保存的模型checkpoint, 训练过程中保存模型时自动创建

talkingface

主要功能模块,包括所有核心代码

config

根据模型和数据集名称自动生成所有模型、数据集、训练、评估等相关的配置信息 ``` config/

├── configurator.py

```

data

  • dataprocess:模型特有的数据处理代码,(可以是对方仓库自己实现的音频特征提取、推理时的数据处理)。如果实现的模型有这个需求,就要建立一对应的文件
  • dataset:每个模型都要重载torch.utils.data.Dataset 用于加载数据。每个模型都要有一个model_name+'_dataset.py'文件. __getitem__()方法的返回值应处理成字典类型的数据。 (核心部分) ``` data/

├── dataprocess

| ├── wav2lip_process.py

| ├── xxxx_process.py

├── dataset

| ├── wav2lip_dataset.py

| ├── xxx_dataset.py ```

evaluate

主要涉及模型评估的代码 LSE metric 需要的数据是生成的视频列表 SSIM metric 需要的数据是生成的视频和真实的视频列表

model

实现的模型的网络和对应的方法 (核心部分)

主要分三类: - audio-driven (音频驱动) - image-driven (图像驱动) - nerf-based (基于神经辐射场的方法)

``` model/

├── audiodriventalkingface

| ├── wav2lip.py

├── imagedriventalkingface

| ├── xxxx.py

├── nerfbasedtalkingface

| ├── xxxx.py

├── abstract_talkingface.py

```

properties

保存默认配置文件,包括: - 数据集配置文件 - 模型配置文件 - 通用配置文件

需要根据对应模型和数据集增加对应的配置文件,通用配置文件overall.yaml一般不做修改 ``` properties/

├── dataset

| ├── xxx.yaml

├── model

| ├── xxx.yaml

├── overall.yaml

```

quick_start

通用的启动文件,根据传入参数自动配置数据集和模型,然后训练和评估(一般不需要修改) ``` quick_start/

├── quick_start.py

```

trainer

训练、评估函数的主类。在trainer中,如果可以使用基类Trainer实现所有功能,则不需要写一个新的。如果模型训练有一些特有部分,则需要重载Trainer。需要重载部分可能主要集中于: _train_epoch(), _valid_epoch()。 重载的Trainer应该命名为:{model_name}Trainer ``` trainer/

├── trainer.py

```

utils

公用的工具类,包括s3fd人脸检测,视频抽帧、视频抽音频方法。还包括根据参数配置找对应的模型类、数据类等方法。 一般不需要修改,但可以适当添加一些必须的且相对普遍的数据处理文件。

使用方法

环境要求

  • python=3.8
  • torch==1.13.1+cu116(gpu版,若设备不支持cuda可以使用cpu版)
  • numpy==1.20.3
  • librosa==0.10.1

尽量保证上面几个包的版本一致

提供了两种配置其他环境的方法: ``` pip install -r requirements.txt

or

conda env create -f environment.yml ```

建议使用conda虚拟环境!!!

训练和评估

bash python run_talkingface.py --model=xxxx --dataset=xxxx (--other_parameters=xxxxxx)

权重文件

可选论文:

Aduio_driven talkingface

| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | MakeItTalk | paper | code | | MEAD | paper | code | | RhythmicHead | paper | code | | PC-AVS | paper | code | | EVP | paper | code | | LSP | paper | code | | EAMM | paper | code | | DiffTalk | paper | code | | TalkLip | paper | code | | EmoGen | paper | code | | SadTalker | paper | code | | HyperLips | paper | code | | PHADTF | paper | code | | VideoReTalking | paper | code | |

Image_driven talkingface

| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | PIRenderer | paper | code | | StyleHEAT | paper | code | | MetaPortrait | paper | code | | |

Nerf-based talkingface

| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | AD-NeRF | paper | code | | GeneFace | paper | code | | DFRF | paper | code | | |

texttospeech

| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | VITS | paper | code | | Glow TTS | paper | code | | FastSpeech2 | paper | code | | StyleTTS2 | paper | code | | Grad-TTS | paper | code | | FastSpeech | paper | code | | |

voice_conversion

| 模型简称 | 论文 | 代码仓库 | |:--------:|:--------:|:--------:| | StarGAN-VC | paper | code | | Emo-StarGAN | paper | code | | adaptive-VC | paper | code | | DiffVC | paper | code | | Assem-VC | paper | code | | |

作业要求

  • 确保可以仅在命令行输入模型和数据集名称就可以训练、验证。(部分仓库没有提供训练代码的,可以不训练)
  • 每个组都要提交一个README文件,写明完成的功能、最终实现的训练、验证截图、所使用的依赖、成员分工等。

Owner

  • Name: DataHammer
  • Login: Academic-Hammer
  • Kind: organization

GitHub Events

Total
  • Watch event: 6
  • Pull request event: 1
  • Fork event: 1
Last Year
  • Watch event: 6
  • Pull request event: 1
  • Fork event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 2
  • Total pull requests: 36
  • Average time to close issues: 13 days
  • Average time to close pull requests: 3 days
  • Total issue authors: 2
  • Total pull request authors: 27
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.08
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Hujiazeng (1)
  • Klayand (1)
Pull Request Authors
  • Kline-song (7)
  • 18pwp81 (6)
  • chuyi369 (4)
  • Atlus99 (3)
  • Aquariuslyh (3)
  • happy-fishingman (2)
  • zhouchushu03 (2)
  • Abstractjkc (2)
  • huanranchen (2)
  • LynxPeng (2)
  • FFFXX0319 (2)
  • ShenShuo137 (2)
  • Pissohappy (2)
  • yang-kun-long (2)
  • vhthree (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

environment.yml pypi
  • absl-py ==2.0.0
  • addict ==2.4.0
  • aiosignal ==1.3.1
  • appdirs ==1.4.4
  • attrs ==23.1.0
  • audioread ==3.0.1
  • basicsr ==1.3.4.7
  • cachetools ==5.3.2
  • certifi ==2020.12.5
  • cffi ==1.16.0
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • cloudpickle ==3.0.0
  • colorama ==0.4.6
  • colorlog ==6.7.0
  • contourpy ==1.1.1
  • cycler ==0.12.1
  • decorator ==5.1.1
  • dlib ==19.22.1
  • docker-pycreds ==0.4.0
  • face-alignment ==1.3.5
  • ffmpeg ==1.4
  • filelock ==3.13.1
  • fonttools ==4.44.0
  • frozenlist ==1.4.0
  • future ==0.18.3
  • gitdb ==4.0.11
  • gitpython ==3.1.40
  • glob2 ==0.7
  • google-auth ==2.23.4
  • google-auth-oauthlib ==0.4.6
  • grpcio ==1.59.2
  • hyperopt ==0.2.5
  • idna ==3.4
  • imageio ==2.9.0
  • imageio-ffmpeg ==0.4.5
  • importlib-metadata ==6.8.0
  • importlib-resources ==6.1.0
  • joblib ==1.3.2
  • jsonschema ==4.19.2
  • jsonschema-specifications ==2023.7.1
  • kiwisolver ==1.4.5
  • kornia ==0.5.5
  • lazy-loader ==0.3
  • librosa ==0.10.1
  • llvmlite ==0.37.0
  • lmdb ==1.2.1
  • lws ==1.2.7
  • markdown ==3.5.1
  • markupsafe ==2.1.3
  • matplotlib ==3.6.3
  • msgpack ==1.0.7
  • networkx ==3.1
  • numba ==0.54.1
  • numpy ==1.20.3
  • oauthlib ==3.2.2
  • opencv-python ==3.4.9.33
  • packaging ==23.2
  • pandas ==1.3.4
  • pathtools ==0.1.2
  • pillow ==6.2.1
  • pkgutil-resolve-name ==1.3.10
  • platformdirs ==3.11.0
  • plotly ==5.18.0
  • pooch ==1.8.0
  • protobuf ==4.25.0
  • psutil ==5.9.6
  • pyasn1 ==0.5.0
  • pyasn1-modules ==0.3.0
  • pycparser ==2.21
  • pyparsing ==3.1.1
  • python-dateutil ==2.8.2
  • python-speech-features ==0.6
  • pytorch-fid ==0.3.0
  • pytz ==2023.3.post1
  • pywavelets ==1.4.1
  • pyyaml ==5.3.1
  • ray ==2.6.3
  • referencing ==0.30.2
  • requests ==2.31.0
  • requests-oauthlib ==1.3.1
  • rpds-py ==0.12.0
  • rsa ==4.9
  • scikit-image ==0.16.2
  • scikit-learn ==1.3.2
  • scipy ==1.5.0
  • sentry-sdk ==1.34.0
  • setproctitle ==1.3.3
  • six ==1.16.0
  • smmap ==5.0.1
  • soundfile ==0.12.1
  • soxr ==0.3.7
  • tabulate ==0.9.0
  • tb-nightly ==2.12.0a20230126
  • tenacity ==8.2.3
  • tensorboard ==2.7.0
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.1
  • texttable ==1.7.0
  • thop ==0.1.1
  • threadpoolctl ==3.2.0
  • tomli ==2.0.1
  • torch ==1.13.1
  • torchaudio ==0.13.1
  • torchvision ==0.14.1
  • tqdm ==4.66.1
  • trimesh ==3.9.20
  • typing-extensions ==4.8.0
  • tzdata ==2023.3
  • urllib3 ==2.0.7
  • wandb ==0.15.12
  • werkzeug ==3.0.1
  • yapf ==0.40.2
  • zipp ==3.17.0
requirements.txt pypi
  • GitPython ==3.1.40
  • Markdown ==3.5.1
  • MarkupSafe ==2.1.3
  • Pillow ==6.2.1
  • PyWavelets ==1.4.1
  • PyYAML ==5.3.1
  • Werkzeug ==3.0.1
  • absl-py ==2.0.0
  • addict ==2.4.0
  • aiosignal ==1.3.1
  • appdirs ==1.4.4
  • attrs ==23.1.0
  • audioread ==3.0.1
  • basicsr ==1.3.4.7
  • cachetools ==5.3.2
  • certifi ==2020.12.5
  • cffi ==1.16.0
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • cloudpickle ==3.0.0
  • colorama ==0.4.6
  • colorlog ==6.7.0
  • contourpy ==1.1.1
  • cycler ==0.12.1
  • decorator ==5.1.1
  • dlib ==19.22.1
  • docker-pycreds ==0.4.0
  • face-alignment ==1.3.5
  • ffmpeg ==1.4
  • filelock ==3.13.1
  • fonttools ==4.44.0
  • frozenlist ==1.4.0
  • future ==0.18.3
  • gitdb ==4.0.11
  • glob2 ==0.7
  • google-auth ==2.23.4
  • google-auth-oauthlib ==0.4.6
  • grpcio ==1.59.2
  • hyperopt ==0.2.5
  • idna ==3.4
  • imageio ==2.9.0
  • imageio-ffmpeg ==0.4.5
  • importlib-metadata ==6.8.0
  • importlib-resources ==6.1.0
  • joblib ==1.3.2
  • jsonschema ==4.19.2
  • jsonschema-specifications ==2023.7.1
  • kiwisolver ==1.4.5
  • kornia ==0.5.5
  • lazy_loader ==0.3
  • librosa ==0.10.1
  • llvmlite ==0.37.0
  • lmdb ==1.2.1
  • lws ==1.2.7
  • matplotlib ==3.6.3
  • msgpack ==1.0.7
  • networkx ==3.1
  • numba ==0.54.1
  • numpy ==1.20.3
  • oauthlib ==3.2.2
  • opencv-python ==3.4.9.33
  • packaging ==23.2
  • pandas ==1.3.4
  • pathtools ==0.1.2
  • pkgutil_resolve_name ==1.3.10
  • platformdirs ==3.11.0
  • plotly ==5.18.0
  • pooch ==1.8.0
  • protobuf ==4.25.0
  • psutil ==5.9.6
  • pyasn1 ==0.5.0
  • pyasn1-modules ==0.3.0
  • pycparser ==2.21
  • pyparsing ==3.1.1
  • python-dateutil ==2.8.2
  • python-speech-features ==0.6
  • pytorch-fid ==0.3.0
  • pytz ==2023.3.post1
  • ray ==2.6.3
  • referencing ==0.30.2
  • requests ==2.31.0
  • requests-oauthlib ==1.3.1
  • rpds-py ==0.12.0
  • rsa ==4.9
  • scikit-image ==0.16.2
  • scikit-learn ==1.3.2
  • scipy ==1.5.0
  • sentry-sdk ==1.34.0
  • setproctitle ==1.3.3
  • six ==1.16.0
  • smmap ==5.0.1
  • soundfile ==0.12.1
  • soxr ==0.3.7
  • tabulate ==0.9.0
  • tb-nightly ==2.12.0a20230126
  • tenacity ==8.2.3
  • tensorboard ==2.7.0
  • tensorboard-data-server ==0.6.1
  • tensorboard-plugin-wit ==1.8.1
  • texttable ==1.7.0
  • thop ==0.1.1.post2209072238
  • threadpoolctl ==3.2.0
  • tomli ==2.0.1
  • torch ==1.13.1
  • torchaudio ==0.13.1
  • torchvision ==0.14.1
  • tqdm ==4.66.1
  • trimesh ==3.9.20
  • typing_extensions ==4.8.0
  • tzdata ==2023.3
  • urllib3 ==2.0.7
  • wandb ==0.15.12
  • yapf ==0.40.2
  • zipp ==3.17.0