https://github.com/cvi-szu/degstalk

[ICASSP'25] DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis

https://github.com/cvi-szu/degstalk

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

[ICASSP'25] DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis

Basic Info
  • Host: GitHub
  • Owner: CVI-SZU
  • Language: Python
  • Default Branch: main
  • Size: 15 MB
Statistics
  • Stars: 43
  • Watchers: 2
  • Forks: 2
  • Open Issues: 3
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis

This is the official repository for our paper ICASSP 2025 DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis.

image

Installation

Tested on Ubuntu 20.04, CUDA 11.8, PyTorch 2.0.1

``` git clone https://github.com/CVI-SZU/DEGSTalk.git --recursive cd DEGSTalk

conda create -n degstalk python=3.9.19 conda activate degstalk pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt pip install "git+https://github.com/facebookresearch/pytorch3d.git" pip install tensorflow-gpu==2.8.1 ```

If encounter installation problem from the diff-gaussian-rasterization or gridencoder, please refer to gaussian-splatting and torch-ngp.

If you encounter problems installing PyTorch3D, you can use the following command to install it:

bash python ./scripts/install_pytorch3d.py

Preparation

  • Prepare face-parsing model and the 3DMM model for head pose estimation.

bash bash scripts/prepare.sh

bash # 1. copy 01_MorphableModel.mat to data_util/face_tracking/3DMM/ # 2. run following cd data_utils/face_tracking python convert_BFM.py

```bash # prepare mmcv conda activate degstalk pip install -U openmim mim install mmcv-full==1.7.1 # pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.0/index.html

# download model weight cd data_utils/easyportrait wget "https://rndml-team-cv.obs.ru-moscow-1.hc.sbercloud.ru/datasets/easyportrait/experiments/models/fpn-fp-512.pth" ```

  • Prepare the smirk model from SMIRK:

bash pip install -U gdown bash scripts/smirk_quick_install.sh

The above installation includes downloading the FLAME model. This requires registration. If you do not have an account you can register at https://flame.is.tue.mpg.de/

This command will also download the SMIRK pretrained model which can also be found on Google Drive.

Usage

Important Notice

  • This code is provided for research purposes only. The author makes no warranties, express or implied, as to the accuracy, completeness, or fitness for a particular purpose of the code. Use this code at your own risk.
  • The author explicitly prohibits the use of this code for any malicious or illegal activities. By using this code, you agree to comply with all applicable laws and regulations, and you agree not to use it to harm others or to perform any actions that would be considered unethical or illegal.
  • The author will not be responsible for any damages, losses, or issues that arise from the use of this code.
  • Users are encouraged to use this code responsibly and ethically.

Pre-processing Training Video

  • Put training video under data/<ID>/<ID>.mp4.

The video must be 25FPS, with all frames containing the talking person. The resolution should be about 512x512, and duration about 1-5 min.

  • Obtain Action Units

Run FeatureExtraction in OpenFace, rename and move the output CSV file to data/<ID>/au.csv.

export OPENFACE_PATH="Path for FeatureExtraction"

  • Run script to process the video.

```bash python data_utils/process.py data//.mp4

# Example mkdir -p data/Obama wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/Obama/Obama.mp4 python data_utils/process.py data/Obama/Obama.mp4 ```

Audio Pre-process

In our paper, we use DeepSpeech features for evaluation.

  • DeepSpeech

bash python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav # saved to data/<name>.npy

  • HuBERT

Similar to ER-NeRF, HuBERT is also available.

Specify --audio_extractor hubert when training and testing.

python data_utils/hubert.py --wav data/<name>.wav # save to data/<name>_hu.npy

Train

```bash

If resources are sufficient, partially parallel is available to speed up the training. See the script.

bash scripts/trainxx.sh data/ output/<projectname>

Example

bash scripts/train_xx.sh data/Obama output/Obama 0 ```

Test

```bash

saved to output//test/ours_None/renders

python synthesizefuse.py -S data/ -M output/<projectname> --eval
```

Inference with target audio

bash python synthesize_fuse.py -S data/<ID> -M output/<project_name> --use_train --audio <preprocessed_audio_feature>.npy

Acknowledgement

This code is developed based on TalkingGaussian, gaussian-splatting with simple-knn, and a modified diff-gaussian-rasterization. It also integrates partial code from RAD-NeRF, GeneFace, DFRF, DFA-NeRF, AD-NeRF, and Deep3DFaceRecon_pytorch. Additionally, the teeth mask is sourced from EasyPortrait, and expression coefficients are from Smirk. We extend our gratitude to these outstanding projects for their valuable contributions.

Citation

Consider citing as below if you find this repository helpful to your project:

@inproceedings{deng2025degstalk, title = {DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis}, author = {Kaijun Deng and Dezhi Zheng and Jindong Xie and Jinbao Wang and Weicheng Xie and Linlin Shen and Siyang Song}, year = {2025}, booktitle = {ICASSP 2025} }

Owner

  • Name: Computer Vision Institute, SZU
  • Login: CVI-SZU
  • Kind: organization
  • Location: Shenzhen Univeristy, Shenzhen, China

Computer Vision Institute, Shenzhen University

GitHub Events

Total
  • Issues event: 4
  • Watch event: 47
  • Issue comment event: 2
  • Member event: 1
  • Push event: 4
  • Fork event: 3
  • Create event: 2
Last Year
  • Issues event: 4
  • Watch event: 47
  • Issue comment event: 2
  • Member event: 1
  • Push event: 4
  • Fork event: 3
  • Create event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 0.67
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 0.67
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • linhcentrio (1)
  • nitinmukesh (1)
  • MahlerMozart (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels