lip2speech

A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.

https://github.com/chris10m/lip2speech

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary

Keywords

deep-learning lip-reading lipreading liptospeech pytorch real-time speaker-embedding speech-synthesis

Last synced: 11 months ago · JSON representation ·

Repository

A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.

Basic Info

Host: GitHub
Owner: Chris10M
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 12.2 MB

Statistics

Stars: 86
Watchers: 3
Forks: 21
Open Issues: 2
Releases: 0

Topics

deep-learning lip-reading lipreading liptospeech pytorch real-time speaker-embedding speech-synthesis

Created about 5 years ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

Lip2Speech [PDF]

A pipeline for lip reading a silent speaking face in a video and generate speech for the lip-read content, i.e Lip to Speech Synthesis.

overview

Architecture Overview

method

LRW

Alignment Plot | Melspectogram Output
:-------------------------:|:-------------------------:| |

Usage

Demo

The pretrained model is available here [265.12 MB]

Download the pretrained model and place it inside savedmodels directory. To visulaize the results, we run demo.py.

python3 demo.py

Default arguments

dataset: LRW (10 Samples)
root: Datasets/SAMPLE_LRW
modelpath: savedmodels/lip2speechfinal.pth
encoding: voice

Evaluate

Evaluates the ESTOI score for the given Lip2Speech model. (Higer is better)

python3 evaluate.py --dataset LRW --root Datasets/LRW --model_path savedmodels/lip2speech_final.pth

Train

To train the model, we run train.py

python3 train.py --dataset LRW --root Datasets/LRW --finetune_model_path savedmodels/lip2speech_final.pth

finetunemodelpath - Use as base model to finetune to dataset. (optional)

Acknowledgement

tacotron2

Citation

If you use this research in your work, please cite it using the following metadata. ``` @misc{millerdurai2022faceilltellspeak, title={Show Me Your Face, And I'll Tell You How You Speak}, author={Christen Millerdurai and Lotfy Abdel Khaliq and Timon Ulrich}, year={2022}, eprint={2206.14009}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2206.14009}, }

@software{MillerduraiLip2Speech2021, author = {Millerdurai, Christen and Abdel Khaliq, Lotfy and Ulrich, Timon}, month = {8}, title = {{Lip2Speech}}, url = {https://github.com/Chris10M/Lip2Speech}, version = {1.0.0}, year = {2021} } ```

Owner

Name: Christen Millerdurai
Login: Chris10M
Kind: user

Repositories: 25
Profile: https://github.com/Chris10M

PhD & Researcher @ AV DFKI-Kaiserslautern.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Millerdurai"
  given-names: "Christen"
- family-names: "Abdel Khaliq"
  given-names: "Lotfy"
  orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Ulrich"
  given-names: "Timon"
  orcid: "https://orcid.org/0000-0000-0000-0000"
  
title: "Lip2Speech"
version: 1.0.0
date-released: 2021-08-26
url: "https://github.com/Chris10M/Lip2Speech"

GitHub Events

Total

Watch event: 13
Push event: 1

Last Year

Watch event: 13
Push event: 1

Dependencies

requirements.txt pypi

Cython ==0.29.23
Pillow ==8.3.1
Pygments ==2.2.0
SoundFile ==0.10.3.post1
apex ==0.1
caffe2 ==0.8.1
dlib ==19.22.0
face_alignment ==1.3.4
facenet_pytorch ==2.5.2
fairseq ==1.0.0a0
ffmpeg ==1.4
ffmpeg_python ==0.2.0
google_api_python_client ==2.18.0
google_auth_oauthlib ==0.4.4
imutils ==0.5.4
ipython ==7.26.0
jupyterlab_pygments ==0.1.2
librosa ==0.8.0
matplotlib ==2.2.5
numpy ==1.19.5
onnxruntime ==1.8.1
onnxruntime_gpu ==1.8.0
opencv_contrib_python ==4.5.1.48
pyppeteer ==0.2.6
pystoi ==0.3.3
sounddevice ==0.4.1
torch ==1.9.0
torchaudio ==0.9.0
torchstat ==0.0.7
torchvision ==0.10.0
tqdm ==4.61.1
transformers ==4.4.0
youtube_dl ==2021.6.6

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science