lip2speech

A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.

https://github.com/chris10m/lip2speech

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary

Keywords

deep-learning lip-reading lipreading liptospeech pytorch real-time speaker-embedding speech-synthesis
Last synced: 6 months ago · JSON representation ·

Repository

A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.

Basic Info
  • Host: GitHub
  • Owner: Chris10M
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 12.2 MB
Statistics
  • Stars: 86
  • Watchers: 3
  • Forks: 21
  • Open Issues: 2
  • Releases: 0
Topics
deep-learning lip-reading lipreading liptospeech pytorch real-time speaker-embedding speech-synthesis
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

Lip2Speech [PDF]

A pipeline for lip reading a silent speaking face in a video and generate speech for the lip-read content, i.e Lip to Speech Synthesis.

overview

Video Input | Processed Input | Speech Output :-------------------------:|:-------------------------:|:-------------------------: | |

Architecture Overview

method

LRW

Alignment Plot | Melspectogram Output
:-------------------------:|:-------------------------:| |

Usage

Demo

The pretrained model is available here [265.12 MB]

Download the pretrained model and place it inside savedmodels directory. To visulaize the results, we run demo.py.

python3 demo.py

Default arguments

  • dataset: LRW (10 Samples)
  • root: Datasets/SAMPLE_LRW
  • modelpath: savedmodels/lip2speechfinal.pth
  • encoding: voice

Evaluate

Evaluates the ESTOI score for the given Lip2Speech model. (Higer is better)

python3 evaluate.py --dataset LRW --root Datasets/LRW --model_path savedmodels/lip2speech_final.pth

Train

To train the model, we run train.py

python3 train.py --dataset LRW --root Datasets/LRW --finetune_model_path savedmodels/lip2speech_final.pth

  • finetunemodelpath - Use as base model to finetune to dataset. (optional)

Acknowledgement

tacotron2

Citation

If you use this research in your work, please cite it using the following metadata. ``` @misc{millerdurai2022faceilltellspeak, title={Show Me Your Face, And I'll Tell You How You Speak}, author={Christen Millerdurai and Lotfy Abdel Khaliq and Timon Ulrich}, year={2022}, eprint={2206.14009}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2206.14009}, }

@software{MillerduraiLip2Speech2021, author = {Millerdurai, Christen and Abdel Khaliq, Lotfy and Ulrich, Timon}, month = {8}, title = {{Lip2Speech}}, url = {https://github.com/Chris10M/Lip2Speech}, version = {1.0.0}, year = {2021} } ```

Owner

  • Name: Christen Millerdurai
  • Login: Chris10M
  • Kind: user

PhD & Researcher @ AV DFKI-Kaiserslautern.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Millerdurai"
  given-names: "Christen"
- family-names: "Abdel Khaliq"
  given-names: "Lotfy"
  orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Ulrich"
  given-names: "Timon"
  orcid: "https://orcid.org/0000-0000-0000-0000"
  
title: "Lip2Speech"
version: 1.0.0
date-released: 2021-08-26
url: "https://github.com/Chris10M/Lip2Speech"

GitHub Events

Total
  • Watch event: 13
  • Push event: 1
Last Year
  • Watch event: 13
  • Push event: 1

Dependencies

requirements.txt pypi
  • Cython ==0.29.23
  • Pillow ==8.3.1
  • Pygments ==2.2.0
  • SoundFile ==0.10.3.post1
  • apex ==0.1
  • caffe2 ==0.8.1
  • dlib ==19.22.0
  • face_alignment ==1.3.4
  • facenet_pytorch ==2.5.2
  • fairseq ==1.0.0a0
  • ffmpeg ==1.4
  • ffmpeg_python ==0.2.0
  • google_api_python_client ==2.18.0
  • google_auth_oauthlib ==0.4.4
  • imutils ==0.5.4
  • ipython ==7.26.0
  • jupyterlab_pygments ==0.1.2
  • librosa ==0.8.0
  • matplotlib ==2.2.5
  • numpy ==1.19.5
  • onnxruntime ==1.8.1
  • onnxruntime_gpu ==1.8.0
  • opencv_contrib_python ==4.5.1.48
  • pyppeteer ==0.2.6
  • pystoi ==0.3.3
  • sounddevice ==0.4.1
  • torch ==1.9.0
  • torchaudio ==0.9.0
  • torchstat ==0.0.7
  • torchvision ==0.10.0
  • tqdm ==4.61.1
  • transformers ==4.4.0
  • youtube_dl ==2021.6.6