lip2speech
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Keywords
Repository
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
Basic Info
Statistics
- Stars: 86
- Watchers: 3
- Forks: 21
- Open Issues: 2
- Releases: 0
Topics
Metadata Files
README.md
Lip2Speech [PDF]
A pipeline for lip reading a silent speaking face in a video and generate speech for the lip-read content, i.e Lip to Speech Synthesis.
Video Input | Processed Input | Speech Output
:-------------------------:|:-------------------------:|:-------------------------:
|
| 
Architecture Overview
LRW
Alignment Plot | Melspectogram Output
:-------------------------:|:-------------------------:|
|
Usage
Demo
The pretrained model is available here [265.12 MB]
Download the pretrained model and place it inside savedmodels directory. To visulaize the results, we run demo.py.
python3 demo.py
Default arguments
- dataset: LRW (10 Samples)
- root: Datasets/SAMPLE_LRW
- modelpath: savedmodels/lip2speechfinal.pth
- encoding: voice
Evaluate
Evaluates the ESTOI score for the given Lip2Speech model. (Higer is better)
python3 evaluate.py --dataset LRW --root Datasets/LRW --model_path savedmodels/lip2speech_final.pth
Train
To train the model, we run train.py
python3 train.py --dataset LRW --root Datasets/LRW --finetune_model_path savedmodels/lip2speech_final.pth
- finetunemodelpath - Use as base model to finetune to dataset. (optional)
Acknowledgement
Citation
If you use this research in your work, please cite it using the following metadata. ``` @misc{millerdurai2022faceilltellspeak, title={Show Me Your Face, And I'll Tell You How You Speak}, author={Christen Millerdurai and Lotfy Abdel Khaliq and Timon Ulrich}, year={2022}, eprint={2206.14009}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2206.14009}, }
@software{MillerduraiLip2Speech2021, author = {Millerdurai, Christen and Abdel Khaliq, Lotfy and Ulrich, Timon}, month = {8}, title = {{Lip2Speech}}, url = {https://github.com/Chris10M/Lip2Speech}, version = {1.0.0}, year = {2021} } ```
Owner
- Name: Christen Millerdurai
- Login: Chris10M
- Kind: user
- Repositories: 25
- Profile: https://github.com/Chris10M
PhD & Researcher @ AV DFKI-Kaiserslautern.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Millerdurai" given-names: "Christen" - family-names: "Abdel Khaliq" given-names: "Lotfy" orcid: "https://orcid.org/0000-0000-0000-0000" - family-names: "Ulrich" given-names: "Timon" orcid: "https://orcid.org/0000-0000-0000-0000" title: "Lip2Speech" version: 1.0.0 date-released: 2021-08-26 url: "https://github.com/Chris10M/Lip2Speech"
GitHub Events
Total
- Watch event: 13
- Push event: 1
Last Year
- Watch event: 13
- Push event: 1
Dependencies
- Cython ==0.29.23
- Pillow ==8.3.1
- Pygments ==2.2.0
- SoundFile ==0.10.3.post1
- apex ==0.1
- caffe2 ==0.8.1
- dlib ==19.22.0
- face_alignment ==1.3.4
- facenet_pytorch ==2.5.2
- fairseq ==1.0.0a0
- ffmpeg ==1.4
- ffmpeg_python ==0.2.0
- google_api_python_client ==2.18.0
- google_auth_oauthlib ==0.4.4
- imutils ==0.5.4
- ipython ==7.26.0
- jupyterlab_pygments ==0.1.2
- librosa ==0.8.0
- matplotlib ==2.2.5
- numpy ==1.19.5
- onnxruntime ==1.8.1
- onnxruntime_gpu ==1.8.0
- opencv_contrib_python ==4.5.1.48
- pyppeteer ==0.2.6
- pystoi ==0.3.3
- sounddevice ==0.4.1
- torch ==1.9.0
- torchaudio ==0.9.0
- torchstat ==0.0.7
- torchvision ==0.10.0
- tqdm ==4.61.1
- transformers ==4.4.0
- youtube_dl ==2021.6.6