https://github.com/astorfi/deep-speaker
Deep Speaker: an End-to-End Neural Speaker Embedding System https://arxiv.org/pdf/1705.02304.pdf
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
Repository
Deep Speaker: an End-to-End Neural Speaker Embedding System https://arxiv.org/pdf/1705.02304.pdf
Basic Info
- Host: GitHub
- Owner: astorfi
- License: apache-2.0
- Language: Python
- Default Branch: master
- Size: 76.2 MB
Statistics
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 0
Fork of philipperemy/deep-speaker
Created almost 8 years ago
· Last pushed about 8 years ago
https://github.com/astorfi/deep-speaker/blob/master/
# Deep Speaker from Baidu Research
[](https://github.com/philipperemy/keras-attention-mechanism/blob/master/LICENSE)
[](https://keras.io/)
[](https://www.tensorflow.org/)
Deep Speaker: an End-to-End Neural Speaker Embedding System https://arxiv.org/pdf/1705.02304.pdf
## Call for contributors
This code is not functional yet! I'm making a call for contributors to help make a great implementation! The basics stuffs are already there. Thanks!
Work accomplished so far:
- [x] Triplet loss
- [x] Triplet loss test
- [x] Model implementation
- [x] Data pipeline implementation. We're going to use the [LibriSpeech dataset](http://www.openslr.org/12/) with 2300+ different speakers.
- [ ] Train the models
## Get started!
Simply run those commands:
```
git clone https://github.com/philipperemy/deep-speaker.git
cd deep-speaker
pip3 install -r requirements.txt
cd audio/
./convert_flac_2_wav.sh # make sure ffmpeg is installed!
cd ..
python3 models_train.py
```
Preconditions:
* Installed tensorflow: https://www.tensorflow.org/install/install_linux
* `sudo apt-get install python3-tk ffmpeg`
* ~ 6 GB memory
## Setup Windows
* install [ffmpeg](http://ffmpeg.zeranoe.com/builds/) (and add to PATH)
* use git bash for: `cd audio; ./convert_flac_2_wav.sh`
* other steps analogous to above
## Contributing
Please message me if you want to contribute. I'll be happy to hear your ideas. There are a lot of undisclosed things in the paper, such as:
- Input size to the network? Which inputs exactly?
- How many filter banks do we use?
- Sample Rate?
## LibriSpeech Dataset
Available here: http://www.openslr.org/12/
List of possible other datasets: http://kaldi-asr.org/doc/examples.html
Extract of this dataset:
```
filenames chapter_id speaker_id dataset_id
0 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0000.wav 128104 1272 dev-clean
1 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0001.wav 128104 1272 dev-clean
2 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0002.wav 128104 1272 dev-clean
3 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0003.wav 128104 1272 dev-clean
4 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0004.wav 128104 1272 dev-clean
5 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0005.wav 128104 1272 dev-clean
6 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0006.wav 128104 1272 dev-clean
7 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0007.wav 128104 1272 dev-clean
8 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0008.wav 128104 1272 dev-clean
9 /Volumes/Transcend/data-set/LibriSpeech/dev-clean/1272/128104/1272-128104-0009.wav 128104 1272 dev-clean
```
## Training example on GPU
Training on the GPU.
## Vizualization of anchors
Visualization of a possible triplet (Anchor, Positive, Negative) in the cosine similarity space
Owner
- Name: Sina Torfi
- Login: astorfi
- Kind: user
- Location: San Jose
- Company: Meta
- Website: https://astorfi.github.io/
- Repositories: 196
- Profile: https://github.com/astorfi
PhD & Developer working on Deep Learning, Computer Vision & NLP