https://github.com/awslabs/speech-representations
Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Keywords
Repository
Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)
Basic Info
- Host: GitHub
- Owner: awslabs
- License: apache-2.0
- Language: Python
- Default Branch: master
- Homepage: https://arxiv.org/abs/1912.01679
- Size: 35.2 KB
Statistics
- Stars: 103
- Watchers: 16
- Forks: 14
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Speech Representations
Models and code for deep learning representations developed by the AWS AI Speech team:
- DeCoAR (self-supervised contextual representations for speech recognition)
- BERTphone (phonetically-aware acoustic BERT for speaker and language recognition)
- DeCoAR 2.0 (deep contextualized acoustic representation with vector quantization)
NOTE: This repo is not actively maintained. For future experiments with DeCoAR and DeCoAR 2.0, we suggest using the S3PRL speech toolkit, which has active and standardized featurizer/upstream/downstream wrappers for these models.
Installation
We provide a library and CLI to featurize speech utterances. We hope to release training/fine-tuning code in the future.
Kaldi should be installed to kaldi/, or $KALDI_ROOT should be set.
We expect Python 3.6+. The BERTphone model are defined in MXNet and our DeCoAR models are defined in Pytorch. Clone this repository, then: ```sh pip install -e .
For DeCoAR
pip install torch fairseq
For BERTphone
pip install mxnet-mkl~=1.6.0 # ...or mxnet-cu102mkl for GPU w/ CUDA 10.2, etc. pip install gluonnlp # optional; for featurizing with bertphone ```
Pre-trained models
First, download the model weights: ```sh mkdir artifacts cd artifacts
For DeCoAR trained on LibriSpeech (257M)
wget https://github.com/awslabs/speech-representations/releases/download/decoar/checkpoint_decoar.pt
For BERTphone 8KHz (λ=0.2) trained on Fisher
wget https://github.com/awslabs/speech-representations/releases/download/bertphone/bertphonefisher02-87159543.params
For Decoar 2.0:
wget https://github.com/awslabs/speech-representations/releases/download/decoar2/checkpoint_decoar2.pt
We support featurizing individual files with the CLI:
sh
speech-reps featurize --model {decoar,bertphone,decoar2} --in-wav
--params : load custom weights (otherwise use artifacts/)
artifacts/)--gpu : use GPU (otherwise use CPU)
or in code:
sh
from speech_reps.featurize import DeCoARFeaturizer
Load the model on GPU 0
featurizer = DeCoARFeaturizer('artifacts/checkpoint_decoar.pt', gpu=0)
Returns a (time, feature) NumPy array
data = featurizer.filetofeats('mywavfile.wav') ```
We plan to support Kaldi .scp and .ark files soon. For now, batches can be processed with the underlying featurizer._model.
References
If you found our package or pre-trained models useful, please cite the relevant work:
DeCoAR
@inproceedings{decoar,
author = {Shaoshi Ling and Yuzong Liu and Julian Salazar and Katrin Kirchhoff},
title = {Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition},
booktitle = {{ICASSP}},
pages = {6429--6433},
publisher = {{IEEE}},
year = {2020}
}
BERTphone
@inproceedings{bertphone,
author = {Shaoshi Ling and Julian Salazar and Yuzong Liu and Katrin Kirchhoff},
title = {BERTphone: Phonetically-aware Encoder Representations for Speaker and Language Recognition},
booktitle = {{Speaker Odyssey}},
publisher = {{ISCA}},
year = {2020}
}
DeCoAR 2.0
@misc{ling2020decoar,
title={DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization},
author={Shaoshi Ling and Yuzong Liu},
year={2020},
eprint={2012.06659},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
Owner
- Name: Amazon Web Services - Labs
- Login: awslabs
- Kind: organization
- Location: Seattle, WA
- Website: http://amazon.com/aws/
- Repositories: 914
- Profile: https://github.com/awslabs
AWS Labs
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 3
- Total pull requests: 0
- Average time to close issues: 7 months
- Average time to close pull requests: N/A
- Total issue authors: 3
- Total pull request authors: 0
- Average comments per issue: 0.67
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jonaskratochvil (1)
- roger-tseng (1)
- leo19941227 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- gluonnlp *
- kaldi_io *
- numpy *
- soundfile *