https://github.com/awslabs/speech-representations

Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary

Keywords

deep-learning nlp speech-recognition

Last synced: 10 months ago · JSON representation

Repository

Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)

Basic Info

Host: GitHub
Owner: awslabs
License: apache-2.0
Language: Python
Default Branch: master
Homepage: https://arxiv.org/abs/1912.01679
Size: 35.2 KB

Statistics

Stars: 103
Watchers: 16
Forks: 14
Open Issues: 0
Releases: 0

Topics

deep-learning nlp speech-recognition

Created about 6 years ago · Last pushed over 3 years ago

Metadata Files

Readme Contributing License Code of conduct

Speech Representations

Models and code for deep learning representations developed by the AWS AI Speech team:

NOTE: This repo is not actively maintained. For future experiments with DeCoAR and DeCoAR 2.0, we suggest using the S3PRL speech toolkit, which has active and standardized featurizer/upstream/downstream wrappers for these models.

Installation

We provide a library and CLI to featurize speech utterances. We hope to release training/fine-tuning code in the future.

Kaldi should be installed to kaldi/, or $KALDI_ROOT should be set.

We expect Python 3.6+. The BERTphone model are defined in MXNet and our DeCoAR models are defined in Pytorch. Clone this repository, then: ```sh pip install -e .

For DeCoAR

pip install torch fairseq

For BERTphone

pip install mxnet-mkl~=1.6.0 # ...or mxnet-cu102mkl for GPU w/ CUDA 10.2, etc. pip install gluonnlp # optional; for featurizing with bertphone ```

Pre-trained models

First, download the model weights: ```sh mkdir artifacts cd artifacts

For DeCoAR trained on LibriSpeech (257M)

wget https://github.com/awslabs/speech-representations/releases/download/decoar/checkpoint_decoar.pt

For BERTphone 8KHz (λ=0.2) trained on Fisher

wget https://github.com/awslabs/speech-representations/releases/download/bertphone/bertphonefisher02-87159543.params

For Decoar 2.0:

wget https://github.com/awslabs/speech-representations/releases/download/decoar2/checkpoint_decoar2.pt

We support featurizing individual files with the CLI:sh speech-reps featurize --model {decoar,bertphone,decoar2} --in-wav .wav --out-npy .npy

--params : load custom weights (otherwise use `artifacts/`)

--gpu : use GPU (otherwise use CPU)

or in code:sh from speech_reps.featurize import DeCoARFeaturizer

Load the model on GPU 0

featurizer = DeCoARFeaturizer('artifacts/checkpoint_decoar.pt', gpu=0)

Returns a (time, feature) NumPy array

data = featurizer.filetofeats('mywavfile.wav') ```

We plan to support Kaldi .scp and .ark files soon. For now, batches can be processed with the underlying featurizer._model.

References

If you found our package or pre-trained models useful, please cite the relevant work:

DeCoAR @inproceedings{decoar, author = {Shaoshi Ling and Yuzong Liu and Julian Salazar and Katrin Kirchhoff}, title = {Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition}, booktitle = {{ICASSP}}, pages = {6429--6433}, publisher = {{IEEE}}, year = {2020} } BERTphone @inproceedings{bertphone, author = {Shaoshi Ling and Julian Salazar and Yuzong Liu and Katrin Kirchhoff}, title = {BERTphone: Phonetically-aware Encoder Representations for Speaker and Language Recognition}, booktitle = {{Speaker Odyssey}}, publisher = {{ISCA}}, year = {2020} } DeCoAR 2.0 @misc{ling2020decoar, title={DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization}, author={Shaoshi Ling and Yuzong Liu}, year={2020}, eprint={2012.06659}, archivePrefix={arXiv}, primaryClass={eess.AS} }

Owner

Name: Amazon Web Services - Labs
Login: awslabs
Kind: organization
Location: Seattle, WA

Website: http://amazon.com/aws/
Repositories: 914
Profile: https://github.com/awslabs

AWS Labs

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: about 2 years ago

All Time

Total issues: 3
Total pull requests: 0
Average time to close issues: 7 months
Average time to close pull requests: N/A
Total issue authors: 3
Total pull request authors: 0
Average comments per issue: 0.67
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/awslabs/speech-representations

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Speech Representations

Installation

For DeCoAR

For BERTphone

Pre-trained models

For DeCoAR trained on LibriSpeech (257M)

For BERTphone 8KHz (λ=0.2) trained on Fisher

For Decoar 2.0:

--params : load custom weights (otherwise use artifacts/)

--gpu : use GPU (otherwise use CPU)

Load the model on GPU 0

Returns a (time, feature) NumPy array

References

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

--params : load custom weights (otherwise use `artifacts/`)