https://github.com/awslabs/speech-representations

Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)

https://github.com/awslabs/speech-representations

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

deep-learning nlp speech-recognition
Last synced: 9 months ago · JSON representation

Repository

Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)

Basic Info
Statistics
  • Stars: 103
  • Watchers: 16
  • Forks: 14
  • Open Issues: 0
  • Releases: 0
Topics
deep-learning nlp speech-recognition
Created about 6 years ago · Last pushed over 3 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Speech Representations

License

Models and code for deep learning representations developed by the AWS AI Speech team:

NOTE: This repo is not actively maintained. For future experiments with DeCoAR and DeCoAR 2.0, we suggest using the S3PRL speech toolkit, which has active and standardized featurizer/upstream/downstream wrappers for these models.

Installation

We provide a library and CLI to featurize speech utterances. We hope to release training/fine-tuning code in the future.

Kaldi should be installed to kaldi/, or $KALDI_ROOT should be set.

We expect Python 3.6+. The BERTphone model are defined in MXNet and our DeCoAR models are defined in Pytorch. Clone this repository, then: ```sh pip install -e .

For DeCoAR

pip install torch fairseq

For BERTphone

pip install mxnet-mkl~=1.6.0 # ...or mxnet-cu102mkl for GPU w/ CUDA 10.2, etc. pip install gluonnlp # optional; for featurizing with bertphone ```

Pre-trained models

First, download the model weights: ```sh mkdir artifacts cd artifacts

For DeCoAR trained on LibriSpeech (257M)

wget https://github.com/awslabs/speech-representations/releases/download/decoar/checkpoint_decoar.pt

For BERTphone 8KHz (λ=0.2) trained on Fisher

wget https://github.com/awslabs/speech-representations/releases/download/bertphone/bertphonefisher02-87159543.params

For Decoar 2.0:

wget https://github.com/awslabs/speech-representations/releases/download/decoar2/checkpoint_decoar2.pt

We support featurizing individual files with the CLI: sh speech-reps featurize --model {decoar,bertphone,decoar2} --in-wav .wav --out-npy .npy

--params : load custom weights (otherwise use artifacts/)

--gpu : use GPU (otherwise use CPU)

or in code: sh from speech_reps.featurize import DeCoARFeaturizer

Load the model on GPU 0

featurizer = DeCoARFeaturizer('artifacts/checkpoint_decoar.pt', gpu=0)

Returns a (time, feature) NumPy array

data = featurizer.filetofeats('mywavfile.wav') ```

We plan to support Kaldi .scp and .ark files soon. For now, batches can be processed with the underlying featurizer._model.

References

If you found our package or pre-trained models useful, please cite the relevant work:

DeCoAR @inproceedings{decoar, author = {Shaoshi Ling and Yuzong Liu and Julian Salazar and Katrin Kirchhoff}, title = {Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition}, booktitle = {{ICASSP}}, pages = {6429--6433}, publisher = {{IEEE}}, year = {2020} } BERTphone @inproceedings{bertphone, author = {Shaoshi Ling and Julian Salazar and Yuzong Liu and Katrin Kirchhoff}, title = {BERTphone: Phonetically-aware Encoder Representations for Speaker and Language Recognition}, booktitle = {{Speaker Odyssey}}, publisher = {{ISCA}}, year = {2020} } DeCoAR 2.0 @misc{ling2020decoar, title={DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization}, author={Shaoshi Ling and Yuzong Liu}, year={2020}, eprint={2012.06659}, archivePrefix={arXiv}, primaryClass={eess.AS} }

Owner

  • Name: Amazon Web Services - Labs
  • Login: awslabs
  • Kind: organization
  • Location: Seattle, WA

AWS Labs

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 3
  • Total pull requests: 0
  • Average time to close issues: 7 months
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 0.67
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jonaskratochvil (1)
  • roger-tseng (1)
  • leo19941227 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

setup.py pypi
  • gluonnlp *
  • kaldi_io *
  • numpy *
  • soundfile *