264-cass-nat-ctc-alignment-based-single-step-non-autoregressive-transformer-for-speech-recognition

https://github.com/szu-advtech-2024/264-cass-nat-ctc-alignment-based-single-step-non-autoregressive-transformer-for-speech-recognition

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: SZU-AdvTech-2024
  • Default Branch: main
  • Size: 0 Bytes
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Citation

https://github.com/SZU-AdvTech-2024/264-CASS-NAT-CTC-Alignment-based-Single-Step-Non-autoregressive-Transformer-for-Speech-Recognition/blob/main/

# Speech-transformer (Auto-regressive and Non-autoregressive)


This is the implementation of our work "Using CTC alignments as latent variables for Non-autoregressive speech-transformer". Some codes are borrowed from [Espnet](https://github.com/espnet/espnet) and [transformer implementation in Harvard NLP group](https://nlp.seas.harvard.edu/2018/04/03/attention.html).

## 1. Requirements

- Python 3.7
- Pytorch 1.2
- Kaldi

We didn't test it for a higher version of Python or Pytorch. Other required python packages are in requirments.txt. You can install it using:
```
pip install -r requirements.txt
```

## 2. Example, run librispeech.

1. Go to egs/librispeech. Modify path.sh and specify the kaldi path (for feature extraction and etc.).
2. ./run\_prepare.sh for preparing librispeech data (for the 100h experiment).
3. Check the conf/transformer.yaml and make revisions on hyparameters if you like.
3. ./run\_art.sh. I suggest to run the script step by step.
4. ./run\_cassnat.sh. Run the non-autoregressive model. You can directly run this step if you want to skip the Auto-regressive transformer.

All the python codes are under src/. Some codes may not well organized since this is still in the period of experiments

## 3. Results.

- Librispeech (WER)

| Methods |  LM  | dev-clean | test-clean | dev-other | test-other | RTF(s) |
|   :-:   |  :-: |    :-:    |     :-:    |    :-:    |    :-:     | :-:    |
|   AST   |  no  |    3.4    |     3.6    |    8.5    |    8.5     | 0.562  |
|   -     |  yes |    2.5    |     2.7    |    5.7    |    5.8     |   -    |
|   NAST  |  no  |    3.7    |     3.8    |    9.2    |    9.1     | 0.011  |
|   -     |  yes |    3.3    |     3.3    |    8.0    |    8.1     |   -    |

- Aishell1 (CER)

| Methods |  LM  | dev  | test  | 
|   :-:   |  :-: | :-:  | :-:   | 
|   AST   |  no  | 5.4  |  5.9  |
|   NAST  |  no  | 5.3  |  5.8  |




Owner

  • Name: SZU-AdvTech-2024
  • Login: SZU-AdvTech-2024
  • Kind: organization

GitHub Events

Total
  • Push event: 3
  • Create event: 3
Last Year
  • Push event: 3
  • Create event: 3