https://github.com/bagustris/speechain
Toolkit for ASR, TTS, and both
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Toolkit for ASR, TTS, and both
Basic Info
- Host: GitHub
- Owner: bagustris
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://bagustris.github.io/speechain
- Size: 15.6 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 2
- Releases: 0
Fork of heli-qi/SpeeChain
Created almost 2 years ago
· Last pushed about 1 year ago
https://github.com/bagustris/speechain/blob/main/
# SpeeChain: A PyTorch-based Machine Speech Chain Toolkit for ASR and TTS
_SpeeChain_ is an open-source PyTorch-based speech and language processing toolkit produced by the [_AHC lab_](https://ahcweb01.naist.jp/en/) at Nara Institute of Science and Technology (NAIST).
This toolkit is designed to simplify the pipeline of the research on the machine speech chain,
i.e., the joint model of automatic speech recognition (ASR) and text-to-speech synthesis (TTS).
_SpeeChain is currently in beta._ Contribution to this toolkit is warmly welcomed anywhere anytime!
If you find our toolkit helpful for your research, we sincerely hope that you can give us a star!
Anytime when you encounter problems when using our toolkit, please don't hesitate to leave us an issue!
## Table of Contents
1. [**Machine Speech Chain**](https://github.com/bagustris/SpeeChain#machine-speech-chain)
2. [**Toolkit Characteristics**](https://github.com/bagustris/SpeeChain#toolkit-characteristics)
3. [**Get a Quick Start**](https://github.com/bagustris/SpeeChain#get-a-quick-start)
## Machine Speech Chain
* Offline TTSASR Chain
## Toolkit Characteristics
Below are the simple, most important features that SpeeChain can do. You may check the [DeepWiki](https://deepwiki.com/bagustris/speechain) to see details about SpeeChain that is generated by AI (Devin).
* **Data Processing:**
* On-the-fly Log-Mel Spectrogram Extraction
* On-the-fly SpecAugment
* On-the-fly Feature Normalization
* **Model Training:**
* Multi-GPU Model Distribution based on _torch.nn.parallel.DistributedDataParallel_
* Real-time status reporting by online _Tensorboard_ and offline _Matplotlib_
* Real-time learning dynamics visualization (attention visualization, spectrogram visualization)
* **Data Loading:**
* On-the-fly mixture of multiple datasets in a single dataloader
* On-the-fly data selection for each dataloader to filter the undesired data samples.
* Multi-dataloader batch generation to form training batches by multiple datasets.
* **Optimization:**
* Model training can be done by multiple optimizers. Each optimizer is responsible for a specific part of model parameters.
* Gradient accumulation for mimicking the large-batch gradients by the ones on several small batches.
* Easy-to-set finetuning factor to scale down the learning rates without any modification of the scheduler configuration.
* **Model Evaluation:**
* Multi-level _.md_ evaluation reports (overall-level, group-level model, and sample-level) without any layout misplacement.
* Histogram visualization for the distribution of evaluation metrics
* TopN bad case analysis for better model diagnosis.
[Back to the table of contents](https://github.com/bagustris/SpeeChain#table-of-contents)
## Get a Quick Start
The simplest recipe is Mini LibriSpeech. It takes about 2 hours to train a model on a single GPU.
We recommend you first install *Anaconda* into your machine before using our toolkit.
After the installation of *Anaconda*, please follow the steps below to deploy our toolkit on your machine:
1. Find a path with enough disk memory space. (e.g., at least 500GB if you want to use _LibriSpeech_ or _LibriTTS_ datasets).
2. Clone our toolkit by `git clone https://github.com/bagustris/SpeeChain.git`.
3. Go to the root path of our toolkit by `cd SpeeChain`.
4. Run `source envir_preparation.sh` to build the environment for SpeeChain toolkit.
After execution, a virtual environment named `speechain` will be created and two environmental variables `SPEECHAIN_ROOT` and `SPEECHAIN_PYTHON` will be initialized in your `~/.bashrc`.
**Note:** It must be executed in the root path `SpeeChain` and by the command `source` rather than `./envir_preparation.sh`.
5. Run `conda activate speechain` in your terminal to examine the installation of Conda environment.
If the environment `speechain` is not successfully activated, please run `conda env create -f environment.yaml`, `conda activate speechain` and `pip install -e ./` to manually install it.
6. Run `echo ${SPEECHAIN_ROOT}` and `echo ${SPEECHAIN_PYTHON}` in your terminal to examine the environmental variables.
If either one is empty, please manually add them into your `~/.bashrc` by `export SPEECHAIN_ROOT=xxx` or `export SPEECHAIN_PYTHON=xxx` and then activate them by `source ~/.bashrc`.
1. `SPEECHAIN_ROOT` should be the absolute path of the `SpeeChain` folder you have just cloned (i.e. `/xxx/SpeeChain` where `/xxx/` is the parent directory);
2. `SPEECHAIN_PYTHON` should be the absolute path of the python compiler in the folder of `speechain` environment (i.e. `/xxx/anaconda3/envs/speechain/bin/python3.X` where `/xxx/` is where your `anaconda3` is placed and `X` depends on `environment.yaml`).
7. Read the [handbook](https://github.com/bagustris/SpeeChain/blob/main/handbook.md#speechain-handbook) and start your journey in SpeeChain!
## Citation
If you are using this toolkit, please cite reference below:
> Qi, H., Novitasari, S., Tjandra, A., Sakti, S., & Nakamura, S. (2023). SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain. http://arxiv.org/abs/2301.02966
Owner
- Name: Bagus Tris Atmaja
- Login: bagustris
- Kind: user
- Location: Tsukuba
- Company: AIST
- Website: http://www.bagustris.blogspot.com
- Twitter: btatmaja
- Repositories: 221
- Profile: https://github.com/bagustris
Researcher @aistairc @VibrasticLab
GitHub Events
Total
- Issues event: 3
- Watch event: 2
- Issue comment event: 5
- Push event: 93
- Pull request review comment event: 10
- Pull request review event: 6
- Pull request event: 6
- Fork event: 1
- Create event: 3
Last Year
- Issues event: 3
- Watch event: 2
- Issue comment event: 5
- Push event: 93
- Pull request review comment event: 10
- Pull request review event: 6
- Pull request event: 6
- Fork event: 1
- Create event: 3