tacotron-cli

Command-line interface to train Tacotron 2 using .wav <=> .TextGrid pairs.

https://github.com/stefantaubert/tacotron-cli

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 15 DOI reference(s) in README
✓
Academic publication links
Links to: ieee.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary

Keywords

linguistics speech speech-synthesis tts

Last synced: 6 months ago · JSON representation ·

Repository

Command-line interface to train Tacotron 2 using .wav <=> .TextGrid pairs.

Basic Info

Host: GitHub
Owner: stefantaubert
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 1.33 MB

Statistics

Stars: 6
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 5

Topics

linguistics speech speech-synthesis tts

Created almost 5 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Changelog Contributing License Code of conduct Citation

tacotron-cli

Command-line interface (CLI) to train Tacotron 2 using .wav <=> .TextGrid pairs.

Features

train phoneme stress separately (ARPAbet/IPA)
train phoneme tone separately (IPA)
train phoneme duration separately (IPA)
train single/multi-speaker
train/synthesize on CPU or GPU
synthesis of paragraphs
copy embeddings from one checkpoint to another
train using embeddings or one-hot encodings

Installation

sh pip install tacotron-cli --user

Usage

Click to unfold usage

```txt usage: tacotron-cli [-h] [-v] {create-mels,train,continue-train,validate,synthesize,synthesize-grids,analyze,add-missing-symbols} ... Command-line interface (CLI) to train Tacotron 2 using .wav <=> .TextGrid pairs. positional arguments: {create-mels,train,continue-train,validate,synthesize,synthesize-grids,analyze,add-missing-symbols} description create-mels create mel-spectrograms from audio files train start training continue-train continue training from a checkpoint validate validate checkpoint(s) synthesize synthesize lines from a file synthesize-grids synthesize .TextGrid files analyze analyze checkpoint add-missing-symbols copy missing symbols from one checkpoint to another options: -h, --help show this help message and exit -v, --version show program's version number and exit ```

Training

The dataset structure need to follow the generic format of speech-dataset-parser, i.e., each TextGrid need to contain a tier in which all phonemes are separated into single intervals, e.g., T|h|i|s| |i|s| |a| |t|e|x|t|..

Tips:

place stress directly to the vowel of the syllable, e.g. b|ˈo|d|i instead of ˈb|o|d|i (body)
place tone directly to the vowel of the syllable, e.g. ʈʂʰ|w|a˥˩|n instead of ʈʂʰ|w|a|n˥˩ (串)
- tone-characters which are considered: ˥ ˦ ˧ ˨ ˩, e.g., ɑ˥˩
duration-characters which are considered: ˘ ˑ ː, e.g., ʌː
normalize the text, e.g., numbers should be written out
substituted space by either SIL0, SIL1 or SIL2 depending on the duration of the pause
- use SIL0 for no pause
- use SIL1 for a short pause, for example after a comma ...|v|i|ˈɛ|n|ʌ|,|SIL1|ˈɔ|s|t|ɹ|i|ʌ|...
- use SIL2 for a longer pause, for example after a sentence: ...|ˈɝ|θ|.|SIL2
Note: only phonemes occurring in the TextGrids (on the selected tier) are possible to synthesize

Synthesis

To prepare a text for synthesis, following things need to be considered:

each line in the text file will be synthesized as a single file, therefore it is recommended to place each sentence onto a single line
paragraphs can be separated by a blank line
each symbol needs can be separated by an separator like |, e.g. s|ˌɪ|ɡ|ɝ|ˈɛ|t
- this is useful if the model contains phonemes/symbols that consist of multiple characters, e.g., ˈɛ

Example valid sentence: "As the overlying plate lifts up, it also forms mountain ranges." => ˈæ|z|SIL0|ð|ʌ|SIL0|ˌoʊ|v|ɝ|l|ˈaɪ|ɪ|ŋ|SIL0|p|l|ˈeɪ|t|SIL0|l|ˈɪ|f|t|s|SIL0|ˈʌ|p|,|SIL1|ɪ|t|SIL0|ˈɔ|l|s|oʊ|SIL0|f|ˈɔ|ɹ|m|z|SIL0|m|ˈaʊ|n|t|ʌ|n|SIL0|ɹ|ˈeɪ|n|d͡ʒ|ʌ|z|.|SIL2

Example invalid sentence: "Digestion is a vital process which involves the breakdown of food into smaller and smaller components, until they can be absorbed and assimilated into the body." => daɪˈʤɛsʧʌn ɪz ʌ ˈvaɪtʌl ˈpɹɑˌsɛs wɪʧ ɪnˈvɑlvz ðʌ ˈbɹeɪkˌdaʊn ʌv fud ˈɪntu ˈsmɔlɝ ænd ˈsmɔlɝ kʌmˈpoʊnʌnts, ʌnˈtɪl ðeɪ kæn bi ʌbˈzɔɹbd ænd ʌˈsɪmʌˌleɪtɪd ˈɪntu ðʌ ˈbɑdi.

Pretrained Models

Audio Example

"The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak." Listen here (headphones recommended)

Example Synthesis

To reproduce the audio example from above, you can use the following commands:

```sh

Create example directory

mkdir ~/example

Download pre-trained Tacotron model checkpoint

wget https://tuc.cloud/index.php/s/xxFCDMgEk8dZKbp/download/LJS-IPA-101500.pt -O ~/example/checkpoint-tacotron.pt

Download pre-trained Waveglow model checkpoint

wget https://tuc.cloud/index.php/s/yBRaWz5oHrFwigf/download/LJS-v3-580000.pt -O ~/example/checkpoint-waveglow.pt

Create text containing phonetic transcription of: "The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak."

cat > ~/example/text.txt << EOF ð|ʌ|SIL0|n|ˈɔ|ɹ|θ|SIL0|w|ˈɪ|n|d|SIL0|ˈæ|n|d|SIL0|ð|ʌ|SIL0|s|ˈʌ|n|SIL0|w|ɝ|SIL0|d|ɪ|s|p|j|ˈu|t|ɪ|ŋ|SIL0|h|w|ˈɪ|t͡ʃ|SIL0|w|ˈɑ|z|SIL0|ð|ʌ|SIL0|s|t|ɹ|ˈɔ|ŋ|ɝ|,|SIL1|h|w|ˈɛ|n|SIL0|ʌ|SIL0|t|ɹ|ˈæ|v|ʌ|l|ɝ|SIL0|k|ˈeɪ|m|SIL0|ʌ|l|ˈɔ|ŋ|SIL0|ɹ|ˈæ|p|t|SIL0|ɪ|n|SIL0|ʌ|SIL0|w|ˈɔ|ɹ|m|SIL0|k|l|ˈoʊ|k|.|SIL2 EOF

Synthesize text to mel-spectrogram

tacotron-cli synthesize \ ~/example/checkpoint-tacotron.pt \ ~/example/text.txt \ --sep "|"

Install waveglow-cli for synthesis of mel-spectrograms

pip install waveglow-cli --user

Synthesize mel-spectrogram to wav

waveglow-cli synthesize \ ~/example/checkpoint-waveglow.pt \ ~/example/text -o

Resulting wav is written to: ~/example/text/1-1.npy.wav

```

Roadmap

Outsource method to convert audio files to mel-spectrograms before training
Better logging
Provide more pre-trained models
Adding tests

Development setup

```sh

update

sudo apt update

install Python 3.8-3.11 for ensuring that tests can be run

sudo apt install python3-pip \ python3.8 python3.8-dev python3.8-distutils python3.8-venv \ python3.9 python3.9-dev python3.9-distutils python3.9-venv \ python3.10 python3.10-dev python3.10-distutils python3.10-venv \ python3.11 python3.11-dev python3.11-distutils python3.11-venv

install pipenv for creation of virtual environments

python3.8 -m pip install pipenv --user

check out repo

git clone https://github.com/stefantaubert/tacotron.git cd tacotron

create virtual environment

python3.8 -m pipenv install --dev ```

Running the tests

```sh

first install the tool like in "Development setup"

then, navigate into the directory of the repo (if not already done)

cd tacotron

activate environment

python3.8 -m pipenv shell

run tests

tox ```

Final lines of test result output:

log py38: commands succeeded py39: commands succeeded py310: commands succeeded py311: commands succeeded congratulations :)

License

MIT License

Acknowledgments

Model code adapted from Nvidia.

Papers:

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

Citation

If you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see About => Cite this repository).

txt Taubert, S. (2024). tacotron-cli (Version 0.0.5) [Computer software]. [https://doi.org/10.5281/zenodo.10568731](https://doi.org/10.5281/zenodo.10568731)

Cited by

Taubert, S., Sternkopf, J., Kahl, S., & Eibl, M. (2022). A Comparison of Text Selection Algorithms for Sequence-to-Sequence Neural TTS. 2022 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), 1–6. https://doi.org/10.1109/ICSPCC55723.2022.9984283
Albrecht, S., Tamboli, R., Taubert, S., Eibl, M., Rey, G. D., & Schmied, J. (2022). Towards a Vowel Formant Based Quality Metric for Text-to-Speech Systems: Measuring Monophthong Naturalness. 2022 IEEE 9th International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), 1–6. https://doi.org/10.1109/CIVEMSA53371.2022.9853712

Owner

Name: Stefan Taubert
Login: stefantaubert
Kind: user
Location: Chemnitz, Germany
Company: Chemnitz University of Technology

Website: https://stefantaubert.com
Twitter: Stefan_Taubert
Repositories: 75
Profile: https://github.com/stefantaubert

Currently I am working on my PhD about the topic of speech synthesis at Chemnitz University of Technology.

Citation (CITATION.cff)

cff-version: 1.2.0
title: tacotron-cli
abstract: Command-line interface (CLI) to train Tacotron 2 using .wav <=> .TextGrid pairs.
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - email: github@stefantaubert.com
    given-names: Stefan
    family-names: Taubert
    affiliation: Chemnitz University of Technology
    orcid: 'https://orcid.org/0000-0002-4932-2874'
    website: 'https://stefantaubert.com/'
version: 0.0.5
date-released: 2024-01-25
license: MIT
url: https://github.com/stefantaubert/tacotron
doi: 10.5281/zenodo.10568731

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 3
Total pull requests: 1
Average time to close issues: 7 months
Average time to close pull requests: 4 days
Total issue authors: 3
Total pull request authors: 1
Average comments per issue: 1.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

stefantaubert (1)

Pull Request Authors

jasminsternkopf (1)

Top Labels

Issue Labels

enhancement (1)

Pull Request Labels

Dependencies

Pipfile pypi

autoflake * develop
autopep8 * develop
isort * develop
pycodestyle * develop
pylint * develop
pytest * develop
rope * develop
tacotron * develop
tox * develop
twine * develop
librosa *
matplotlib *
mel-cepstral-distance >=0.0.1
numpy *
ordered-set >=4.1.0
pandas *
plotly *
scikit-image *
scikit-learn *
scipy *
speech-dataset-parser >=0.0.1
torch *
tqdm *

Pipfile.lock pypi

137 dependencies

pyproject.toml pypi

librosa *
matplotlib *
mel-cepstral-distance >=0.0.2
numpy *
ordered_set >=4.1.0
pandas *
plotly *
scikit-image *
scikit-learn *
scipy *
speech-dataset-parser >=0.0.4
torch *
tqdm *

tacotron-cli

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

tacotron-cli

Features

Installation

Usage

Training

Synthesis

Pretrained Models

Audio Example

Example Synthesis

Create example directory

Download pre-trained Tacotron model checkpoint

Download pre-trained Waveglow model checkpoint

Create text containing phonetic transcription of: "The North Wind and the Sun were disputing which was the stronger, when a traveler came along wrapped in a warm cloak."

Synthesize text to mel-spectrogram

Install waveglow-cli for synthesis of mel-spectrograms

Synthesize mel-spectrogram to wav

Resulting wav is written to: ~/example/text/1-1.npy.wav

Roadmap

Development setup

update

install Python 3.8-3.11 for ensuring that tests can be run

install pipenv for creation of virtual environments

check out repo

create virtual environment

Running the tests

first install the tool like in "Development setup"

then, navigate into the directory of the repo (if not already done)

activate environment

run tests

License

Acknowledgments

Citation

Cited by

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies