https://github.com/ai4bharat/indic-tts

Text-to-Speech for languages of India

https://github.com/ai4bharat/indic-tts

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Text-to-Speech for languages of India

Basic Info
  • Host: GitHub
  • Owner: AI4Bharat
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage:
  • Size: 570 KB
Statistics
  • Stars: 151
  • Watchers: 9
  • Forks: 35
  • Open Issues: 28
  • Releases: 1
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

AI4Bharat Indic-TTS

Towards Building Text-To-Speech Systems for the Next Billion Users

🎉 Accepted at ICASSP 2023

Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively lower resource availability, and the diverse set of advances in neural TTS that remain untested. In this paper, we evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. Based on this, we identify monolingual models with FastPitch and HiFi-GAN V1, trained jointly on male and female speakers to perform the best. With this setup, we train and evaluate TTS models for 13 languages and find our models to significantly improve upon existing models in all languages as measured by mean opinion scores. We open-source all models on the Bhashini platform.

TL;DR: We open-source SOTA Text-To-Speech models for 13 Indian languages: Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Malayalam, Manipuri, Marathi, Odia, Rajasthani, Tamil and Telugu.

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

Authors: Gokul Karthik Kumar, Praveen S V, Pratyush Kumar, Mitesh M. Khapra, Karthik Nandakumar

[ArXiv Preprint] [Audio Samples] [Try It Live] [Video]

Unified architecture of our TTS system

Results

Setup:

Environment Setup:

```

1. Create environment

sudo apt-get install libsndfile1-dev ffmpeg enchant conda create -n tts-env conda activate tts-env

2. Setup PyTorch

pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

3. Setup Trainer

git clone https://github.com/gokulkarthik/Trainer

cd Trainer pip3 install -e .[all] cd .. [or] cp Trainer/trainer/logging/wandblogger.py to the local Trainer installation # fixed wandb logger cp Trainer/trainer/trainer.py to the local Trainer installation # fixed model.module.testlog and added code to log epoch add gpus = [str(gpu) for gpu in gpus] in line 53 of trainer/distribute.py

4. Setup TTS

git clone https://github.com/gokulkarthik/TTS

cd TTS pip3 install -e .[all] cd .. [or] cp TTS/TTS/bin/synthesize.py to the local TTS installation # added multiple output support for TTS.bin.synthesis

5. Install other requirements

pip3 install -r requirements.txt ```

Data Setup:

  1. Format IndicTTS dataset in LJSpeech format using preprocessing/FormatDatasets.ipynb
  2. Analyze IndicTTS dataset to check TTS suitability using preprocessing/AnalyzeDataset.ipynb

Training Steps:

  1. Set the configuration with main.py, vocoder.py, configs and run.sh. Make sure to update the CUDAVISIBLEDEVICES in all these files.
  2. Train and test by executing sh run.sh

Inference:

Trained model weight and config files can be downloaded at this link.

python3 -m TTS.bin.synthesize --text <TEXT> \ --model_path <LANG>/fastpitch/best_model.pth \ --config_path <LANG>/config.json \ --vocoder_path <LANG>/hifigan/best_model.pth \ --vocoder_config_path <LANG>/hifigan/config.json \ --out_path <OUT_PATH>


Code Reference: https://github.com/coqui-ai/TTS `

Owner

  • Name: AI4Bhārat
  • Login: AI4Bharat
  • Kind: organization
  • Email: opensource@ai4bharat.org
  • Location: India

Artificial-Intelligence-For-Bhārat : Building open-source AI solutions for India!

GitHub Events

Total
  • Issues event: 13
  • Watch event: 120
  • Member event: 1
  • Issue comment event: 15
  • Push event: 3
  • Fork event: 33
Last Year
  • Issues event: 13
  • Watch event: 120
  • Member event: 1
  • Issue comment event: 15
  • Push event: 3
  • Fork event: 33

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 12
  • Total pull requests: 0
  • Average time to close issues: 15 minutes
  • Average time to close pull requests: N/A
  • Total issue authors: 11
  • Total pull request authors: 0
  • Average comments per issue: 0.08
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 12
  • Pull requests: 0
  • Average time to close issues: 15 minutes
  • Average time to close pull requests: N/A
  • Issue authors: 11
  • Pull request authors: 0
  • Average comments per issue: 0.08
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ban1989ban (2)
  • Horopter (1)
  • ProtoSol (1)
  • vigyanabikshu1 (1)
  • rthoke (1)
  • moh1tx (1)
  • iamshreeji-copy2 (1)
  • SafwanGanz (1)
  • RamakrishnaChaitanya (1)
  • Excurl (1)
  • sreerajrenjith (1)
  • anandh-c (1)
  • Rakshith12-pixel (1)
  • sachin7695 (1)
  • AIBappa (1)
Pull Request Authors
  • mysxs (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

inference/Dockerfile docker
  • ${BASE_IMAGE} latest build
inference/socket_proxy/front_end/package.json npm
  • http-server ^14.1.1
  • socket.io ^4.5.2
inference/requirements-ml.txt pypi
  • TTS *
  • ai4bharat-transliteration *
  • asteroid *
  • numba ==0.56.2
  • numpy >=1.23.0
  • protobuf ==3.20
inference/requirements-server.txt pypi
  • fastapi *
  • gunicorn *
  • uvicorn *
inference/requirements-utils.txt pypi
  • aksharamukha ==1.9.7
  • ffmpeg-python *
  • indic-numtowords *
  • librosa *
  • nemo-text-processing *
  • nltk *
  • num2words *
  • pyenchant *
  • regex *
  • soundfile *
  • translators *
inference/socket_proxy/requirements.txt pypi
  • TTS *
  • fastapi *
  • gunicorn *
  • uvicorn *
inference/triton_server/azure_ml/environment.yml pypi
requirements.txt pypi
  • TTS *
  • jupyter *
  • librosa *
  • pandas *
  • pytorch_lightning *
  • scikit-learn *
  • seaborn *
  • soundfile *
  • tensorboard *
  • tqdm *
  • wandb *