vits_text_to_speech
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Ishank56
- License: mpl-2.0
- Language: Python
- Default Branch: master
- Size: 133 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Collab Link for training Marathi model:
https://colab.research.google.com/drive/10T9VFTJ5uCg679Dg7IBQCIr7EQtMn_ZN?usp=sharing
Dataset used from:
https://www.openslr.org/103/ VCTK Format
TrainingVITSHindi_TTS
Introduction
This repository contains code for training a Text-to-Speech (TTS) model specifically for Hindi language using the VITS model. The VITS model is known for its high-quality speech synthesis capabilities.
Installation
To get started with training the Hindi TTS model, follow these steps:
git clone https://github.com/Ishank56/vits_using_coqui.tts.git
pip install -e .
2. Ensure that the Hindi dataset is available inside the Dataset folder. The Hindi data i used can be downloaded from here. The data should be formatted in a manner similar to LJSpeech_1.1 dataset for compatibility.
Dataset I used for training here: https://keithito.com/LJ-Speech-Dataset/
- Install all required libraries for phonemizing Hindi alphabets. Espeak library is particularly useful for this purpose. For specific files config.json needs to be set accordingly while using vits model,
- Adjust the parameters in the code according to your requirements. The rest of the parameters should already be updated accordingly for the Hindi dataset.
Dataset
The dataset provided in the Dataset folder contains text data and corresponding Wav files in Hindi language. It follows the same format as the LJSpeech 1.1 dataset for consistency. Ensure that the dataset is properly formatted and organized before proceeding with training.
Training
To train the TTS model for Hindi language, run the train_tts.py file. This file contains the necessary code for training the model using the VITS architecture. Make sure all dependencies are installed and the dataset is properly configured before initiating the training process.
python
nvidia-smi #to check the GPUS available
CUDA_VISIBLE_DEVICES="5" python train.py #Mention the GPU to run on and the training file
Inference
```python pip install TTS
tts --text "यह अपनत्व और उत्कर्ष गुलज़ार की पूरी ज़िदगी और उनके अनेक अन्य कार्यों में आसानी से लक्षित हो जा सकती है." \
--modelpath path/to/model.pth \
--configpath path/to/config.json \
--out_path folder/to/save/output.wav
or
python
python testing.py
```
Contributing
I made the project working and it is generating good synthesised output. Contributions to this project are welcome! If you have any improvements or suggestions, feel free to open an issue or submit a pull request.
License
I give credits to https://github.com/coqui-ai/TTS?tab=readme-ov-file and dataset contributors mentioned above.
Owner
- Login: Ishank56
- Kind: user
- Repositories: 1
- Profile: https://github.com/Ishank56
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you want to cite 🐸💬, feel free to use this (but only if you loved it 😊)"
title: "Coqui TTS"
abstract: "A deep learning toolkit for Text-to-Speech, battle-tested in research and production"
date-released: 2021-01-01
authors:
- family-names: "Eren"
given-names: "Gölge"
- name: "The Coqui TTS Team"
version: 1.4
doi: 10.5281/zenodo.6334862
license: "MPL-2.0"
url: "https://www.coqui.ai"
repository-code: "https://github.com/coqui-ai/TTS"
keywords:
- machine learning
- deep learning
- artificial intelligence
- text to speech
- TTS
GitHub Events
Total
Last Year
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v2 composite
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- docker/setup-buildx-action v1 composite
- docker/setup-qemu-action v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/download-artifact v2 composite
- actions/setup-python v2 composite
- actions/upload-artifact v2 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- ${BASE} latest build
- ubuntu 22.04 build
- faster_whisper ==0.9.0
- gradio ==4.7.1
- numpy >=1.17.0
- umap-learn *
- furo *
- linkify-it-py *
- myst-parser ==2.0.0
- sphinx ==7.2.5
- sphinx_copybutton *
- sphinx_inline_tabs *
- black * development
- coverage * development
- isort * development
- nose2 * development
- pylint ==2.10.2 development
- cutlet *
- mecab-python3 ==1.0.6
- unidic-lite ==1.0.8
- bokeh ==1.4.0
- aiohttp >=3.8.1
- anyascii >=0.3.0
- bangla *
- bnnumerizer *
- bnunicodenormalizer *
- coqpit >=0.0.16
- cython >=0.29.30
- einops >=0.6.0
- encodec >=0.1.1
- flask >=2.0.1
- fsspec >=2023.6.0
- g2pkk >=0.1.1
- gruut ==2.2.3
- hangul_romanize *
- inflect >=5.6.0
- jamo *
- jieba *
- librosa >=0.10.0
- matplotlib >=3.7.0
- mutagen ==1.47.0
- nltk *
- num2words *
- numba >=0.57.0
- numba ==0.55.1
- numpy >=1.24.3
- numpy ==1.22.0
- packaging >=23.1
- pandas >=1.4,<2.0
- pypinyin *
- pysbd >=0.3.4
- pyyaml >=6.0
- scikit-learn >=1.3.0
- scipy >=1.11.2
- soundfile >=0.12.0
- spacy >=3
- torch >=2.1
- torchaudio *
- tqdm >=4.64.1
- trainer >=0.0.36
- transformers >=4.33.0
- umap-learn >=0.5.1
- unidecode >=1.3.2