samtts

https://github.com/ktheindifferent/samtts

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: ktheindifferent
License: mpl-2.0
Language: Python
Default Branch: dev
Size: 126 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed 10 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

SamTTS - Multi-Backend Offline Text-to-Speech API

A unified HTTP REST API providing access to multiple offline Text-to-Speech (TTS) engines through a single interface. Built on top of 🐸Coqui TTS and other leading TTS technologies.

🚀 Features

Unified API - Single HTTP interface for multiple TTS backends
Offline Operation - No internet connectivity required
Multiple Engines - Support for Coqui TTS, eSpeak, eSpeak-NG, MaryTTS, pyttsx3, and Festival
Auto-Detection - Automatic detection of available TTS engines
Streaming & Batch - Real-time streaming and batch synthesis support
Voice & Language Support - Multiple voices and languages per backend
Adjustable Parameters - Control speed, pitch, and other speech parameters

🎯 Quick Start

Start the API Server

bash cd multi_tts_api python -m uvicorn api:app --host 0.0.0.0 --port 8000

Synthesize Speech

bash curl -X POST http://localhost:8000/synthesize \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "backend": "espeak"}' \ --output speech.wav

📋 Supported TTS Backends

| Backend | Quality | Speed | Languages | Features | |---------|---------|-------|-----------|----------| | Coqui TTS | Excellent | Medium | 20+ | Neural models, voice cloning | | eSpeak | Fair | Very Fast | 50+ | Lightweight, reliable | | eSpeak-NG | Good | Very Fast | 100+ | Improved quality | | MaryTTS | Good | Medium | Multiple | Modular, customizable | | pyttsx3 | Varies | Fast | Varies | Cross-platform wrapper | | Festival | Good | Medium | Multiple | Highly configurable |

📖 API Documentation

Core Endpoints

List Available Backends

bash GET /backends

Get Backend Information

bash GET /backends/{backend_id}

Synthesize Speech

```bash POST /synthesize Content-Type: application/json

{ "text": "Hello, this is a test", "backend": "espeak", "language": "en", "speed": 1.0, "pitch": 1.0, "format": "wav" } ```

Streaming Synthesis

bash POST /synthesize/stream

Batch Synthesis

bash POST /synthesize/batch

For detailed API documentation, see multi_tts_api/README.md.

🛠️ Installation

Prerequisites

Install system dependencies for the TTS backends you want to use:

```bash

For eSpeak

sudo apt-get install espeak

For eSpeak-NG

sudo apt-get install espeak-ng

For Festival

sudo apt-get install festival

For MaryTTS (requires Java)

sudo apt-get install default-jre ```

Python Dependencies

```bash

Clone the repository

git clone [your-repo-url] cd SamTTS

Install Python dependencies

cd multittsapi pip install -r requirements.txt ```

🎭 Backend Details

Coqui TTS

Based on the original 🐸TTS library with support for: - High-performance Deep Learning models (Tacotron2, Glow-TTS, VITS, YourTTS) - Neural vocoders (HiFiGAN, MelGAN, WaveRNN) - Multi-speaker synthesis and voice cloning - 20+ languages with pretrained models

Other Backends

eSpeak/eSpeak-NG: Lightweight, rule-based synthesis
MaryTTS: Modular Java-based platform
pyttsx3: Cross-platform TTS wrapper
Festival: Configurable speech synthesis system

💻 Usage Examples

Python Client

```python import requests

Start the API server first

python -m uvicorn multittsapi.api:app --host 0.0.0.0 --port 8000

base_url = "http://localhost:8000"

List available backends

response = requests.get(f"{base_url}/backends") backends = response.json() print("Available backends:", backends)

Synthesize with eSpeak

tts_request = { "text": "Hello from SamTTS!", "backend": "espeak", "language": "en", "speed": 1.2 }

response = requests.post(f"{baseurl}/synthesize", json=ttsrequest) with open("output.wav", "wb") as f: f.write(response.content) ```

Command Line Interface

```bash

List available backends

curl http://localhost:8000/backends

Synthesize speech with different backends

curl -X POST http://localhost:8000/synthesize \ -H "Content-Type: application/json" \ -d '{"text": "Fast synthesis", "backend": "espeak"}' \ --output fast.wav

curl -X POST http://localhost:8000/synthesize \ -H "Content-Type: application/json" \ -d '{"text": "High quality synthesis", "backend": "coqui"}' \ --output quality.wav

Batch synthesis

curl -X POST http://localhost:8000/synthesize/batch \ -H "Content-Type: application/json" \ -d '[ {"text": "First sentence", "backend": "espeak"}, {"text": "Second sentence", "backend": "festival"} ]' \ --output batch.zip ```

📁 Project Structure

├── multi_tts_api/ # Main API application │ ├── api.py # FastAPI application and endpoints │ ├── backend_manager.py # Backend management and orchestration │ ├── backends/ # TTS backend implementations │ │ ├── base.py # Abstract base class for backends │ │ ├── coqui.py # Coqui TTS backend │ │ ├── espeak.py # eSpeak backend │ │ ├── espeak_ng.py # eSpeak-NG backend │ │ ├── festival.py # Festival backend │ │ ├── marytts.py # MaryTTS backend │ │ └── pyttsx3.py # pyttsx3 backend │ ├── requirements.txt # Python dependencies │ ├── run_server.py # Server startup script │ ├── test_api.py # API test suite │ └── README.md # Detailed API documentation ├── TTS/ # Original Coqui TTS library └── README.md # This file

🤝 Contributing

Contributions are welcome! Areas for improvement: - New TTS backend implementations - Performance optimizations - Additional audio format support - Better error handling and logging - Extended language support

📄 License

This project builds upon multiple open-source TTS libraries: - Coqui TTS: Mozilla Public License 2.0 - eSpeak/eSpeak-NG: GPL v3
- MaryTTS: LGPL v3 - Festival: Custom license

See individual backend documentation for specific license requirements.

Owner

Login: ktheindifferent
Kind: user

Repositories: 9
Profile: https://github.com/ktheindifferent

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you want to cite 🐸💬, feel free to use this (but only if you loved it 😊)"
title: "Coqui TTS"
abstract: "A deep learning toolkit for Text-to-Speech, battle-tested in research and production"
date-released: 2021-01-01
authors:
  - family-names: "Eren"
    given-names: "Gölge"
  - name: "The Coqui TTS Team"
version: 1.4
doi: 10.5281/zenodo.6334862
license: "MPL-2.0"
url: "https://www.coqui.ai"
repository-code: "https://github.com/coqui-ai/TTS"
keywords:
  - machine learning
  - deep learning
  - artificial intelligence
  - text to speech
  - TTS

GitHub Events

Total

Delete event: 1
Issue comment event: 2
Push event: 4
Pull request event: 11
Create event: 4

Last Year

Delete event: 1
Issue comment event: 2
Push event: 4
Pull request event: 11
Create event: 4

Dependencies

.github/workflows/aux_tests.yml actions

actions/checkout v2 composite
coqui-ai/setup-python pip-cache-key-py-ver composite

.github/workflows/data_tests.yml actions

actions/checkout v2 composite
coqui-ai/setup-python pip-cache-key-py-ver composite

.github/workflows/docker.yaml actions

actions/checkout v2 composite
docker/build-push-action v2 composite
docker/login-action v1 composite
docker/setup-buildx-action v1 composite
docker/setup-qemu-action v1 composite

.github/workflows/inference_tests.yml actions

actions/checkout v2 composite
coqui-ai/setup-python pip-cache-key-py-ver composite

.github/workflows/pypi-release.yml actions

actions/checkout v2 composite
actions/download-artifact v2 composite
actions/setup-python v2 composite
actions/upload-artifact v2 composite

.github/workflows/style_check.yml actions

actions/checkout v2 composite
coqui-ai/setup-python pip-cache-key-py-ver composite

.github/workflows/text_tests.yml actions

actions/checkout v2 composite
coqui-ai/setup-python pip-cache-key-py-ver composite

.github/workflows/tts_tests.yml actions

actions/checkout v2 composite
coqui-ai/setup-python pip-cache-key-py-ver composite

.github/workflows/vocoder_tests.yml actions

actions/checkout v2 composite
coqui-ai/setup-python pip-cache-key-py-ver composite

.github/workflows/zoo_tests.yml actions

actions/checkout v2 composite
coqui-ai/setup-python pip-cache-key-py-ver composite

Dockerfile docker

${BASE} latest build

TTS/encoder/requirements.txt pypi

numpy >=1.17.0
umap-learn *

docs/requirements.txt pypi

furo *
linkify-it-py *
myst-parser ==0.15.1
sphinx ==4.0.2
sphinx_copybutton *
sphinx_inline_tabs *

requirements.dev.txt pypi

black * development
coverage * development
isort * development
nose2 * development
pylint ==2.10.2 development

requirements.notebooks.txt pypi

bokeh ==1.4.0

requirements.txt pypi

anyascii *
coqpit >=0.0.16
cython ==0.29.28
flask *
fsspec >=2021.04.0
g2pkk >=0.1.1
gruut ==2.2.3
gunicorn ==20.1.0
inflect ==5.6.0
jamo *
jieba *
librosa ==0.8.0
matplotlib *
mecab-python3 ==1.0.5
nltk *
numba ==0.55.2
numba ==0.55.1
numpy ==1.22.4
numpy ==1.21.6
pandas *
pypinyin *
pysbd *
pyyaml *
scipy >=1.4.0
soundfile *
torch >=1.7
torchaudio *
tqdm *
trainer *
umap-learn ==0.5.1
unidic-lite ==1.0.8