Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: ktheindifferent
  • License: mpl-2.0
  • Language: Python
  • Default Branch: dev
  • Size: 126 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

SamTTS - Multi-Backend Offline Text-to-Speech API

A unified HTTP REST API providing access to multiple offline Text-to-Speech (TTS) engines through a single interface. Built on top of 🐸Coqui TTS and other leading TTS technologies.

🚀 Features

  • Unified API - Single HTTP interface for multiple TTS backends
  • Offline Operation - No internet connectivity required
  • Multiple Engines - Support for Coqui TTS, eSpeak, eSpeak-NG, MaryTTS, pyttsx3, and Festival
  • Auto-Detection - Automatic detection of available TTS engines
  • Streaming & Batch - Real-time streaming and batch synthesis support
  • Voice & Language Support - Multiple voices and languages per backend
  • Adjustable Parameters - Control speed, pitch, and other speech parameters

🎯 Quick Start

Start the API Server

bash cd multi_tts_api python -m uvicorn api:app --host 0.0.0.0 --port 8000

Synthesize Speech

bash curl -X POST http://localhost:8000/synthesize \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "backend": "espeak"}' \ --output speech.wav

📋 Supported TTS Backends

| Backend | Quality | Speed | Languages | Features | |---------|---------|-------|-----------|----------| | Coqui TTS | Excellent | Medium | 20+ | Neural models, voice cloning | | eSpeak | Fair | Very Fast | 50+ | Lightweight, reliable | | eSpeak-NG | Good | Very Fast | 100+ | Improved quality | | MaryTTS | Good | Medium | Multiple | Modular, customizable | | pyttsx3 | Varies | Fast | Varies | Cross-platform wrapper | | Festival | Good | Medium | Multiple | Highly configurable |

📖 API Documentation

Core Endpoints

List Available Backends

bash GET /backends

Get Backend Information

bash GET /backends/{backend_id}

Synthesize Speech

```bash POST /synthesize Content-Type: application/json

{ "text": "Hello, this is a test", "backend": "espeak", "language": "en", "speed": 1.0, "pitch": 1.0, "format": "wav" } ```

Streaming Synthesis

bash POST /synthesize/stream

Batch Synthesis

bash POST /synthesize/batch

For detailed API documentation, see multi_tts_api/README.md.

🛠️ Installation

Prerequisites

Install system dependencies for the TTS backends you want to use:

```bash

For eSpeak

sudo apt-get install espeak

For eSpeak-NG

sudo apt-get install espeak-ng

For Festival

sudo apt-get install festival

For MaryTTS (requires Java)

sudo apt-get install default-jre ```

Python Dependencies

```bash

Clone the repository

git clone [your-repo-url] cd SamTTS

Install Python dependencies

cd multittsapi pip install -r requirements.txt ```

🎭 Backend Details

Coqui TTS

Based on the original 🐸TTS library with support for: - High-performance Deep Learning models (Tacotron2, Glow-TTS, VITS, YourTTS) - Neural vocoders (HiFiGAN, MelGAN, WaveRNN) - Multi-speaker synthesis and voice cloning - 20+ languages with pretrained models

Other Backends

  • eSpeak/eSpeak-NG: Lightweight, rule-based synthesis
  • MaryTTS: Modular Java-based platform
  • pyttsx3: Cross-platform TTS wrapper
  • Festival: Configurable speech synthesis system

💻 Usage Examples

Python Client

```python import requests

Start the API server first

python -m uvicorn multittsapi.api:app --host 0.0.0.0 --port 8000

base_url = "http://localhost:8000"

List available backends

response = requests.get(f"{base_url}/backends") backends = response.json() print("Available backends:", backends)

Synthesize with eSpeak

tts_request = { "text": "Hello from SamTTS!", "backend": "espeak", "language": "en", "speed": 1.2 }

response = requests.post(f"{baseurl}/synthesize", json=ttsrequest) with open("output.wav", "wb") as f: f.write(response.content) ```

Command Line Interface

```bash

List available backends

curl http://localhost:8000/backends

Synthesize speech with different backends

curl -X POST http://localhost:8000/synthesize \ -H "Content-Type: application/json" \ -d '{"text": "Fast synthesis", "backend": "espeak"}' \ --output fast.wav

curl -X POST http://localhost:8000/synthesize \ -H "Content-Type: application/json" \ -d '{"text": "High quality synthesis", "backend": "coqui"}' \ --output quality.wav

Batch synthesis

curl -X POST http://localhost:8000/synthesize/batch \ -H "Content-Type: application/json" \ -d '[ {"text": "First sentence", "backend": "espeak"}, {"text": "Second sentence", "backend": "festival"} ]' \ --output batch.zip ```

📁 Project Structure

├── multi_tts_api/ # Main API application │ ├── api.py # FastAPI application and endpoints │ ├── backend_manager.py # Backend management and orchestration │ ├── backends/ # TTS backend implementations │ │ ├── base.py # Abstract base class for backends │ │ ├── coqui.py # Coqui TTS backend │ │ ├── espeak.py # eSpeak backend │ │ ├── espeak_ng.py # eSpeak-NG backend │ │ ├── festival.py # Festival backend │ │ ├── marytts.py # MaryTTS backend │ │ └── pyttsx3.py # pyttsx3 backend │ ├── requirements.txt # Python dependencies │ ├── run_server.py # Server startup script │ ├── test_api.py # API test suite │ └── README.md # Detailed API documentation ├── TTS/ # Original Coqui TTS library └── README.md # This file

🤝 Contributing

Contributions are welcome! Areas for improvement: - New TTS backend implementations - Performance optimizations - Additional audio format support - Better error handling and logging - Extended language support

📄 License

This project builds upon multiple open-source TTS libraries: - Coqui TTS: Mozilla Public License 2.0 - eSpeak/eSpeak-NG: GPL v3
- MaryTTS: LGPL v3 - Festival: Custom license

See individual backend documentation for specific license requirements.

Owner

  • Login: ktheindifferent
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you want to cite 🐸💬, feel free to use this (but only if you loved it 😊)"
title: "Coqui TTS"
abstract: "A deep learning toolkit for Text-to-Speech, battle-tested in research and production"
date-released: 2021-01-01
authors:
  - family-names: "Eren"
    given-names: "Gölge"
  - name: "The Coqui TTS Team"
version: 1.4
doi: 10.5281/zenodo.6334862
license: "MPL-2.0"
url: "https://www.coqui.ai"
repository-code: "https://github.com/coqui-ai/TTS"
keywords:
  - machine learning
  - deep learning
  - artificial intelligence
  - text to speech
  - TTS

GitHub Events

Total
  • Delete event: 1
  • Issue comment event: 2
  • Push event: 4
  • Pull request event: 11
  • Create event: 4
Last Year
  • Delete event: 1
  • Issue comment event: 2
  • Push event: 4
  • Pull request event: 11
  • Create event: 4

Dependencies

.github/workflows/aux_tests.yml actions
  • actions/checkout v2 composite
  • coqui-ai/setup-python pip-cache-key-py-ver composite
.github/workflows/data_tests.yml actions
  • actions/checkout v2 composite
  • coqui-ai/setup-python pip-cache-key-py-ver composite
.github/workflows/docker.yaml actions
  • actions/checkout v2 composite
  • docker/build-push-action v2 composite
  • docker/login-action v1 composite
  • docker/setup-buildx-action v1 composite
  • docker/setup-qemu-action v1 composite
.github/workflows/inference_tests.yml actions
  • actions/checkout v2 composite
  • coqui-ai/setup-python pip-cache-key-py-ver composite
.github/workflows/pypi-release.yml actions
  • actions/checkout v2 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v2 composite
.github/workflows/style_check.yml actions
  • actions/checkout v2 composite
  • coqui-ai/setup-python pip-cache-key-py-ver composite
.github/workflows/text_tests.yml actions
  • actions/checkout v2 composite
  • coqui-ai/setup-python pip-cache-key-py-ver composite
.github/workflows/tts_tests.yml actions
  • actions/checkout v2 composite
  • coqui-ai/setup-python pip-cache-key-py-ver composite
.github/workflows/vocoder_tests.yml actions
  • actions/checkout v2 composite
  • coqui-ai/setup-python pip-cache-key-py-ver composite
.github/workflows/zoo_tests.yml actions
  • actions/checkout v2 composite
  • coqui-ai/setup-python pip-cache-key-py-ver composite
Dockerfile docker
  • ${BASE} latest build
TTS/encoder/requirements.txt pypi
  • numpy >=1.17.0
  • umap-learn *
docs/requirements.txt pypi
  • furo *
  • linkify-it-py *
  • myst-parser ==0.15.1
  • sphinx ==4.0.2
  • sphinx_copybutton *
  • sphinx_inline_tabs *
requirements.dev.txt pypi
  • black * development
  • coverage * development
  • isort * development
  • nose2 * development
  • pylint ==2.10.2 development
requirements.notebooks.txt pypi
  • bokeh ==1.4.0
requirements.txt pypi
  • anyascii *
  • coqpit >=0.0.16
  • cython ==0.29.28
  • flask *
  • fsspec >=2021.04.0
  • g2pkk >=0.1.1
  • gruut ==2.2.3
  • gunicorn ==20.1.0
  • inflect ==5.6.0
  • jamo *
  • jieba *
  • librosa ==0.8.0
  • matplotlib *
  • mecab-python3 ==1.0.5
  • nltk *
  • numba ==0.55.2
  • numba ==0.55.1
  • numpy ==1.22.4
  • numpy ==1.21.6
  • pandas *
  • pypinyin *
  • pysbd *
  • pyyaml *
  • scipy >=1.4.0
  • soundfile *
  • torch >=1.7
  • torchaudio *
  • tqdm *
  • trainer *
  • umap-learn ==0.5.1
  • unidic-lite ==1.0.8