https://github.com/areid987/voice-ai-f5-mlx

https://github.com/areid987/voice-ai-f5-mlx

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: AReid987
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 721 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

F5 TTS diagram

F5 TTS — MLX

Implementation of F5-TTS, with the MLX framework.

F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).

You can listen to a sample here that was generated in ~11 seconds on an M3 Max MacBook Pro.

F5 is an evolution of E2 TTS and improves performance with ConvNeXT v2 blocks for the learned text alignment. This repository is based on the original Pytorch implementation available here.

Installation

bash pip install f5-tts-mlx

Usage

bash python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."

If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:

bash python -m f5_tts_mlx.generate \ --text "The quick brown fox jumped over the lazy dog." --ref-audio /path/to/audio.wav --ref-text "This is the caption for the reference audio."

You can convert an audio file to the correct format with ffmpeg like this:

bash ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav

See here for more options to customize generation.

You can load a pretrained model from Python like this:

```python from f5ttsmlx.generate import generate

audio = generate(text = "Hello world.", ...) ```

Pretrained model weights are also available on Hugging Face.

Appreciation

Yushen Chen for the original Pytorch implementation of F5 TTS and pretrained model.

Phil Wang for the E2 TTS implementation that this model is based on.

Citations

bibtex @article{chen-etal-2024-f5tts, title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen}, journal={arXiv preprint arXiv:2410.06885}, year={2024}, }

bibtex @inproceedings{Eskimez2024E2TE, title = {E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS}, author = {Sefik Emre Eskimez and Xiaofei Wang and Manthan Thakker and Canrun Li and Chung-Hsien Tsai and Zhen Xiao and Hemin Yang and Zirun Zhu and Min Tang and Xu Tan and Yanqing Liu and Sheng Zhao and Naoyuki Kanda}, year = {2024}, url = {https://api.semanticscholar.org/CorpusID:270738197} }

License

The code in this repository is released under the MIT license as found in the LICENSE file.

Owner

  • Name: Antonio Reid
  • Login: AReid987
  • Kind: user
  • Location: Austin, Texas

GitHub Events

Total
  • Push event: 1
  • Pull request event: 1
  • Create event: 2
Last Year
  • Push event: 1
  • Pull request event: 1
  • Create event: 2

Dependencies

.github/workflows/python-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
pyproject.toml pypi
  • einops *
  • einx *
  • huggingface_hub *
  • jieba *
  • mlx >=0.18.1
  • numpy *
  • pypinyin *
  • setuptools *
  • soundfile *
  • vocos-mlx *
requirements.txt pypi
  • einops *
  • einx *
  • huggingface_hub *
  • jieba *
  • mlx *
  • numpy *
  • pypinyin *
  • setuptools *
  • soundfile *
  • vocos-mlx *
uv.lock pypi
  • certifi 2024.8.30
  • cffi 1.17.1
  • charset-normalizer 3.4.0
  • colorama 0.4.6
  • einops 0.8.0
  • einx 0.3.0
  • f5-tts-mlx 0.1.7
  • filelock 3.16.1
  • frozendict 2.4.6
  • fsspec 2024.10.0
  • huggingface-hub 0.26.2
  • idna 3.10
  • jieba 0.42.1
  • mlx 0.19.3
  • mpmath 1.3.0
  • numpy 2.0.2
  • packaging 24.1
  • pycparser 2.22
  • pypinyin 0.53.0
  • pyyaml 6.0.2
  • requests 2.32.3
  • setuptools 75.3.0
  • soundfile 0.12.1
  • sympy 1.13.3
  • tqdm 4.66.6
  • typing-extensions 4.12.2
  • urllib3 2.2.3
  • vocos-mlx 0.0.7