https://github.com/devidw/dswav

Tooling to build datasets for audio model training

https://github.com/devidw/dswav

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Tooling to build datasets for audio model training

Basic Info
  • Host: GitHub
  • Owner: devidw
  • License: unlicense
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 516 KB
Statistics
  • Stars: 16
  • Watchers: 2
  • Forks: 0
  • Open Issues: 5
  • Releases: 0
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme License

README.md

dswav

Tool to build dataset for audio model training

Includes a series of helpers for dataset work, such as:

  • transcribing audio source into a dataset of segments of text & audio pairs
  • combining differnt data sources
  • bulk lengthening audio samples
  • bulk conversation of mp3s to wav at given sample rate
  • building metadata files that can be used for training

Mostly focused around tooling for StyleTTS2 datasets, but can also be used for other kinds of models / libraries such as coqui

Usage

bash docker run \ -p 7860:7860 \ -v ./projects:/app/projects \ ghcr.io/devidw/dswav:main

TTS, LJSpeech

https://tts.readthedocs.io/en/latest/formattingyourdataset.html

Supports output in LJSpeech dataset format (metadata.csv, wavs/) that can be used in the TTS py pkg to train models such as xtts2

StyleTTS2

https://github.com/yl4579/StyleTTS2

Also supports output format for StyleTTS2

  • train_list.txt 99 %
  • val_list.txt 1 %
  • wavs/

Data sources

In order to import other data sources they must follow this structure:

  • /your/path/index.json
  • /your/path/wavs/[id].wav

ts { id: string // unique identifier for each sample, should match file name in `./wavs/[id].wav` folder content: string // the transcript speaker_id?: string // optional when building for multi-speaker, unique on a per voice speaker basis }[]

Development

  • need ffmpeg, espeak, whipser

```bash git clone https://github.com/devidw/dswav cd dswav

poetry install

make dev ```

notes

  • currently splitting based on sentences and not silence, which sometimes still keeps artifacts at the end, should rather detect silence to have clean examples

Owner

  • Name: David Wolf
  • Login: devidw
  • Kind: user
  • Location: The Zone

// uncommenting

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 13
  • Total Committers: 1
  • Avg Commits per committer: 13.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
David Wolf 6****w 13

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 5
  • Total pull requests: 3
  • Average time to close issues: 3 days
  • Average time to close pull requests: N/A
  • Total issue authors: 4
  • Total pull request authors: 1
  • Average comments per issue: 1.6
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • fakerybakery (2)
  • MariasStory (1)
  • ArrowM (1)
Pull Request Authors
  • dependabot[bot] (4)
Top Labels
Issue Labels
Pull Request Labels
dependencies (4)

Dependencies

poetry.lock pypi
  • aiofiles 23.2.1
  • altair 5.1.2
  • annotated-types 0.6.0
  • anyio 3.7.1
  • attrs 23.1.0
  • babel 2.13.1
  • certifi 2023.11.17
  • charset-normalizer 3.3.2
  • click 8.1.7
  • clldutils 3.20.0
  • colorama 0.4.6
  • colorlog 6.7.0
  • contourpy 1.2.0
  • csvw 3.2.1
  • cycler 0.12.1
  • dlinfo 1.2.1
  • fastapi 0.104.1
  • ffmpy 0.3.1
  • filelock 3.13.1
  • fonttools 4.44.3
  • fsspec 2023.10.0
  • gradio 4.4.1
  • gradio-client 0.7.0
  • h11 0.14.0
  • httpcore 1.0.2
  • httpx 0.25.1
  • huggingface-hub 0.19.4
  • idna 3.4
  • importlib-resources 6.1.1
  • isodate 0.6.1
  • jinja2 3.1.2
  • joblib 1.3.2
  • jsonschema 4.20.0
  • jsonschema-specifications 2023.11.1
  • kiwisolver 1.4.5
  • language-tags 1.2.0
  • lxml 4.9.3
  • markdown 3.5.1
  • markdown-it-py 3.0.0
  • markupsafe 2.1.3
  • matplotlib 3.8.2
  • mdurl 0.1.2
  • numpy 1.26.2
  • orjson 3.9.10
  • packaging 23.2
  • pandas 2.1.3
  • phonemizer 3.2.1
  • pillow 10.1.0
  • pydantic 2.5.1
  • pydantic-core 2.14.3
  • pydub 0.25.1
  • pygments 2.17.0
  • pylatexenc 2.10
  • pyparsing 3.1.1
  • python-dateutil 2.8.2
  • python-multipart 0.0.6
  • pytz 2023.3.post1
  • pyyaml 6.0.1
  • rdflib 7.0.0
  • referencing 0.31.0
  • regex 2023.10.3
  • requests 2.31.0
  • rfc3986 1.5.0
  • rich 13.7.0
  • rpds-py 0.13.0
  • segments 2.2.1
  • semantic-version 2.10.0
  • setuptools 68.2.2
  • shellingham 1.5.4
  • six 1.16.0
  • sniffio 1.3.0
  • starlette 0.27.0
  • tabulate 0.9.0
  • tomlkit 0.12.0
  • toolz 0.12.0
  • tqdm 4.66.1
  • typer 0.9.0
  • typing-extensions 4.8.0
  • tzdata 2023.3
  • uritemplate 4.1.1
  • urllib3 2.1.0
  • uvicorn 0.24.0.post1
  • websockets 11.0.3
pyproject.toml pypi
  • gradio ^4.4.1
  • phonemizer ^3.2.1
  • python 3.11.6
  • setuptools ^68.2.2