https://github.com/devidw/dswav
Tooling to build datasets for audio model training
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Repository
Tooling to build datasets for audio model training
Basic Info
Statistics
- Stars: 16
- Watchers: 2
- Forks: 0
- Open Issues: 5
- Releases: 0
Metadata Files
README.md
dswav
Tool to build dataset for audio model training
Includes a series of helpers for dataset work, such as:
- transcribing audio source into a dataset of segments of text & audio pairs
- combining differnt data sources
- bulk lengthening audio samples
- bulk conversation of mp3s to wav at given sample rate
- building metadata files that can be used for training
Mostly focused around tooling for StyleTTS2 datasets, but can also be used for other kinds of models / libraries such as coqui
Usage
bash
docker run \
-p 7860:7860 \
-v ./projects:/app/projects \
ghcr.io/devidw/dswav:main
TTS, LJSpeech
https://tts.readthedocs.io/en/latest/formattingyourdataset.html
Supports output in LJSpeech dataset format (metadata.csv, wavs/) that can be used in the TTS py pkg to train models such as xtts2
StyleTTS2
https://github.com/yl4579/StyleTTS2
Also supports output format for StyleTTS2
train_list.txt99 %val_list.txt1 %wavs/
Data sources
In order to import other data sources they must follow this structure:
- /your/path/index.json
- /your/path/wavs/[id].wav
ts
{
id: string // unique identifier for each sample, should match file name in `./wavs/[id].wav` folder
content: string // the transcript
speaker_id?: string // optional when building for multi-speaker, unique on a per voice speaker basis
}[]
Development
- need ffmpeg, espeak, whipser
```bash git clone https://github.com/devidw/dswav cd dswav
poetry install
make dev ```
notes
- currently splitting based on sentences and not silence, which sometimes still keeps artifacts at the end, should rather detect silence to have clean examples
Owner
- Name: David Wolf
- Login: devidw
- Kind: user
- Location: The Zone
- Website: https://david.wolf.gdn
- Repositories: 159
- Profile: https://github.com/devidw
// uncommenting
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 5
- Total pull requests: 3
- Average time to close issues: 3 days
- Average time to close pull requests: N/A
- Total issue authors: 4
- Total pull request authors: 1
- Average comments per issue: 1.6
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- fakerybakery (2)
- MariasStory (1)
- ArrowM (1)
Pull Request Authors
- dependabot[bot] (4)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- aiofiles 23.2.1
- altair 5.1.2
- annotated-types 0.6.0
- anyio 3.7.1
- attrs 23.1.0
- babel 2.13.1
- certifi 2023.11.17
- charset-normalizer 3.3.2
- click 8.1.7
- clldutils 3.20.0
- colorama 0.4.6
- colorlog 6.7.0
- contourpy 1.2.0
- csvw 3.2.1
- cycler 0.12.1
- dlinfo 1.2.1
- fastapi 0.104.1
- ffmpy 0.3.1
- filelock 3.13.1
- fonttools 4.44.3
- fsspec 2023.10.0
- gradio 4.4.1
- gradio-client 0.7.0
- h11 0.14.0
- httpcore 1.0.2
- httpx 0.25.1
- huggingface-hub 0.19.4
- idna 3.4
- importlib-resources 6.1.1
- isodate 0.6.1
- jinja2 3.1.2
- joblib 1.3.2
- jsonschema 4.20.0
- jsonschema-specifications 2023.11.1
- kiwisolver 1.4.5
- language-tags 1.2.0
- lxml 4.9.3
- markdown 3.5.1
- markdown-it-py 3.0.0
- markupsafe 2.1.3
- matplotlib 3.8.2
- mdurl 0.1.2
- numpy 1.26.2
- orjson 3.9.10
- packaging 23.2
- pandas 2.1.3
- phonemizer 3.2.1
- pillow 10.1.0
- pydantic 2.5.1
- pydantic-core 2.14.3
- pydub 0.25.1
- pygments 2.17.0
- pylatexenc 2.10
- pyparsing 3.1.1
- python-dateutil 2.8.2
- python-multipart 0.0.6
- pytz 2023.3.post1
- pyyaml 6.0.1
- rdflib 7.0.0
- referencing 0.31.0
- regex 2023.10.3
- requests 2.31.0
- rfc3986 1.5.0
- rich 13.7.0
- rpds-py 0.13.0
- segments 2.2.1
- semantic-version 2.10.0
- setuptools 68.2.2
- shellingham 1.5.4
- six 1.16.0
- sniffio 1.3.0
- starlette 0.27.0
- tabulate 0.9.0
- tomlkit 0.12.0
- toolz 0.12.0
- tqdm 4.66.1
- typer 0.9.0
- typing-extensions 4.8.0
- tzdata 2023.3
- uritemplate 4.1.1
- urllib3 2.1.0
- uvicorn 0.24.0.post1
- websockets 11.0.3
- gradio ^4.4.1
- phonemizer ^3.2.1
- python 3.11.6
- setuptools ^68.2.2