create_vtl_corpus

Python scripts to create and synthesize a speech corpus with VocalTractLab.

https://github.com/quantling/create_vtl_corpus

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Python scripts to create and synthesize a speech corpus with VocalTractLab.

Basic Info
Statistics
  • Stars: 3
  • Watchers: 3
  • Forks: 1
  • Open Issues: 0
  • Releases: 5
Created over 7 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.rst

======
CreateVTLCorpus
======

.. image:: https://zenodo.org/badge/167427297.svg
   :target: https://zenodo.org/badge/latestdoi/167427297

This package supplies the necessary functions in order to synthesize speech
from a phonemic transcription. Furthermore, it defines helpers to improve the
result if more information as the pitch contour is available. It is especially useful when working with 
the `PAULE `__ framework.

.. image:: https://raw.githubusercontent.com/quantling/paule/main/docs/figure/vtl_3d_vtl_midsagittal_cps_audio.png
  :width: 800
  :alt: A 3d vocal tract shape, a midsagittal slice, control parameter trajectories and a wave form.

Currently the package supports the following languages:
   - German
   - English



Minimal Example
===============
If you run the following command the package will align the audio files for you, and then create a pandas DataFrame with the synthesized audio and other information useful for the PAULE model,
but only for the first 100 words that occur 4 times or more. Since you use multiprocessing, no melspectrograms are generated:

.. code:: bash

    python -m create_vtl_corpus.create_corpus --corpus CORPUS --language de --needs_aligner --use_mp --min_word_count 4 --word_amount 100 --save_df_name SAVE_DF_NAME

This works, if we have a German corpus in at the path CORPUS with the following structure, which is what the `Mozilla Common Voice project `__ provides:

 .. code:: bash

    CORPUS/
    ├── validated.tsv         # a file where the transcripts are stored
    ├── clips/
    │   └── *.mp3             # audio files (mp3)
    └── files_not_relevant_to_this_project


The end product should look someting like this

.. code:: bash

   CORPUS/
   ├── validated.tsv          # a file where the transcripts are stored
   ├── clips/
   │   ├── *.mp3              # mp3 files
   │   └── *.lab              # lab files
   ├── clips_validated/
   │   ├── *.mp3              # validated mp3 files
   │   └── *.lab              # validated lab files
   ├── clips_aligned/
   │   └── *.TextGrid         # aligned TextGrid files
   ├── corpus_as_df.pkl       # a pandas DataFrame with the information
   └── files_not_relevant_to_this_project

The DataFrame contains the following columns

=======================  ===========================================================
label                    description
=======================  ===========================================================
'file_name'              name of the clip
'mfa_word'                  the spoken word as it is in the aligned textgrid
'lexical_word'           the word as it appears in the sentence
'word_position'          the position of the word in the sentence
'sentence'               the sentence the word is part of
'wav_recording'          spliced out audio as mono audio signal
'sr_recording'           sampling rate of the recording
'sr_synthesized'         sampling_rates_sythesized,
'sampa_phones'           the sampa(like) phonemes of the word
'mfa_phones'             the phonemes as outputted by the aligner
'phone_durations_lists'  the duration of each phone in the word as list
'cp_norm'                normalized cp-trajectories
'vector'                 embedding vector of the lexical word, based on fastText Embeddings
'client_id'              id of the client
=======================  ===========================================================


Copyright
=========
As the VocalTractLabAPI.so and the JD2.speaker is under GPL v3 the rest of the code
here is GPL  under as well.  If the code is not dependent on VTL anymore you can use
it under MIT license.


Citing 
=======
If you use this code for your research, please cite the following thesis:

Konstantin Sering. Predictive articulatory speech synthesis utilizing lexical embeddings (PAULE). PhD thesis, Universität Tübingen, 2023.

.. code:: bibtex
   
      @phdthesis{sering2023paule,
         title={Predictive articulatory speech synthesis utilizing lexical embeddings (PAULE)},
         author={Sering, Konstantin},
         year={2023},
         school={Universität Tübingen}
      }

Older Versions
==============


Version 2.0.0 and later
-----------------------
From version 2.0.0 we are relying on the new segment-to-gesture API introduced
in VTL 2.3 and use the JD3.speaker instead of the JD2.speaker.

Old version 1.1.0
-----------------
The original version of this tool is based on the work and on the Matlab code
on Yingming Gao. This can be viewed by checking out the tag ``1.1.0``.

The overall logic is in ``create_corpus.py`` which executes the appropriate
functions from top to bottom. The functions are supplied by the other files.

.. note::

   In the since VTL version 2.3 which can be downloaded as free software from
   https://www.vocaltractlab.de/index.php?page=vocaltractlab-download most of
   the functionality implemented here is available directly from the VTL api.
   Please use the VTL api directly.



   

Acknowledgments
===============
This research was supported by an ERC advanced Grant (no. 742545), by the
University of Tübingen and by the TU Dresden.

Owner

  • Name: quantling
  • Login: quantling
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
title: create_vtl_corpus
message: Synthesizing a speech corpus with VocalTractLab
type: software
authors:
  - given-names: Konstantin
    family-names: Sering
    affiliation: University of Tübingen
    orcid: 'https://orcid.org/0000-0002-1178-7932'
  - given-names: Valentin
    family-names: Schmidt
    affiliation: University of Tübingen
    orcid: 'https://orcid.org/0009-0006-4691-706X'
  - given-names: Niels
    family-names: Stehwien
  - given-names: Yingming
    family-names: Gao
identifiers:
  - type: doi
    value: 10.5281/zenodo.2548894
repository-code: 'https://github.com/quantling/create_vtl_corpus'
keywords:
  - speech corpus
  - speech synthesis
  - vocal tract lab
  - linguistics
  - python
  - cognitive science
  - machine learning
license: GPL-3.0+

GitHub Events

Total
  • Create event: 7
  • Commit comment event: 2
  • Release event: 5
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 4
  • Push event: 81
  • Pull request review comment event: 2
  • Pull request review event: 5
  • Pull request event: 5
Last Year
  • Create event: 7
  • Commit comment event: 2
  • Release event: 5
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 4
  • Push event: 81
  • Pull request review comment event: 2
  • Pull request review event: 5
  • Pull request event: 5

Dependencies

create_vtl_corpus/environment.yml pypi
  • alabaster ==0.7.16
  • audioread ==3.0.1
  • black ==24.4.2
  • click ==8.1.7
  • docutils ==0.21.2
  • fasttext ==0.9.2
  • fsspec ==2024.6.0
  • imagesize ==1.4.1
  • joblib ==1.4.2
  • lazy-loader ==0.4
  • librosa ==0.10.2.post1
  • llvmlite ==0.42.0
  • msgpack ==1.0.8
  • mypy-extensions ==1.0.0
  • numba ==0.59.1
  • pathspec ==0.12.1
  • paule ==0.4.5
  • pooch ==1.8.1
  • praatio ==6.2.0
  • pybind11 ==2.12.0
  • scikit-learn ==1.5.0
  • scipy ==1.13.1
  • snowballstemmer ==2.2.0
  • soundfile ==0.12.1
  • soxr ==0.3.7
  • sphinx ==7.3.7
  • sphinxcontrib-applehelp ==1.0.8
  • sphinxcontrib-devhelp ==1.0.6
  • sphinxcontrib-htmlhelp ==2.0.5
  • sphinxcontrib-jsmath ==1.0.1
  • sphinxcontrib-qthelp ==1.0.7
  • sphinxcontrib-serializinghtml ==1.1.10
  • threadpoolctl ==3.5.0
  • toml ==0.10.2
docs/requirements.txt pypi
  • easydev >=0.9.35
  • nbsphinx >=0.8.8
  • notebook >=6.4.10
  • numpydoc >=1.2
  • seaborn >=0.11.2
  • sphinx >=1.4
  • sphinx_rtd_theme >=1.0.0