pvlex

Create Lexicon with Pronunciation Variants for Austrian German Conversational Speech

https://github.com/spsc-tugraz/pvlex

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Create Lexicon with Pronunciation Variants for Austrian German Conversational Speech

Basic Info
  • Host: GitHub
  • Owner: SPSC-TUGraz
  • License: gpl-3.0
  • Language: Python
  • Default Branch: public
  • Size: 422 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

Readme.md

Tool for creating a pronunciation lexicon for Austrian German conversational speech.

Installation

Clone repository, change path cd /path/to/pvlex/, then pip install ..

Usage

Create a wordlist with the words that you want to process (one word per line). Change the paths in the configuration file config.json and run pvlex.py. This creates pronunciation lexicons with multiple pronunciations per word. The pronunciation generation is based on phonetic and phonological rules. Note that a rule-based approach will always produce some unlikely variants as well. Creating Forced Alignments with your data will sort them out and trim your lexicon.

Note: If you want to use your own version of a canonical pronunciation lexicon, you can set the "RawGermanLexicon" in the configuration file. Otherwise, you will need an internet connection for accessing the integrated online service to create canonical pronunciations (in this case, please cite BAS Web Services Grapheme-to-phoneme-conversion tool as well).

If you want to create pronunciation variants with the typical reductions for standard German (as spoken in Germany), set the configuration ConversationalAustrianGerman to false in the section GeneralSettings>ruleSets.

Example wordlists can be found in data/wordlists/, an output lexicon in data/lexiconLatest_lower.txt.

How to cite

If you use our code or data in your research, please cite this repository. Use the "Cite this repository" option or: @misc{wepner2025pvlex author = {Wepner, Saskia}, title = {pvlex -- Lexicon with Pronunciation Variants for (Austrian) German Conversatinal Speech}, year = 2025, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/SPSC-TUGraz/pvlex}} } and the article that contains details about some of the general rules used in this code: @InProceedings{schuppler2014pronunciation, author = {Schuppler, Barbara and Adda-Decker, Martine and Morales-Cordovilla, Juan A}, booktitle = {Fifteenth Annual Conference of the International Speech Communication Association}, title = {Pronunciation variation in read and conversational {Austrian German}}, year = {2014}, }

Owner

  • Name: Signal Processing and Speech Communication Laboratory at Graz University of Technology
  • Login: SPSC-TUGraz
  • Kind: organization
  • Location: Austria

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use our code or data in your research, please cite this repository:"
authors:
  - family-names: Wepner
    given-names: Saskia
    orcid: https://orcid.org/0000-0001-8232-1913
title: "pvlex – Lexicon with Pronunciation Variants for (Austrian) German Conversatinal Speech"
version: 1.0
date-released: 2025-01-11
url: https://github.com/SPSC-TUGraz/pvlex

GitHub Events

Total
  • Release event: 1
  • Push event: 4
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 4
  • Create event: 1