pvlex
Create Lexicon with Pronunciation Variants for Austrian German Conversational Speech
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
Create Lexicon with Pronunciation Variants for Austrian German Conversational Speech
Basic Info
- Host: GitHub
- Owner: SPSC-TUGraz
- License: gpl-3.0
- Language: Python
- Default Branch: public
- Size: 422 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
Readme.md
Tool for creating a pronunciation lexicon for Austrian German conversational speech.
Installation
Clone repository, change path cd /path/to/pvlex/, then pip install ..
Usage
Create a wordlist with the words that you want to process (one word per line). Change the paths in the configuration file config.json and run pvlex.py. This creates pronunciation lexicons with multiple pronunciations per word. The pronunciation generation is based on phonetic and phonological rules. Note that a rule-based approach will always produce some unlikely variants as well. Creating Forced Alignments with your data will sort them out and trim your lexicon.
Note: If you want to use your own version of a canonical pronunciation lexicon, you can set the "RawGermanLexicon" in the configuration file. Otherwise, you will need an internet connection for accessing the integrated online service to create canonical pronunciations (in this case, please cite BAS Web Services Grapheme-to-phoneme-conversion tool as well).
If you want to create pronunciation variants with the typical reductions for standard German (as spoken in Germany), set the configuration ConversationalAustrianGerman to false in the section GeneralSettings>ruleSets.
Example wordlists can be found in data/wordlists/, an output lexicon in data/lexiconLatest_lower.txt.
How to cite
If you use our code or data in your research, please cite this repository. Use the "Cite this repository" option or:
@misc{wepner2025pvlex
author = {Wepner, Saskia},
title = {pvlex -- Lexicon with Pronunciation Variants for (Austrian) German Conversatinal Speech},
year = 2025,
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/SPSC-TUGraz/pvlex}}
}
and the article that contains details about some of the general rules used in this code:
@InProceedings{schuppler2014pronunciation,
author = {Schuppler, Barbara and Adda-Decker, Martine and Morales-Cordovilla, Juan A},
booktitle = {Fifteenth Annual Conference of the International Speech Communication Association},
title = {Pronunciation variation in read and conversational {Austrian German}},
year = {2014},
}
Owner
- Name: Signal Processing and Speech Communication Laboratory at Graz University of Technology
- Login: SPSC-TUGraz
- Kind: organization
- Location: Austria
- Website: https://www.spsc.tugraz.at
- Repositories: 1
- Profile: https://github.com/SPSC-TUGraz
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use our code or data in your research, please cite this repository:"
authors:
- family-names: Wepner
given-names: Saskia
orcid: https://orcid.org/0000-0001-8232-1913
title: "pvlex – Lexicon with Pronunciation Variants for (Austrian) German Conversatinal Speech"
version: 1.0
date-released: 2025-01-11
url: https://github.com/SPSC-TUGraz/pvlex
GitHub Events
Total
- Release event: 1
- Push event: 4
- Create event: 1
Last Year
- Release event: 1
- Push event: 4
- Create event: 1