pronunciation-dictionary
Library and CLI to load/save/modify pronunciation dictionaries.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Keywords
Repository
Library and CLI to load/save/modify pronunciation dictionaries.
Basic Info
Statistics
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 5
Topics
Metadata Files
README.md
pronunciation-dictionary
Library to save and load pronunciation dictionaries (language-independent).
Features
- Load dictionary from file or URL
- Parsing of
- line comments
- pronunciation comments
- numbers indicating alternative pronunciations for words
- weights
- Multiprocessing for faster deserialization
- Save dictionary to file
- including numbers for alternative pronunciations
- include weights
- set word/weight/pronunciation separator
- Select pronunciation via
- first/last
- longest/shortest
- highest/lowest weight
- random
- weight
- Get phoneme set
Example dictionaries and deserialization arguments
- Montreal Forced Aligner dictionaries
encoding: "UTF-8"
- CMU
encoding: "ISO-8859-1"consider_numbers: Trueconsider_pronunciation_comments: True
- LibriSpeech
encoding: "UTF-8"
- Prosodylab
- Old: CMU 0.7b
encoding: "ISO-8859-1"consider_comments: Trueconsider_numbers: True
Excerpt from CMU (as example)
dict
a.d. EY2 D IY1
a.m. EY2 EH1 M
a.s EY1 Z
aaa T R IH2 P AH0 L EY1
aaberg AA1 B ER0 G
aachen AA1 K AH0 N
aachener AA1 K AH0 N ER0
aaker AA1 K ER0
aalborg AO1 L B AO0 R G # place, danish
aalborg(2) AA1 L B AO0 R G
Installation
sh
pip install pronunciation-dictionary --user
Usage
sh
from pronunciation_dictionary import load_dict, save_dict, MultiprocessingOptions, DeserializationOptions, SerializationOptions
Example
```py from pathlib import Path
from pronunciationdictionary import (DeserializationOptions, MultiprocessingOptions, SerializationOptions, getphonemeset, loaddictfromurl, save_dict)
dictionary = loaddictfrom_url( "https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict", "ISO-8859-1", DeserializationOptions(False, True, True, False), MultiprocessingOptions(4, None, 10000) )
phonemeset = getphoneme_set(dictionary)
print(phoneme_set)
{'Z', 'EY1', 'AH0', 'F', 'AE0', 'UW0', 'CH', 'G', 'V', 'AY1', 'AO2', 'ZH', 'AA1', 'IY1', 'AW0', 'T', 'TH', 'AY2', 'DH', 'S', 'W', 'ER1', 'AA2', 'AE2', 'AE1', 'AW1', 'UW1', 'AH1', 'Y', 'EY2', 'AO0', 'OW2', 'OY2', 'IY2', 'JH', 'N', 'NG', 'P', 'IH2', 'M', 'OW0', 'L', 'UH1', 'IY0', 'EY0', 'HH', 'IH0', 'SH', 'AH2', 'AW2', 'EH2', 'OW1', 'D', 'R', 'IH1', 'AO1', 'B', 'UH2', 'UH0', 'ER0', 'UW2', 'ER2', 'EH0', 'AY0', 'AA0', 'EH1', 'OY1', 'OY0', 'K'}
pronunciations_distmantle = dictionary.get("dismantle")
for pronunciation, weight in pronunciations_distmantle.items(): print(pronunciation, weight)
('D', 'IH0', 'S', 'M', 'AE1', 'N', 'T', 'AH0', 'L') 1.0
('D', 'IH0', 'S', 'M', 'AE1', 'N', 'AH0', 'L') 1.0
save_dict(dictionary, Path("/tmp/cmu.dict"), "UTF-8", SerializationOptions("DOUBLE-SPACE", False, False)) ```
```sh head /tmp/cmu.dict
'bout B AW1 T
'cause K AH0 Z
'course K AO1 R S
'cuse K Y UW1 Z
'em AH0 M
'frisco F R IH1 S K OW0
'gain G EH1 N
'kay K EY1
'm AH0 M
'n AH0 N
```
Roadmap
- replace
SerializationOptions,DeserializationOptionsandMultiprocessingOptionswith parameters - add default parameter values
- add more tests
Development setup
```sh
update
sudo apt update
install Python 3.8-3.12 for ensuring that tests can be run
sudo apt install python3-pip \ python3.8 python3.8-dev python3.8-distutils python3.8-venv \ python3.9 python3.9-dev python3.9-distutils python3.9-venv \ python3.10 python3.10-dev python3.10-distutils python3.10-venv \ python3.11 python3.11-dev python3.11-distutils python3.11-venv \ python3.12 python3.12-dev python3.12-distutils python3.12-venv
install pipenv for creation of virtual environments
python3.8 -m pip install pipenv --user
check out repo
git clone https://github.com/stefantaubert/pronunciation-dictionary.git cd pronunciation-dictionary
create virtual environment
python3.8 -m pipenv install --dev ```
Running the tests
```sh
first install the tool like in "Development setup"
then, navigate into the directory of the repo (if not already done)
cd pronunciation-dictionary
activate environment
python3.8 -m pipenv shell
run tests
tox ```
Final lines of test result output:
log
py38: commands succeeded
py39: commands succeeded
py310: commands succeeded
py311: commands succeeded
py312: commands succeeded
congratulations :)
License
MIT License
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410
Citation
If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).
txt
Taubert, S. (2024). pronunciation-dictionary (Version 0.0.6) [Computer software]. https://doi.org/10.5281/zenodo.10552058
Owner
- Name: Stefan Taubert
- Login: stefantaubert
- Kind: user
- Location: Chemnitz, Germany
- Company: Chemnitz University of Technology
- Website: https://stefantaubert.com
- Twitter: Stefan_Taubert
- Repositories: 75
- Profile: https://github.com/stefantaubert
Currently I am working on my PhD about the topic of speech synthesis at Chemnitz University of Technology.
Citation (CITATION.cff)
cff-version: 1.2.0
title: pronunciation-dictionary
abstract: Library to save and load pronunciation dictionaries (language-independent).
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- email: github@stefantaubert.com
given-names: Stefan
family-names: Taubert
affiliation: Chemnitz University of Technology
orcid: 'https://orcid.org/0000-0002-4932-2874'
website: 'https://stefantaubert.com/'
version: 0.0.6
date-released: 2024-01-22
license: MIT
url: https://github.com/stefantaubert/pronunciation-dictionary
doi: 10.5281/zenodo.10552058
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 135 last-month
- Total dependent packages: 8
- Total dependent repositories: 6
- Total versions: 5
- Total maintainers: 1
pypi.org: pronunciation-dictionary
Library to save and load pronunciation dictionaries (language-independent).
- Homepage: https://github.com/stefantaubert/pronunciation-dictionary
- Documentation: https://pronunciation-dictionary.readthedocs.io/
- License: MIT
-
Latest release: 0.0.6
published about 2 years ago
Rankings
Maintainers (1)
Dependencies
- autoflake * develop
- autopep8 * develop
- cx-freeze * develop
- isort * develop
- pronunciation-dictionary * develop
- pycodestyle * develop
- pylint * develop
- pytest * develop
- rope * develop
- twine * develop
- astroid ==2.11.2 develop
- attrs ==21.4.0 develop
- autoflake ==1.4 develop
- autopep8 ==1.6.0 develop
- bleach ==5.0.0 develop
- certifi ==2021.10.8 develop
- cffi ==1.15.0 develop
- charset-normalizer ==2.0.12 develop
- commonmark ==0.9.1 develop
- cryptography ==36.0.2 develop
- cx-freeze ==6.10 develop
- dill ==0.3.4 develop
- docutils ==0.18.1 develop
- idna ==3.3 develop
- importlib-metadata ==4.11.3 develop
- iniconfig ==1.1.1 develop
- isort ==5.10.1 develop
- jeepney ==0.8.0 develop
- keyring ==23.5.0 develop
- lazy-object-proxy ==1.7.1 develop
- mccabe ==0.7.0 develop
- ordered-set ==4.1.0 develop
- packaging ==21.3 develop
- patchelf ==0.14.5.0 develop
- pkginfo ==1.8.2 develop
- platformdirs ==2.5.1 develop
- pluggy ==1.0.0 develop
- pronunciation-dictionary * develop
- py ==1.11.0 develop
- pycodestyle ==2.8.0 develop
- pycparser ==2.21 develop
- pyflakes ==2.4.0 develop
- pygments ==2.11.2 develop
- pylint ==2.13.5 develop
- pyparsing ==3.0.8 develop
- pytest ==7.1.1 develop
- readme-renderer ==34.0 develop
- requests ==2.27.1 develop
- requests-toolbelt ==0.9.1 develop
- rfc3986 ==2.0.0 develop
- rich ==12.2.0 develop
- rope ==1.0.0 develop
- secretstorage ==3.3.1 develop
- setuptools ==62.1.0 develop
- six ==1.16.0 develop
- toml ==0.10.2 develop
- tomli ==2.0.1 develop
- tqdm ==4.64.0 develop
- twine ==4.0.0 develop
- typing-extensions ==4.1.1 develop
- urllib3 ==1.26.9 develop
- webencodings ==0.5.1 develop
- wheel ==0.37.1 develop
- wrapt ==1.14.0 develop
- zipp ==3.8.0 develop
- ordered-set ==4.1.0
- tqdm ==4.64.0