pronunciation-dictionary

Library and CLI to load/save/modify pronunciation dictionaries.

https://github.com/stefantaubert/pronunciation-dictionary

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary

Keywords

linguistics pronunciation-dictionary python speech-synthesis speech-to-text

Last synced: 11 months ago · JSON representation ·

Repository

Library and CLI to load/save/modify pronunciation dictionaries.

Basic Info

Host: GitHub
Owner: stefantaubert
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 166 KB

Statistics

Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 5

Topics

linguistics pronunciation-dictionary python speech-synthesis speech-to-text

Created over 4 years ago · Last pushed over 2 years ago

Metadata Files

Readme Changelog Contributing License Code of conduct Citation

pronunciation-dictionary

PyPI

Library to save and load pronunciation dictionaries (language-independent).

Features

Load dictionary from file or URL
- Parsing of
- line comments
- pronunciation comments
- numbers indicating alternative pronunciations for words
- weights
- Multiprocessing for faster deserialization
Save dictionary to file
- including numbers for alternative pronunciations
- include weights
- set word/weight/pronunciation separator
Select pronunciation via
- first/last
- longest/shortest
- highest/lowest weight
- random
- weight
Get phoneme set

Example dictionaries and deserialization arguments

Montreal Forced Aligner dictionaries
- encoding: "UTF-8"
CMU
- encoding: "ISO-8859-1"
- consider_numbers: True
- consider_pronunciation_comments: True
LibriSpeech
- encoding: "UTF-8"
Prosodylab
Old: CMU 0.7b
- encoding: "ISO-8859-1"
- consider_comments: True
- consider_numbers: True

Excerpt from CMU (as example)

dict a.d. EY2 D IY1 a.m. EY2 EH1 M a.s EY1 Z aaa T R IH2 P AH0 L EY1 aaberg AA1 B ER0 G aachen AA1 K AH0 N aachener AA1 K AH0 N ER0 aaker AA1 K ER0 aalborg AO1 L B AO0 R G # place, danish aalborg(2) AA1 L B AO0 R G

Installation

sh pip install pronunciation-dictionary --user

Usage

sh from pronunciation_dictionary import load_dict, save_dict, MultiprocessingOptions, DeserializationOptions, SerializationOptions

Example

```py from pathlib import Path

from pronunciationdictionary import (DeserializationOptions, MultiprocessingOptions, SerializationOptions, getphonemeset, loaddictfromurl, save_dict)

dictionary = loaddictfrom_url( "https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict", "ISO-8859-1", DeserializationOptions(False, True, True, False), MultiprocessingOptions(4, None, 10000) )

phonemeset = getphoneme_set(dictionary)

print(phoneme_set)

{'Z', 'EY1', 'AH0', 'F', 'AE0', 'UW0', 'CH', 'G', 'V', 'AY1', 'AO2', 'ZH', 'AA1', 'IY1', 'AW0', 'T', 'TH', 'AY2', 'DH', 'S', 'W', 'ER1', 'AA2', 'AE2', 'AE1', 'AW1', 'UW1', 'AH1', 'Y', 'EY2', 'AO0', 'OW2', 'OY2', 'IY2', 'JH', 'N', 'NG', 'P', 'IH2', 'M', 'OW0', 'L', 'UH1', 'IY0', 'EY0', 'HH', 'IH0', 'SH', 'AH2', 'AW2', 'EH2', 'OW1', 'D', 'R', 'IH1', 'AO1', 'B', 'UH2', 'UH0', 'ER0', 'UW2', 'ER2', 'EH0', 'AY0', 'AA0', 'EH1', 'OY1', 'OY0', 'K'}

pronunciations_distmantle = dictionary.get("dismantle")

for pronunciation, weight in pronunciations_distmantle.items(): print(pronunciation, weight)

('D', 'IH0', 'S', 'M', 'AE1', 'N', 'T', 'AH0', 'L') 1.0

('D', 'IH0', 'S', 'M', 'AE1', 'N', 'AH0', 'L') 1.0

save_dict(dictionary, Path("/tmp/cmu.dict"), "UTF-8", SerializationOptions("DOUBLE-SPACE", False, False)) ```

```sh head /tmp/cmu.dict

'bout B AW1 T

'cause K AH0 Z

'course K AO1 R S

'cuse K Y UW1 Z

'em AH0 M

'frisco F R IH1 S K OW0

'gain G EH1 N

'kay K EY1

'm AH0 M

'n AH0 N

```

Roadmap

replace SerializationOptions, DeserializationOptions and MultiprocessingOptions with parameters
add default parameter values
add more tests

Development setup

```sh

update

sudo apt update

install Python 3.8-3.12 for ensuring that tests can be run

sudo apt install python3-pip \ python3.8 python3.8-dev python3.8-distutils python3.8-venv \ python3.9 python3.9-dev python3.9-distutils python3.9-venv \ python3.10 python3.10-dev python3.10-distutils python3.10-venv \ python3.11 python3.11-dev python3.11-distutils python3.11-venv \ python3.12 python3.12-dev python3.12-distutils python3.12-venv

install pipenv for creation of virtual environments

python3.8 -m pip install pipenv --user

check out repo

git clone https://github.com/stefantaubert/pronunciation-dictionary.git cd pronunciation-dictionary

create virtual environment

python3.8 -m pipenv install --dev ```

Running the tests

```sh

first install the tool like in "Development setup"

then, navigate into the directory of the repo (if not already done)

cd pronunciation-dictionary

activate environment

python3.8 -m pipenv shell

run tests

tox ```

Final lines of test result output:

log py38: commands succeeded py39: commands succeeded py310: commands succeeded py311: commands succeeded py312: commands succeeded congratulations :)

License

MIT License

Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

Citation

If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).

txt Taubert, S. (2024). pronunciation-dictionary (Version 0.0.6) [Computer software]. https://doi.org/10.5281/zenodo.10552058

Owner

Name: Stefan Taubert
Login: stefantaubert
Kind: user
Location: Chemnitz, Germany
Company: Chemnitz University of Technology

Website: https://stefantaubert.com
Twitter: Stefan_Taubert
Repositories: 75
Profile: https://github.com/stefantaubert

Currently I am working on my PhD about the topic of speech synthesis at Chemnitz University of Technology.

Citation (CITATION.cff)

cff-version: 1.2.0
title: pronunciation-dictionary
abstract: Library to save and load pronunciation dictionaries (language-independent).
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - email: github@stefantaubert.com
    given-names: Stefan
    family-names: Taubert
    affiliation: Chemnitz University of Technology
    orcid: 'https://orcid.org/0000-0002-4932-2874'
    website: 'https://stefantaubert.com/'
version: 0.0.6
date-released: 2024-01-22
license: MIT
url: https://github.com/stefantaubert/pronunciation-dictionary
doi: 10.5281/zenodo.10552058

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 135 last-month

Total dependent packages: 8
Total dependent repositories: 6
Total versions: 5
Total maintainers: 1

pypi.org: pronunciation-dictionary

Library to save and load pronunciation dictionaries (language-independent).

Homepage: https://github.com/stefantaubert/pronunciation-dictionary
Documentation: https://pronunciation-dictionary.readthedocs.io/
License: MIT
Latest release: 0.0.6
published over 2 years ago

Versions: 5
Dependent Packages: 8
Dependent Repositories: 6
Downloads: 135 Last month

Rankings

Dependent packages count: 1.6%

Dependent repos count: 6.1%

Average: 20.8%

Forks count: 29.9%

Stargazers count: 32.0%

Downloads: 34.7%

Maintainers (1)

stefantaubert

Last synced: 11 months ago

Dependencies

Pipfile pypi

autoflake * develop
autopep8 * develop
cx-freeze * develop
isort * develop
pronunciation-dictionary * develop
pycodestyle * develop
pylint * develop
pytest * develop
rope * develop
twine * develop

Pipfile.lock pypi

astroid ==2.11.2 develop
attrs ==21.4.0 develop
autoflake ==1.4 develop
autopep8 ==1.6.0 develop
bleach ==5.0.0 develop
certifi ==2021.10.8 develop
cffi ==1.15.0 develop
charset-normalizer ==2.0.12 develop
commonmark ==0.9.1 develop
cryptography ==36.0.2 develop
cx-freeze ==6.10 develop
dill ==0.3.4 develop
docutils ==0.18.1 develop
idna ==3.3 develop
importlib-metadata ==4.11.3 develop
iniconfig ==1.1.1 develop
isort ==5.10.1 develop
jeepney ==0.8.0 develop
keyring ==23.5.0 develop
lazy-object-proxy ==1.7.1 develop
mccabe ==0.7.0 develop
ordered-set ==4.1.0 develop
packaging ==21.3 develop
patchelf ==0.14.5.0 develop
pkginfo ==1.8.2 develop
platformdirs ==2.5.1 develop
pluggy ==1.0.0 develop
pronunciation-dictionary * develop
py ==1.11.0 develop
pycodestyle ==2.8.0 develop
pycparser ==2.21 develop
pyflakes ==2.4.0 develop
pygments ==2.11.2 develop
pylint ==2.13.5 develop
pyparsing ==3.0.8 develop
pytest ==7.1.1 develop
readme-renderer ==34.0 develop
requests ==2.27.1 develop
requests-toolbelt ==0.9.1 develop
rfc3986 ==2.0.0 develop
rich ==12.2.0 develop
rope ==1.0.0 develop
secretstorage ==3.3.1 develop
setuptools ==62.1.0 develop
six ==1.16.0 develop
toml ==0.10.2 develop
tomli ==2.0.1 develop
tqdm ==4.64.0 develop
twine ==4.0.0 develop
typing-extensions ==4.1.1 develop
urllib3 ==1.26.9 develop
webencodings ==0.5.1 develop
wheel ==0.37.1 develop
wrapt ==1.14.0 develop
zipp ==3.8.0 develop
ordered-set ==4.1.0
tqdm ==4.64.0

pyproject.toml pypi

pronunciation-dictionary

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

pronunciation-dictionary

Features

Example dictionaries and deserialization arguments

Excerpt from CMU (as example)

Installation

Usage

Example

('D', 'IH0', 'S', 'M', 'AE1', 'N', 'T', 'AH0', 'L') 1.0

('D', 'IH0', 'S', 'M', 'AE1', 'N', 'AH0', 'L') 1.0

'bout B AW1 T

'cause K AH0 Z

'course K AO1 R S

'cuse K Y UW1 Z

'em AH0 M

'frisco F R IH1 S K OW0

'gain G EH1 N

'kay K EY1

'm AH0 M

'n AH0 N

Roadmap

Development setup

update

install Python 3.8-3.12 for ensuring that tests can be run

install pipenv for creation of virtual environments

check out repo

create virtual environment

Running the tests

first install the tool like in "Development setup"

then, navigate into the directory of the repo (if not already done)

activate environment

run tests

License

Acknowledgments

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: pronunciation-dictionary

Rankings

Maintainers (1)

Dependencies