pronunciation-dictionary

Library and CLI to load/save/modify pronunciation dictionaries.

https://github.com/stefantaubert/pronunciation-dictionary

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary

Keywords

linguistics pronunciation-dictionary python speech-synthesis speech-to-text
Last synced: 6 months ago · JSON representation ·

Repository

Library and CLI to load/save/modify pronunciation dictionaries.

Basic Info
  • Host: GitHub
  • Owner: stefantaubert
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 166 KB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 5
Topics
linguistics pronunciation-dictionary python speech-synthesis speech-to-text
Created almost 4 years ago · Last pushed about 2 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

pronunciation-dictionary

PyPI PyPI MIT PyPI PyPI PyPI DOI

Library to save and load pronunciation dictionaries (language-independent).

Features

  • Load dictionary from file or URL
    • Parsing of
    • line comments
    • pronunciation comments
    • numbers indicating alternative pronunciations for words
    • weights
    • Multiprocessing for faster deserialization
  • Save dictionary to file
    • including numbers for alternative pronunciations
    • include weights
    • set word/weight/pronunciation separator
  • Select pronunciation via
    • first/last
    • longest/shortest
    • highest/lowest weight
    • random
    • weight
  • Get phoneme set

Example dictionaries and deserialization arguments

Excerpt from CMU (as example)

dict a.d. EY2 D IY1 a.m. EY2 EH1 M a.s EY1 Z aaa T R IH2 P AH0 L EY1 aaberg AA1 B ER0 G aachen AA1 K AH0 N aachener AA1 K AH0 N ER0 aaker AA1 K ER0 aalborg AO1 L B AO0 R G # place, danish aalborg(2) AA1 L B AO0 R G

Installation

sh pip install pronunciation-dictionary --user

Usage

sh from pronunciation_dictionary import load_dict, save_dict, MultiprocessingOptions, DeserializationOptions, SerializationOptions

Example

```py from pathlib import Path

from pronunciationdictionary import (DeserializationOptions, MultiprocessingOptions, SerializationOptions, getphonemeset, loaddictfromurl, save_dict)

dictionary = loaddictfrom_url( "https://raw.githubusercontent.com/cmusphinx/cmudict/master/cmudict.dict", "ISO-8859-1", DeserializationOptions(False, True, True, False), MultiprocessingOptions(4, None, 10000) )

phonemeset = getphoneme_set(dictionary)

print(phoneme_set)

{'Z', 'EY1', 'AH0', 'F', 'AE0', 'UW0', 'CH', 'G', 'V', 'AY1', 'AO2', 'ZH', 'AA1', 'IY1', 'AW0', 'T', 'TH', 'AY2', 'DH', 'S', 'W', 'ER1', 'AA2', 'AE2', 'AE1', 'AW1', 'UW1', 'AH1', 'Y', 'EY2', 'AO0', 'OW2', 'OY2', 'IY2', 'JH', 'N', 'NG', 'P', 'IH2', 'M', 'OW0', 'L', 'UH1', 'IY0', 'EY0', 'HH', 'IH0', 'SH', 'AH2', 'AW2', 'EH2', 'OW1', 'D', 'R', 'IH1', 'AO1', 'B', 'UH2', 'UH0', 'ER0', 'UW2', 'ER2', 'EH0', 'AY0', 'AA0', 'EH1', 'OY1', 'OY0', 'K'}

pronunciations_distmantle = dictionary.get("dismantle")

for pronunciation, weight in pronunciations_distmantle.items(): print(pronunciation, weight)

('D', 'IH0', 'S', 'M', 'AE1', 'N', 'T', 'AH0', 'L') 1.0

('D', 'IH0', 'S', 'M', 'AE1', 'N', 'AH0', 'L') 1.0

save_dict(dictionary, Path("/tmp/cmu.dict"), "UTF-8", SerializationOptions("DOUBLE-SPACE", False, False)) ```

```sh head /tmp/cmu.dict

'bout B AW1 T

'cause K AH0 Z

'course K AO1 R S

'cuse K Y UW1 Z

'em AH0 M

'frisco F R IH1 S K OW0

'gain G EH1 N

'kay K EY1

'm AH0 M

'n AH0 N

```

Roadmap

  • replace SerializationOptions, DeserializationOptions and MultiprocessingOptions with parameters
  • add default parameter values
  • add more tests

Development setup

```sh

update

sudo apt update

install Python 3.8-3.12 for ensuring that tests can be run

sudo apt install python3-pip \ python3.8 python3.8-dev python3.8-distutils python3.8-venv \ python3.9 python3.9-dev python3.9-distutils python3.9-venv \ python3.10 python3.10-dev python3.10-distutils python3.10-venv \ python3.11 python3.11-dev python3.11-distutils python3.11-venv \ python3.12 python3.12-dev python3.12-distutils python3.12-venv

install pipenv for creation of virtual environments

python3.8 -m pip install pipenv --user

check out repo

git clone https://github.com/stefantaubert/pronunciation-dictionary.git cd pronunciation-dictionary

create virtual environment

python3.8 -m pipenv install --dev ```

Running the tests

```sh

first install the tool like in "Development setup"

then, navigate into the directory of the repo (if not already done)

cd pronunciation-dictionary

activate environment

python3.8 -m pipenv shell

run tests

tox ```

Final lines of test result output:

log py38: commands succeeded py39: commands succeeded py310: commands succeeded py311: commands succeeded py312: commands succeeded congratulations :)

License

MIT License

Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

Citation

If you want to cite this repo, you can use this BibTeX-entry generated by GitHub (see About => Cite this repository).

txt Taubert, S. (2024). pronunciation-dictionary (Version 0.0.6) [Computer software]. https://doi.org/10.5281/zenodo.10552058

Owner

  • Name: Stefan Taubert
  • Login: stefantaubert
  • Kind: user
  • Location: Chemnitz, Germany
  • Company: Chemnitz University of Technology

Currently I am working on my PhD about the topic of speech synthesis at Chemnitz University of Technology.

Citation (CITATION.cff)

cff-version: 1.2.0
title: pronunciation-dictionary
abstract: Library to save and load pronunciation dictionaries (language-independent).
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - email: github@stefantaubert.com
    given-names: Stefan
    family-names: Taubert
    affiliation: Chemnitz University of Technology
    orcid: 'https://orcid.org/0000-0002-4932-2874'
    website: 'https://stefantaubert.com/'
version: 0.0.6
date-released: 2024-01-22
license: MIT
url: https://github.com/stefantaubert/pronunciation-dictionary
doi: 10.5281/zenodo.10552058

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 135 last-month
  • Total dependent packages: 8
  • Total dependent repositories: 6
  • Total versions: 5
  • Total maintainers: 1
pypi.org: pronunciation-dictionary

Library to save and load pronunciation dictionaries (language-independent).

  • Versions: 5
  • Dependent Packages: 8
  • Dependent Repositories: 6
  • Downloads: 135 Last month
Rankings
Dependent packages count: 1.6%
Dependent repos count: 6.1%
Average: 20.8%
Forks count: 29.9%
Stargazers count: 32.0%
Downloads: 34.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

Pipfile pypi
  • autoflake * develop
  • autopep8 * develop
  • cx-freeze * develop
  • isort * develop
  • pronunciation-dictionary * develop
  • pycodestyle * develop
  • pylint * develop
  • pytest * develop
  • rope * develop
  • twine * develop
Pipfile.lock pypi
  • astroid ==2.11.2 develop
  • attrs ==21.4.0 develop
  • autoflake ==1.4 develop
  • autopep8 ==1.6.0 develop
  • bleach ==5.0.0 develop
  • certifi ==2021.10.8 develop
  • cffi ==1.15.0 develop
  • charset-normalizer ==2.0.12 develop
  • commonmark ==0.9.1 develop
  • cryptography ==36.0.2 develop
  • cx-freeze ==6.10 develop
  • dill ==0.3.4 develop
  • docutils ==0.18.1 develop
  • idna ==3.3 develop
  • importlib-metadata ==4.11.3 develop
  • iniconfig ==1.1.1 develop
  • isort ==5.10.1 develop
  • jeepney ==0.8.0 develop
  • keyring ==23.5.0 develop
  • lazy-object-proxy ==1.7.1 develop
  • mccabe ==0.7.0 develop
  • ordered-set ==4.1.0 develop
  • packaging ==21.3 develop
  • patchelf ==0.14.5.0 develop
  • pkginfo ==1.8.2 develop
  • platformdirs ==2.5.1 develop
  • pluggy ==1.0.0 develop
  • pronunciation-dictionary * develop
  • py ==1.11.0 develop
  • pycodestyle ==2.8.0 develop
  • pycparser ==2.21 develop
  • pyflakes ==2.4.0 develop
  • pygments ==2.11.2 develop
  • pylint ==2.13.5 develop
  • pyparsing ==3.0.8 develop
  • pytest ==7.1.1 develop
  • readme-renderer ==34.0 develop
  • requests ==2.27.1 develop
  • requests-toolbelt ==0.9.1 develop
  • rfc3986 ==2.0.0 develop
  • rich ==12.2.0 develop
  • rope ==1.0.0 develop
  • secretstorage ==3.3.1 develop
  • setuptools ==62.1.0 develop
  • six ==1.16.0 develop
  • toml ==0.10.2 develop
  • tomli ==2.0.1 develop
  • tqdm ==4.64.0 develop
  • twine ==4.0.0 develop
  • typing-extensions ==4.1.1 develop
  • urllib3 ==1.26.9 develop
  • webencodings ==0.5.1 develop
  • wheel ==0.37.1 develop
  • wrapt ==1.14.0 develop
  • zipp ==3.8.0 develop
  • ordered-set ==4.1.0
  • tqdm ==4.64.0
pyproject.toml pypi