multi_seq_align_nl

This repository contains scripts to perform phoneme-grapheme, grapheme-grapheme, phoneme-phoneme and multiple sequence alignment for the Dutch language.

https://github.com/wiekeharmsen/multi_seq_align_nl

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

This repository contains scripts to perform phoneme-grapheme, grapheme-grapheme, phoneme-phoneme and multiple sequence alignment for the Dutch language.

Basic Info
  • Host: GitHub
  • Owner: WiekeHarmsen
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 288 KB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Multiple sequence alignment on Dutch phoneme and grapheme strings

This repository contains scripts to perform phoneme-grapheme, grapheme-grapheme, phoneme-phoneme and multiple sequence alignment for the Dutch language.

ADAPT: Algorithm for Dynamic Alignment of Phoneme Transcriptions

An algorithm to align two phoneme strings, written in the ADAPT Computer Phonetic Alphabet. For a description of how this CPA links to IPA, see the attached pdf. For a description of the implementation of the algorithm, see this paper by Elffers et al. (2013). The phonetic feature definitions are obtained from Cucchiarini (1993) and Cucchiarini (1996).

python3 run.py --type 'adapt' --target_phonemes 'k EI k @' --realised_phonemes 'k i s'

ADAGT: Algorithm for Dynamic Alignment of Grapheme Transcriptions

An adaptation of the ADAPT algorithm, so that grapheme-grapheme alignment can be performed. This adaptation is made by Wieke Harmsen, see this paper by Harmsen et al. (2021), section 2.2.1 'Clean and align original and target texts' (p. 288).

python3 run.py --type 'adagt' --target_graphemes 'kijken' --realised_graphemes 'keiken'

APGA: Algorithm for Phoneme-Grapheme alignment

This algorithm is made by Wieke Harmsen, see this paper by Harmsen et al. (2021), section 2.3.2 'Phoneme-grapheme alignment' (p. 289).

python3 run.py --type 'gpa' --target_graphemes 'kijken' --target_phonemes 'k EI k @'

Multiple sequence alignment: AGPA & ADAGT

Used for spelling error detection, see this paper by Harmsen et al. (2021), section 2.3.3 'Deduce PCU segmentation' (p. 289).

python3 run.py --type 'multi_graph' --target_graphemes 'kijken' --target_phonemes 'k EI k @' --realised_graphemes 'keiken' 

Multiple sequence alignment: AGPA & ADAPT

Used for pronunciation error detection.

python3 run.py --type 'multi_phon' --target_graphemes 'kijken' --target_phonemes 'k EI k @' --realised_phonemes 'k i s'

@LanguageResource{ADAPT, author = "{Elffers et al.}", title = {{ADAPT: Algorithm for Dynamic Alignment of Phonetic Transcriptions}}, publisher = {Radboud University, \url{https://lands.let.ru.nl/literature/elffers.2005.1.pdf}}, year = {2013}, }

@book{Cucchiarini1993, author = {Cucchiarini, C.}, title = {{Phonetic transcription: a methodological and empirical study}}, institution = {Radboud University}, year = 1993, address = {Nijmegen, The Netherlands}, isbn = {9090066993}, url = {https://lib.ugent.be/catalog/rug01:000310899} }

@article{Cucchiarini1996, title = {{Assessing Transcription Agreement: {M}ethodological Aspects}}, year = {1996}, journal = {Clinical Linguistics and Phonetics}, author = {Cucchiarini, C.}, pages = {131-155}, volume = {10}, doi = {https://doi.org/10.3109/02699209608985167}, }

@article{Harmsen2021, title={Automatic Detection and Annotation of Spelling Errors and Orthographic Properties in the Dutch BasiScript Corpus}, volume={11}, url={https://www.clinjournal.org/clinj/article/view/140}, journal={Computational Linguistics in the Netherlands Journal}, author={Harmsen, Wieke Noa and Cucchiarini, Catia and Strik, Helmer}, year={2021}, month={Dec.}, pages={281–306} }

Owner

  • Login: WiekeHarmsen
  • Kind: user

Citation (citation.cff)

cff-version: 1.0.0
message: If you use this software, please cite it using these metadata.
title: Algorithm for Dutch Multiple Sequence Alignment of Phoneme and Grapheme Strings
authors:
  - family-names: Harmsen
    given-names: Wieke
    orcid: " https://orcid.org/0000-0002-3329-2201"
  - family-names: Cucchiarini
    given-names: Catia
  - family-names: Strik
    given-names: Helmer
version: 1.0
date-released: "2023-01-22"
repository-code: "https://github.com/WiekeHarmsen/multi_seq_align_nl"

GitHub Events

Total
  • Push event: 1
  • Fork event: 1
Last Year
  • Push event: 1
  • Fork event: 1