multi_seq_align_nl

This repository contains scripts to perform phoneme-grapheme, grapheme-grapheme, phoneme-phoneme and multiple sequence alignment for the Dutch language.

https://github.com/wiekeharmsen/multi_seq_align_nl

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

This repository contains scripts to perform phoneme-grapheme, grapheme-grapheme, phoneme-phoneme and multiple sequence alignment for the Dutch language.

Basic Info

Host: GitHub
Owner: WiekeHarmsen
License: gpl-3.0
Language: Python
Default Branch: main
Size: 288 KB

Statistics

Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

Multiple sequence alignment on Dutch phoneme and grapheme strings

This repository contains scripts to perform phoneme-grapheme, grapheme-grapheme, phoneme-phoneme and multiple sequence alignment for the Dutch language.

ADAPT: Algorithm for Dynamic Alignment of Phoneme Transcriptions

An algorithm to align two phoneme strings, written in the ADAPT Computer Phonetic Alphabet. For a description of how this CPA links to IPA, see the attached pdf. For a description of the implementation of the algorithm, see this paper by Elffers et al. (2013). The phonetic feature definitions are obtained from Cucchiarini (1993) and Cucchiarini (1996).

python3 run.py --type 'adapt' --target_phonemes 'k EI k @' --realised_phonemes 'k i s'

ADAGT: Algorithm for Dynamic Alignment of Grapheme Transcriptions

An adaptation of the ADAPT algorithm, so that grapheme-grapheme alignment can be performed. This adaptation is made by Wieke Harmsen, see this paper by Harmsen et al. (2021), section 2.2.1 'Clean and align original and target texts' (p. 288).

python3 run.py --type 'adagt' --target_graphemes 'kijken' --realised_graphemes 'keiken'

APGA: Algorithm for Phoneme-Grapheme alignment

This algorithm is made by Wieke Harmsen, see this paper by Harmsen et al. (2021), section 2.3.2 'Phoneme-grapheme alignment' (p. 289).

python3 run.py --type 'gpa' --target_graphemes 'kijken' --target_phonemes 'k EI k @'

Multiple sequence alignment: AGPA & ADAGT

Used for spelling error detection, see this paper by Harmsen et al. (2021), section 2.3.3 'Deduce PCU segmentation' (p. 289).

python3 run.py --type 'multi_graph' --target_graphemes 'kijken' --target_phonemes 'k EI k @' --realised_graphemes 'keiken'

Multiple sequence alignment: AGPA & ADAPT

Used for pronunciation error detection.

python3 run.py --type 'multi_phon' --target_graphemes 'kijken' --target_phonemes 'k EI k @' --realised_phonemes 'k i s'

@LanguageResource{ADAPT, author = "{Elffers et al.}", title = {{ADAPT: Algorithm for Dynamic Alignment of Phonetic Transcriptions}}, publisher = {Radboud University, \url{https://lands.let.ru.nl/literature/elffers.2005.1.pdf}}, year = {2013}, }

@book{Cucchiarini1993, author = {Cucchiarini, C.}, title = {{Phonetic transcription: a methodological and empirical study}}, institution = {Radboud University}, year = 1993, address = {Nijmegen, The Netherlands}, isbn = {9090066993}, url = {https://lib.ugent.be/catalog/rug01:000310899} }

@article{Cucchiarini1996, title = {{Assessing Transcription Agreement: {M}ethodological Aspects}}, year = {1996}, journal = {Clinical Linguistics and Phonetics}, author = {Cucchiarini, C.}, pages = {131-155}, volume = {10}, doi = {https://doi.org/10.3109/02699209608985167}, }

@article{Harmsen2021, title={Automatic Detection and Annotation of Spelling Errors and Orthographic Properties in the Dutch BasiScript Corpus}, volume={11}, url={https://www.clinjournal.org/clinj/article/view/140}, journal={Computational Linguistics in the Netherlands Journal}, author={Harmsen, Wieke Noa and Cucchiarini, Catia and Strik, Helmer}, year={2021}, month={Dec.}, pages={281–306} }

Owner

Login: WiekeHarmsen
Kind: user

Repositories: 1
Profile: https://github.com/WiekeHarmsen

Citation (citation.cff)

cff-version: 1.0.0
message: If you use this software, please cite it using these metadata.
title: Algorithm for Dutch Multiple Sequence Alignment of Phoneme and Grapheme Strings
authors:
  - family-names: Harmsen
    given-names: Wieke
    orcid: " https://orcid.org/0000-0002-3329-2201"
  - family-names: Cucchiarini
    given-names: Catia
  - family-names: Strik
    given-names: Helmer
version: 1.0
date-released: "2023-01-22"
repository-code: "https://github.com/WiekeHarmsen/multi_seq_align_nl"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

multi_seq_align_nl

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Multiple sequence alignment on Dutch phoneme and grapheme strings

ADAPT: Algorithm for Dynamic Alignment of Phoneme Transcriptions

ADAGT: Algorithm for Dynamic Alignment of Grapheme Transcriptions

APGA: Algorithm for Phoneme-Grapheme alignment

Multiple sequence alignment: AGPA & ADAGT

Multiple sequence alignment: AGPA & ADAPT

Owner

Citation (citation.cff)

GitHub Events

Total

Last Year