multi_seq_align_nl
This repository contains scripts to perform phoneme-grapheme, grapheme-grapheme, phoneme-phoneme and multiple sequence alignment for the Dutch language.
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.8%) to scientific vocabulary
Repository
This repository contains scripts to perform phoneme-grapheme, grapheme-grapheme, phoneme-phoneme and multiple sequence alignment for the Dutch language.
Basic Info
- Host: GitHub
- Owner: WiekeHarmsen
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 288 KB
Statistics
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Multiple sequence alignment on Dutch phoneme and grapheme strings
This repository contains scripts to perform phoneme-grapheme, grapheme-grapheme, phoneme-phoneme and multiple sequence alignment for the Dutch language.
ADAPT: Algorithm for Dynamic Alignment of Phoneme Transcriptions
An algorithm to align two phoneme strings, written in the ADAPT Computer Phonetic Alphabet. For a description of how this CPA links to IPA, see the attached pdf. For a description of the implementation of the algorithm, see this paper by Elffers et al. (2013). The phonetic feature definitions are obtained from Cucchiarini (1993) and Cucchiarini (1996).
python3 run.py --type 'adapt' --target_phonemes 'k EI k @' --realised_phonemes 'k i s'
ADAGT: Algorithm for Dynamic Alignment of Grapheme Transcriptions
An adaptation of the ADAPT algorithm, so that grapheme-grapheme alignment can be performed. This adaptation is made by Wieke Harmsen, see this paper by Harmsen et al. (2021), section 2.2.1 'Clean and align original and target texts' (p. 288).
python3 run.py --type 'adagt' --target_graphemes 'kijken' --realised_graphemes 'keiken'
APGA: Algorithm for Phoneme-Grapheme alignment
This algorithm is made by Wieke Harmsen, see this paper by Harmsen et al. (2021), section 2.3.2 'Phoneme-grapheme alignment' (p. 289).
python3 run.py --type 'gpa' --target_graphemes 'kijken' --target_phonemes 'k EI k @'
Multiple sequence alignment: AGPA & ADAGT
Used for spelling error detection, see this paper by Harmsen et al. (2021), section 2.3.3 'Deduce PCU segmentation' (p. 289).
python3 run.py --type 'multi_graph' --target_graphemes 'kijken' --target_phonemes 'k EI k @' --realised_graphemes 'keiken'
Multiple sequence alignment: AGPA & ADAPT
Used for pronunciation error detection.
python3 run.py --type 'multi_phon' --target_graphemes 'kijken' --target_phonemes 'k EI k @' --realised_phonemes 'k i s'
@LanguageResource{ADAPT, author = "{Elffers et al.}", title = {{ADAPT: Algorithm for Dynamic Alignment of Phonetic Transcriptions}}, publisher = {Radboud University, \url{https://lands.let.ru.nl/literature/elffers.2005.1.pdf}}, year = {2013}, }
@book{Cucchiarini1993, author = {Cucchiarini, C.}, title = {{Phonetic transcription: a methodological and empirical study}}, institution = {Radboud University}, year = 1993, address = {Nijmegen, The Netherlands}, isbn = {9090066993}, url = {https://lib.ugent.be/catalog/rug01:000310899} }
@article{Cucchiarini1996, title = {{Assessing Transcription Agreement: {M}ethodological Aspects}}, year = {1996}, journal = {Clinical Linguistics and Phonetics}, author = {Cucchiarini, C.}, pages = {131-155}, volume = {10}, doi = {https://doi.org/10.3109/02699209608985167}, }
@article{Harmsen2021, title={Automatic Detection and Annotation of Spelling Errors and Orthographic Properties in the Dutch BasiScript Corpus}, volume={11}, url={https://www.clinjournal.org/clinj/article/view/140}, journal={Computational Linguistics in the Netherlands Journal}, author={Harmsen, Wieke Noa and Cucchiarini, Catia and Strik, Helmer}, year={2021}, month={Dec.}, pages={281–306} }
Owner
- Login: WiekeHarmsen
- Kind: user
- Repositories: 1
- Profile: https://github.com/WiekeHarmsen
Citation (citation.cff)
cff-version: 1.0.0
message: If you use this software, please cite it using these metadata.
title: Algorithm for Dutch Multiple Sequence Alignment of Phoneme and Grapheme Strings
authors:
- family-names: Harmsen
given-names: Wieke
orcid: " https://orcid.org/0000-0002-3329-2201"
- family-names: Cucchiarini
given-names: Catia
- family-names: Strik
given-names: Helmer
version: 1.0
date-released: "2023-01-22"
repository-code: "https://github.com/WiekeHarmsen/multi_seq_align_nl"
GitHub Events
Total
- Push event: 1
- Fork event: 1
Last Year
- Push event: 1
- Fork event: 1