yoruba-text
Yorùbá language training text for NLP, ASR and TTS tasks
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.0%) to scientific vocabulary
Keywords
Repository
Yorùbá language training text for NLP, ASR and TTS tasks
Basic Info
Statistics
- Stars: 76
- Watchers: 7
- Forks: 26
- Open Issues: 5
- Releases: 0
Topics
Metadata Files
README.md
Yorb text
This repository contains fully diacritized Yorb text, converted to Unicode Normalization Form Composition (NFC) format, where diacritized characters are composed into a single character with the following code:
def convert_to_NFC(filename, outfilename):
text=''.join(c for c in unicodedata.normalize('NFC', open(filename).read()))
with open(outfilename, 'w') as f:
f.write(text)
Sources:
- Lagos-NWU conversational corpus
- Bbl Mm n d Yorb de-n
- The Yorb blog
- Asubiaro, T., Adegbola, T. et al. (2018). A Word-Level Language Identification Strategy for Resource-Scarce Languages
- we Yorb
- w Ti Mmn
- Krn (Qur'an) Mm
#### Sources yet to be scraped and cleaned * BBC Yorb * Yorb for Academic Purpose * Yob m odu * wa Elr Jhf * Or Kn * Iw ti Nic * Alkw * d Yorb Rw * m_r * ryoruba * Wikipedia * Poetry of lrewj Adpj
Social Media sources:
- https://twitter.com/yobamoodua
- https://twitter.com/yoruba_proverbs
- https://www.facebook.com/oweyoruba
Text has been gathered with permission from online sources, and lightly preprocessed for use in NLP, TTS, ASR applications. Note, some of the sentences may have errors, please submit a pull-request if you have corrections!
Resources
- https://clas.uiowa.edu/dwllc/allnet/yoruba-language-and-culture-resources
- https://glosbe.com/yo/en
Bibtex
If you want to cite this repo in your work, please use:
@misc{Orife_yoruba-text_2018,
author = {Orife, Iroro and Fasubaa, Timilehin and Wahab, Olamilekan},
month = {1},
title = {{yoruba-text}},
url = {https://github.com/Niger-Volta-LTI/yoruba-text},
year = {2018}
}
Owner
- Name: Niger-Volta Language Technologies Institute
- Login: Niger-Volta-LTI
- Kind: organization
- Location: Naija, Germany, Yankee → International
- Repositories: 21
- Profile: https://github.com/Niger-Volta-LTI
Speech Recognition, Language Identification, Machine Translation & Natural Language Processing for West African Languages
GitHub Events
Total
- Watch event: 8
- Fork event: 4
Last Year
- Watch event: 8
- Fork event: 4