yoruba-text

Yorùbá language training text for NLP, ASR and TTS tasks

https://github.com/niger-volta-lti/yoruba-text

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.0%) to scientific vocabulary

Keywords

african-languages asr diacritization machine-translation natural-language-processing nlp nlp-datasets training-dataset tts yoruba
Last synced: 9 months ago · JSON representation

Repository

Yorùbá language training text for NLP, ASR and TTS tasks

Basic Info
  • Host: GitHub
  • Owner: Niger-Volta-LTI
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 76.2 MB
Statistics
  • Stars: 76
  • Watchers: 7
  • Forks: 26
  • Open Issues: 5
  • Releases: 0
Topics
african-languages asr diacritization machine-translation natural-language-processing nlp nlp-datasets training-dataset tts yoruba
Created over 8 years ago · Last pushed about 3 years ago
Metadata Files
Readme License Citation

README.md

Yorb text

This repository contains fully diacritized Yorb text, converted to Unicode Normalization Form Composition (NFC) format, where diacritized characters are composed into a single character with the following code:

def convert_to_NFC(filename, outfilename): text=''.join(c for c in unicodedata.normalize('NFC', open(filename).read())) with open(outfilename, 'w') as f: f.write(text)

Sources:

#### Sources yet to be scraped and cleaned * BBC Yorb * Yorb for Academic Purpose * Yob m odu * wa Elr Jhf * Or Kn * Iw ti Nic * Alkw * d Yorb Rw * m_r * ryoruba * Wikipedia * Poetry of lrewj Adpj

Social Media sources:

  • https://twitter.com/yobamoodua
  • https://twitter.com/yoruba_proverbs
  • https://www.facebook.com/oweyoruba

Text has been gathered with permission from online sources, and lightly preprocessed for use in NLP, TTS, ASR applications. Note, some of the sentences may have errors, please submit a pull-request if you have corrections!

Resources

  • https://clas.uiowa.edu/dwllc/allnet/yoruba-language-and-culture-resources
  • https://glosbe.com/yo/en

Bibtex

If you want to cite this repo in your work, please use:

@misc{Orife_yoruba-text_2018, author = {Orife, Iroro and Fasubaa, Timilehin and Wahab, Olamilekan}, month = {1}, title = {{yoruba-text}}, url = {https://github.com/Niger-Volta-LTI/yoruba-text}, year = {2018} }

Owner

  • Name: Niger-Volta Language Technologies Institute
  • Login: Niger-Volta-LTI
  • Kind: organization
  • Location: Naija, Germany, Yankee → International

Speech Recognition, Language Identification, Machine Translation & Natural Language Processing for West African Languages

GitHub Events

Total
  • Watch event: 8
  • Fork event: 4
Last Year
  • Watch event: 8
  • Fork event: 4