speech_data_ghana_ug

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.

https://github.com/isaacwiafe/speech_data_ghana_ug

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.3%) to scientific vocabulary

Keywords

data data-science ghana legon llm tts ug ugspeechdata
Last synced: 9 months ago · JSON representation ·

Repository

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.

Basic Info
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
data data-science ghana legon llm tts ug ugspeechdata
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

UGSpeechData - Audio speech dataset of 5 Ghanaian languages - Akan, Ewe, Dagbani, Dagaare, and Ikposo

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.

Link(s) to Data Assets

<!--- Transcriptions --->

AUDIO_ID.csv Description

Column Description
IMAGE_URL Provides the relative path to the images in the folder
IMAGE_SRC_URL Provides the source path to the actual image online
AUDIO_URL Provides the relative path to the local audio language in the Local Audio folder
ORG_NAME Identifies the institution coordinating the audio collection
PROJECT_NAME Provides the name of the project
SPEAKER_ID Provides the ID number of the individual describing the image
LOCALE Provides the local language IETF BCP 47 language tag of the audio file
GENDER Provides the individual providing the audio description gender
AGE Provides the individual providing the audio description age
DEVICE Identifies the device from which the audio recording was done
ENVIRONMENT Identifies the space within which the audio was recorded
YEAR The year in which the audio was recorded

Note: Local IDs

Locale ID Name
ak_gh Akan
dga_gh Dagbani
dag_gh Dagaare
ee_gh Ewe
kpo_gh Ikposo

CITATION

Wiafe, I., Abdulai, J., Ekpezu, A. O., Dodzie, R., Atsakpo, E. D., Nutrokpor, C., Winful, F. B. P., & Solaga, K. K. (2023). UGSPEECHDATA (Version 1.0.0) [Data set]. https://github.com/isaacwiafe/speechdataug

Owner

  • Login: isaacwiafe
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this data, please cite it as below."
authors:
  - family-names: Wiafe
    given-names: Isaac
    orcid: "https://orcid.org/0000-0003-1149-3309"
  - family-names: Abdulai
    given-names: Jamal-Deen
    orcid: "https://orcid.org/0000-0002-4871-9458"
  - family-names: Ekpezu
    given-names: Akon Obu
    orcid: "https://orcid.org/0000-0002-9502-1052"
  - family-names: Dodzie
    given-names: Raynard
    orcid: "https://orcid.org/0009-0003-0771-6933"
  - family-names: Atsakpo
    given-names: Elikem Doe
    orcid: "https://orcid.org/0000-0002-2283-9275"
  - family-names: Nutrokpor
    given-names: Charles
    orcid: "https://orcid.org/0000-0002-9155-4556"
  - family-names: Winful
    given-names:  Fiifi Baffoe Payin
    orcid: "https://orcid.org/0009-0004-0594-3662"
  - family-names: Solaga
    given-names:  Kafui Kwashie
    orcid: "https://orcid.org/0009-0003-4810-9355"
title: "UGSPEECHDATA"
version: 1.0.0
identifiers:
  - type: doi
    value: ""
date-released: 2023-03-01
repository-code: "https://github.com/isaacwiafe/speech_data_ug"
keywords: 
  - Speech Data
  - Dataset
  - University of Ghana
  - Ghana
  - West Africa
  - Computer Science 
abstract: "The UG SPEECHDATA dataset was collected and curated at the University of Ghana, Legon, Accra, Ghana, to support local language speech research in Ghanaian languages."
type: dataset

GitHub Events

Total
  • Push event: 15
Last Year
  • Push event: 15