speech_data_ghana_ug

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language. Of which 100 hours is transcribed.

https://github.com/hci-lab-ugspeechdata/speech_data_ghana_ug

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.3%) to scientific vocabulary

Keywords

asr data data-science ghana legon llm ml tts ug ugspeechdata
Last synced: 6 months ago · JSON representation ·

Repository

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language. Of which 100 hours is transcribed.

Basic Info
Statistics
  • Stars: 7
  • Watchers: 3
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
asr data data-science ghana legon llm ml tts ug ugspeechdata
Created about 1 year ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

UGSpeechData - Audio speech dataset of 5 Ghanaian languages - Akan, Ewe, Dagbani, Dagaare, and Ikposo

The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Dagaare, and Ikposo. Each language includes 100 hours of transcribed audio speech from indigenous speakers of the language.

Link(s) to Data Assets

<!--- Transcriptions --->

AUDIO_ID.csv Description

Column Description
IMAGE_URL Provides the relative path to the images in the folder
IMAGE_SRC_URL Provides the source path to the actual image online
AUDIO_URL Provides the relative path to the local audio language in the Local Audio folder
ORG_NAME Identifies the institution coordinating the audio collection
PROJECT_NAME Provides the name of the project
SPEAKER_ID Provides the ID number of the individual describing the image
LOCALE Provides the local language IETF BCP 47 language tag of the audio file
GENDER Provides the individual providing the audio description gender
AGE Provides the individual providing the audio description age
DEVICE Identifies the device from which the audio recording was done
ENVIRONMENT Identifies the space within which the audio was recorded
YEAR The year in which the audio was recorded

Note: Local IDs

Locale ID Name
ak_gh Akan
dga_gh Dagbani
dag_gh Dagaare
ee_gh Ewe
kpo_gh Ikposo

Licensce

This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.

CITATION

Wiafe, I., Abdulai, J.-D., Ekpezu, A. O., Helegah, R. D., Atsakpo, E. D., Nutrokpor, C., Winful, F. B. P., & Solaga, K. K. (2023). UGSpeechData. Science Data Bank. https://doi.org/10.57760/sciencedb.22298

Owner

  • Name: HCI LAB (UG Computer Science) Ghana
  • Login: HCI-LAB-UGSPEECHDATA
  • Kind: organization
  • Location: Ghana

A Computer Science Lab at university of Ghana with interest in researching on LLM, ML, AR and VR applications.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this data, please cite it as below."
authors:
  - family-names: Wiafe
    given-names: Isaac
    orcid: "https://orcid.org/0000-0003-1149-3309"
  - family-names: Abdulai
    given-names: Jamal-Deen
    orcid: "https://orcid.org/0000-0002-4871-9458"
  - family-names: Ekpezu
    given-names: Akon Obu
    orcid: "https://orcid.org/0000-0002-9502-1052"
  - family-names: Dodzi
    given-names: Raynard
    orcid: "https://orcid.org/0009-0003-0771-6933"
  - family-names: Atsakpo
    given-names: Elikem Doe
    orcid: "https://orcid.org/0000-0002-2283-9275"
  - family-names: Nutrokpor
    given-names: Charles
    orcid: "https://orcid.org/0000-0002-9155-4556"
  - family-names: Winful
    given-names:  Fiifi Baffoe Payin
    orcid: "https://orcid.org/0009-0004-0594-3662"
  - family-names: Solaga
    given-names:  Kafui Kwashie
    orcid: "https://orcid.org/0009-0003-4810-9355"
title: "UGSPEECHDATA"
version: 1.0.0
identifiers:
  - type: doi
    value: "https://doi.org/10.57760/sciencedb.22298"
date-released: 2023-03-01
repository-code: "https://github.com/isaacwiafe/speech_data_ug"
keywords: 
  - Speech Data
  - Dataset
  - University of Ghana
  - Ghana
  - West Africa
  - Computer Science 
abstract: "The UG SPEECHDATA dataset was collected and curated at the University of Ghana, Legon, Accra, Ghana, to support local language speech research in Ghanaian languages."
type: dataset

GitHub Events

Total
  • Watch event: 9
  • Member event: 3
  • Push event: 18
  • Pull request event: 5
  • Fork event: 3
  • Create event: 2
Last Year
  • Watch event: 9
  • Member event: 3
  • Push event: 18
  • Pull request event: 5
  • Fork event: 3
  • Create event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: about 7 hours
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: about 7 hours
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • dodziraynard (2)
  • Prom12 (2)
Top Labels
Issue Labels
Pull Request Labels