speech_data_ghana_ug
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language. Of which 100 hours is transcribed.
https://github.com/hci-lab-ugspeechdata/speech_data_ghana_ug
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.3%) to scientific vocabulary
Keywords
Repository
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language. Of which 100 hours is transcribed.
Basic Info
- Host: GitHub
- Owner: HCI-LAB-UGSPEECHDATA
- License: other
- Language: HTML
- Default Branch: main
- Homepage: https://hci-lab-ugspeechdata.github.io/speech_data_ghana_ug/
- Size: 106 KB
Statistics
- Stars: 7
- Watchers: 3
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
UGSpeechData - Audio speech dataset of 5 Ghanaian languages - Akan, Ewe, Dagbani, Dagaare, and Ikposo
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Dagaare, and Ikposo. Each language includes 100 hours of transcribed audio speech from indigenous speakers of the language.
Link(s) to Data Assets
<!--- Transcriptions --->
AUDIO_ID.csv Description
| Column | Description |
|---|---|
IMAGE_URL |
Provides the relative path to the images in the folder |
IMAGE_SRC_URL |
Provides the source path to the actual image online |
AUDIO_URL |
Provides the relative path to the local audio language in the Local Audio folder |
ORG_NAME |
Identifies the institution coordinating the audio collection |
PROJECT_NAME |
Provides the name of the project |
SPEAKER_ID |
Provides the ID number of the individual describing the image |
LOCALE |
Provides the local language IETF BCP 47 language tag of the audio file |
GENDER |
Provides the individual providing the audio description gender |
AGE |
Provides the individual providing the audio description age |
DEVICE |
Identifies the device from which the audio recording was done |
ENVIRONMENT |
Identifies the space within which the audio was recorded |
YEAR |
The year in which the audio was recorded |
Note: Local IDs
| Locale ID | Name |
|---|---|
ak_gh |
Akan |
dga_gh |
Dagbani |
dag_gh |
Dagaare |
ee_gh |
Ewe |
kpo_gh |
Ikposo |
Licensce
This project is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.
CITATION
Wiafe, I., Abdulai, J.-D., Ekpezu, A. O., Helegah, R. D., Atsakpo, E. D., Nutrokpor, C., Winful, F. B. P., & Solaga, K. K. (2023). UGSpeechData. Science Data Bank. https://doi.org/10.57760/sciencedb.22298
Owner
- Name: HCI LAB (UG Computer Science) Ghana
- Login: HCI-LAB-UGSPEECHDATA
- Kind: organization
- Location: Ghana
- Repositories: 1
- Profile: https://github.com/HCI-LAB-UGSPEECHDATA
A Computer Science Lab at university of Ghana with interest in researching on LLM, ML, AR and VR applications.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this data, please cite it as below."
authors:
- family-names: Wiafe
given-names: Isaac
orcid: "https://orcid.org/0000-0003-1149-3309"
- family-names: Abdulai
given-names: Jamal-Deen
orcid: "https://orcid.org/0000-0002-4871-9458"
- family-names: Ekpezu
given-names: Akon Obu
orcid: "https://orcid.org/0000-0002-9502-1052"
- family-names: Dodzi
given-names: Raynard
orcid: "https://orcid.org/0009-0003-0771-6933"
- family-names: Atsakpo
given-names: Elikem Doe
orcid: "https://orcid.org/0000-0002-2283-9275"
- family-names: Nutrokpor
given-names: Charles
orcid: "https://orcid.org/0000-0002-9155-4556"
- family-names: Winful
given-names: Fiifi Baffoe Payin
orcid: "https://orcid.org/0009-0004-0594-3662"
- family-names: Solaga
given-names: Kafui Kwashie
orcid: "https://orcid.org/0009-0003-4810-9355"
title: "UGSPEECHDATA"
version: 1.0.0
identifiers:
- type: doi
value: "https://doi.org/10.57760/sciencedb.22298"
date-released: 2023-03-01
repository-code: "https://github.com/isaacwiafe/speech_data_ug"
keywords:
- Speech Data
- Dataset
- University of Ghana
- Ghana
- West Africa
- Computer Science
abstract: "The UG SPEECHDATA dataset was collected and curated at the University of Ghana, Legon, Accra, Ghana, to support local language speech research in Ghanaian languages."
type: dataset
GitHub Events
Total
- Watch event: 9
- Member event: 3
- Push event: 18
- Pull request event: 5
- Fork event: 3
- Create event: 2
Last Year
- Watch event: 9
- Member event: 3
- Push event: 18
- Pull request event: 5
- Fork event: 3
- Create event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 7 hours
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 7 hours
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- dodziraynard (2)
- Prom12 (2)