speech_data_ghana_ug
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.3%) to scientific vocabulary
Keywords
Repository
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.
Basic Info
- Host: GitHub
- Owner: isaacwiafe
- License: other
- Language: HTML
- Default Branch: main
- Homepage: https://isaacwiafe.github.io/speech_data_ghana_ug/
- Size: 48.8 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
UGSpeechData - Audio speech dataset of 5 Ghanaian languages - Akan, Ewe, Dagbani, Dagaare, and Ikposo
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language and 100 hours of transcription.
Link(s) to Data Assets
<!--- Transcriptions --->
AUDIO_ID.csv Description
| Column | Description |
|---|---|
IMAGE_URL |
Provides the relative path to the images in the folder |
IMAGE_SRC_URL |
Provides the source path to the actual image online |
AUDIO_URL |
Provides the relative path to the local audio language in the Local Audio folder |
ORG_NAME |
Identifies the institution coordinating the audio collection |
PROJECT_NAME |
Provides the name of the project |
SPEAKER_ID |
Provides the ID number of the individual describing the image |
LOCALE |
Provides the local language IETF BCP 47 language tag of the audio file |
GENDER |
Provides the individual providing the audio description gender |
AGE |
Provides the individual providing the audio description age |
DEVICE |
Identifies the device from which the audio recording was done |
ENVIRONMENT |
Identifies the space within which the audio was recorded |
YEAR |
The year in which the audio was recorded |
Note: Local IDs
| Locale ID | Name |
|---|---|
ak_gh |
Akan |
dga_gh |
Dagbani |
dag_gh |
Dagaare |
ee_gh |
Ewe |
kpo_gh |
Ikposo |
CITATION
Wiafe, I., Abdulai, J., Ekpezu, A. O., Dodzie, R., Atsakpo, E. D., Nutrokpor, C., Winful, F. B. P., & Solaga, K. K. (2023). UGSPEECHDATA (Version 1.0.0) [Data set]. https://github.com/isaacwiafe/speechdataug
Owner
- Login: isaacwiafe
- Kind: user
- Repositories: 1
- Profile: https://github.com/isaacwiafe
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this data, please cite it as below."
authors:
- family-names: Wiafe
given-names: Isaac
orcid: "https://orcid.org/0000-0003-1149-3309"
- family-names: Abdulai
given-names: Jamal-Deen
orcid: "https://orcid.org/0000-0002-4871-9458"
- family-names: Ekpezu
given-names: Akon Obu
orcid: "https://orcid.org/0000-0002-9502-1052"
- family-names: Dodzie
given-names: Raynard
orcid: "https://orcid.org/0009-0003-0771-6933"
- family-names: Atsakpo
given-names: Elikem Doe
orcid: "https://orcid.org/0000-0002-2283-9275"
- family-names: Nutrokpor
given-names: Charles
orcid: "https://orcid.org/0000-0002-9155-4556"
- family-names: Winful
given-names: Fiifi Baffoe Payin
orcid: "https://orcid.org/0009-0004-0594-3662"
- family-names: Solaga
given-names: Kafui Kwashie
orcid: "https://orcid.org/0009-0003-4810-9355"
title: "UGSPEECHDATA"
version: 1.0.0
identifiers:
- type: doi
value: ""
date-released: 2023-03-01
repository-code: "https://github.com/isaacwiafe/speech_data_ug"
keywords:
- Speech Data
- Dataset
- University of Ghana
- Ghana
- West Africa
- Computer Science
abstract: "The UG SPEECHDATA dataset was collected and curated at the University of Ghana, Legon, Accra, Ghana, to support local language speech research in Ghanaian languages."
type: dataset
GitHub Events
Total
- Push event: 15
Last Year
- Push event: 15