https://github.com/brucewlee/lama-music-genre-dataset
.wav files, training dataset (MFCC), and graph plots (FFTs, MFCCs, Waveforms) from Latin America, Asia, MiddleEast, and Africa
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords
Repository
.wav files, training dataset (MFCC), and graph plots (FFTs, MFCCs, Waveforms) from Latin America, Asia, MiddleEast, and Africa
Basic Info
- Host: GitHub
- Owner: brucewlee
- License: other
- Language: Python
- Default Branch: master
- Homepage: https://doi.org/10.7910/DVN/13BPFB
- Size: 23.8 MB
Statistics
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
readme.md
LAMA : A World Music Genre Dataset
LAMA - LatinAmerica, Asia, MiddleEastern, Africa Genre Dataset
This Dataset consists of the .wav files, training dataset (MFCC), and graph plots (FFTs, MFCCs, STFTs, Waveforms) of YouTube videos classified into four categories: LatinAmerica, Asia, MiddleEastern, and Africa.
I hope that this work can help in several Deep Learning, Machine Learning projects in Music Genre Classification.
Anyone is free to use/change/contribute to this Dataset. Please cite if you use this Dataset in your research/projects.
Getting Started
Some data couldn't be uploaded to GitHub because the file size was too large. Instead, I attached a Harvard Dataverse link below to retrieve the data.
The data contained in LAMA can be classified into three categories:
1. .wav files (Data_Original, Data_Cropped) -> Uploaded in Harvard Dataverse. Link below.
2. Waveform, FFT, STFT, MFCC graph plots (Graphs, Sample_Graphs) -> Uploaded in Harvard Dataverse. Link below.
3. mfcc training data (training_data.json)
Statistics
Data_Original: 113 (Africa), 83 (Asia), 90 (LatinAmerica), 80 (MiddleEast)Data_Cropped(1 min clips of original) : 108 (Africa), 77 (Asia), 87 (LatinAmerica), 77 (MiddleEast)Graphs-> FFT : 108 (Africa), 77 (Asia), 87 (LatinAmerica), 77 (MiddleEast)
graph plot EXAMPLES:
from Graphs
from Sample_Graphs

Acknowledgements
The classification of genre in this Dataset is mostly from the "AudioSet" project by the Sound and Video Understanding teams at Google Research. I chose the best examples from their website and preprocessed them to create this Dataset.
In addition, I obtained much of the knowledge needed to create this Dataset from the YouTube Channel, "Valerio Velardo - The Sound of AI. Mr. Velardo makes amazing videos, and I believe that he deserves more recognition.
Citing this DataSet
@data{DVN/13BPFB_2020,
author = {Lee, Bruce W},
publisher = {Harvard Dataverse},
title = {{LAMA World Music Genre Dataset}},
year = {2020},
version = {DRAFT VERSION},
doi = {10.7910/DVN/13BPFB},
url = {https://doi.org/10.7910/DVN/13BPFB}
}
Owner
- Name: Bruce W. Lee (이웅성)
- Login: brucewlee
- Kind: user
- Location: Philadelphia, PA
- Company: University of Pennsylvania
- Website: brucewlee.github.io
- Repositories: 3
- Profile: https://github.com/brucewlee
Research Scientist - NLP
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Bruce W. Lee | w****e@g****m | 26 |
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Pillow ==8.0.1
- SoundFile ==0.10.3.post1
- appdirs ==1.4.4
- audioread ==2.1.9
- certifi ==2020.11.8
- cffi ==1.14.4
- chardet ==3.0.4
- cycler ==0.10.0
- decorator ==4.4.2
- idna ==2.10
- joblib ==0.17.0
- kiwisolver ==1.3.1
- librosa ==0.8.0
- llvmlite ==0.34.0
- matplotlib ==3.3.3
- numba ==0.51.2
- numpy ==1.19.4
- packaging ==20.7
- pandas ==1.1.4
- pooch ==1.3.0
- pycparser ==2.20
- pydub ==0.24.1
- pyparsing ==2.4.7
- python-dateutil ==2.8.1
- pytz ==2020.4
- requests ==2.25.0
- resampy ==0.2.2
- scikit-learn ==0.23.2
- scipy ==1.5.4
- six ==1.15.0
- threadpoolctl ==2.1.0
- urllib3 ==1.26.2