https://github.com/brucewlee/lama-music-genre-dataset

.wav files, training dataset (MFCC), and graph plots (FFTs, MFCCs, Waveforms) from Latin America, Asia, MiddleEast, and Africa

https://github.com/brucewlee/lama-music-genre-dataset

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary

Keywords

africa asia audio-processing classification dataset genre genre-classification genre-suggestion genres-classification harvard-dataverse lama mfcc music music-library signal-processing sound
Last synced: 5 months ago · JSON representation

Repository

.wav files, training dataset (MFCC), and graph plots (FFTs, MFCCs, Waveforms) from Latin America, Asia, MiddleEast, and Africa

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
africa asia audio-processing classification dataset genre genre-classification genre-suggestion genres-classification harvard-dataverse lama mfcc music music-library signal-processing sound
Created about 5 years ago · Last pushed about 5 years ago
Metadata Files
Readme License

readme.md

LAMA : A World Music Genre Dataset

LAMA - LatinAmerica, Asia, MiddleEastern, Africa Genre Dataset

This Dataset consists of the .wav files, training dataset (MFCC), and graph plots (FFTs, MFCCs, STFTs, Waveforms) of YouTube videos classified into four categories: LatinAmerica, Asia, MiddleEastern, and Africa.

I hope that this work can help in several Deep Learning, Machine Learning projects in Music Genre Classification.

MusicGenreClassification

Anyone is free to use/change/contribute to this Dataset. Please cite if you use this Dataset in your research/projects.

Getting Started

Some data couldn't be uploaded to GitHub because the file size was too large. Instead, I attached a Harvard Dataverse link below to retrieve the data.

The data contained in LAMA can be classified into three categories: 1. .wav files (Data_Original, Data_Cropped) -> Uploaded in Harvard Dataverse. Link below. 2. Waveform, FFT, STFT, MFCC graph plots (Graphs, Sample_Graphs) -> Uploaded in Harvard Dataverse. Link below. 3. mfcc training data (training_data.json)

Statistics

  • Data_Original : 113 (Africa), 83 (Asia), 90 (LatinAmerica), 80 (MiddleEast)
  • Data_Cropped (1 min clips of original) : 108 (Africa), 77 (Asia), 87 (LatinAmerica), 77 (MiddleEast)
  • Graphs -> FFT : 108 (Africa), 77 (Asia), 87 (LatinAmerica), 77 (MiddleEast)

Link to the full audio data

graph plot EXAMPLES:

from Graphs Image of SFTS from Sample_Graphs Image of MFCC

Acknowledgements

The classification of genre in this Dataset is mostly from the "AudioSet" project by the Sound and Video Understanding teams at Google Research. I chose the best examples from their website and preprocessed them to create this Dataset.

In addition, I obtained much of the knowledge needed to create this Dataset from the YouTube Channel, "Valerio Velardo - The Sound of AI. Mr. Velardo makes amazing videos, and I believe that he deserves more recognition.

Citing this DataSet

@data{DVN/13BPFB_2020, author = {Lee, Bruce W}, publisher = {Harvard Dataverse}, title = {{LAMA World Music Genre Dataset}}, year = {2020}, version = {DRAFT VERSION}, doi = {10.7910/DVN/13BPFB}, url = {https://doi.org/10.7910/DVN/13BPFB} }

Owner

  • Name: Bruce W. Lee (이웅성)
  • Login: brucewlee
  • Kind: user
  • Location: Philadelphia, PA
  • Company: University of Pennsylvania

Research Scientist - NLP

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 26
  • Total Committers: 1
  • Avg Commits per committer: 26.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Bruce W. Lee w****e@g****m 26

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • Pillow ==8.0.1
  • SoundFile ==0.10.3.post1
  • appdirs ==1.4.4
  • audioread ==2.1.9
  • certifi ==2020.11.8
  • cffi ==1.14.4
  • chardet ==3.0.4
  • cycler ==0.10.0
  • decorator ==4.4.2
  • idna ==2.10
  • joblib ==0.17.0
  • kiwisolver ==1.3.1
  • librosa ==0.8.0
  • llvmlite ==0.34.0
  • matplotlib ==3.3.3
  • numba ==0.51.2
  • numpy ==1.19.4
  • packaging ==20.7
  • pandas ==1.1.4
  • pooch ==1.3.0
  • pycparser ==2.20
  • pydub ==0.24.1
  • pyparsing ==2.4.7
  • python-dateutil ==2.8.1
  • pytz ==2020.4
  • requests ==2.25.0
  • resampy ==0.2.2
  • scikit-learn ==0.23.2
  • scipy ==1.5.4
  • six ==1.15.0
  • threadpoolctl ==2.1.0
  • urllib3 ==1.26.2