Soundata
Soundata: Reproducible use of audio datasets - Published in JOSS (2024)
Science Score: 100.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 12 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
✓Committers with academic emails
8 of 30 committers (26.7%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
Python library for downloading, loading & working with sound datasets
Basic Info
- Host: GitHub
- Owner: soundata
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Homepage: https://soundata.readthedocs.io/en/stable
- Size: 123 MB
Statistics
- Stars: 344
- Watchers: 9
- Forks: 27
- Open Issues: 25
- Releases: 19
Topics
Metadata Files
README.md
soundata

Python library for downloading, loading & working with sound datasets. Check the API documentation and the contributing instructions.
For Music Information Retrieval (MIR) datasets please check mirdata.
This library provides tools for working with common sound datasets, including tools for: * Downloading datasets to a common location and format * Validating that the files for a dataset are all present * Loading annotation files to a common format * Parsing clip-level metadata for detailed evaluations
Here's soundata's list of currently supported datasets.
Installation
To install, simply run:
python
pip install soundata
Quick example
```python import soundata
dataset = soundata.initialize('urbansound8k') dataset.download() # download the dataset dataset.validate() # validate that all the expected files are there
exampleclip = dataset.choiceclip() # choose a random example clip print(example_clip) # see the available data
``` See the documentation for more examples and the API reference.
Contributing a new dataset loader
We welcome and encourage contributions to this library, especially new dataset loaders. Please see contributing for guidelines. Feel free to open an issue if you have any doubt or your run into problems when working on the library.
Releases
The Soundata Zenodo repository is the preferred source for downloading the software releases.
Citing
If you use Soundata in your pipeline, please cite the version used with the corresponding DOI of the version release in Zenodo. For Soundata v1.0.1.:
If you refer to soundata's design principles, motivation etc., please cite the JOSS article:
bibtex
@article{Fuentes2024,
title = {{Soundata: Reproducible use of audio datasets}},
author = {Fuentes, Magdalena and Plaja-Roglans, Genís and Cortès-Sebastià, Guillem and Khandelwal, Tanmay and Miron, Marius and Serra, Xavier and Bello, Juan Pablo and Salamon, Justin},
year = 2024,
month = jun,
journal = {Journal of Open Source Software},
volume = 9,
number = 98,
pages = 6634,
doi = {10.21105/joss.06634},
url = {https://joss.theoj.org/papers/10.21105/joss.06634}
}
When working with datasets, please include the reference of the dataset, which can be found in the respective dataset loader using cite().
Owner
- Name: soundata
- Login: soundata
- Kind: organization
- Repositories: 1
- Profile: https://github.com/soundata
JOSS Publication
Soundata: Reproducible use of audio datasets
Authors
Tags
audio dataset urban-sound environmental-sound bioacousticsCitation (CITATION.cff)
cff-version: "1.2.0"
authors:
- family-names: Fuentes
given-names: Magdalena
orcid: "https://orcid.org/0000-0003-4506-6639"
- family-names: Plaja-Roglans
given-names: Genís
orcid: "https://orcid.org/0000-0003-3450-3194"
- family-names: Cortès-Sebastià
given-names: Guillem
orcid: "https://orcid.org/0000-0003-2827-8955"
- family-names: Khandelwal
given-names: Tanmay
orcid: "https://orcid.org/0009-0004-3770-8317"
- family-names: Miron
given-names: Marius
orcid: "https://orcid.org/0000-0002-2563-075X"
- family-names: Serra
given-names: Xavier
orcid: "https://orcid.org/0000-0003-1395-2345"
- family-names: Bello
given-names: Juan Pablo
orcid: "https://orcid.org/0000-0001-8561-5204"
- family-names: Salamon
given-names: Justin
orcid: "https://orcid.org/0000-0001-6345-4593"
contact:
- family-names: Fuentes
given-names: Magdalena
orcid: "https://orcid.org/0000-0003-4506-6639"
doi: 10.5281/zenodo.11580085
message: If you use this software, please cite our article in the
Journal of Open Source Software.
preferred-citation:
authors:
- family-names: Fuentes
given-names: Magdalena
orcid: "https://orcid.org/0000-0003-4506-6639"
- family-names: Plaja-Roglans
given-names: Genís
orcid: "https://orcid.org/0000-0003-3450-3194"
- family-names: Cortès-Sebastià
given-names: Guillem
orcid: "https://orcid.org/0000-0003-2827-8955"
- family-names: Khandelwal
given-names: Tanmay
orcid: "https://orcid.org/0009-0004-3770-8317"
- family-names: Miron
given-names: Marius
orcid: "https://orcid.org/0000-0002-2563-075X"
- family-names: Serra
given-names: Xavier
orcid: "https://orcid.org/0000-0003-1395-2345"
- family-names: Bello
given-names: Juan Pablo
orcid: "https://orcid.org/0000-0001-8561-5204"
- family-names: Salamon
given-names: Justin
orcid: "https://orcid.org/0000-0001-6345-4593"
date-published: 2024-06-18
doi: 10.21105/joss.06634
issn: 2475-9066
issue: 98
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 6634
title: "Soundata: Reproducible use of audio datasets"
type: article
url: "https://joss.theoj.org/papers/10.21105/joss.06634"
volume: 9
title: "Soundata: Reproducible use of audio datasets"
GitHub Events
Total
- Issues event: 17
- Watch event: 22
- Member event: 1
- Issue comment event: 21
- Push event: 8
- Pull request review comment event: 4
- Pull request review event: 8
- Pull request event: 33
- Fork event: 9
Last Year
- Issues event: 17
- Watch event: 22
- Member event: 1
- Issue comment event: 21
- Push event: 8
- Pull request review comment event: 4
- Pull request review event: 8
- Pull request event: 33
- Fork event: 9
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Tanmay Khandelwal | 3****4 | 191 |
| Magdalena Fuentes | m****o@g****m | 143 |
| Justin Salamon | j****n@g****m | 92 |
| Rachel Bittner | r****6@n****u | 65 |
| guillemcortes | c****a@g****m | 62 |
| genisplaja | g****a@u****u | 56 |
| Rachel Bittner | r****r@s****m | 36 |
| Iran R. Roman | i****n@c****u | 33 |
| David Rubinstein | d****n | 22 |
| Name | m****s@g****m | 20 |
| Vincent Lostanlen | v****n@g****m | 18 |
| Pablo Zinemanas | p****s@f****y | 18 |
| Pedro | p****o@h****m | 15 |
| Tanmay Khandelwal | 9****4 | 9 |
| Andreas Jansson | a****n@g****m | 7 |
| Harsh Palan | h****4@g****m | 7 |
| Thor | t****r@t****a | 6 |
| Stefano Scola | 6****6 | 5 |
| Karn Watcharasupat | k****1@e****g | 4 |
| Keunwoo Choi | g****b@g****m | 4 |
| Yujin | y****1@n****u | 3 |
| Qingyang (Tom) Xi | t****i@n****u | 3 |
| Vincent Lostanlen | v****n@n****u | 2 |
| Michael Scibor | m****r@g****m | 2 |
| Emmanuel Ferdman | e****n@g****m | 1 |
| Janne | j****t@g****m | 1 |
| Michael Scibor | m****r@s****m | 1 |
| Guillem Cortès | g****s@d****m | 1 |
| Kyungyun Lee | k****3@g****m | 1 |
| ooyamatakehisa | 4****a | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 77
- Total pull requests: 169
- Average time to close issues: 12 months
- Average time to close pull requests: about 2 months
- Total issue authors: 25
- Total pull request authors: 19
- Average comments per issue: 1.84
- Average comments per pull request: 1.76
- Merged pull requests: 118
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 38
- Average time to close issues: 7 months
- Average time to close pull requests: 27 days
- Issue authors: 5
- Pull request authors: 6
- Average comments per issue: 0.0
- Average comments per pull request: 0.84
- Merged pull requests: 13
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- magdalenafuentes (15)
- hagenw (9)
- auroracramer (8)
- iranroman (7)
- pzinemanas (5)
- justinsalamon (5)
- hadware (3)
- guillemcortes (3)
- mrocamora (2)
- sripathisridhar (2)
- lostanlen (2)
- nkundiushuti (2)
- karnwatcharasupat (2)
- M1stergame (1)
- tanmayy24 (1)
Pull Request Authors
- tanmayy24 (32)
- magdalenafuentes (22)
- justinsalamon (18)
- guillemcortes (18)
- Masetto96 (16)
- yujin-kimmm (15)
- iranroman (9)
- harshpalan (7)
- genisplaja (7)
- pzinemanas (7)
- danielskatz (4)
- karnwatcharasupat (4)
- sergigf03 (2)
- emmanuel-ferdman (2)
- faroit (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 2,279 last-month
- Total dependent packages: 0
- Total dependent repositories: 2
- Total versions: 18
- Total maintainers: 2
pypi.org: soundata
Python library for loading and working with sound datasets.
- Homepage: https://github.com/soundata/soundata
- Documentation: https://soundata.readthedocs.io/en/latest/
- License: Copyright (c) 2016 All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of soundata nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
Latest release: 1.0.1
published over 1 year ago
Rankings
Maintainers (2)
Dependencies
- jams *
- librosa >=0.7.0
- pandas *
- sphinx ==4.0.2
- sphinx-togglebutton *
- tqdm *
- jams *
- librosa *
- numpy >=1.16,
- pandas *
- requests *
- tqdm *
- actions/cache v3 composite
- actions/checkout v3 composite
- codecov/codecov-action v3 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v3 composite
- psf/black stable composite
- actions/cache v3 composite
- actions/checkout v3 composite
- conda-incubator/setup-miniconda v2 composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
