Soundata

Soundata: Reproducible use of audio datasets - Published in JOSS (2024)

https://github.com/soundata/soundata

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    8 of 30 committers (26.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

audio bioacoustics dataset environmental-sound python urban-sound
Last synced: 4 months ago · JSON representation ·

Repository

Python library for downloading, loading & working with sound datasets

Basic Info
Statistics
  • Stars: 344
  • Watchers: 9
  • Forks: 27
  • Open Issues: 25
  • Releases: 19
Topics
audio bioacoustics dataset environmental-sound python urban-sound
Created almost 5 years ago · Last pushed 5 months ago
Metadata Files
Readme License Citation

README.md

soundata

Python library for downloading, loading & working with sound datasets. Check the API documentation and the contributing instructions.
For Music Information Retrieval (MIR) datasets please check mirdata.

CI status Formatting status Linting status Downloads

codecov Documentation Status GitHub PyPI version PRs Welcome

This library provides tools for working with common sound datasets, including tools for: * Downloading datasets to a common location and format * Validating that the files for a dataset are all present * Loading annotation files to a common format * Parsing clip-level metadata for detailed evaluations

Here's soundata's list of currently supported datasets.

Installation

To install, simply run:

python pip install soundata

Quick example

```python import soundata

dataset = soundata.initialize('urbansound8k') dataset.download() # download the dataset dataset.validate() # validate that all the expected files are there

exampleclip = dataset.choiceclip() # choose a random example clip print(example_clip) # see the available data

``` See the documentation for more examples and the API reference.

Contributing a new dataset loader

We welcome and encourage contributions to this library, especially new dataset loaders. Please see contributing for guidelines. Feel free to open an issue if you have any doubt or your run into problems when working on the library.

Releases

The Soundata Zenodo repository is the preferred source for downloading the software releases.

DOI

Citing

If you use Soundata in your pipeline, please cite the version used with the corresponding DOI of the version release in Zenodo. For Soundata v1.0.1.:

DOI

If you refer to soundata's design principles, motivation etc., please cite the JOSS article:

DOI

bibtex @article{Fuentes2024, title = {{Soundata: Reproducible use of audio datasets}}, author = {Fuentes, Magdalena and Plaja-Roglans, Genís and Cortès-Sebastià, Guillem and Khandelwal, Tanmay and Miron, Marius and Serra, Xavier and Bello, Juan Pablo and Salamon, Justin}, year = 2024, month = jun, journal = {Journal of Open Source Software}, volume = 9, number = 98, pages = 6634, doi = {10.21105/joss.06634}, url = {https://joss.theoj.org/papers/10.21105/joss.06634} }

When working with datasets, please include the reference of the dataset, which can be found in the respective dataset loader using cite().

Owner

  • Name: soundata
  • Login: soundata
  • Kind: organization

JOSS Publication

Soundata: Reproducible use of audio datasets
Published
June 18, 2024
Volume 9, Issue 98, Page 6634
Authors
Magdalena Fuentes ORCID
New York University, New York, United States
Genís Plaja-Roglans ORCID
Universitat Pompeu Fabra, Barcelona, Spain
Guillem Cortès-Sebastià ORCID
Universitat Pompeu Fabra, Barcelona, Spain
Tanmay Khandelwal ORCID
New York University, New York, United States
Marius Miron ORCID
Earth Species Project, Barcelona, Spain
Xavier Serra ORCID
Universitat Pompeu Fabra, Barcelona, Spain
Juan Pablo Bello ORCID
New York University, New York, United States
Justin Salamon ORCID
Adobe Research, San Francisco, United States
Editor
Fabian-Robert Stöter ORCID
Tags
audio dataset urban-sound environmental-sound bioacoustics

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Fuentes
  given-names: Magdalena
  orcid: "https://orcid.org/0000-0003-4506-6639"
- family-names: Plaja-Roglans
  given-names: Genís
  orcid: "https://orcid.org/0000-0003-3450-3194"
- family-names: Cortès-Sebastià
  given-names: Guillem
  orcid: "https://orcid.org/0000-0003-2827-8955"
- family-names: Khandelwal
  given-names: Tanmay
  orcid: "https://orcid.org/0009-0004-3770-8317"
- family-names: Miron
  given-names: Marius
  orcid: "https://orcid.org/0000-0002-2563-075X"
- family-names: Serra
  given-names: Xavier
  orcid: "https://orcid.org/0000-0003-1395-2345"
- family-names: Bello
  given-names: Juan Pablo
  orcid: "https://orcid.org/0000-0001-8561-5204"
- family-names: Salamon
  given-names: Justin
  orcid: "https://orcid.org/0000-0001-6345-4593"
contact:
- family-names: Fuentes
  given-names: Magdalena
  orcid: "https://orcid.org/0000-0003-4506-6639"
doi: 10.5281/zenodo.11580085
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Fuentes
    given-names: Magdalena
    orcid: "https://orcid.org/0000-0003-4506-6639"
  - family-names: Plaja-Roglans
    given-names: Genís
    orcid: "https://orcid.org/0000-0003-3450-3194"
  - family-names: Cortès-Sebastià
    given-names: Guillem
    orcid: "https://orcid.org/0000-0003-2827-8955"
  - family-names: Khandelwal
    given-names: Tanmay
    orcid: "https://orcid.org/0009-0004-3770-8317"
  - family-names: Miron
    given-names: Marius
    orcid: "https://orcid.org/0000-0002-2563-075X"
  - family-names: Serra
    given-names: Xavier
    orcid: "https://orcid.org/0000-0003-1395-2345"
  - family-names: Bello
    given-names: Juan Pablo
    orcid: "https://orcid.org/0000-0001-8561-5204"
  - family-names: Salamon
    given-names: Justin
    orcid: "https://orcid.org/0000-0001-6345-4593"
  date-published: 2024-06-18
  doi: 10.21105/joss.06634
  issn: 2475-9066
  issue: 98
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6634
  title: "Soundata: Reproducible use of audio datasets"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06634"
  volume: 9
title: "Soundata: Reproducible use of audio datasets"

GitHub Events

Total
  • Issues event: 17
  • Watch event: 22
  • Member event: 1
  • Issue comment event: 21
  • Push event: 8
  • Pull request review comment event: 4
  • Pull request review event: 8
  • Pull request event: 33
  • Fork event: 9
Last Year
  • Issues event: 17
  • Watch event: 22
  • Member event: 1
  • Issue comment event: 21
  • Push event: 8
  • Pull request review comment event: 4
  • Pull request review event: 8
  • Pull request event: 33
  • Fork event: 9

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 829
  • Total Committers: 30
  • Avg Commits per committer: 27.633
  • Development Distribution Score (DDS): 0.77
Past Year
  • Commits: 10
  • Committers: 4
  • Avg Commits per committer: 2.5
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
Tanmay Khandelwal 3****4 191
Magdalena Fuentes m****o@g****m 143
Justin Salamon j****n@g****m 92
Rachel Bittner r****6@n****u 65
guillemcortes c****a@g****m 62
genisplaja g****a@u****u 56
Rachel Bittner r****r@s****m 36
Iran R. Roman i****n@c****u 33
David Rubinstein d****n 22
Name m****s@g****m 20
Vincent Lostanlen v****n@g****m 18
Pablo Zinemanas p****s@f****y 18
Pedro p****o@h****m 15
Tanmay Khandelwal 9****4 9
Andreas Jansson a****n@g****m 7
Harsh Palan h****4@g****m 7
Thor t****r@t****a 6
Stefano Scola 6****6 5
Karn Watcharasupat k****1@e****g 4
Keunwoo Choi g****b@g****m 4
Yujin y****1@n****u 3
Qingyang (Tom) Xi t****i@n****u 3
Vincent Lostanlen v****n@n****u 2
Michael Scibor m****r@g****m 2
Emmanuel Ferdman e****n@g****m 1
Janne j****t@g****m 1
Michael Scibor m****r@s****m 1
Guillem Cortès g****s@d****m 1
Kyungyun Lee k****3@g****m 1
ooyamatakehisa 4****a 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 77
  • Total pull requests: 169
  • Average time to close issues: 12 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 25
  • Total pull request authors: 19
  • Average comments per issue: 1.84
  • Average comments per pull request: 1.76
  • Merged pull requests: 118
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 38
  • Average time to close issues: 7 months
  • Average time to close pull requests: 27 days
  • Issue authors: 5
  • Pull request authors: 6
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.84
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • magdalenafuentes (15)
  • hagenw (9)
  • auroracramer (8)
  • iranroman (7)
  • pzinemanas (5)
  • justinsalamon (5)
  • hadware (3)
  • guillemcortes (3)
  • mrocamora (2)
  • sripathisridhar (2)
  • lostanlen (2)
  • nkundiushuti (2)
  • karnwatcharasupat (2)
  • M1stergame (1)
  • tanmayy24 (1)
Pull Request Authors
  • tanmayy24 (32)
  • magdalenafuentes (22)
  • justinsalamon (18)
  • guillemcortes (18)
  • Masetto96 (16)
  • yujin-kimmm (15)
  • iranroman (9)
  • harshpalan (7)
  • genisplaja (7)
  • pzinemanas (7)
  • danielskatz (4)
  • karnwatcharasupat (4)
  • sergigf03 (2)
  • emmanuel-ferdman (2)
  • faroit (2)
Top Labels
Issue Labels
new loader (13) enhancement (13) documentation (8) priority (4) bug (3) question (3) good first issue (1)
Pull Request Labels
documentation (2) priority (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 2,279 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 2
  • Total versions: 18
  • Total maintainers: 2
pypi.org: soundata

Python library for loading and working with sound datasets.

  • Homepage: https://github.com/soundata/soundata
  • Documentation: https://soundata.readthedocs.io/en/latest/
  • License: Copyright (c) 2016 All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of soundata nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • Latest release: 1.0.1
    published over 1 year ago
  • Versions: 18
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 2,279 Last month
Rankings
Stargazers count: 4.2%
Dependent packages count: 7.4%
Forks count: 8.4%
Average: 9.1%
Dependent repos count: 11.9%
Downloads: 13.8%
Last synced: 4 months ago

Dependencies

docs/requirements.txt pypi
  • jams *
  • librosa >=0.7.0
  • pandas *
  • sphinx ==4.0.2
  • sphinx-togglebutton *
  • tqdm *
setup.py pypi
  • jams *
  • librosa *
  • numpy >=1.16,
  • pandas *
  • requests *
  • tqdm *
.github/workflows/ci.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • codecov/codecov-action v3 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/formatting.yml actions
  • actions/checkout v3 composite
  • psf/black stable composite
.github/workflows/lint-python.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/pythonpublish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite