https://github.com/fgnt/upb_audio_tagging_2019

UPB system for the Kaggle competition "Freesound Audio Tagging 2019"

https://github.com/fgnt/upb_audio_tagging_2019

Science Score: 8.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization fgnt has institutional domain (nt.uni-paderborn.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

UPB system for the Kaggle competition "Freesound Audio Tagging 2019"

Basic Info
  • Host: GitHub
  • Owner: fgnt
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 2.48 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created almost 7 years ago · Last pushed about 6 years ago

https://github.com/fgnt/upb_audio_tagging_2019/blob/master/

# upb_audio_tagging_2019: Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision [\[pdf\]](http://dcase.community/documents/workshop2019/proceedings/DCASE2019Workshop_Ebbers_54.pdf)

This repository provides the source code for the 5-th place solution presented by Paderborn University for the Freesound Audio Tagging 2019 Kaggle Competition.
Our best submitted system achieved 75.5 % label-weighted label ranking average precision (lwlrap).
Later improvements due to sophisticated ensembling including Multi-Task-Learning led to 76.5 % lwlrap outperforming the winner of the competition.

Competition website: https://www.kaggle.com/c/freesound-audio-tagging-2019

If you are using this code please cite the following paper:

```
@inproceedings{Ebbers2019,
    author = "Ebbers, Janek and Hb-Umbach, Reinhold",
    title = "Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision",
    booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)",
    address = "New York University, NY, USA",
    month = "October",
    year = "2019",
    pages = "64--68"
}

```

## Installation
Install requirements
```bash
$ pip install --user git+https://github.com/fgnt/lazy_dataset.git@ec06c1e8ff4ccb09420d2d641db8f6d9b1099a4f
$ pip install --user git+https://github.com/fgnt/paderbox.git@7b3b4e9d00e07664596108f987292b8c78d846b1
$ pip install --user git+https://github.com/fgnt/padertorch.git@88233a0c33ddcc33a6842a5f8dc6c24df84d9f09
```

Clone the repo
```bash
$ git clone https://github.com/fgnt/upb_audio_tagging_2019.git
$ cd upb_audio_tagging_2019
```

Install this package
```bash
$ pip install --user -e .
```

Create database description jsons
```bash
$ export FSDKaggle2019DIR=/path/to/fsd_kaggle_2019
$ python -m upb_audio_tagging_2019.create_jsons
```

## Training
We use sacred (https://sacred.readthedocs.io/en/latest/quickstart.html) for
configuration of a training. To train a model (without Multi-Task-Learning) run
```bash
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None
```

It is assumed that the folder `exp` in this git is the simulation folder.
If you want to change the simulation dir, add a symlink to the folder where you
want to store the simulation results: `ln -s /path/to/sim/dir exp`.
For each training a new timestamped subdirectory is created.
Monitoring of the training is done using tensorboardX.
To view the training progress call `tensorboard --logdir exp/subdirname`.
After training finished there will be a checkpoint `ckpt_final.pth` in the
subdirectory.


## Reproduce paper results
### Training
To train the ensembles described in our paper you need to run the following
configurations:
#### with provided labels:
```bash
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 curated_reps=7
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 curated_reps=7
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 curated_reps=7
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 curated_reps=5
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 curated_reps=5
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 curated_reps=5
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 curated_reps=3
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 curated_reps=3
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 curated_reps=3
```
#### with Multi-Task-Learning:
```bash
$ python -m upb_audio_tagging_2019.train with split=0 curated_reps=7
$ python -m upb_audio_tagging_2019.train with split=1 curated_reps=7
$ python -m upb_audio_tagging_2019.train with split=2 curated_reps=7
$ python -m upb_audio_tagging_2019.train with split=0 curated_reps=5
$ python -m upb_audio_tagging_2019.train with split=1 curated_reps=5
$ python -m upb_audio_tagging_2019.train with split=2 curated_reps=5
$ python -m upb_audio_tagging_2019.train with split=0 curated_reps=3
$ python -m upb_audio_tagging_2019.train with split=1 curated_reps=3
$ python -m upb_audio_tagging_2019.train with split=2 curated_reps=3
```

#### with Relabeling:
```bash
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 relabeled=True curated_reps=6
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 relabeled=True curated_reps=6
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 relabeled=True curated_reps=6
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 relabeled=True curated_reps=4
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 relabeled=True curated_reps=4
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 relabeled=True curated_reps=4
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 relabeled=True curated_reps=2
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 relabeled=True curated_reps=2
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 relabeled=True curated_reps=2
```
##### Reproduce relabeling:
The relabeling procedure requires to train 15 models for relabeling
```bash
$ python -m upb_audio_tagging_2019.train with split=0 fold=0 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=0 fold=1 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=0 fold=2 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=0 fold=3 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=0 fold=4 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=0 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=1 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=2 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=3 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=4 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=0 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=1 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=2 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=3 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=4 curated_reps=9
```
Perform relabeling: Code not yet available


### Inference:
Kaggle kernel not yet public

Owner

  • Name: Department of Communications Engineering University of Paderborn
  • Login: fgnt
  • Kind: organization
  • Location: Paderborn, Germany

GitHub Events

Total
Last Year

Dependencies

setup.py pypi
  • cached_property *
  • sacred *
  • scikit-image *
  • tensorboardX *
  • torch *
  • torchcontrib *
  • torchvision *