https://github.com/fgnt/upb_audio_tagging_2019
UPB system for the Kaggle competition "Freesound Audio Tagging 2019"
Science Score: 8.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization fgnt has institutional domain (nt.uni-paderborn.de) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
Repository
UPB system for the Kaggle competition "Freesound Audio Tagging 2019"
Basic Info
- Host: GitHub
- Owner: fgnt
- License: mit
- Language: Python
- Default Branch: master
- Size: 2.48 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 0
Created almost 7 years ago
· Last pushed about 6 years ago
https://github.com/fgnt/upb_audio_tagging_2019/blob/master/
# upb_audio_tagging_2019: Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision [\[pdf\]](http://dcase.community/documents/workshop2019/proceedings/DCASE2019Workshop_Ebbers_54.pdf)
This repository provides the source code for the 5-th place solution presented by Paderborn University for the Freesound Audio Tagging 2019 Kaggle Competition.
Our best submitted system achieved 75.5 % label-weighted label ranking average precision (lwlrap).
Later improvements due to sophisticated ensembling including Multi-Task-Learning led to 76.5 % lwlrap outperforming the winner of the competition.
Competition website: https://www.kaggle.com/c/freesound-audio-tagging-2019
If you are using this code please cite the following paper:
```
@inproceedings{Ebbers2019,
author = "Ebbers, Janek and Hb-Umbach, Reinhold",
title = "Convolutional Recurrent Neural Network and Data Augmentation for Audio Tagging with Noisy Labels and Minimal Supervision",
booktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)",
address = "New York University, NY, USA",
month = "October",
year = "2019",
pages = "64--68"
}
```
## Installation
Install requirements
```bash
$ pip install --user git+https://github.com/fgnt/lazy_dataset.git@ec06c1e8ff4ccb09420d2d641db8f6d9b1099a4f
$ pip install --user git+https://github.com/fgnt/paderbox.git@7b3b4e9d00e07664596108f987292b8c78d846b1
$ pip install --user git+https://github.com/fgnt/padertorch.git@88233a0c33ddcc33a6842a5f8dc6c24df84d9f09
```
Clone the repo
```bash
$ git clone https://github.com/fgnt/upb_audio_tagging_2019.git
$ cd upb_audio_tagging_2019
```
Install this package
```bash
$ pip install --user -e .
```
Create database description jsons
```bash
$ export FSDKaggle2019DIR=/path/to/fsd_kaggle_2019
$ python -m upb_audio_tagging_2019.create_jsons
```
## Training
We use sacred (https://sacred.readthedocs.io/en/latest/quickstart.html) for
configuration of a training. To train a model (without Multi-Task-Learning) run
```bash
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None
```
It is assumed that the folder `exp` in this git is the simulation folder.
If you want to change the simulation dir, add a symlink to the folder where you
want to store the simulation results: `ln -s /path/to/sim/dir exp`.
For each training a new timestamped subdirectory is created.
Monitoring of the training is done using tensorboardX.
To view the training progress call `tensorboard --logdir exp/subdirname`.
After training finished there will be a checkpoint `ckpt_final.pth` in the
subdirectory.
## Reproduce paper results
### Training
To train the ensembles described in our paper you need to run the following
configurations:
#### with provided labels:
```bash
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 curated_reps=7
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 curated_reps=7
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 curated_reps=7
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 curated_reps=5
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 curated_reps=5
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 curated_reps=5
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 curated_reps=3
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 curated_reps=3
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 curated_reps=3
```
#### with Multi-Task-Learning:
```bash
$ python -m upb_audio_tagging_2019.train with split=0 curated_reps=7
$ python -m upb_audio_tagging_2019.train with split=1 curated_reps=7
$ python -m upb_audio_tagging_2019.train with split=2 curated_reps=7
$ python -m upb_audio_tagging_2019.train with split=0 curated_reps=5
$ python -m upb_audio_tagging_2019.train with split=1 curated_reps=5
$ python -m upb_audio_tagging_2019.train with split=2 curated_reps=5
$ python -m upb_audio_tagging_2019.train with split=0 curated_reps=3
$ python -m upb_audio_tagging_2019.train with split=1 curated_reps=3
$ python -m upb_audio_tagging_2019.train with split=2 curated_reps=3
```
#### with Relabeling:
```bash
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 relabeled=True curated_reps=6
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 relabeled=True curated_reps=6
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 relabeled=True curated_reps=6
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 relabeled=True curated_reps=4
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 relabeled=True curated_reps=4
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 relabeled=True curated_reps=4
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=0 relabeled=True curated_reps=2
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=1 relabeled=True curated_reps=2
$ python -m upb_audio_tagging_2019.train with model.fcn_noisy=None split=2 relabeled=True curated_reps=2
```
##### Reproduce relabeling:
The relabeling procedure requires to train 15 models for relabeling
```bash
$ python -m upb_audio_tagging_2019.train with split=0 fold=0 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=0 fold=1 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=0 fold=2 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=0 fold=3 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=0 fold=4 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=0 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=1 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=2 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=3 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=1 fold=4 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=0 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=1 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=2 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=3 curated_reps=9
$ python -m upb_audio_tagging_2019.train with split=2 fold=4 curated_reps=9
```
Perform relabeling: Code not yet available
### Inference:
Kaggle kernel not yet public
Owner
- Name: Department of Communications Engineering University of Paderborn
- Login: fgnt
- Kind: organization
- Location: Paderborn, Germany
- Website: http://nt.uni-paderborn.de
- Repositories: 37
- Profile: https://github.com/fgnt
GitHub Events
Total
Last Year
Dependencies
setup.py
pypi
- cached_property *
- sacred *
- scikit-image *
- tensorboardX *
- torch *
- torchcontrib *
- torchvision *