https://github.com/csteinmetz1/mixcnn

Convolutional Neural Network for multitrack mix leveling

https://github.com/csteinmetz1/mixcnn

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Convolutional Neural Network for multitrack mix leveling

Basic Info
  • Host: GitHub
  • Owner: csteinmetz1
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 2.1 MB
Statistics
  • Stars: 18
  • Watchers: 2
  • Forks: 3
  • Open Issues: 1
  • Releases: 0
Created over 8 years ago · Last pushed almost 8 years ago
Metadata Files
Readme

README.md

MixCNN

Mulitrack mix leveling with convolutional neural nets.

Setup

Install dependancies.

$ pip install --upgrade -r requirements.txt

Install python ITU-R BS.1770-4 loudness package.

$ git clone https://github.com/csteinmetz1/pyloudnorm.git $ cd pyloudnorm $ python setup.py install

Dataset

Download and extract the DSD100 dataset: http://liutkus.net/DSD100.zip (12 GB)

Ensure that the extracted DSD100 directory is placed in the top of the directory structure.

Pre-process

To generate the input and output data run the pre_process.py script.

$ python pre_process.py

This will first measure the true mix loudness levels (and then calculate loudness ratios w.r.t the bass) which are saved to a .csv file. Then all of the stems are normalized to -24 LUFS. Next melspectrograms with frame size 1024 and and hop length of the same size are generated of the normalized stems and stored in a pickle file.

During training the melspectrograms of each subgroup is frammed with frame size of 128 (about 3 seconds of audio) and then stacked depth-wise to produce inputs of size 128x128x4. A single stack of TF-patches of length 128 are shown below for a single song in the data

tf_patches

Train

To train the CNN model run the train_cnn.py script.

$ python train_cnn.py

Owner

  • Name: Christian J. Steinmetz
  • Login: csteinmetz1
  • Kind: user
  • Location: London, UK
  • Company: @aim-qmul

Machine learning for Hi-Fi audio. PhD Researcher at C4DM.

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • forart (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • librosa =0.5.1
  • matplotlib =2.1.1
  • numpy =1.14.2
  • pandas =0.21.0
  • scipy =1.0.0