complex-cnn-deeplab-v3-with-stft-for-audio-denoising

Paper Name: Complex Convolution Neural Network model (Complex DeepLab v3) on STFT time-varying frequency components for audio denoising Creating a Complex Deep Lab v3 model for audio denoising using STFT complex mask Dataset from: https://datashare.is.ed.ac.uk/handle/10283/2791

https://github.com/athanatos96/complex-cnn-deeplab-v3-with-stft-for-audio-denoising

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, researchgate.net
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.5%) to scientific vocabulary

Keywords

audio-denoising audio-processing convolutional-neural-networks deep-learning deeplabv3 machine-learning pytorch stft
Last synced: 6 months ago · JSON representation ·

Repository

Paper Name: Complex Convolution Neural Network model (Complex DeepLab v3) on STFT time-varying frequency components for audio denoising Creating a Complex Deep Lab v3 model for audio denoising using STFT complex mask Dataset from: https://datashare.is.ed.ac.uk/handle/10283/2791

Basic Info
Statistics
  • Stars: 9
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
audio-denoising audio-processing convolutional-neural-networks deep-learning deeplabv3 machine-learning pytorch stft
Created about 3 years ago · Last pushed about 3 years ago
Metadata Files
Readme Citation

README.md

Complex Deep-Lab V3

PyTorch Implementation of Complex Convolution Neural Network model (Complex DeepLab v3) on STFT time-varying frequency components for audio denoising, (A. C. Parra, 2022)

Original Code

Original Code from https://github.com/sweetcocoa/DeepComplexUNetPyTorch/

Deep Lab V3

Code was adapted to work for Deep Lab V3 Rethinking Atrous Convolution for Semantic Image Segmentation, (L-C. Chen et al., 2017)

Reimplementation of DeepLabV3 to work with complex numbers

DeepLabv3 base code: https://github.com/pytorch/vision/blob/0dceac025615a1c2df6ec1675d8f9d7757432a49/torchvision/models/segmentation/deeplabv3.py

FCN head base code: https://github.com/pytorch/vision/blob/0dceac025615a1c2df6ec1675d8f9d7757432a49/torchvision/models/segmentation/fcn.py#L36

Resnet base code: https://github.com/pytorch/vision/blob/0dceac025615a1c2df6ec1675d8f9d7757432a49/torchvision/models/resnet.py#L166

Complex Layers

New functions adapted from https://github.com/wavefrontshaping/complexPyTorch/blob/70a511c1bedc4c7eeba0d571638b35ff0d8347a2/complexPyTorch/complexFunctions.py

They were built to run with complex types for pytorch. I had to change them to work with floats with 1 extra dimension of size 2 (Real, Imaginary)

New Functions and classes: ComplexAdaptiveAvgPool2d ComplexMaxPool2d ComplexReLU ComplexDropout complex_interpolate

Requirements

See file requirements.txt

Train

Download Datasets: - https://datashare.is.ed.ac.uk/handle/10283/2791

Train bash python ComplexDeepLabV3/train_dcunet.py \ --batch_size 2 \ --train_signal Data/DS_10283_2791/Train/clean_trainset_28spk_wav \ --train_noise Data/DS_10283_2791/Train/noisy_trainset_28spk_wav \ --test_signal Data/DS_10283_2791/Test/clean_testset_wav \ --test_noise Data/DS_10283_2791/Test/noisy_testset_wav \ --ckpt checkpoints/checkpoint.pth \ --num_step 300 \ --validation_interval 150\ --complex

Owner

  • Name: Alejandro C Parra
  • Login: athanatos96
  • Kind: user
  • Location: New York

Master of Science in Artificial Intelligence | Machine Learning Engineer | Business Administration and Management

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Parra Garcia"
  given-names: "Alejandro C."
  orcid: "https://orcid.org/0000-0002-9503-1357"
title: "Complex-CNN-DeepLab-v3-with-STFT-for-audio-denoising"
version: 1.0.0
date-released: 2022-12-22
url: "https://github.com/athanatos96/Complex-CNN-DeepLab-v3-with-STFT-for-audio-denoising"

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • appdirs =1.4.4=pyh9f0ad1d_0
  • audioread =2.1.9=py36ha15d459_0
  • backtrace =0.2.1=pypi_0
  • blas =1.0=mkl
  • brotlipy =0.7.0=py36h68aa20f_1001
  • ca-certificates =2022.12.7=h5b45459_0
  • certifi =2021.5.30=py36ha15d459_0
  • cffi =1.14.6=py36h2bbff1b_0
  • charset-normalizer =2.1.1=pyhd8ed1ab_0
  • colorama =0.3.7=pypi_0
  • console_shortcut =0.1.1=4
  • cryptography =35.0.0=py36hd0de82c_0
  • cudatoolkit =10.0.130=0
  • cycler =0.11.0=pyhd8ed1ab_0
  • decorator =5.1.1=pyhd8ed1ab_0
  • easydict =1.9=pypi_0
  • freetype =2.12.1=ha860e81_0
  • hdf5 =1.8.20=hac2f561_1
  • icc_rt =2022.1.0=h6049295_2
  • idna =3.4=pyhd8ed1ab_0
  • intel-openmp =2022.1.0=h59b6b97_3788
  • joblib =1.2.0=pyhd8ed1ab_0
  • jpeg =9e=h2bbff1b_0
  • kiwisolver =1.3.1=py36he95197e_1
  • lerc =3.0=hd77b12b_0
  • libblas =3.8.0=20_mkl
  • libcblas =3.8.0=20_mkl
  • libdeflate =1.8=h2bbff1b_5
  • libflac =1.3.4=h0e60522_0
  • liblapack =3.8.0=20_mkl
  • libogg =1.3.4=h8ffe710_1
  • libopencv =3.4.2=h20b85fd_0
  • libopus =1.3.1=h8ffe710_1
  • libpng =1.6.37=h2a8f88b_0
  • librosa =0.9.2=pyhd8ed1ab_0
  • libsndfile =1.0.31=h0e60522_1
  • libtiff =4.4.0=h8a3f274_2
  • libvorbis =1.3.7=h0e60522_0
  • llvmlite =0.36.0=py36haecd60e_0
  • lz4-c =1.9.3=h2bbff1b_1
  • m2w64-gcc-libgfortran =5.3.0=6
  • m2w64-gcc-libs =5.3.0=7
  • m2w64-gcc-libs-core =5.3.0=7
  • m2w64-gmp =6.1.0=2
  • m2w64-libwinpthread-git =5.0.0.4634.697f757=2
  • matplotlib-base =3.3.4=py36h1abdf75_0
  • mkl =2020.2=256
  • mkl-service =2.3.0=py36h196d8e1_0
  • mkl_fft =1.3.0=py36h46781fe_0
  • mkl_random =1.1.1=py36h47e9c7a_0
  • msys2-conda-epoch =20160418=1
  • ninja =1.10.2=haa95532_5
  • ninja-base =1.10.2=h6d14046_5
  • numba =0.53.1=py36hd0dfabe_1
  • numpy =1.19.2=py36hadc3359_0
  • numpy-base =1.19.2=py36ha3acd2a_0
  • olefile =0.46=py36_0
  • opencv =3.4.2=py36h40b0b35_0
  • openssl =1.1.1q=h8ffe710_0
  • packaging =21.3=pyhd8ed1ab_0
  • pandas =0.25.3=py36he350917_0
  • pesq =0.0.4=pypi_0
  • pillow =8.3.1=py36h4fa10fc_0
  • pinkblack =0.0.9=pypi_0
  • pip =20.0.2=py36_1
  • pooch =1.6.0=pyhd8ed1ab_0
  • protobuf =3.19.6=pypi_0
  • py-opencv =3.4.2=py36hc319ecb_0
  • pycparser =2.21=pyhd3eb1b0_0
  • pyopenssl =22.0.0=pyhd8ed1ab_1
  • pyparsing =3.0.9=pyhd8ed1ab_0
  • pypesq =1.2.4=pypi_0
  • pysocks =1.7.1=py36ha15d459_3
  • pysoundfile =0.11.0=pyhd8ed1ab_0
  • python =3.6.13=h3758d61_0
  • python-dateutil =2.8.2=pyhd3eb1b0_0
  • python_abi =3.6=2_cp36m
  • pytorch =1.1.0=py3.6_cuda100_cudnn7_1
  • pytz =2021.3=pyhd3eb1b0_0
  • requests =2.28.1=pyhd8ed1ab_0
  • resampy =0.4.2=pyhd8ed1ab_0
  • scikit-learn =0.24.2=py36h5a2dbc3_1
  • scipy =1.5.3=py36h27d303f_1
  • setuptools =49.6.0=py36ha15d459_3
  • six =1.16.0=pyhd3eb1b0_1
  • sqlite =3.40.0=h2bbff1b_0
  • tensorboardx =2.5.1=pypi_0
  • threadpoolctl =3.1.0=pyh8a188c0_0
  • tk =8.6.12=h2bbff1b_0
  • torchaudio-contrib =0.1=pypi_0
  • torchcontrib =0.0.2=pypi_0
  • torchvision =0.3.0=py36_cu100_1
  • tornado =6.1=py36h68aa20f_1
  • tqdm =4.28.1=pypi_0
  • urllib3 =1.26.13=pyhd8ed1ab_0
  • vc =14.2=h21ff451_1
  • vs2015_runtime =14.27.29016=h5e58377_2
  • wheel =0.37.1=pyhd3eb1b0_0
  • win_inet_pton =1.1.0=pyhd8ed1ab_6
  • wincertstore =0.2=py36h7fe50ca_0
  • xz =5.2.8=h8cc25b3_0
  • zlib =1.2.13=h8cc25b3_0
  • zstd =1.5.2=h19a0ad4_0