complex-cnn-deeplab-v3-with-stft-for-audio-denoising

Paper Name: Complex Convolution Neural Network model (Complex DeepLab v3) on STFT time-varying frequency components for audio denoising Creating a Complex Deep Lab v3 model for audio denoising using STFT complex mask Dataset from: https://datashare.is.ed.ac.uk/handle/10283/2791

https://github.com/athanatos96/complex-cnn-deeplab-v3-with-stft-for-audio-denoising

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, researchgate.net
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.5%) to scientific vocabulary

Keywords

audio-denoising audio-processing convolutional-neural-networks deep-learning deeplabv3 machine-learning pytorch stft

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: athanatos96
Language: Jupyter Notebook
Default Branch: main
Homepage: https://www.researchgate.net/publication/366517727_Complex_Convolution_Neural_Network_model_Complex_DeepLab_v3_on_STFT_time-varying_frequency_components_for_audio_denoising
Size: 227 KB

Statistics

Stars: 9
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

audio-denoising audio-processing convolutional-neural-networks deep-learning deeplabv3 machine-learning pytorch stft

Created about 3 years ago · Last pushed about 3 years ago

Metadata Files

Readme Citation

README.md

Complex Deep-Lab V3

PyTorch Implementation of Complex Convolution Neural Network model (Complex DeepLab v3) on STFT time-varying frequency components for audio denoising, (A. C. Parra, 2022)

Original Code

Original Code from https://github.com/sweetcocoa/DeepComplexUNetPyTorch/

Deep Lab V3

Code was adapted to work for Deep Lab V3 Rethinking Atrous Convolution for Semantic Image Segmentation, (L-C. Chen et al., 2017)

Reimplementation of DeepLabV3 to work with complex numbers

DeepLabv3 base code: https://github.com/pytorch/vision/blob/0dceac025615a1c2df6ec1675d8f9d7757432a49/torchvision/models/segmentation/deeplabv3.py

FCN head base code: https://github.com/pytorch/vision/blob/0dceac025615a1c2df6ec1675d8f9d7757432a49/torchvision/models/segmentation/fcn.py#L36

Resnet base code: https://github.com/pytorch/vision/blob/0dceac025615a1c2df6ec1675d8f9d7757432a49/torchvision/models/resnet.py#L166

Complex Layers

New functions adapted from https://github.com/wavefrontshaping/complexPyTorch/blob/70a511c1bedc4c7eeba0d571638b35ff0d8347a2/complexPyTorch/complexFunctions.py

They were built to run with complex types for pytorch. I had to change them to work with floats with 1 extra dimension of size 2 (Real, Imaginary)

New Functions and classes: ComplexAdaptiveAvgPool2d ComplexMaxPool2d ComplexReLU ComplexDropout complex_interpolate

Requirements

See file requirements.txt

Train

Download Datasets: - https://datashare.is.ed.ac.uk/handle/10283/2791

Train bash python ComplexDeepLabV3/train_dcunet.py \ --batch_size 2 \ --train_signal Data/DS_10283_2791/Train/clean_trainset_28spk_wav \ --train_noise Data/DS_10283_2791/Train/noisy_trainset_28spk_wav \ --test_signal Data/DS_10283_2791/Test/clean_testset_wav \ --test_noise Data/DS_10283_2791/Test/noisy_testset_wav \ --ckpt checkpoints/checkpoint.pth \ --num_step 300 \ --validation_interval 150\ --complex

Owner

Name: Alejandro C Parra
Login: athanatos96
Kind: user
Location: New York

Website: https://www.linkedin.com/in/alejandro-parra-garcia/
Repositories: 3
Profile: https://github.com/athanatos96

Master of Science in Artificial Intelligence | Machine Learning Engineer | Business Administration and Management

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Parra Garcia"
  given-names: "Alejandro C."
  orcid: "https://orcid.org/0000-0002-9503-1357"
title: "Complex-CNN-DeepLab-v3-with-STFT-for-audio-denoising"
version: 1.0.0
date-released: 2022-12-22
url: "https://github.com/athanatos96/Complex-CNN-DeepLab-v3-with-STFT-for-audio-denoising"

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

appdirs =1.4.4=pyh9f0ad1d_0
audioread =2.1.9=py36ha15d459_0
backtrace =0.2.1=pypi_0
blas =1.0=mkl
brotlipy =0.7.0=py36h68aa20f_1001
ca-certificates =2022.12.7=h5b45459_0
certifi =2021.5.30=py36ha15d459_0
cffi =1.14.6=py36h2bbff1b_0
charset-normalizer =2.1.1=pyhd8ed1ab_0
colorama =0.3.7=pypi_0
console_shortcut =0.1.1=4
cryptography =35.0.0=py36hd0de82c_0
cudatoolkit =10.0.130=0
cycler =0.11.0=pyhd8ed1ab_0
decorator =5.1.1=pyhd8ed1ab_0
easydict =1.9=pypi_0
freetype =2.12.1=ha860e81_0
hdf5 =1.8.20=hac2f561_1
icc_rt =2022.1.0=h6049295_2
idna =3.4=pyhd8ed1ab_0
intel-openmp =2022.1.0=h59b6b97_3788
joblib =1.2.0=pyhd8ed1ab_0
jpeg =9e=h2bbff1b_0
kiwisolver =1.3.1=py36he95197e_1
lerc =3.0=hd77b12b_0
libblas =3.8.0=20_mkl
libcblas =3.8.0=20_mkl
libdeflate =1.8=h2bbff1b_5
libflac =1.3.4=h0e60522_0
liblapack =3.8.0=20_mkl
libogg =1.3.4=h8ffe710_1
libopencv =3.4.2=h20b85fd_0
libopus =1.3.1=h8ffe710_1
libpng =1.6.37=h2a8f88b_0
librosa =0.9.2=pyhd8ed1ab_0
libsndfile =1.0.31=h0e60522_1
libtiff =4.4.0=h8a3f274_2
libvorbis =1.3.7=h0e60522_0
llvmlite =0.36.0=py36haecd60e_0
lz4-c =1.9.3=h2bbff1b_1
m2w64-gcc-libgfortran =5.3.0=6
m2w64-gcc-libs =5.3.0=7
m2w64-gcc-libs-core =5.3.0=7
m2w64-gmp =6.1.0=2
m2w64-libwinpthread-git =5.0.0.4634.697f757=2
matplotlib-base =3.3.4=py36h1abdf75_0
mkl =2020.2=256
mkl-service =2.3.0=py36h196d8e1_0
mkl_fft =1.3.0=py36h46781fe_0
mkl_random =1.1.1=py36h47e9c7a_0
msys2-conda-epoch =20160418=1
ninja =1.10.2=haa95532_5
ninja-base =1.10.2=h6d14046_5
numba =0.53.1=py36hd0dfabe_1
numpy =1.19.2=py36hadc3359_0
numpy-base =1.19.2=py36ha3acd2a_0
olefile =0.46=py36_0
opencv =3.4.2=py36h40b0b35_0
openssl =1.1.1q=h8ffe710_0
packaging =21.3=pyhd8ed1ab_0
pandas =0.25.3=py36he350917_0
pesq =0.0.4=pypi_0
pillow =8.3.1=py36h4fa10fc_0
pinkblack =0.0.9=pypi_0
pip =20.0.2=py36_1
pooch =1.6.0=pyhd8ed1ab_0
protobuf =3.19.6=pypi_0
py-opencv =3.4.2=py36hc319ecb_0
pycparser =2.21=pyhd3eb1b0_0
pyopenssl =22.0.0=pyhd8ed1ab_1
pyparsing =3.0.9=pyhd8ed1ab_0
pypesq =1.2.4=pypi_0
pysocks =1.7.1=py36ha15d459_3
pysoundfile =0.11.0=pyhd8ed1ab_0
python =3.6.13=h3758d61_0
python-dateutil =2.8.2=pyhd3eb1b0_0
python_abi =3.6=2_cp36m
pytorch =1.1.0=py3.6_cuda100_cudnn7_1
pytz =2021.3=pyhd3eb1b0_0
requests =2.28.1=pyhd8ed1ab_0
resampy =0.4.2=pyhd8ed1ab_0
scikit-learn =0.24.2=py36h5a2dbc3_1
scipy =1.5.3=py36h27d303f_1
setuptools =49.6.0=py36ha15d459_3
six =1.16.0=pyhd3eb1b0_1
sqlite =3.40.0=h2bbff1b_0
tensorboardx =2.5.1=pypi_0
threadpoolctl =3.1.0=pyh8a188c0_0
tk =8.6.12=h2bbff1b_0
torchaudio-contrib =0.1=pypi_0
torchcontrib =0.0.2=pypi_0
torchvision =0.3.0=py36_cu100_1
tornado =6.1=py36h68aa20f_1
tqdm =4.28.1=pypi_0
urllib3 =1.26.13=pyhd8ed1ab_0
vc =14.2=h21ff451_1
vs2015_runtime =14.27.29016=h5e58377_2
wheel =0.37.1=pyhd3eb1b0_0
win_inet_pton =1.1.0=pyhd8ed1ab_6
wincertstore =0.2=py36h7fe50ca_0
xz =5.2.8=h8cc25b3_0
zlib =1.2.13=h8cc25b3_0
zstd =1.5.2=h19a0ad4_0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science