complex-cnn-deeplab-v3-with-stft-for-audio-denoising
Paper Name: Complex Convolution Neural Network model (Complex DeepLab v3) on STFT time-varying frequency components for audio denoising Creating a Complex Deep Lab v3 model for audio denoising using STFT complex mask Dataset from: https://datashare.is.ed.ac.uk/handle/10283/2791
https://github.com/athanatos96/complex-cnn-deeplab-v3-with-stft-for-audio-denoising
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, researchgate.net -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.5%) to scientific vocabulary
Keywords
Repository
Paper Name: Complex Convolution Neural Network model (Complex DeepLab v3) on STFT time-varying frequency components for audio denoising Creating a Complex Deep Lab v3 model for audio denoising using STFT complex mask Dataset from: https://datashare.is.ed.ac.uk/handle/10283/2791
Basic Info
- Host: GitHub
- Owner: athanatos96
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://www.researchgate.net/publication/366517727_Complex_Convolution_Neural_Network_model_Complex_DeepLab_v3_on_STFT_time-varying_frequency_components_for_audio_denoising
- Size: 227 KB
Statistics
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Complex Deep-Lab V3
PyTorch Implementation of Complex Convolution Neural Network model (Complex DeepLab v3) on STFT time-varying frequency components for audio denoising, (A. C. Parra, 2022)
Original Code
Original Code from https://github.com/sweetcocoa/DeepComplexUNetPyTorch/
Deep Lab V3
Code was adapted to work for Deep Lab V3 Rethinking Atrous Convolution for Semantic Image Segmentation, (L-C. Chen et al., 2017)
Reimplementation of DeepLabV3 to work with complex numbers
DeepLabv3 base code: https://github.com/pytorch/vision/blob/0dceac025615a1c2df6ec1675d8f9d7757432a49/torchvision/models/segmentation/deeplabv3.py
FCN head base code: https://github.com/pytorch/vision/blob/0dceac025615a1c2df6ec1675d8f9d7757432a49/torchvision/models/segmentation/fcn.py#L36
Resnet base code: https://github.com/pytorch/vision/blob/0dceac025615a1c2df6ec1675d8f9d7757432a49/torchvision/models/resnet.py#L166
Complex Layers
New functions adapted from https://github.com/wavefrontshaping/complexPyTorch/blob/70a511c1bedc4c7eeba0d571638b35ff0d8347a2/complexPyTorch/complexFunctions.py
They were built to run with complex types for pytorch. I had to change them to work with floats with 1 extra dimension of size 2 (Real, Imaginary)
New Functions and classes: ComplexAdaptiveAvgPool2d ComplexMaxPool2d ComplexReLU ComplexDropout complex_interpolate
Requirements
See file requirements.txt
Train
Download Datasets: - https://datashare.is.ed.ac.uk/handle/10283/2791
Train
bash
python ComplexDeepLabV3/train_dcunet.py \
--batch_size 2 \
--train_signal Data/DS_10283_2791/Train/clean_trainset_28spk_wav \
--train_noise Data/DS_10283_2791/Train/noisy_trainset_28spk_wav \
--test_signal Data/DS_10283_2791/Test/clean_testset_wav \
--test_noise Data/DS_10283_2791/Test/noisy_testset_wav \
--ckpt checkpoints/checkpoint.pth \
--num_step 300 \
--validation_interval 150\
--complex
Owner
- Name: Alejandro C Parra
- Login: athanatos96
- Kind: user
- Location: New York
- Website: https://www.linkedin.com/in/alejandro-parra-garcia/
- Repositories: 3
- Profile: https://github.com/athanatos96
Master of Science in Artificial Intelligence | Machine Learning Engineer | Business Administration and Management
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Parra Garcia" given-names: "Alejandro C." orcid: "https://orcid.org/0000-0002-9503-1357" title: "Complex-CNN-DeepLab-v3-with-STFT-for-audio-denoising" version: 1.0.0 date-released: 2022-12-22 url: "https://github.com/athanatos96/Complex-CNN-DeepLab-v3-with-STFT-for-audio-denoising"
GitHub Events
Total
Last Year
Dependencies
- appdirs =1.4.4=pyh9f0ad1d_0
- audioread =2.1.9=py36ha15d459_0
- backtrace =0.2.1=pypi_0
- blas =1.0=mkl
- brotlipy =0.7.0=py36h68aa20f_1001
- ca-certificates =2022.12.7=h5b45459_0
- certifi =2021.5.30=py36ha15d459_0
- cffi =1.14.6=py36h2bbff1b_0
- charset-normalizer =2.1.1=pyhd8ed1ab_0
- colorama =0.3.7=pypi_0
- console_shortcut =0.1.1=4
- cryptography =35.0.0=py36hd0de82c_0
- cudatoolkit =10.0.130=0
- cycler =0.11.0=pyhd8ed1ab_0
- decorator =5.1.1=pyhd8ed1ab_0
- easydict =1.9=pypi_0
- freetype =2.12.1=ha860e81_0
- hdf5 =1.8.20=hac2f561_1
- icc_rt =2022.1.0=h6049295_2
- idna =3.4=pyhd8ed1ab_0
- intel-openmp =2022.1.0=h59b6b97_3788
- joblib =1.2.0=pyhd8ed1ab_0
- jpeg =9e=h2bbff1b_0
- kiwisolver =1.3.1=py36he95197e_1
- lerc =3.0=hd77b12b_0
- libblas =3.8.0=20_mkl
- libcblas =3.8.0=20_mkl
- libdeflate =1.8=h2bbff1b_5
- libflac =1.3.4=h0e60522_0
- liblapack =3.8.0=20_mkl
- libogg =1.3.4=h8ffe710_1
- libopencv =3.4.2=h20b85fd_0
- libopus =1.3.1=h8ffe710_1
- libpng =1.6.37=h2a8f88b_0
- librosa =0.9.2=pyhd8ed1ab_0
- libsndfile =1.0.31=h0e60522_1
- libtiff =4.4.0=h8a3f274_2
- libvorbis =1.3.7=h0e60522_0
- llvmlite =0.36.0=py36haecd60e_0
- lz4-c =1.9.3=h2bbff1b_1
- m2w64-gcc-libgfortran =5.3.0=6
- m2w64-gcc-libs =5.3.0=7
- m2w64-gcc-libs-core =5.3.0=7
- m2w64-gmp =6.1.0=2
- m2w64-libwinpthread-git =5.0.0.4634.697f757=2
- matplotlib-base =3.3.4=py36h1abdf75_0
- mkl =2020.2=256
- mkl-service =2.3.0=py36h196d8e1_0
- mkl_fft =1.3.0=py36h46781fe_0
- mkl_random =1.1.1=py36h47e9c7a_0
- msys2-conda-epoch =20160418=1
- ninja =1.10.2=haa95532_5
- ninja-base =1.10.2=h6d14046_5
- numba =0.53.1=py36hd0dfabe_1
- numpy =1.19.2=py36hadc3359_0
- numpy-base =1.19.2=py36ha3acd2a_0
- olefile =0.46=py36_0
- opencv =3.4.2=py36h40b0b35_0
- openssl =1.1.1q=h8ffe710_0
- packaging =21.3=pyhd8ed1ab_0
- pandas =0.25.3=py36he350917_0
- pesq =0.0.4=pypi_0
- pillow =8.3.1=py36h4fa10fc_0
- pinkblack =0.0.9=pypi_0
- pip =20.0.2=py36_1
- pooch =1.6.0=pyhd8ed1ab_0
- protobuf =3.19.6=pypi_0
- py-opencv =3.4.2=py36hc319ecb_0
- pycparser =2.21=pyhd3eb1b0_0
- pyopenssl =22.0.0=pyhd8ed1ab_1
- pyparsing =3.0.9=pyhd8ed1ab_0
- pypesq =1.2.4=pypi_0
- pysocks =1.7.1=py36ha15d459_3
- pysoundfile =0.11.0=pyhd8ed1ab_0
- python =3.6.13=h3758d61_0
- python-dateutil =2.8.2=pyhd3eb1b0_0
- python_abi =3.6=2_cp36m
- pytorch =1.1.0=py3.6_cuda100_cudnn7_1
- pytz =2021.3=pyhd3eb1b0_0
- requests =2.28.1=pyhd8ed1ab_0
- resampy =0.4.2=pyhd8ed1ab_0
- scikit-learn =0.24.2=py36h5a2dbc3_1
- scipy =1.5.3=py36h27d303f_1
- setuptools =49.6.0=py36ha15d459_3
- six =1.16.0=pyhd3eb1b0_1
- sqlite =3.40.0=h2bbff1b_0
- tensorboardx =2.5.1=pypi_0
- threadpoolctl =3.1.0=pyh8a188c0_0
- tk =8.6.12=h2bbff1b_0
- torchaudio-contrib =0.1=pypi_0
- torchcontrib =0.0.2=pypi_0
- torchvision =0.3.0=py36_cu100_1
- tornado =6.1=py36h68aa20f_1
- tqdm =4.28.1=pypi_0
- urllib3 =1.26.13=pyhd8ed1ab_0
- vc =14.2=h21ff451_1
- vs2015_runtime =14.27.29016=h5e58377_2
- wheel =0.37.1=pyhd3eb1b0_0
- win_inet_pton =1.1.0=pyhd8ed1ab_6
- wincertstore =0.2=py36h7fe50ca_0
- xz =5.2.8=h8cc25b3_0
- zlib =1.2.13=h8cc25b3_0
- zstd =1.5.2=h19a0ad4_0