Recent Releases of Software Design and User Interface of ESPnet-SE++

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202506

New Features

  • [New Features][ESPnet2][ESPnet3][CI][size:XXL][lgtm] [espnet3-3] Add trainer and model #6172 by @Masao-Someki
  • [New Features][ESPnet3][CI][size:XXL][lgtm] [espnet3-1] Add Data Organizer #6167 by @Masao-Someki
  • [New Features][ESPnet2][size:XL] LID-1: Training and task setup #6155 by @Qingzheng-Wang
  • [New Features][ESPnet2][SID][size:XL] Update SPK recipe for CN-celeb #6154 by @holvan
  • [New Features][ESPnet2][SLU] Add code for training turn taking prediction model #5948 by @siddhu001

Recipe

  • [Recipe][ESPnet2][size:XXL] S2T Recipe for IPAPack++: Data Preparation #6169 by @chinjouli
  • [Recipe][ESPnet2][size:XL] S2T Recipe for IPAPack++: main recipe #6168 by @chinjouli
  • [Recipe][ESPnet2][Codec] add: complete codec1 recipe for AudioSet and musdb18 #6068 by @whr-a
  • [Recipe][ESPnet2][ASR] Additional results for the discrete ASR challenge #6067 by @juice500ml
  • [Recipe][ESPnet2][Installation][SE] Add implementations of USES2 speech enhancement models #5761 by @Emrys365

Bugfix

  • [Bugfix][ESPnet2][size:XS] Fix FutureWarning torch.cuda.amp.autocast(args...) is deprecated #6190 by @KanTakahiro
  • [Bugfix][ESPnet2][ESPnet1] Resolve logger warnings #6117 by @emmanuel-ferdman
  • [Bugfix][ESPnet2] Fix for issue #6112 Lagacy torch tensor constructor causes issue when… #6114 by @advaitvd

Documentation

  • [Documentation][ESPnet1][size:S] docs: clarify CBHG encoder vs post‑net roles in Tacotron 1 #6188 by @ZhuoyanTao
  • [Documentation][ESPnet3][Docker][CI][size:L] Add devcontainer change from Espnet3 #6145 by @sw005320
  • [Documentation][CI][size:M] Update PULLREQUESTTEMPLATE.md #6144 by @sw005320
  • [Documentation][CI][size:M] Update document to add tutorials + more easy connection to installation #6143 by @juice500ml
  • [Documentation][ESPnet3][Docker][size:L][lgtm] Espnet3/devcontainer #6141 by @Masao-Someki
  • [Documentation][Installation] Update Makefile #6124 by @sw005320

Refactoring

  • [Refactoring][ESPnet2][size:L] Refactor ACESinger's audio segmentation #6151 by @Arllan-lanliu
  • [Refactoring][ESPnet2][ESPnet1][CI][size:L][lgtm] Flake8 CI Fixes #6140 by @Fhrozen

Others

  • [Others][CI][size:S][lgtm] Workaround for shellcheck v0.11.0 #6197 by @Masao-Someki
  • [Others][Installation][size:XS] Update transformers installation #6191 by @Fhrozen
  • [Others][ESPnet3][CI][size:L] [espnet3-2] Add Config Loading script #6171 by @Masao-Someki
  • [Others][ESPnet2][ESPnet1][ESPnetEZ][Installation][size:L] [espnet3] Format files #6164 by @Masao-Someki
  • [Others][ESPnet2][SE] Update BSRNN implementations to support more flexible band-split schemes #6123 by @Emrys365
  • [Others][ESPnet2][Music] [SVS1] SingingGenerate and VISinger Inference Fix #6113 by @HANJionghao
  • [Others][CI] FIX CI test_import #6111 by @Fhrozen
  • [Others][ESPnet2] [Recipe] Create inference recipe for non-native English ASR benchmark (ALLSSTAR) #6110 by @chenehk
  • [Others][Docker][Installation][CI] Torch Version Update #6095 by @Fhrozen
  • [Others][ESPnet2][ASR] Add explicit typecheck for warning msg #6082 by @ftshijt
  • [Others][ESPnet2][ESPnet1][SSL][size:XL] SSL Fine-tuning PR #6069 by @wanchichen

New Contributors

  • @Arllan-lanliu made their first contribution in https://github.com/espnet/espnet/pull/6090
  • @chinjouli made their first contribution in https://github.com/espnet/espnet/pull/6109
  • @chenehk made their first contribution in https://github.com/espnet/espnet/pull/6110
  • @advaitvd made their first contribution in https://github.com/espnet/espnet/pull/6114
  • @whr-a made their first contribution in https://github.com/espnet/espnet/pull/6068
  • @holvan made their first contribution in https://github.com/espnet/espnet/pull/6126
  • @Qingzheng-Wang made their first contribution in https://github.com/espnet/espnet/pull/6155
  • @ZhuoyanTao made their first contribution in https://github.com/espnet/espnet/pull/6188
  • @KanTakahiro made their first contribution in https://github.com/espnet/espnet/pull/6190

Acknowledgements

Special thanks to @Arllan-lanliu, @Emrys365, @Fhrozen, @HANJionghao, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @ZhuoyanTao, @advaitvd, @chenehk, @chinjouli, @emmanuel-ferdman, @ftshijt, @holvan, @juice500ml, @siddhu001, @sw005320, @wanchichen, @whr-a.

Full Changelog: https://github.com/espnet/espnet/compare/v.202503...v.202506

Scientific Software - Peer-reviewed - Python
Published by Fhrozen 5 months ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202503

New Features

  • [New Features][ESPnet2] Add Hugging Face Front End #5913 by @taiqihe

Enhancement

  • [Enhancement][ESPnet2][ESPnet1][OWSM] Improving efficiency of large-scale training #6024 by @pyf98
  • [Enhancement][ESPnet2][Codec] Update scoring config to support WER/CER information with VERSA #6001 by @ftshijt
  • [Enhancement][ESPnet1] Add Scaled Dot Product Attention (SDPA) from PyTorch #5994 by @pyf98
  • [Enhancement][ESPnet2][ESPnet1][Installation] Support PyTorch Lightning Trainer in ESPnet2 #5954 by @pyf98

Recipe

  • [Recipe][ESPnet2][ASR] cmu_kids #6017 by @wangpuup
  • [Recipe][ESPnet2][ASR] EDACC dataset automatic speech recognition #5996 by @uwanny
  • [Recipe][ESPnet2][ASR] ml-superb 2024 recipe #5989 by @wanchichen
  • [Recipe][ESPnet2] Clotho_v2 Audio Captioning (DCASE 2023 implementation) #5967 by @Shikhar-S

Bugfix

  • [Bugfix][Installation] Downgrade Transformers version #6071 by @Fhrozen
  • [Bugfix][ESPnet2] Docs Fix #6065 by @Fhrozen
  • [Bugfix][ESPnet2][ST] A quick fix for type error when dealing with multi-decoder (ST) #6064 by @ftshijt
  • [Bugfix][ESPnet2][SID] fixed few typos on egs2/spk template #6060 by @yigitcatak
  • [Bugfix][ESPnet2] Bugfix #6057 #6058 by @Masao-Someki
  • [Bugfix][ESPnet2][SID] fix some minor errors in SID recipe #6045 by @shimhz
  • [Bugfix][ESPnet2] Fix the deprecated amp interface #6036 by @ftshijt
  • [Bugfix][ESPnet2] Add explicit weights_only=False for checkpoint loading #6035 by @ftshijt
  • [Bugfix][Installation] Fix boost URL #6034 by @sw005320
  • [Bugfix][Installation] Fix minor bug in Makefile #6031 by @juice500ml
  • [Bugfix][ESPnet2] Logging bugfix, skip import #6023 by @Shikhar-S
  • [Bugfix][ESPnet2][OWSM] Fix minor bug in OWSM-CTC preprocessor #6005 by @pyf98
  • [Bugfix][ESPnet2][ASR] Minor formatting fixes in mlsuperb 2 recipe #6003 by @wanchichen

Documentation

  • [Documentation][ESPnet2][CI] [Doc] Update parser on lightning_train #6020 by @Fhrozen

Others

  • [Others][Installation] Transformers version check #6076 by @Fhrozen
  • [Others][ESPnet2][ESPnet1] New SSL Recipe #6053 by @wanchichen
  • [Others][Installation] Update tools/README.md #6030 by @popcornell
  • [Others][ESPnet2][OWSM] doc: update OWSM data preparation instructions #6026 by @kalvinchang
  • [Others][ESPnet2][OWSM] fix: OWSM v3.1 - remove flash attention args #6025 by @kalvinchang
  • [Others][ESPnet2][SED] BEATs Tokenizer Inference #6008 by @Shikhar-S
  • [Others][ESPnet2][ESPnet1] Implement unified batch decode interface for OWSM-CTC #6007 by @pyf98
  • [Others][ESPnet2][TTS] [feature]finish versa eval in TTS recipe #6002 by @Whale-Dolphin
  • [Others][ESPnet2][ESPnet1][Installation][CI][SED] Classification Task and AudioSet-20K #5998 by @Shikhar-S
  • [Others][ESPnet2][ESPnet1][Installation][CI] remove gtn in setup.py #5982 by @sw005320
  • [Others][ESPnet2][ESPnet1][SED] ESC-50 classification with BEATs #5977 by @Shikhar-S
  • [Others][ESPnet2][TTS][ASR][SLU] Spoken dialogue systems demo recipe #5975 by @siddhu001
  • [Others][ESPnet2][SE] fix: gradient truncation bug in pit_solver.py #5974 by @YuzhuWang-code

Acknowledgements

Special thanks to @Fhrozen, @Masao-Someki, @Shikhar-S, @Whale-Dolphin, @YuzhuWang-code, @ftshijt, @juice500ml, @kalvinchang, @popcornell, @pyf98, @shimhz, @siddhu001, @sw005320, @taiqihe, @uwanny, @wanchichen, @wangpuup, @yigitcatak.

Scientific Software - Peer-reviewed - Python
Published by Fhrozen 9 months ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202412

New Features

  • [New Features][ESPnet2][Codec] Add HiFiCodec model #5898 by @RayYuki

Enhancement

  • [Enhancement][ESPnetEZ] Add missing functionalities for espnetez #5890 by @Masao-Someki

Recipe

  • [Recipe][ESPnet2][ASR] My Science Tutor (MyST) Children's Conversational Speech Corpus #5964 by @eric102004
  • [Recipe][ESPnet2] Feature/improve is24 asr2 #5938 by @juice500ml
  • [Recipe][ESPnet2][ASR] Add asr1 recipe for libriheavy_small #5932 by @Miamoto
  • [Recipe][ESPnet2][SID] Add RATS dataset for SV task #5840 by @shimhz

Bugfix

  • [Bugfix][ESPnet2][Diarization] [Bugfix] fix keyword argument error in stage 7 of diar.sh #5969 by @eric102004
  • [Bugfix][ESPnetEZ] Bug fixed for #5949 #5950 by @Masao-Someki
  • [Bugfix][ESPnet2][ASR] removed ''continue'' statement from the for loop in run_mono.sh #5946 by @Trikaldarshi
  • [Bugfix][ESPnet2] Add SWBD text processing fix #5941 by @siddhu001
  • [Bugfix][ESPnet2][ESPnet1] Training code patches #5931 by @wanchichen

Documentation

  • [Documentation] Fix bug in document that overflows the page #5940 by @juice500ml
  • [Documentation] Update CI reference #5939 by @emmanuel-ferdman
  • [Documentation] fix: collcatefn -> collatefn #5925 by @kalvinchang
  • [Documentation][Docker][Installation][CI] Migration from Anaconda to conda-forge #5924 by @yoshipon

Others

  • [Others][ESPnet2][Codec] Fix versa interface #5951 by @ftshijt
  • [Others][ESPnet2][ESPnet1] Add OWSM-CTC #5933 by @pyf98
  • [Others][ESPnet2] Recipe/ogi kids speech #5916 by @anyuyay

Acknowledgements

Special thanks to @Masao-Someki, @Miamoto, @RayYuki, @Trikaldarshi, @anyuyay, @emmanuel-ferdman, @eric102004, @ftshijt, @juice500ml, @kalvinchang, @pyf98, @shimhz, @siddhu001, @wanchichen, @yoshipon.

Scientific Software - Peer-reviewed - Python
Published by Fhrozen about 1 year ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202409

New Features

  • [New Features][ESPnet2][TTS][Codec] Support Codec feature for TTS2 task #5857 by @wyh2000
  • [New Features][ESPnet2][Codec] Codec downstream task support: TTS #5763 by @jctian98
  • [New Features][ESPnet2][Codec] Add Encodec features for Codec toolkit #5758 by @jctian98
  • [New Features][ESPnet2][Installation][TTS] Add evaluation scripts with DiscreteSpeechMetrics. #5661 by @Takaaki-Saeki
  • [New Features][ESPnet2][ASR] Integrate adapter for s3prl frontend #5609 by @Stanwang1210
  • [New Features][ESPnet2][CI][OWSM] Support external dataset library for ESPnetEasy #5584 by @Masao-Someki
  • [New Features][ESPnet2][CI][LM] Pr voxtlm #5472 by @soumimaiti

Enhancement

  • [Enhancement][ESPnet2][SLM] MT Task in SpeechLM #5899 by @ftshijt
  • [Enhancement][ESPnet2][Codec] Categorical Balnced Chunk iterator #5894 by @ftshijt
  • [Enhancement][ESPnet2][ESPnet1] TransformerDecoder forwardonestep with memory_mask #5679 by @albertz
  • [Enhancement][ESPnet2] Update espnet_model.py #5646 by @shen9712

Recipe

  • [Recipe][ESPnet2][Music] Fixed KiSing Data Preparation #5895 by @HANJionghao
  • [Recipe][ESPnet2][ASR] CORAAL asr1 recipe #5882 by @kalvinchang
  • [Recipe][ESPnet2][ASR] ml_superb asr2 recipe #5866 by @Stanwang1210
  • [Recipe][ESPnet2] Add more download links for ML-SUPERB #5863 by @ftshijt
  • [Recipe][ESPnet2][ASR] Fix bug in asr2.sh #5859 by @juice500ml
  • [Recipe][ESPnet2][Music] fix bugs in SVS1 #5851 by @South-Twilight
  • [Recipe][ESPnet2][TTS] New Recipe of tts2+aishell3 #5849 by @Tsukasane
  • [Recipe][ESPnet2][ASR] Espnet Multi-convformer implementation #5832 by @Darshan7575
  • [Recipe][ESPnet2][SE] Update of SE functions #5825 by @Emrys365
  • [Recipe][ESPnet2] SPRING-INX Recipe (Speech Lab, IIT, Madras) #5811 by @arjun-gangwar
  • [Recipe][ESPnet2][TTS] Adding Hifitts recipe for espnet #5784 by @coding-phoenix-12
  • [Recipe][ESPnet2][ASR] Updated results for CHiME-8 DASR baseline with new notsofar1 dev set #5771 by @popcornell
  • [Recipe][ESPnet2][SE] Final model scores for TF-GridNetV2 on the Kinect-WSJ dataset #5754 by @atharva253
  • [Recipe][ESPnet2] Apply normalization on validation set for CHiME-8 recipe #5749 by @popcornell
  • [Recipe][ESPnet2][Need review][Codec] ESPnet-Codec decoding and Scoring #5747 by @ftshijt
  • [Recipe][ESPnet2][CI][ST] Add recipe for IWSLT 2024 shared task Indic track #5744 by @cromz22
  • [Recipe][ESPnet2][Music] [SVS] VISinger Plus #5741 by @jerryuhoo
  • [Recipe][ESPnet2][Need review][Codec] ESPnet-codec Training and Setup #5732 by @ftshijt
  • [Recipe][ESPnet2][ASR] ESPnet Recipe for ASR on the Makerere Radio Speech Corpus #5730 by @satvik-dixit
  • [Recipe][ESPnet2][SE] ESPnet recipe for the Kinect-WSJ dataset #5711 by @atharva253
  • [Recipe][ESPnet2][TTS][ASR][Music] Update bitrate calculation scripts for the IS24 discrete speech challenge #5677 by @ftshijt
  • [Recipe][ESPnet2][ASR] Add some documents for JTubeSpeech #5663 by @sw005320
  • [Recipe][ESPnet2][SID] ESPnet-SPK: add SdSV 2021 recipe #5659 by @Alexgichamba
  • [Recipe][ESPnet2][ASR] Add E-Branchformer model for FLEURS #5657 by @wanchichen
  • [Recipe][ESPnet2][Installation][CI][ASR] CHiME-8 DASR recipe based on CHiME-7 DASR baseline #5641 by @popcornell
  • [Recipe][ESPnet2][ASR] add interspeech2024dsuchallenge/asr2 #5627 by @simpleoier
  • [Recipe][ESPnet2][Installation][TTS] Discrete token-based TTS implementation #5626 by @ftshijt

Bugfix

  • [Bugfix] fix: replace ellipses (...) in ESPnet-EZ Trainer documentation #5911 by @kalvinchang
  • [Bugfix] Bugfix/homepage #5885 by @Masao-Someki
  • [Bugfix][ESPnet2] Fix absolute paths in aishell3_tts2 #5884 by @Tsukasane
  • [Bugfix] Bug fix for source link #5883 by @Masao-Someki
  • [Bugfix][Installation] [CI] Add required file for g2p_en #5869 by @Fhrozen
  • [Bugfix][ESPnet2] A fix to newer torch version (compatible to old version with typecheck) #5830 by @ftshijt
  • [Bugfix][ESPnet2] Revert change to abs_task to keep the consistency behavior #5789 by @ftshijt
  • [Bugfix][ESPnet2] Fix Whisper frontend #5760 by @siddhu001
  • [Bugfix][ESPnet2][SE] Update TSE recipe egs2/librimix/tse1 #5731 by @Emrys365
  • [Bugfix][ESPnet2] Fix LoRA issues when saving all parameters. #5722 by @simpleoier
  • [Bugfix][ESPnet2] Fix tts packing with new spk embedding #5715 by @ftshijt
  • [Bugfix][ESPnet2][TTS] Fix stage references in generated run.sh in TTS recipes #5714 by @G-Thor
  • [Bugfix][ESPnet2][OWSM] fix a small issue in OWSM decode_long #5703 by @jctian98
  • [Bugfix][ESPnet2][Installation] Upgrade typeguard #5702 by @sw005320
  • [Bugfix][ESPnet2] Quick fix to calculation of bitrate #5692 by @ftshijt
  • [Bugfix][ESPnet2][SSUM] Fix typo in summarization scoring #5688 by @YoshikiMas
  • [Bugfix][ESPnet2] Update egs2/TEMPLATE/asr2/asr2.sh #5682 by @simpleoier
  • [Bugfix][ESPnet2][ASR] Fix over-lengthy audio in ml_superb data prep #5678 by @ftshijt
  • [Bugfix][ESPnet2] fix typo #5673 by @hiranoyu0830
  • [Bugfix][Installation][ST] Fix CI Multilingual ST test #5672 by @Fhrozen
  • [Bugfix][ESPnet2][SLU] Fix speed perturbation when not using transcript in slu.sh #5671 by @siddhu001
  • [Bugfix][ESPnet2][SLU] Fix loading pre-trained model from transformers #5668 by @siddhu001
  • [Bugfix][ESPnet2] Correct the argument errors in the whisper tokenizer language. #5666 by @pengchengguo

Documentation

  • [Documentation][ESPnet2][Music] Fixed SingingGenerate docstring examples #5889 by @HANJionghao
  • [Documentation][ESPnet2][CI] Separate packing and uploading stages #5752 by @cromz22
  • [Documentation] Add script to make release note from milestone #5653 by @kan-bayashi

Refactoring

  • [Refactoring] Modified easy to ez #5719 by @Masao-Someki

Others

  • [Others][CI] Bugfix for the paper publish workflow #5909 by @juice500ml
  • [Others][ESPnet2] Revision on Speechlm vocabulary extension script #5906 by @jctian98
  • [Others][ESPnet2][TTS] Fix tts.sh path in aishell3 tts2 #5879 by @sw005320
  • [Others][ESPnet2][Installation] Add DeepSpeed trainer for large-scale training #5856 by @jctian98
  • [Others] Update README info #5852 by @ftshijt
  • [Others][ESPnet2][ESPnet1][Installation] Add flash-attn #5839 by @wanchichen
  • [Others][ESPnet2][Music] [SVS] fix VISinger2 typecheck error #5838 by @jerryuhoo
  • [Others][ESPnet2] Fixed kising/acesinger google drive download #5834 by @HANJionghao
  • [Others][ESPnet2][SID] update MFA-Conformer performance after fixing the bug in #5797 #5826 by @Jungjee
  • [Others][ESPnet2][CI][SE] SE function updates: new models and support for handling various sampling frequencies #5800 by @Emrys365
  • [Others][ESPnet2][SID] fix spk mfa-conformer forwarding #5797 by @series2
  • [Others][ESPnet2][CI][Music] [SVS] Add CI tests for VISinger Plus #5786 by @jerryuhoo
  • [Others][ESPnet2][LM] Bug fix for VoxtLM v1 recipe #5782 by @cromz22
  • [Others][ESPnet2][ESPnet1] Added partially auto-regressive decoding #5769 by @Masao-Someki
  • [Others][Installation][CI] Fix minor issue in anaconda downloading #5753 by @ftshijt
  • [Others] [pre-commit.ci] pre-commit autoupdate #5738 by @pre-commit-ci[bot]
  • [Others][ESPnet2][Installation][CI] Upgrade typeguard [Subst.] #5724 by @Fhrozen
  • [Others][ESPnet2][SE] TF-GridNet training recipe for DNS Interspeech 2020 dataset #5710 by @nateanl
  • [Others][ESPnet2][LM] Adding transformer_opt #5709 by @soumimaiti
  • [Others][ESPnet2] Add Readme for Voxtlm #5693 by @wyh2000
  • [Others][ESPnet2][SID] ESPnet-SPK: add ASVspoof19 SASV recipe #5687 by @Alexgichamba

Acknowledgements

Special thanks to @Alexgichamba, @Darshan7575, @Emrys365, @Fhrozen, @G-Thor, @HANJionghao, @Jungjee, @Masao-Someki, @South-Twilight, @Stanwang1210, @Takaaki-Saeki, @Tsukasane, @YoshikiMas, @albertz, @arjun-gangwar, @atharva253, @coding-phoenix-12, @cromz22, @ftshijt, @hiranoyu0830, @jctian98, @jerryuhoo, @juice500ml, @kalvinchang, @kan-bayashi, @nateanl, @pengchengguo, @popcornell, @pre-commit-ci[bot], @satvik-dixit, @series2, @shen9712, @siddhu001, @simpleoier, @soumimaiti, @sw005320, @wanchichen, @wyh2000.

Scientific Software - Peer-reviewed - Python
Published by Fhrozen about 1 year ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202402

News

We're thrilled to announce that our latest update brings two groundbreaking features to our project: espnetez and ESPnet-SPK!

New Features

  • [New Features][ESPnet2][ESPnet1][Installation][SE] Add diffusion-base SE model to ESPnet-SE #5572 by @LiChenda
  • [New Features][ESPnet2][ESPnet1][CI][ASR] Add Bayes Risk CTC (reworked) #5519 by @jctian98
  • [New Features][ESPnet2][TTS] TTS evaluation script and monitoring functionality using MOS prediction model #5485 by @Takaaki-Saeki
  • [New Features][ESPnet2][SE] Add USES model for speech enhancement in diverse conditions #5482 by @Emrys365
  • [New Features][ESPnet2][CI][SID] ESPnet-SPk: major update #5408 by @Jungjee
  • [New Features][ESPnet2][TTS][ASR] Add espnetez #5372 by @Masao-Someki

Enhancement

  • [Enhancement][ESPnet2][OWSM] Improving OWSM inference interface #5618 by @pyf98
  • [Enhancement][ESPnet2][OWSM] Add OWSM v3.1 #5611 by @pyf98
  • [Enhancement][ESPnet2][CI] ESPnet-SPK: Additional models, supplement readme #5559 by @Jungjee
  • [Enhancement][ESPnet2][CI][SE] Add PyTorch & GPU support for DNSMOS calculation #5548 by @Emrys365
  • [Enhancement][ESPnet2][TTS][SID] Speaker embedding extractor (with ESPnet pre-trained speaker model) #5579 by @ftshijt

Recipe

  • [Recipe][ESPnet2][Music] Fix relative setting of train-dev-test #5623 by @ftshijt
  • [Recipe][ESPnet2][SID] ESPnet-SPK: add Voxblink recipe #5583 by @Jungjee
  • [Recipe][ESPnet2][SID] ESPnet-SPK: Model upload and result generation #5558 by @Jungjee
  • [Recipe][ESPnet2][Music] ACE singer recipe fixing #5551 by @ftshijt
  • [Recipe][ESPnet2][TTS] TTS2 Template #5541 by @ftshijt
  • [Recipe][ESPnet2][ASR] fix kaldi dependency in asr2 #5540 by @ftshijt
  • [Recipe][ESPnet2][CI][S2ST] CI test for s2st #5526 by @ftshijt
  • [Recipe][ESPnet2][ASR] Added data.sh to SPRING-INX IITM Recipe #5522 by @arjun-gangwar
  • [Recipe][ESPnet2][ASR] Add Libriheavy small and medium ASR2 recipes #5512 by @akreal
  • [Recipe][ESPnet2][ASR] SPRING-INX IITM RECIPE #5505 by @arjun-gangwar
  • [Recipe][ESPnet2][ASR][RNNT] Add transducer conformer configuration to commonvoice recipe #5503 by @zuazo
  • [Recipe][ESPnet2][ESPnet1] add centralized data preparation for OWSM #5478 by @jctian98
  • [Recipe][ESPnet1] Added clean speech results #5649 by @linan2
  • [Recipe][ESPnet2][Installation][AV] AVSR recipe for Easycom Dataset #5630 by @ms-dot-k
  • [Recipe][ESPnet2] Update CHiME-7 ASR1 recipe #5555 by @popcornell
  • [Recipe][ESPnet2] Add E-Branchformer model checkpoint in OWSM v2 #5517 by @pyf98
  • [Recipe][ESPnet2][SLU] Slue PR configs #5087 by @siddhu001

Bugfix

  • [Bugfix][ESPnet2] Fix path dependency in ESPnet tutorial #5645 by @siddhu001
  • [Bugfix][ESPnet2] Fix ESPnet tutorial #5644 by @siddhu001
  • [Bugfix] Fix CI #5642 by @siddhu001
  • [Bugfix][ESPnet2] Fixed bug by copying missing Kaldi scripts #5636 by @VicentCano
  • [Bugfix][ESPnet1][ASR] CTC prefix score, fix if blank == eos #5620 by @albertz
  • [Bugfix][ESPnet2] Fix minor OWSM data prep bug #5607 by @juice500ml
  • [Bugfix][ESPnet2][ESPnet1][CI] E721 #5589 by @sw005320
  • [Bugfix][ESPnet2][ESPnet1] Make minlenratio effective #5581 by @jctian98
  • [Bugfix][ESPnet2] Fix except #5567 by @takenori-y
  • [Bugfix][ESPnet1][Installation][CI] Improve error robustness of unit tests #5535 by @Emrys365
  • [Bugfix][ESPnet2][AV] Fix bug in lrs3 data preprocessing #5520 by @ms-dot-k
  • [Bugfix][ESPnet1] replace old mustc links with new instructions #5516 by @brianyan918
  • [Bugfix][ESPnet2][ST] Fix s2st HF model uploading #5504 by @tjysdsg
  • [Bugfix][ESPnet2][ESPnet1] bug fixes for must_c v2 recipe #5640 by @jasonmusespresso

Documentation

  • [Documentation][ESPnet2] Add instructions for finetuning owsm #5539 by @pyf98
  • [Documentation] Updated the reference of the accepted JOSS paper #5515 by @neillu23

Others

  • [Others] Update Discord Invitation Link #5578 by @Fhrozen
  • [Others][ESPnet2][CI] Improve error robustness of unit tests #5523 by @Emrys365

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @Jungjee, @LiChenda, @Masao-Someki, @Takaaki-Saeki, @VicentCano, @akreal, @albertz, @arjun-gangwar, @brianyan918, @ftshijt, @jasonmusespresso, @jctian98, @juice500ml, @linan2, @ms-dot-k, @neillu23, @popcornell, @pyf98, @siddhu001, @sw005320, @takenori-y, @tjysdsg, @zuazo.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 2 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202310

What's Changed

  • Support arbitrary language finetune for Whisper models. by @pengchengguo in https://github.com/espnet/espnet/pull/5344
  • Update Dipco Data URL by @Fhrozen in https://github.com/espnet/espnet/pull/5391
  • Update readme in TEMPLATE/svs1 by @linyueqian in https://github.com/espnet/espnet/pull/5394
  • add gramvaani asr recipe by @bloodraven66 in https://github.com/espnet/espnet/pull/5366
  • ESPnet-SPK: sampler by @Jungjee in https://github.com/espnet/espnet/pull/5365
  • Adding general data augmentation methods for speech preprocessing by @Emrys365 in https://github.com/espnet/espnet/pull/5370
  • Update of several SE recipes and some minor fixes by @Emrys365 in https://github.com/espnet/espnet/pull/5401
  • Reproducing MIMOIRIS by @YoshikiMas in https://github.com/espnet/espnet/pull/5409
  • Kathbath asr by @bloodraven66 in https://github.com/espnet/espnet/pull/5369
  • Add pytorch2.0.1 to CI by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5413
  • [skip ci] Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5417
  • In spec_augment.py, check whether an array is writeable before modifying it inplace by @mdecerbo in https://github.com/espnet/espnet/pull/5416
  • Docker updates for local builds by @Fhrozen in https://github.com/espnet/espnet/pull/5406
  • fix typo in TEMPLATE/svs1/README.md by @linyueqian in https://github.com/espnet/espnet/pull/5426
  • Update install_mwerSegmenter.sh by @sw005320 in https://github.com/espnet/espnet/pull/5437
  • Support Whisper-style training as a new task S2T by @pyf98 in https://github.com/espnet/espnet/pull/5120
  • fix twice numpy installation issue by @kan-bayashi in https://github.com/espnet/espnet/pull/5447
  • Add Whisper SOT recipe for Librimix by @LiChenda in https://github.com/espnet/espnet/pull/5371
  • Update for the JOSS paper editor review by @neillu23 in https://github.com/espnet/espnet/pull/5418
  • Add the VOiCES recipe for ASR by @Emrys365 in https://github.com/espnet/espnet/pull/5448
  • Improve diacritic compatibility in data_prep.pl preprocessing scripts by @zuazo in https://github.com/espnet/espnet/pull/5445
  • [WIP] create recipe for acesinger by @linyueqian in https://github.com/espnet/espnet/pull/5431
  • Add BibleTTS recipe by @wyh2000 in https://github.com/espnet/espnet/pull/5436
  • ASR2 CHiME4 & Gigaspeech Recipes by @yichen14 in https://github.com/espnet/espnet/pull/5434
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5427
  • Simple fix to reduce testsluinference time by @siddhu001 in https://github.com/espnet/espnet/pull/5460
  • Do not use root logger in Beamsearch by @vsd-vector in https://github.com/espnet/espnet/pull/5454
  • Fix whisper test by @siddhu001 in https://github.com/espnet/espnet/pull/5464
  • Add doc for OWSM by @pyf98 in https://github.com/espnet/espnet/pull/5463
  • Speech-to-speech translation Task by @ftshijt in https://github.com/espnet/espnet/pull/4859
  • AVSR recipes on LRS3 using pre-trained AV-HuBERT model by @ms-dot-k in https://github.com/espnet/espnet/pull/5456
  • Support LoRA based large model finetuning. by @pengchengguo in https://github.com/espnet/espnet/pull/5400
  • Multilingual Librispeech (MLS) refactor ASR1 recipe by @juice500ml in https://github.com/espnet/espnet/pull/5323
  • Add phonemized LibriTTS ASR recipe by @akreal in https://github.com/espnet/espnet/pull/5466
  • Update the Enh framework to support training with variable numbers of speakers by @Emrys365 in https://github.com/espnet/espnet/pull/5414
  • speed up TFGridNet code by @zqwang7 in https://github.com/espnet/espnet/pull/5395
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5468
  • ASR2 recipe on Tedlium3 dataset by @kohei0209 in https://github.com/espnet/espnet/pull/5331
  • Create README.md in OWSM v1 by @pyf98 in https://github.com/espnet/espnet/pull/5489
  • Update setup.py by @sw005320 in https://github.com/espnet/espnet/pull/5490
  • Fix default value in ML-SUPERB by @ftshijt in https://github.com/espnet/espnet/pull/5492
  • Fix bugs of Whisper SOT. by @pengchengguo in https://github.com/espnet/espnet/pull/5494
  • Multilingual Librispeech ASR2 + ASR1 baselines by @juice500ml in https://github.com/espnet/espnet/pull/5441
  • Add a new SE recipe combining five public corpora by @Emrys365 in https://github.com/espnet/espnet/pull/5484
  • Update .mergify.yml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5502
  • update version to 202310 by @kan-bayashi in https://github.com/espnet/espnet/pull/5501

New Contributors

  • @linyueqian made their first contribution in https://github.com/espnet/espnet/pull/5394
  • @mdecerbo made their first contribution in https://github.com/espnet/espnet/pull/5416
  • @zuazo made their first contribution in https://github.com/espnet/espnet/pull/5445
  • @wyh2000 made their first contribution in https://github.com/espnet/espnet/pull/5436
  • @yichen14 made their first contribution in https://github.com/espnet/espnet/pull/5434
  • @vsd-vector made their first contribution in https://github.com/espnet/espnet/pull/5454
  • @ms-dot-k made their first contribution in https://github.com/espnet/espnet/pull/5456
  • @juice500ml made their first contribution in https://github.com/espnet/espnet/pull/5323
  • @kohei0209 made their first contribution in https://github.com/espnet/espnet/pull/5331

Full Changelog: https://github.com/espnet/espnet/compare/v.202308...v.202310

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 2 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202308

What's Changed

  • Update tutorial by @ftshijt in https://github.com/espnet/espnet/pull/4648
  • Update tutorials by @ftshijt in https://github.com/espnet/espnet/pull/4898
  • add e-branchformer result for tedlium3 and add checker for text output length by @Some-random in https://github.com/espnet/espnet/pull/5130
  • Limit the Numpy version (<1.24) to fix CI error temporarily. by @simpleoier in https://github.com/espnet/espnet/pull/5162
  • [SVS] Add new recipes by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5158
  • Update README.md of CHiME-7 DASR: fixing typos by @popcornell in https://github.com/espnet/espnet/pull/5166
  • Fix typo in CONTRIBUTING.md by @eltociear in https://github.com/espnet/espnet/pull/5167
  • CHiME-7 DASR: Update install_dependencies.sh, fix lhotse version by @popcornell in https://github.com/espnet/espnet/pull/5168
  • Update TD-SpeakerBeam by @Emrys365 in https://github.com/espnet/espnet/pull/5155
  • Add pre-trained causal speech separation model and streaming demo by @LiChenda in https://github.com/espnet/espnet/pull/5172
  • KSC recipe by @khassanoff in https://github.com/espnet/espnet/pull/5171
  • [SVS] Add new recipe by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5173
  • Update AphasiaBank Recipe by @tjysdsg in https://github.com/espnet/espnet/pull/5104
  • fix the gradient backward issue when joint training with s3prl frontend by @simpleoier in https://github.com/espnet/espnet/pull/5159
  • Add installer for ParallelWaveGAN by @ftshijt in https://github.com/espnet/espnet/pull/4052
  • [GAN SVS] Add VISinger2, UHifiGAN, Avocodo by @jerryuhoo in https://github.com/espnet/espnet/pull/5123
  • [SVS] Update docs README.md by @South-Twilight in https://github.com/espnet/espnet/pull/5178
  • Update SVS README.md by @jerryuhoo in https://github.com/espnet/espnet/pull/5180
  • Adding eendss models by @soumimaiti in https://github.com/espnet/espnet/pull/5157
  • 2022fall new task tutorial by @ftshijt in https://github.com/espnet/espnet/pull/5186
  • [SVS] Updates for recipes by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5187
  • [GAN SVS] fix phoneme predictor by @jerryuhoo in https://github.com/espnet/espnet/pull/5188
  • Update generatelibrimixsd.sh by @leepeiying in https://github.com/espnet/espnet/pull/5182
  • Bug fix for #5195 by @YosukeHiguchi in https://github.com/espnet/espnet/pull/5196
  • [SVS] Update on recipes by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5197
  • Update preprocessor.py by @sw005320 in https://github.com/espnet/espnet/pull/5200
  • Minor fixes for ML-SUPERB by @ftshijt in https://github.com/espnet/espnet/pull/5202
  • Quick fix for whisper specaug by @siddhu001 in https://github.com/espnet/espnet/pull/5206
  • espnet-spk data preparation part by @Jungjee in https://github.com/espnet/espnet/pull/5184
  • Fix M4singer multi-spk recipe by @ftshijt in https://github.com/espnet/espnet/pull/5201
  • Update Dataset link for mlsuperb by @ftshijt in https://github.com/espnet/espnet/pull/5216
  • Fix bug when scoretype is set to normal in mlsuperb by @ftshijt in https://github.com/espnet/espnet/pull/5217
  • Add new functions and fix some bugs in SE by @Emrys365 in https://github.com/espnet/espnet/pull/5193
  • Update import order by @ftshijt in https://github.com/espnet/espnet/pull/5229
  • Closed CHiME-7 DASR adding evaluation inference + adding support to use diarization baseline "pre-computed" JSONs (new PR) by @popcornell in https://github.com/espnet/espnet/pull/5228
  • Standalone Transducer v1.1 by @b-flo in https://github.com/espnet/espnet/pull/5140
  • Small fixes for Transducer by @b-flo in https://github.com/espnet/espnet/pull/5247
  • add asr2 task and librispeech recipe as an example. by @simpleoier in https://github.com/espnet/espnet/pull/5181
  • fix norm compatibility in scale discriminator by @kan-bayashi in https://github.com/espnet/espnet/pull/5240
  • CFSD, SECS metrics for TTS by @imdanboy in https://github.com/espnet/espnet/pull/5235
  • Add new SE recipes: chime1/enh1, chime2/enh1, reverb/enh1, and wsj0_2mix/tse1 by @Emrys365 in https://github.com/espnet/espnet/pull/5246
  • Fix bugs in mfa_format.py by @G-Thor in https://github.com/espnet/espnet/pull/5223
  • New features for SVS by @ftshijt in https://github.com/espnet/espnet/pull/5245
  • re-fix norm compatibility in scale discriminator by @kan-bayashi in https://github.com/espnet/espnet/pull/5249
  • add conv1d subsampling 3 and egs2/librispeech/asr2 wavlmlarge21 kmeans (1000/2000) results by @simpleoier in https://github.com/espnet/espnet/pull/5252
  • Revise the ESPnet-SE++ Joss paper to incorporate the feedback from the reviewer. by @neillu23 in https://github.com/espnet/espnet/pull/5212
  • Fix a bug in score script for ML-SUPERB by @ftshijt in https://github.com/espnet/espnet/pull/5254
  • Refactor prep_segments in SVS by @jerryuhoo in https://github.com/espnet/espnet/pull/5210
  • A minor fix for numsplitsssl for training by @ftshijt in https://github.com/espnet/espnet/pull/5262
  • [SVS] add singing tacotron by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5233
  • Add script to use speaker averaged xvectors in TTS training by @G-Thor in https://github.com/espnet/espnet/pull/5244
  • Fix filling of waveform_buffer with samples for streaming inference by @espnetUser in https://github.com/espnet/espnet/pull/5267
  • Some name update for ml-superb by @ftshijt in https://github.com/espnet/espnet/pull/5276
  • Add support for K2 pruned transducer loss by @b-flo in https://github.com/espnet/espnet/pull/5268
  • Fix Transducer doc by @b-flo in https://github.com/espnet/espnet/pull/5306
  • Update installation.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5291
  • Update install_nkf.sh by @sw005320 in https://github.com/espnet/espnet/pull/5300
  • Fix Cython version to pass the installation of libraries with Cython by @kan-bayashi in https://github.com/espnet/espnet/pull/5310
  • Update README.md by @sw005320 in https://github.com/espnet/espnet/pull/5315
  • Update setup.py by @sw005320 in https://github.com/espnet/espnet/pull/5316
  • Migrate recipe for nit_song070 from Muskit by @wwwbxy123 in https://github.com/espnet/espnet/pull/5251
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5294
  • A few updates for asr2 and hubert by @simpleoier in https://github.com/espnet/espnet/pull/5285
  • Add decodeoptions and hypcleaner in evaluatewhisperinference by @pyf98 in https://github.com/espnet/espnet/pull/5272
  • update pyworld version by @kan-bayashi in https://github.com/espnet/espnet/pull/5319
  • fix a data preparation issue for librimix recipe. by @LiChenda in https://github.com/espnet/espnet/pull/5322
  • Update README.md in egs2/librimix/tse1 and egs2/wsj0_2mix/tse1 by @Emrys365 in https://github.com/espnet/espnet/pull/5289
  • fix the s3prl frontend gradient backprop bug, ensuring featuregradmult=1.0 by @simpleoier in https://github.com/espnet/espnet/pull/5297
  • ESPNet-SPK part 2 - training by @Jungjee in https://github.com/espnet/espnet/pull/5258
  • remove some tests in espnet1 integration test by @sw005320 in https://github.com/espnet/espnet/pull/5328
  • Fix random segments by @iamanigeeit in https://github.com/espnet/espnet/pull/5274
  • Skip CI for draft PR by @ftshijt in https://github.com/espnet/espnet/pull/5333
  • Update cancel.yml by @kan-bayashi in https://github.com/espnet/espnet/pull/5334
  • Update several SE recipes and bash scripts by @Emrys365 in https://github.com/espnet/espnet/pull/5327
  • Add PULLREQUESTTEMPLATE.md by @kan-bayashi in https://github.com/espnet/espnet/pull/5340
  • ESPnet-Spk part 3 - inference every epoch using EER by @Jungjee in https://github.com/espnet/espnet/pull/5314
  • Minimize espnet2 integration test by @kan-bayashi in https://github.com/espnet/espnet/pull/5324
  • PR Labels for CI control by @Fhrozen in https://github.com/espnet/espnet/pull/5320
  • Split ci into several jobs by @kan-bayashi in https://github.com/espnet/espnet/pull/5343
  • Update CONTRIBUTING.md by @sw005320 in https://github.com/espnet/espnet/pull/5335
  • Update Scoring for Speech Summarization from NLG-Eval to Huggingface Evaluate by @roshansh-cmu in https://github.com/espnet/espnet/pull/5341
  • Fix documentation skip CI by @Fhrozen in https://github.com/espnet/espnet/pull/5351
  • Update the usage by @sw005320 in https://github.com/espnet/espnet/pull/5349
  • Docker Update by @Fhrozen in https://github.com/espnet/espnet/pull/5321
  • Update installation.md by @sw005320 in https://github.com/espnet/espnet/pull/5348
  • Fix doc condition by @kan-bayashi in https://github.com/espnet/espnet/pull/5355
  • Update issue templates by @sw005320 in https://github.com/espnet/espnet/pull/5357
  • Update Contribution.md by @Fhrozen in https://github.com/espnet/espnet/pull/5352
  • Fix .mergify condition by @kan-bayashi in https://github.com/espnet/espnet/pull/5354
  • Reduce ffmpeg installation time in ci by @kan-bayashi in https://github.com/espnet/espnet/pull/5356
  • Update CI table by @kan-bayashi in https://github.com/espnet/espnet/pull/5359
  • Clean workflow files by @kan-bayashi in https://github.com/espnet/espnet/pull/5360
  • Couple of tweaks for asr2.sh for the HF hub upload by @akreal in https://github.com/espnet/espnet/pull/5362
  • Update TEMPLATEHFReadme.md (fix bash typo) by @akreal in https://github.com/espnet/espnet/pull/5361
  • Add discrete-token ASR for LibriSpeech 100h by @akreal in https://github.com/espnet/espnet/pull/5350
  • Whisper fine-tuning recipes for CHiME-4 and WSJ by @YoshikiMas in https://github.com/espnet/espnet/pull/5342
  • Fix bug in ngram training in slu.sh by @siddhu001 in https://github.com/espnet/espnet/pull/5364
  • Add musdb18 recipe for music source separation by @Emrys365 in https://github.com/espnet/espnet/pull/5338
  • Bugfix: JETS CTCLoss by @imdanboy in https://github.com/espnet/espnet/pull/5288
  • Check the value of n_shift == upsample_factor in GAN_TTS by @imdanboy in https://github.com/espnet/espnet/pull/5299
  • MFA format fix by @iamanigeeit in https://github.com/espnet/espnet/pull/5275
  • add --num-workers 0 option to enable coverage to truck data loader by @kan-bayashi in https://github.com/espnet/espnet/pull/5368
  • ESPnet-SPK: fix data augment by @Jungjee in https://github.com/espnet/espnet/pull/5347
  • A few minor fixes for SSL by @ftshijt in https://github.com/espnet/espnet/pull/5265
  • remove unused file + small typo/style by @b-flo in https://github.com/espnet/espnet/pull/5346
  • ESPnet-SPK: EER validation efficiency improvement by @Jungjee in https://github.com/espnet/espnet/pull/5358
  • New Architectures for ST by @brianyan918 in https://github.com/espnet/espnet/pull/4815
  • [SVS] Add CI test by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5269
  • Add causal LM to Hugging Face Transformers Decoder by @akreal in https://github.com/espnet/espnet/pull/5313
  • Make make_pad_mask onnx convertible by @Masao-Someki in https://github.com/espnet/espnet/pull/5326
  • fix numerical error of parallel wavegan compatibility test in CI by @kan-bayashi in https://github.com/espnet/espnet/pull/5380
  • Add LibriTTS-R recipe by @ShigekiKarita in https://github.com/espnet/espnet/pull/5379
  • minor fix: correct wrong comments by @imdanboy in https://github.com/espnet/espnet/pull/5378
  • Add quotation marks to install_datasets.sh by @qmeeus in https://github.com/espnet/espnet/pull/5387

New Contributors

  • @khassanoff made their first contribution in https://github.com/espnet/espnet/pull/5171
  • @leepeiying made their first contribution in https://github.com/espnet/espnet/pull/5182
  • @Jungjee made their first contribution in https://github.com/espnet/espnet/pull/5184
  • @wwwbxy123 made their first contribution in https://github.com/espnet/espnet/pull/5251

Full Changelog: https://github.com/espnet/espnet/compare/v.202304...v.202308

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 2 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202304

What's Changed

  • Update collect stats stage so that less memory cost in Utt_mvn by @simpleoier in https://github.com/espnet/espnet/pull/4888
  • Apply the latest black by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4907
  • Add pytorch=1.13.1 to CI configuration by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4906
  • How2 fix README, incorrect url by @roshansh-cmu in https://github.com/espnet/espnet/pull/4902
  • standardized inference and number of iterations for mSuperb single lang track by @DanBerrebbi in https://github.com/espnet/espnet/pull/4905
  • Fix typo in lrs/README.md by @eltociear in https://github.com/espnet/espnet/pull/4911
  • MSUPERB setting update by @ftshijt in https://github.com/espnet/espnet/pull/4913
  • Update test_import.yaml to install numba by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4918
  • update pyopenjtalk version to 0.3.0 by @kan-bayashi in https://github.com/espnet/espnet/pull/4912
  • CHiME-7 Task1 recipe by @popcornell in https://github.com/espnet/espnet/pull/4894
  • Update CHiME-7 Task 1 README.md by @popcornell in https://github.com/espnet/espnet/pull/4920
  • Use native CPU version of STFT on newer pytorch versions, fix librosa window size < ftt by @bmilde in https://github.com/espnet/espnet/pull/4922
  • Add few shot subset for mSuperb multilingual setting by @guapaQAQ in https://github.com/espnet/espnet/pull/4923
  • Fix existing bugs in the TSE task by @Emrys365 in https://github.com/espnet/espnet/pull/4915
  • IAM OCR recipe updates by @kenzheng99 in https://github.com/espnet/espnet/pull/4927
  • Fixing some issues with chime7-task1 baseline by @popcornell in https://github.com/espnet/espnet/pull/4925
  • set default none decoder for ASR by @ftshijt in https://github.com/espnet/espnet/pull/4917
  • Update inference and training setting for mSuperb multilingual model by @guapaQAQ in https://github.com/espnet/espnet/pull/4932
  • Add E-Branchformer Transducer results by @pyf98 in https://github.com/espnet/espnet/pull/4933
  • add tf-gridnet by @zqwang7 in https://github.com/espnet/espnet/pull/4864
  • Fixes + Channel Selection for CHiME-7 Task by @popcornell in https://github.com/espnet/espnet/pull/4934
  • fix extracted feature dummy generation by @roshansh-cmu in https://github.com/espnet/espnet/pull/4926
  • Fix device mismatch error in GPU decoding with PyTorch 1.13 by @pyf98 in https://github.com/espnet/espnet/pull/4941
  • CHiME-7 DASR MD5 checksum fix for mixer6/train_call by @popcornell in https://github.com/espnet/espnet/pull/4942
  • Update showasrresult.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4943
  • CHiME-7 DASR correct development results by @popcornell in https://github.com/espnet/espnet/pull/4946
  • Fix 'floordiv is deprecated' warnings by @fujimotos in https://github.com/espnet/espnet/pull/4945
  • Added WSLII installation instruction by @sw005320 in https://github.com/espnet/espnet/pull/4949
  • Update Muskits by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4931
  • Set a longer time execution threshold for related failed time-outs CI by @ftshijt in https://github.com/espnet/espnet/pull/4962
  • Modify data prep for mSUPERB multilingual by @guapaQAQ in https://github.com/espnet/espnet/pull/4965
  • Add E-Branchformer results in some recipes by @pyf98 in https://github.com/espnet/espnet/pull/4958
  • Add 'six' as a required Python module by @fujimotos in https://github.com/espnet/espnet/pull/4964
  • add msuperb linguistic analysis by @hhhaaahhhaa in https://github.com/espnet/espnet/pull/4938
  • Fix a 'refchannel'-related issue in espnet2/bin/enhinference.py by @Emrys365 in https://github.com/espnet/espnet/pull/4972
  • Add E-Branchformer results in slurp_entity by @pyf98 in https://github.com/espnet/espnet/pull/4971
  • Add Conformer and E-Branchformer results in fisherspanishcallhome ASR by @pyf98 in https://github.com/espnet/espnet/pull/4976
  • [SVS] Add Joint-training by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4977
  • Update the chunk iterator for the TSE task by @Emrys365 in https://github.com/espnet/espnet/pull/4929
  • update msuperb LID scoring script by @hhhaaahhhaa in https://github.com/espnet/espnet/pull/4979
  • add multilingual+lid lid score generation by @hhhaaahhhaa in https://github.com/espnet/espnet/pull/4982
  • Add python=3.10 to CI by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4627
  • LID score v2 by @hhhaaahhhaa in https://github.com/espnet/espnet/pull/4983
  • Fix ci by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4985
  • Change to use Ubuntu-latest instead of Ubuntu-18.04 in CI by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4986
  • Remove six by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4988
  • Modify formatwavscp.py to support PCM of uint8, int32, float32, float64, etc. by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4997
  • Fix Whisper tokenizer CI error by @slSeanWU in https://github.com/espnet/espnet/pull/5004
  • fix s3prl upstream attribute bug by @jwrh in https://github.com/espnet/espnet/pull/5003
  • [Recipe] Add iwslt22 low resource speech translation task for egs2 by @freddy5566 in https://github.com/espnet/espnet/pull/4994
  • Fix typeguard version by @silvanocerza in https://github.com/espnet/espnet/pull/5009
  • Add .pre-commit-config.yaml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5011
  • Copy Kaldi utils/steps/sid and add a new github action to check the consistency by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4998
  • Modfiy .pre-commit-config.yaml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5012
  • Modify .pre-commit-config.yaml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5014
  • Modify .pre-commit-config.yaml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5015
  • [Tuning] iwslt22 low-resource ST decode configuration tuning by @freddy5566 in https://github.com/espnet/espnet/pull/5019
  • Modify asr.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5020
  • [SVS] Improve visinger by @jerryuhoo in https://github.com/espnet/espnet/pull/5022
  • Use scripts/utils/printargs.sh instead of pyscripts/utils/printargs.py by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5025
  • Add docstring in extra_path.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5028
  • Update installation.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5029
  • Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5030
  • Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5031
  • Change bc to python by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5032
  • Update tools/Makefile and path.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5027
  • Fix for formatwavscp.py by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5038
  • Add execute permission to installiceg2p.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5040
  • Bug fix of #5025 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5039
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5041
  • Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5042
  • Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5043
  • Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5045
  • Fix in gentask1data.sh from CHiME7 by @boeddeker in https://github.com/espnet/espnet/pull/4953
  • Update README.md by @eml914 in https://github.com/espnet/espnet/pull/5044
  • Add installers/install_ffmpeg.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5046
  • Fix broken links reported by #5048 by @ShigekiKarita in https://github.com/espnet/espnet/pull/5050
  • fix: resolve upgrade issues with praatio 6.0; lock praatio version by @timmahrt in https://github.com/espnet/espnet/pull/4978
  • Add miniconda in gitignore by @pyf98 in https://github.com/espnet/espnet/pull/5052
  • CHiME-7 DASR fixes from participants feedback by @popcornell in https://github.com/espnet/espnet/pull/4999
  • Fix the condition for maxlen warning in beam search by @pyf98 in https://github.com/espnet/espnet/pull/5055
  • Fixed SQLalchemy version for MFA by @Fhrozen in https://github.com/espnet/espnet/pull/5059
  • Support Multi-Blank Transducer in Espnet2 by @jctian98 in https://github.com/espnet/espnet/pull/4876
  • Fix chime7 DASR task1 run.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5060
  • CHiME-7 DASR recipe, fix display bug for scenario-wide DER and JER by @popcornell in https://github.com/espnet/espnet/pull/5061
  • Add testformatwavscpsh.bats by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5062
  • Update documentation by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5063
  • Support SOT training on LibriMix data. by @pengchengguo in https://github.com/espnet/espnet/pull/4861
  • Update check_install.py by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5066
  • Tedlium3 recipe by @Some-random in https://github.com/espnet/espnet/pull/5068
  • Bug Fix: pretrained s3prl-frontend based models loaded with parameters key mismatch error by @simpleoier in https://github.com/espnet/espnet/pull/5074
  • Mechanism for multi channels input using multi columns wav.scp by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5075
  • Clean ML-SUPERB by @ftshijt in https://github.com/espnet/espnet/pull/5067
  • CHiME-7 DASR: first diarization system based on Pyannote. by @popcornell in https://github.com/espnet/espnet/pull/5054
  • Chime7-task1 diarization (updated results) by @popcornell in https://github.com/espnet/espnet/pull/5088
  • Add InterCTC to E-Branchformer encoder, and the ability to save InterCTC inference output to files by @tjysdsg in https://github.com/espnet/espnet/pull/5084
  • [SVS] Bug fix: sample rate by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5094
  • [SVS] Extend SingingGenerate by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5100
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5080
  • Add kaldi steps/libs by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5106
  • Fix sentencepice version to v0.1.97 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5107
  • Drop PyTorch<=1.9 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5111
  • Update installers/install_kenlm.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5110
  • Merge */{scripts,pyscripts} into asr1/{scripts,pyscripts} by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5109
  • Update ReazonSpeech training recipe for v1.1.0 by @fujimotos in https://github.com/espnet/espnet/pull/5114
  • Fix typo in espnet2formatwav_scp.md by @boeddeker in https://github.com/espnet/espnet/pull/5116
  • Dtype for Speechbrain by @Fhrozen in https://github.com/espnet/espnet/pull/5112
  • Add test of soundfile for Makefile by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5119
  • Add lm_inference for conditional text generation by @pyf98 in https://github.com/espnet/espnet/pull/5122
  • CHiME-7 diarization (updated README.md) by @popcornell in https://github.com/espnet/espnet/pull/5102
  • [WIP] Update Docker by @Fhrozen in https://github.com/espnet/espnet/pull/5128
  • Fix several bugs and improve function design in SE by @Emrys365 in https://github.com/espnet/espnet/pull/5103
  • [SVS] Update XiaoiceSing by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5124
  • Add missing filterscps scripts and note about kaldi for diarization example of minilibrispeech by @toto6038 in https://github.com/espnet/espnet/pull/5139
  • Bump up the debian version to 11 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5144
  • Bug fixing and improvement in SE functions by @Emrys365 in https://github.com/espnet/espnet/pull/5143
  • Add data augmentation to ReazonSpeech recipe by @fujimotos in https://github.com/espnet/espnet/pull/5127
  • Update error calculator for transducer by @aky15 in https://github.com/espnet/espnet/pull/5097
  • Add streaming speech enhancemnt inference. by @LiChenda in https://github.com/espnet/espnet/pull/5049
  • Update README.md about debian by @sw005320 in https://github.com/espnet/espnet/pull/5146
  • Fix issues in split scps by @pyf98 in https://github.com/espnet/espnet/pull/5138
  • fix 5148 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5149
  • fix formatwavscp.py by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5150
  • Add more stats to the training log by @Emrys365 in https://github.com/espnet/espnet/pull/5147
  • update version to 202304 by @kan-bayashi in https://github.com/espnet/espnet/pull/5151

New Contributors

  • @bmilde made their first contribution in https://github.com/espnet/espnet/pull/4922
  • @guapaQAQ made their first contribution in https://github.com/espnet/espnet/pull/4923
  • @zqwang7 made their first contribution in https://github.com/espnet/espnet/pull/4864
  • @hhhaaahhhaa made their first contribution in https://github.com/espnet/espnet/pull/4938
  • @jwrh made their first contribution in https://github.com/espnet/espnet/pull/5003
  • @freddy5566 made their first contribution in https://github.com/espnet/espnet/pull/4994
  • @silvanocerza made their first contribution in https://github.com/espnet/espnet/pull/5009
  • @pre-commit-ci made their first contribution in https://github.com/espnet/espnet/pull/5041
  • @boeddeker made their first contribution in https://github.com/espnet/espnet/pull/4953
  • @timmahrt made their first contribution in https://github.com/espnet/espnet/pull/4978
  • @Some-random made their first contribution in https://github.com/espnet/espnet/pull/5068
  • @toto6038 made their first contribution in https://github.com/espnet/espnet/pull/5139

Full Changelog: https://github.com/espnet/espnet/compare/v.202301...v.202304

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 2 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202301

What's Changed

  • Initialize VISinger branch by @ftshijt in https://github.com/espnet/espnet/pull/4683
  • Update VISInger branch by @ftshijt in https://github.com/espnet/espnet/pull/4705
  • Update UASR branch with latest ESPnet functions by @ftshijt in https://github.com/espnet/espnet/pull/4752
  • Update uasr by @ftshijt in https://github.com/espnet/espnet/pull/4770
  • Shell scripts for UASR processing by @ftshijt in https://github.com/espnet/espnet/pull/4769
  • Uasr python scripts by @DongjiGao in https://github.com/espnet/espnet/pull/4791
  • Update visinger by @ftshijt in https://github.com/espnet/espnet/pull/4818
  • Update testcustomtransducer.py by @sw005320 in https://github.com/espnet/espnet/pull/4826
  • Update asr.sh by @sw005320 in https://github.com/espnet/espnet/pull/4827
  • Fixed pad mode for librosa.stft by @Masao-Someki in https://github.com/espnet/espnet/pull/4832
  • Add E-Branchformer models in some recipes by @pyf98 in https://github.com/espnet/espnet/pull/4833
  • Fix data prep in GigaSpeech by @pyf98 in https://github.com/espnet/espnet/pull/4836
  • time sync decoding for asr by @brianyan918 in https://github.com/espnet/espnet/pull/4792
  • Remove duplicated VOXFORGE in db.sh (line81 and line157) by @pyf98 in https://github.com/espnet/espnet/pull/4840
  • Fix argument parsing for nonlinguisticsymbols in asr.sh by @pyf98 in https://github.com/espnet/espnet/pull/4841
  • Add a warning statement when the hypo length equals to the max out length. by @pengchengguo in https://github.com/espnet/espnet/pull/4843
  • Add target speaker extraction (TSE) functions by @Emrys365 in https://github.com/espnet/espnet/pull/4823
  • Multilingual superb by @ftshijt in https://github.com/espnet/espnet/pull/4824
  • VISinger by @jerryuhoo in https://github.com/espnet/espnet/pull/4689
  • Update VISInger to latest by @ftshijt in https://github.com/espnet/espnet/pull/4849
  • VISinger for singing voice synthesis by @ftshijt in https://github.com/espnet/espnet/pull/4848
  • Reduce word counts for ESPnet-SE++ Joss paper by @neillu23 in https://github.com/espnet/espnet/pull/4844
  • Add E-Branchformer configs and models in ASR recipes by @pyf98 in https://github.com/espnet/espnet/pull/4837
  • Address Muskits updates on README by @ftshijt in https://github.com/espnet/espnet/pull/4850
  • Minor fix for MSUPERB recipe by @ftshijt in https://github.com/espnet/espnet/pull/4851
  • Update for the latest changes in the draft (minor changes) by @neillu23 in https://github.com/espnet/espnet/pull/4852
  • Add E-Branchformer results on Librispeech by @kkim-asapp in https://github.com/espnet/espnet/pull/4856
  • Update hubert implementation. by @simpleoier in https://github.com/espnet/espnet/pull/4747
  • VISinger unit test by @jerryuhoo in https://github.com/espnet/espnet/pull/4855
  • Minor fix to commonvoice espnet1 by @ftshijt in https://github.com/espnet/espnet/pull/4862
  • [WIP] Add S4 decoder in ESPnet2 by @m-koichi in https://github.com/espnet/espnet/pull/4845
  • Update hubert feature and acknowledge information in related Readmes. by @simpleoier in https://github.com/espnet/espnet/pull/4863
  • Generating MFA aligments by @Fhrozen in https://github.com/espnet/espnet/pull/4803
  • [WIP] EURO uasr scripts by @DongjiGao in https://github.com/espnet/espnet/pull/4846
  • Update README.md related to ASR architecture by @m-koichi in https://github.com/espnet/espnet/pull/4865
  • Minor fix to librimix diar recipe by @ftshijt in https://github.com/espnet/espnet/pull/4867
  • Add Full Whisper Model for Finetuning by @slSeanWU in https://github.com/espnet/espnet/pull/4793
  • Add torchaudio version check for HuBERT pretraining by @simpleoier in https://github.com/espnet/espnet/pull/4872
  • add k2 decoder related scripts for EURO by @DongjiGao in https://github.com/espnet/espnet/pull/4868
  • EURO: small fix (temporarily remove support for nbest_rescoring) by @DongjiGao in https://github.com/espnet/espnet/pull/4875
  • Add description for Whisper ASR in homepage readme by @slSeanWU in https://github.com/espnet/espnet/pull/4877
  • Update README.md by @eltociear in https://github.com/espnet/espnet/pull/4879
  • add explanations to text tokenizing related scripts and remove unused script by @DongjiGao in https://github.com/espnet/espnet/pull/4880
  • update information about source and our modification for k2 related scripts by @DongjiGao in https://github.com/espnet/espnet/pull/4881
  • AphasiaBank ASR recipe by @tjysdsg in https://github.com/espnet/espnet/pull/4860
  • Multilingual SUPERB update by @ftshijt in https://github.com/espnet/espnet/pull/4878
  • ESPnet Unsupervised ASR (EURO project) by @ftshijt in https://github.com/espnet/espnet/pull/4774
  • Support ProDiff in TTS by @Fhrozen in https://github.com/espnet/espnet/pull/4808
  • Add E-Branchformer for GigaSpeech by @pyf98 in https://github.com/espnet/espnet/pull/4882
  • FLEURS - Auxillary CTC conditioning tasks by @wanchichen in https://github.com/espnet/espnet/pull/4756
  • Add python 3.8 requirement for Whisper & update tests by @slSeanWU in https://github.com/espnet/espnet/pull/4891
  • Update some ASR results in the main readme file by @pyf98 in https://github.com/espnet/espnet/pull/4883
  • Add Conv2dSubsampling1 module and test it in AphasiaBank ASR recipe by @tjysdsg in https://github.com/espnet/espnet/pull/4892
  • Support x-vector extractor based on RawNet by @Takaaki-Saeki in https://github.com/espnet/espnet/pull/4884
  • single language track setups by @DanBerrebbi in https://github.com/espnet/espnet/pull/4895
  • fixing bug deu1 by @DanBerrebbi in https://github.com/espnet/espnet/pull/4900
  • Fix dataprep issues based on updated data release via Google form by @roshansh-cmu in https://github.com/espnet/espnet/pull/4899
  • Add a new EGS2 recipe 'reazonspeech' by @fujimotos in https://github.com/espnet/espnet/pull/4885
  • Update version to 202301 by @kan-bayashi in https://github.com/espnet/espnet/pull/4901

New Contributors

  • @DongjiGao made their first contribution in https://github.com/espnet/espnet/pull/4791
  • @jerryuhoo made their first contribution in https://github.com/espnet/espnet/pull/4689
  • @m-koichi made their first contribution in https://github.com/espnet/espnet/pull/4845
  • @fujimotos made their first contribution in https://github.com/espnet/espnet/pull/4885

Full Changelog: https://github.com/espnet/espnet/compare/v.202211...v.202301

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 3 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202211

What's Changed

  • Update muskits update by @ftshijt in https://github.com/espnet/espnet/pull/4616
  • Muskit installation by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4617
  • Sync Muskits branch with Master by @ftshijt in https://github.com/espnet/espnet/pull/4640
  • Updates on Muskit Migration by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4631
  • Update Muskits branch by @ftshijt in https://github.com/espnet/espnet/pull/4662
  • Add stage 5 & stage 6 by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4649
  • Muskit: rename & reorganize features by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4668
  • Update Muskits branch by @ftshijt in https://github.com/espnet/espnet/pull/4671
  • Muskits CI fixing by @ftshijt in https://github.com/espnet/espnet/pull/4672
  • Muskits CI fix by @ftshijt in https://github.com/espnet/espnet/pull/4673
  • Muskits - apply isort by @ftshijt in https://github.com/espnet/espnet/pull/4677
  • Muskits CI fix by @ftshijt in https://github.com/espnet/espnet/pull/4678
  • Muskit: Add tokenizer by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4676
  • Muskits - various fix for CI test by @ftshijt in https://github.com/espnet/espnet/pull/4679
  • Muskit: add recipe ofuton by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4681
  • Muskits (CI fix) by @ftshijt in https://github.com/espnet/espnet/pull/4682
  • Fix CI issue in muskits by @ftshijt in https://github.com/espnet/espnet/pull/4687
  • Add dns_icassp22 Speech Enhancement Recipe by @slSeanWU in https://github.com/espnet/espnet/pull/4657
  • Singing Voice Synthesis Task for ESPnet by @ftshijt in https://github.com/espnet/espnet/pull/4670
  • Documentation of Tutorial and Muskits by @ftshijt in https://github.com/espnet/espnet/pull/4692
  • Add tests on MacOS and Windows (only installation) by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4669
  • Add missing entries in readme by @ftshijt in https://github.com/espnet/espnet/pull/4699
  • Support ST without texts in source language by @sophia1488 in https://github.com/espnet/espnet/pull/4688
  • Update ConvInput for Transducer by @b-flo in https://github.com/espnet/espnet/pull/4720
  • Small changes for standalone Transducer by @b-flo in https://github.com/espnet/espnet/pull/4722
  • Fix input block tutorial documentation for Transducer by @b-flo in https://github.com/espnet/espnet/pull/4724
  • Fix HF Pytest Errors by @siddhu001 in https://github.com/espnet/espnet/pull/4737
  • Update to puebla-nahuatl recipe (some minor fixes) by @ftshijt in https://github.com/espnet/espnet/pull/4713
  • Add espnet2 TTS recipe on M-AILABS by @Takaaki-Saeki in https://github.com/espnet/espnet/pull/4701
  • Update outdated enh config files by @Emrys365 in https://github.com/espnet/espnet/pull/4719
  • add srcsos & srceos for mt task to address the index out of range w… by @simpleoier in https://github.com/espnet/espnet/pull/4736
  • Add g2pkexplicitspace tokenizer by @jonghwanhyeon in https://github.com/espnet/espnet/pull/4718
  • Fix JETS inference with GST (#4743) by @kan-bayashi in https://github.com/espnet/espnet/pull/4744
  • Update on Muskit by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4700
  • add fleurs conformer+sc-ctc results by @wanchichen in https://github.com/espnet/espnet/pull/4746
  • Add recipe for OCR task on IAM handwriting dataset by @kenzheng99 in https://github.com/espnet/espnet/pull/4707
  • Add Talromur2 recipe by @G-Thor in https://github.com/espnet/espnet/pull/4680
  • Add multi-channel enh_asr for CHiME-4 by @YoshikiMas in https://github.com/espnet/espnet/pull/4706
  • chunk_mask error by @aky15 in https://github.com/espnet/espnet/pull/4751
  • fix wav2vec2 encoder mask bug by @simpleoier in https://github.com/espnet/espnet/pull/4772
  • Add Hugging Face Transformers Decoder, Tokenizer and their example on SLURP by @akreal in https://github.com/espnet/espnet/pull/4099
  • [Recipe PR] MELD: Multimodal EmotionLines Dataset by @realzza in https://github.com/espnet/espnet/pull/4771
  • MultiIRIS follow up by @YoshikiMas in https://github.com/espnet/espnet/pull/4765
  • Add CATSLU results for XLS-R with mBART-50 by @akreal in https://github.com/espnet/espnet/pull/4782
  • Add MEDIA and PortMEDIA results for XLS-R with mBART-50 by @akreal in https://github.com/espnet/espnet/pull/4794
  • Add SLUE-VoxPopuli results for WavLM with mBART-50 by @akreal in https://github.com/espnet/espnet/pull/4777
  • Follow up for SLURP and CATSLU by @akreal in https://github.com/espnet/espnet/pull/4796
  • Update README in chime4/enh_asr1 by @YoshikiMas in https://github.com/espnet/espnet/pull/4795
  • fix parsing token_list by @imdanboy in https://github.com/espnet/espnet/pull/4778
  • Use torchaudio functions for beamforming related operations in torch 1.12.1+ by @Emrys365 in https://github.com/espnet/espnet/pull/4638
  • PIT E2E multi-speaker ASR and librimix recipe by @simpleoier in https://github.com/espnet/espnet/pull/4753
  • Fix an audio format issue in some enh recipes by @YoshikiMas in https://github.com/espnet/espnet/pull/4799
  • Fixing How2-2000h Data preparation and Seq Length Assert for Longformer Encoder by @roshansh-cmu in https://github.com/espnet/espnet/pull/4805
  • Adding MFA scripts for LJSpeech by @iamanigeeit in https://github.com/espnet/espnet/pull/4801
  • fix typo in espnet2_tutorial.md by @eltociear in https://github.com/espnet/espnet/pull/4811
  • [WIP] E-Branchformer Encoder in ESPnet2 by @kkim-asapp in https://github.com/espnet/espnet/pull/4812
  • Muskit update by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4783

New Contributors

  • @A-Quarter-Mile made their first contribution in https://github.com/espnet/espnet/pull/4617
  • @sophia1488 made their first contribution in https://github.com/espnet/espnet/pull/4688
  • @kenzheng99 made their first contribution in https://github.com/espnet/espnet/pull/4707
  • @realzza made their first contribution in https://github.com/espnet/espnet/pull/4771
  • @iamanigeeit made their first contribution in https://github.com/espnet/espnet/pull/4801
  • @eltociear made their first contribution in https://github.com/espnet/espnet/pull/4811
  • @kkim-asapp made their first contribution in https://github.com/espnet/espnet/pull/4812

Full Changelog: https://github.com/espnet/espnet/compare/v.202209...v.202211

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 3 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202209

What's Changed

  • Add dynamic mixing in the speech separation task. by @LiChenda in https://github.com/espnet/espnet/pull/4387
  • Added test script and usage for calculate_rtf.py script to ESPnet2 tutorial page by @espnetUser in https://github.com/espnet/espnet/pull/4560
  • Offline/Online (standalone) ESPnet2 Transducer by @b-flo in https://github.com/espnet/espnet/pull/4479
  • Unfix matplotlib version by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4576
  • use torch.finfo for dtype other than float by @wenzhe-nrv in https://github.com/espnet/espnet/pull/4584
  • Update recipe for slurp-entity by @ftshijt in https://github.com/espnet/espnet/pull/4585
  • Egs2 aesrc by @brianyan918 in https://github.com/espnet/espnet/pull/4592
  • update checks for bias in initialization by @LiChenda in https://github.com/espnet/espnet/pull/4574
  • [WIP] Update to fit the recent update in s3prl. by @simpleoier in https://github.com/espnet/espnet/pull/4593
  • Unfix numpy version by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4598
  • Update to fit the recent update in s3prl. by @simpleoier in https://github.com/espnet/espnet/pull/4600
  • Add improved results on FLEURS dataset by @wanchichen in https://github.com/espnet/espnet/pull/4596
  • Update mp4towav.sh by @jaehyun-ko in https://github.com/espnet/espnet/pull/4605
  • Pass output_dir as str to wandb.init() by @jonghwanhyeon in https://github.com/espnet/espnet/pull/4607
  • Support enh_s2t joint training on multi-speaker data by @Emrys365 in https://github.com/espnet/espnet/pull/4566
  • Add ASR results for commonvoice zh_TW by @slSeanWU in https://github.com/espnet/espnet/pull/4612
  • Fix both utt2sid and utt2lid when removing long/short data by @jonghwanhyeon in https://github.com/espnet/espnet/pull/4609
  • recipe config update by @ftshijt in https://github.com/espnet/espnet/pull/4621
  • Add pytorch=1.12.1 to CI configurations by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4604
  • New SLU task by @siddhu001 in https://github.com/espnet/espnet/pull/4569
  • Joss paper: Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing by @neillu23 in https://github.com/espnet/espnet/pull/4620
  • Update conformer result of AMI corpus by @teinhonglo in https://github.com/espnet/espnet/pull/4629
  • Offline/Online Branchformer Transducer by @b-flo in https://github.com/espnet/espnet/pull/4582
  • Change to install numba using pip instead of conda by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4637
  • Add MixIT support. It is unsupervised only. Semi-supervised config is not available for now. by @simpleoier in https://github.com/espnet/espnet/pull/4619
  • Add 2-pass SLU code for FSC Challenge by @siddhu001 in https://github.com/espnet/espnet/pull/4636
  • CI fix and some other minor recipe fixes by @ftshijt in https://github.com/espnet/espnet/pull/4656
  • Update the title of plots to be y-label vs x-label by @pyf98 in https://github.com/espnet/espnet/pull/4647
  • Update VIVOS download link by @hieuthi in https://github.com/espnet/espnet/pull/4644
  • Add ASR recipe of MAGICDATA mandarin read speech by @tjysdsg in https://github.com/espnet/espnet/pull/4635
  • Amend to CI fix by @ftshijt in https://github.com/espnet/espnet/pull/4663
  • qasr update by @massabaali7 in https://github.com/espnet/espnet/pull/4642
  • Open_li110 for large-scale multilingual speech by @ftshijt in https://github.com/espnet/espnet/pull/4408
  • Fix the path of calculate_rft.py by @sw005320 in https://github.com/espnet/espnet/pull/4660
  • Fix importlib-metadata version by @kan-bayashi in https://github.com/espnet/espnet/pull/4686
  • Cmu arctic tts pretrain finetune by @soumimaiti in https://github.com/espnet/espnet/pull/4456
  • updated version to 202209 by @kan-bayashi in https://github.com/espnet/espnet/pull/4685

New Contributors

  • @wenzhe-nrv made their first contribution in https://github.com/espnet/espnet/pull/4584
  • @jaehyun-ko made their first contribution in https://github.com/espnet/espnet/pull/4605
  • @jonghwanhyeon made their first contribution in https://github.com/espnet/espnet/pull/4607
  • @slSeanWU made their first contribution in https://github.com/espnet/espnet/pull/4612
  • @massabaali7 made their first contribution in https://github.com/espnet/espnet/pull/4642
  • @soumimaiti made their first contribution in https://github.com/espnet/espnet/pull/4456

Full Changelog: https://github.com/espnet/espnet/compare/v.202207...v.202209

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 3 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202207

New Features

  • [New Features][ESPnet1][ASR] Add DDP support for v1 ASR training. #4430 by @lazykyama
  • [New Features][ESPnet2] Support tensorboard graph #4418 by @kamo-naoyuki
  • [New Features][ESPnet2][ASR] Branchformer Encoder in ESPnet2 #4400 by @pyf98
  • [New Features][ESPnet2][Diarization][SE] enh_diar joint model #4339 by @YushiUeda
  • [New Features][ESPnet2][ESPnet1] Calculate RTF and latency in espnet2 #4382 by @espnetUser
  • [New Features][ESPnet2][ESPnet1][SE] Add EnhPreprocessor for Speech Enhancement #4321 by @Emrys365
  • [New Features][ESPnet2][SE] Add DPTNet and WarmupStepLR scheduler #4449 by @Emrys365
  • [New Features][ESPnet2][SE] Add support for calculating losses on noise and dereverberated signals #4476 by @Emrys365

Recipe

  • [Recipe][ESPnet2] Aishell-2 GPU info #4501 by @jctian98
  • [Recipe][ESPnet2] Fix librispeech default path to signify auto download #4517 by @karthik19967829
  • [Recipe][ESPnet2] Recipe fix for PueblaNahuatl Recipe #4522 by @ftshijt
  • [Recipe][ESPnet2][ASR][README] Add Aishell-2 ASR Recipe for Espnet2 #4451 by @jctian98
  • [Recipe][ESPnet2][ASR][README] Add AmericasNLP 2022 baselines #4428 by @akreal
  • [Recipe][ESPnet2][ESPnet1][ASR][Installation] FLEURS ASR Recipe for ESPnet2 #4455 by @wanchichen
  • [Recipe][ESPnet2][ESPnet1][ASR][README] tedxspanishcorpus egs2 recipe #4523 by @jessicah25
  • [Recipe][ESPnet2][ESPnet1][ASR][SE] Adding L3DAS22 Task1 model to ESPNet-SE #3994 by @popcornell
  • [Recipe][ESPnet2][ESPnet1][ST] Must_C v1 and v2 in egs2 #4306 by @brianyan918
  • [Recipe][ESPnet2][README] Dcase task1 Baseline #4317 by @siddhu001
  • [Recipe][ESPnet2][README] Report Aishell-2 Transducer results #4489 by @jctian98
  • [Recipe][ESPnet2][README] Update language codes in AmericasNLP 2022 baseline #4441 by @akreal
  • [Recipe][ESPnet2][README] Vox populi baseline #4478 by @siddhu001
  • [Recipe][ESPnet2][SE] L3DAS22 enhancement recipe #4269 by @neillu23
  • [Recipe][ESPnet2][SE] Update notes in the recipes for DNS challenges #4433 by @YoshikiMas
  • [Recipe][ESPnet2][SE][SLU][ST] LT-Spatialized and SLURP-Spatialized combined enhancement recipe #4268 by @neillu23
  • [Recipe][ESPnet2][ST] Add moses check for ST recipes #4417 by @ftshijt
  • [Recipe][ESPnet2][TTS] Add talromur recipe #4379 by @G-Thor
  • [Recipe][ESPnet2][TTS] Fix for issue #4401 #4402 by @G-Thor
  • [Recipe][ESPnet2][TTS] add pre-trained model jets in the recipe of ljspeech, kss #4406 by @imdanboy

Bugfix

  • [Bugfix][ESPnet1] fix the corrupted pretrained model #4490 by @wentaoxandry
  • [Bugfix][ESPnet1][ESPnet2] Fix an4 URL #4427 by @pyf98
  • [Bugfix][ESPnet1][ESPnet2][RNNT] Fix mAES with big vocab size #4312 by @b-flo
  • [Bugfix][ESPnet2] Adding init.py to espnet2/diar/layers and espnet2/diar/separator #4470 by @cycentum
  • [Bugfix][ESPnet2] Fix tensorboard-graph creation for multi gpu mode #4431 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Update char_tokenizer.py #4499 by @xiabingquan
  • [Bugfix][ESPnet2][ESPnet1][ASR][LM][MT][TTS] Fix Transducer LM fusion and add Logging for Transducer inference #4327 by @chintu619
  • [Bugfix][ESPnet2][SE] Fix a bug in enh unit test #4435 by @Emrys365

Enhancement

  • [Enhancement][ESPnet2] Optionize graph creation #4551 by @kan-bayashi
  • [Enhancement][ESPnet2][Installation][TTS] Add icelandic g2p #4384 by @G-Thor
  • [Enhancement][ESPnet2][SE] Add support of test-only criterions after each epoch #4381 by @Emrys365
  • [Enhancement][ESPnet2][SSL] raise more useful error in espnet2/asr/frontend/s3prl.py if s3prl is not installed #4480 by @popcornell
  • [Enhancement][ESPnet2][TTS] Add JETS AlignmentModule in calculateallattentions.py #4446 by @seastar105

Refactoring

  • [Refactoring][ESPnet1] Refactoring 'is_prefix' function #4530 by @jhlee9010
  • [Refactoring][ESPnet2][ASR] Zero_infinity option for ctc loss #4415 by @kamo-naoyuki

Others

  • [CI][ESPnet1][ESPnet2][Installation] Remove the version restriction for numpy #4419 by @kamo-naoyuki
  • [CI][ESPnet2] Canged to install espnet from wheel in the test_import CI test #4471 by @kamo-naoyuki
  • [CI][Installation] Temporary fixed numpy version #4464 by @kamo-naoyuki
  • [Documentation] Add notes on batch size and num of GPUs in ESPnet2 documentation #4436 by @pyf98
  • [Documentation][ESPnet1] Update decoder.py #4322 by @sw005320
  • [Documentation][ESPnet2] Add a note to follow the installation instructions #4477 by @akreal

Acknowledgements

Special thanks to @Emrys365, @G-Thor, @YoshikiMas, @YushiUeda, @akreal, @b-flo, @brianyan918, @chintu619, @cycentum, @espnetUser, @ftshijt, @imdanboy, @jctian98, @jessicah25, @jhlee9010, @kamo-naoyuki, @kan-bayashi, @karthik19967829, @lazykyama, @neillu23, @popcornell, @pyf98, @seastar105, @siddhu001, @sw005320, @wanchichen, @wentaoxandry, @xiabingquan.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 3 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet version 202205

New Features

  • [New Features][ESPnet1][ESPnet2][ASR] Add quantization in ESPnet2 for asr inference #4349 by @pyf98
  • [New Features][ESPnet2][SE] Add svoice recipe for wsj0-2mix speech separation #4257 by @nateanl
  • [New Features][ESPnet2][SE] Merge Deep Clustering and Deep Attractor Network to enh separator #4110 by @earthmanylf
  • [New Features][ESPnet2][SE] Some improvements to current enh functions #4251 by @Emrys365
  • [New Features][ESPnet2][SE][Installation] Import fastbsseval and update some time-domain losses for enh task #4256 by @LiChenda
  • [New Features][ESPnet2][TTS] add e2e tts model: JETS #4364 by @imdanboy

Bugfix

  • [Bugfix][ESPnet1] Fix minimum input length for Conv2dSubsampling2 in checkshortutt #4378 by @akreal
  • [Bugfix][ESPnet1][ESPnet2] Minor fixes for the intermediate loss usage and Mask-CTC decoding #4374 by @YosukeHiguchi
  • [Bugfix][ESPnet2] Fix #4396 #4398 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix a bug in utterance_mvn #4304 by @Emrys365
  • [Bugfix][ESPnet2] Minor fix for Mask-CTC forward function #4347 by @YosukeHiguchi
  • [Bugfix][ESPnet2] Wandb Minor Fix for Model Resume #4329 by @roshansh-cmu
  • [Bugfix][ESPnet2] fix the enhs2ttask argument in espnet2/bin/st_inference.py #4323 by @simpleoier
  • [Bugfix][ESPnet2][MT][ST] fix bug in mt/st templates for having separate token lists #4149 by @brianyan918
  • [Bugfix][ESPnet2][Recipe] Fix aishell3 data preparation script #4277 by @LanceaKing
  • [Bugfix][ESPnet2][SE] Fix a bug in stats aggregation when PITSolver is used #4343 by @Emrys365
  • [Bugfix][ESPnet2][SE] fix for enhancement model loading compatibility #4259 by @LiChenda
  • [Bugfix][ESPnet2][ST] bug fixes in ST recipes #4341 by @chintu619
  • [Bugfix][ESPnet2][TTS] Fix optional data names for TTS #4355 by @kan-bayashi
  • [Bugfix][ESPnet2][TTS] fix a bug in Mandarin pypinying2pphone #4206 by @WeiGodHorse
  • [Bugfix][ESPnet2][TTS] fix loss = NaN in VITS with mixed precision #4356 by @kan-bayashi
  • [Bugfix][ESPnet2][streaming] Add unit test to streaming ASR inference #4352 by @espnetUser
  • [Bugfix][Installation] fix s3prl install by using legacy version. Temporal solution. #4399 by @simpleoier
  • [Bugfix][README] Fix typo #4338 by @ftshijt

Enhancement

  • [Enhancement][ESPnet1][ESPnet2][ASR][SE][SLU][ST] enh_s2t joint model #4226 by @simpleoier
  • [Enhancement][ESPnet2] Add progress bar to phonemization #4320 by @G-Thor
  • [Enhancement][ESPnet2][MT] Update showtranslationresult.sh to show all decoding results under the given exp directory #4330 by @pyf98

Recipe

  • [Recipe][ESPnet1][ASR] Accented English Speech Recognition Challenge 2020 recipe (AESRC2020) #3898 by @brianyan918
  • [Recipe][ESPnet1][ESPnet2][ASR][README][Recipe] Add MediaSpeech ASR recipe #4183 by @AshibaWu
  • [Recipe][ESPnet2][ASR][README] recipee for Microsoft speech corpus for Indian Languages #4191 by @navya-yarrabelly
  • [Recipe][ESPnet2][ASR][README] Accented French Openslr57 ASR recipe (ESPnet2) (part of Homework3 MNLP) #4280 by @DanBerrebbi
  • [Recipe][ESPnet2][ASR][README] Add Mask-CTC results #4180 by @YosukeHiguchi
  • [Recipe][ESPnet2][ASR][README] Add ml_openslr63 ASR recipe #4173 by @bharaniuk
  • [Recipe][ESPnet2][ASR][README] Adding new recipe for Burmese (OpenSLR80) #4182 by @JainSameer06
  • [Recipe][ESPnet2][ASR][README] add chime6 recipe #4332 by @simpleoier
  • [Recipe][ESPnet2][ASR][SE][README] add egs2/chime4/enh_asr1 recipe and results #4316 by @simpleoier
  • [Recipe][ESPnet2][README][RNNT] updated librispeech-asr with rnn-t results #4281 by @chintu619
  • [Recipe][ESPnet2][README][SE] 2021 Clarity Challenge recipe #4210 by @popcornell
  • [Recipe][ESPnet2][README][SE] Add AISHELL-4 ENH recipe #4249 by @Emrys365
  • [Recipe][ESPnet2][README][SE] Add ConferencingSpeech 2021 recipe to egs2 #4192 by @Emrys365
  • [Recipe][ESPnet2][README][SE] Add ICASSP2021 DNS Challenge 2 recipe #4253 by @YoshikiMas
  • [Recipe][ESPnet2][README][SE] Add INTERSPEECH 2021 DNS Challenge 3 recipe #4238 by @YoshikiMas
  • [Recipe][ESPnet2][README][SE] Add results of ICASSP2021 DNS Challenge 2 recipe #4309 by @YoshikiMas
  • [Recipe][ESPnet2][README][SE] Rename egs2/clarity21/enh_2021 to egs2/clarity21/enh1 #4328 by @Emrys365
  • [Recipe][ESPnet2][README][SE] add convtasnet recipe for dns_ins20 #4314 by @muqiaoy
  • [Recipe][ESPnet2][README][SLU] Harpervalley recipe #4315 by @YushiUeda
  • [Recipe][ESPnet2][README][SLU] SLUE Voxpopuli base recipe #4262 by @siddhu001
  • [Recipe][ESPnet2][README][ST] CoVOST2 recipes #4300 by @ftshijt
  • [Recipe][ESPnet2][SLU][README] Update SLU results for ICASSP #4283 by @siddhu001

Others

  • [CI][Docker] Github Action Trigger Docker Build #4295 by @Fhrozen
  • [CI][Docker] Github Action for Docker build #4219 by @Fhrozen
  • [CI][ESPnet1][ESPnet2][Installation][README] Add isort checking to the CI tests #4372 by @kamo-naoyuki
  • [CI][ESPnet1][ESPnet2][Installation][README][mergify] Add pytorch=1.10.2 and 1.11.0 to ci configurations #4348 by @kamo-naoyuki
  • [CI][ESPnet2][ASR][SE] add integration test and fix the decoding in enhasr and enhst #4310 by @simpleoier
  • [CI][ESPnet2][New Features][SLU][ST][streaming] Add streaming ST/SLU #4243 by @D-Keqi
  • [CI][ESPnet2][ST] Add Test Functions for ST Train and Inference #4324 by @ftshijt
  • [CI][Installation] update install_pesq.sh #4265 by @LiChenda
  • [Documentation][ESPnet2][README][TTS] Minor update for JETS #4369 by @kan-bayashi
  • [Documentation][README] Change the order of README #4289 by @ftshijt
  • [Documentation][README] Update README.md #4284 by @sw005320

Acknowledgements

Special thanks to @AshibaWu, @D-Keqi, @DanBerrebbi, @Emrys365, @Fhrozen, @G-Thor, @JainSameer06, @LanceaKing, @LiChenda, @WeiGodHorse, @YoshikiMas, @YosukeHiguchi, @YushiUeda, @akreal, @bharaniuk, @brianyan918, @chintu619, @earthmanylf, @espnetUser, @ftshijt, @imdanboy, @kamo-naoyuki, @kan-bayashi, @muqiaoy, @nateanl, @navya-yarrabelly, @popcornell, @pyf98, @roshansh-cmu, @siddhu001, @simpleoier, @sw005320.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 3 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 202204

News

From this version, we decided to use date-based versioning, e.g., v.202204.

New Features

  • [New Features][ESPnet1] added learnable fourier features #4029 by @popcornell
  • [New Features][ESPnet1][ESPnet2][ASR] Restricted Self Attention for E2E Speech Summarization #4071 by @roshansh-cmu
  • [New Features][ESPnet1][Installation][README] add lrs avsr recipe #4104 by @wentaoxandry
  • [New Features][ESPnet1][README] add lip reading sentences dataset code #4074 by @wentaoxandry
  • [New Features][ESPnet2][ASR] [ESPnet2] Intermediate/Self-conditioned CTC #4084 by @YosukeHiguchi
  • [New Features][ESPnet2][ASR] [WIP] [ESPnet2] Mask-CTC #4158 by @YosukeHiguchi
  • [New Features][ESPnet2][ASR][README] Add stochastic depth to conformer and share results on LibriSpeech 960h #4142 by @pyf98
  • [New Features][ESPnet2][MT] MT task for espnet2 with IWSLT14 recipe #4111 by @siddalmia
  • [New Features][ESPnet2][README][SE] Add DC-CRN complex masking and spectral mapping approach for speech enhancement #4127 by @Emrys365
  • [New Features][ESPnet2][README][SE] Add DCCRN separator #4097 by @Johnson-Lsx
  • [New Features][ESPnet2][README][SE] Add a new separator for speech enhancement/separation tasks #4062 by @LiChenda
  • [New Features][ESPnet2][README][SE] Add iFaSNet for enhancement/separation tasks. #4130 by @LiChenda
  • [New Features][ESPnet2][SE] Refactor DNN_Beamformer in espnet2 and add new beamformers #4082 by @Emrys365

Enhancement

  • [Enhancement][ESPnet2] Add an optional suffix to the averaged model file name #4067 by @pyf98
  • [Enhancement][ESPnet2] Update perturbdatadir_speed.sh #4091 by @AmirHussein96
  • [Enhancement][ESPnet2][ASR] Add tests for Intermediate/Self-conditioned CTC #4117 by @YosukeHiguchi
  • [Enhancement][ESPnet2][TTS] Add option to use norm. feats over denorm. #4250 by @G-Thor

Recipe

  • [Recipe][ESPnet1][RNNT] [ESPNET1] Add the results of conformer-transducer for Librispeech #4080 by @eesungkim
  • [Recipe][ESPnet2][ASR] Add ASR recipe for VCTK dataset based on TTS's dataprep. #4088 by @kashikashi
  • [Recipe][ESPnet2][ASR] Add new conformer config with hop length 160 for LibriSpeech 960h #4162 by @pyf98
  • [Recipe][ESPnet2][ASR] Add new zh_openslr38 ASR recipe #4181 by @cuichenx
  • [Recipe][ESPnet2][ASR] Add transformer results for LibriSpeech 100h #4089 by @pyf98
  • [Recipe][ESPnet2][ASR] Added Marathi OpenSLR 64 recipe #4179 by @SujaySKumar
  • [Recipe][ESPnet2][ASR] Added recipe for Microsoft Speech Corpus (Indian languages) #4194 by @chintu619
  • [Recipe][ESPnet2][ASR] Automatic lyric recognition Recipe #4129 by @ftshijt
  • [Recipe][ESPnet2][ASR] ESPNET - LRS3 Recepie #4101 by @gdebayan
  • [Recipe][ESPnet2][ASR] bengali asr model with no finetuning #4047 by @dzeinali
  • [Recipe][ESPnet2][MT] IWSLT'14 Results using ESPnet2-MT #4132 by @pyf98
  • [Recipe][ESPnet2][README] Mandarin ISO id should be CMN instead of ZHO #4125 by @xinjli
  • [Recipe][ESPnet2][README] Update README.md #4037 by @dzeinali
  • [Recipe][ESPnet2][README] Update README.md #4121 by @dzeinali
  • [Recipe][ESPnet2][README] Update README.md for How2 2000h ASR,SUM #4155 by @roshansh-cmu
  • [Recipe][ESPnet2][RNNT] Create decodernntconformer.yaml #4058 by @sw005320
  • [Recipe][ESPnet2][RNNT] Create trainrnntconformer.yaml #4057 by @sw005320
  • [Recipe][ESPnet2][SLU] Add IEMOCAP results and configs #4100 by @YushiUeda
  • [Recipe][ESPnet2][SLU] Add new config and support for computing WER in SLUE-VoxCeleb #4152 by @siddhu001
  • [Recipe][ESPnet2][SLU] Add sentiment data preparation for IEMOCAP #4065 by @YushiUeda
  • [Recipe][ESPnet2][SLU] ESPnet2 swbd_sentiment recipe #4134 by @YushiUeda
  • [Recipe][ESPnet2][ST] egs2/iwslt22_dialect #4013 by @brianyan918

Bugfix

  • [Bugfix][CI][ESPnet2] Fix CI test failures related to torch_complex 0.4.0 #4112 by @Emrys365
  • [Bugfix][CI][Installation] fix doc ci by pinning jinja version #4239 by @xinjli
  • [Bugfix][ESPnet2] Fix n-gram decoding #4168 by @sw005320
  • [Bugfix][ESPnet2] bug fixes and efficient train/dev split in data prep of Microsoft Indian Languages recipe #4196 by @chintu619
  • [Bugfix][ESPnet2] fix errors in configs of librispeech ssl frontends #4098 by @simpleoier
  • [Bugfix][ESPnet2][ASR][ST] [bug patch] egs2/iwslt22_dialect #4049 by @brianyan918
  • [Bugfix][ESPnet2][MT][ST] Fix joint tokenization in st.sh #4143 by @pyf98
  • [Bugfix][ESPnet2][MT][ST] scoring fixes MT and ST #4146 by @siddalmia
  • [Bugfix][ESPnet2][TTS] Fix speaker normalization #4229 by @LanceaKing
  • [Bugfix][Installation] set gtn version #4122 by @brianyan918
  • [Bugfix][ESPnet1][ESPnet2] minor fixes in ST in espnet2 #4056 by @siddalmia

Others

  • [CI] Simplify vocoder compatibility test #4061 by @kan-bayashi
  • [CI][Documentation] Fix notebook in the official doc. #4171 by @ShigekiKarita
  • [Docker] Docker Updates #4064 by @Fhrozen
  • [Documentation] Add a checklist for PRs on recipe #4053 by @ftshijt
  • [Documentation] README Update for E2E Speech Summarization #4071 #4150 by @roshansh-cmu
  • [Documentation] Update the example PyTorch version in Installation doc #4116 by @pyf98
  • [Documentation] [documentation] fix minor typo in installation.md #4164 by @JDongian
  • [Documentation][ESPnet1] fix typo #4044 by @ooyamatakehisa
  • [Documentation][ESPnet1][ESPnet2][ASR] Add Huggingface-cli usage #4027 by @karthik19967829

Acknowledgements

Special thanks to @AmirHussein96, @Emrys365, @Fhrozen, @G-Thor, @JDongian, @Johnson-Lsx, @LanceaKing, @LiChenda, @ShigekiKarita, @SujaySKumar, @YosukeHiguchi, @YushiUeda, @brianyan918, @chintu619, @cuichenx, @dzeinali, @eesungkim, @ftshijt, @gdebayan, @kan-bayashi, @karthik19967829, @kashikashi, @ooyamatakehisa, @popcornell, @pyf98, @roshansh-cmu, @siddalmia, @siddhu001, @simpleoier, @sw005320, @wentaoxandry, @xinjli.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 3 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.6

New Features

  • [New Features][ESPnet2][TTS][Installation][README] [TTS] Support python-based toolkit for xvector extractors #4016 by @Fhrozen
  • [New Features][ESPnet2] Add SpecAug2 which supports variable maximum width in time masking #3902 by @pyf98

Recipe

  • [Recipe][ESPnet1][ASR] Add librispeech-100h recipe #3997 by @YosukeHiguchi
  • [Recipe][ESPnet1][ASR] Update egs/librispeech_100 #4036 by @YosukeHiguchi
  • [Recipe][ESPnet2][ASR][README] Scoring Mandarin / English separately for the SEAME corpus #3976 by @vectominist
  • [Recipe][ESPnet2][ASR][README] update LibriSpeech Pretrained models with SSLRs: results and huggingf… #3979 by @simpleoier
  • [Recipe][ESPnet2][ASR][README][ST] Speech translation framework (merging into master) #3987 by @ftshijt
  • [Recipe][ESPnet2][ASR][TTS] Update two recipes (googlei18n and hub4_spanish) #3895 by @ftshijt
  • [Recipe][ESPnet2][SLU][README] updated the results of Slue voxceleb #3929 by @siddhu001
  • [Recipe][ESPnet2][ST] Update the default setting for st #3993 by @ftshijt

Bugfix

  • [Bugfix][ESPnet1][RNNT] Fix bug for Conformer-T #4020 by @YosukeHiguchi
  • [Bugfix][ESPnet2][Diarization] Diarization: fix for convolutional input layer in the encoder #3957 by @alumae
  • [Bugfix][ESPnet2][Diarization] Two fixes to diarization evaluation scripts #3938 by @alumae
  • [Bugfix][ESPnet2][Diarization][Recipe] Fix issues in EEND-EDA & add Librimix_diar recipe #3900 by @YushiUeda
  • [Bugfix][ESPnet2][ESPnet1][ASR][streaming] streaming conformer bugfix #4025 by @jeon30c
  • [Bugfix][ESPnet2][LM] Bugfix for espnet2 ngram #4002 by @yaochie
  • [Bugfix][ESPnet2][RNNT] espnet2 asr inference bugfix for transducer #3943 by @jeon30c
  • [Bugfix][ESPnet2][ST] Bugfix for ST scoring #3972 by @ftshijt

Enhancement

  • [Enhancement][ESPnet2] cleaned tensorboard and stats logging for espnet2 #3910 by @siddalmia
  • [Enhancement][ESPnet2][Diarization] Add test codes for diarization #3953 by @YushiUeda
  • [Enhancement][ESPnet2][streaming] Add reference for streaming ASR #4014 by @D-Keqi

Ohter

  • [CI] remove the support of pytorch 1.3.1 #4038 by @sw005320
  • [CI][ESPnet1][ESPnet2] fix ci for librosa update #4043 by @ftshijt
  • [CI][Installation] Fix numpy version #3965 by @kan-bayashi
  • [CI][Installation] temporary fixed pypinyin version #3995 by @kan-bayashi
  • [Documentation][ESPnet1][ESPnet2][README][SLU] Add Sinhala E2E SLU Recipe #3890 by @karthik19967829
  • [Documentation][README] Update README.md #4039 by @sw005320
  • [ESPnet2][README] Update README.md #3931 by @sw005320
  • [ESPnet2][README][TTS][Typo] Fix typo in README.md #4024 by @kan-bayashi

Acknowledgements

Special thanks to @D-Keqi, @Fhrozen, @YosukeHiguchi, @YushiUeda, @alumae, @ftshijt, @jeon30c, @kan-bayashi, @karthik19967829, @pyf98, @siddalmia, @siddhu001, @simpleoier, @sw005320, @vectominist, @yaochie.

Full Changelog

https://github.com/espnet/espnet/compare/v.0.10.5...v.0.10.6

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 4 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.5

New Features

  • [New Features][ESPnet1][ASR] Implement self-conditioned CTC #3856 by @komatta-san
  • [New Features][ESPnet2][ASR][CI][Installation] GTN CTC for ESPnet2 #3778 by @brianyan918
  • [New Features][ESPnet2][ASR][Refactoring] [ESPnet2] Transducer #2533 by @b-flo
  • [New Features][ESPnet2][README][Recipe] Frontends fusion (any type, any number, linear fusion only for now) for ASR in espnet2 #3824 by @DanBerrebbi
  • [New Features][ESPnet2][SE] Refactor loss computation in enhancement tasks. #3838 by @LiChenda

Recipe

  • [Recipe][ESPnet1][ESPnet2][ASR][README] updated the results of aidatatang_200zh #3925 by @sw005320
  • [Recipe][ESPnet1][VC] Various fixes of voice conversion recipes #3800 by @unilight
  • [Recipe][ESPnet2][ASR][README] Expanding egs2 of Tedlium2 #3795 by @D-Keqi
  • [Recipe][ESPnet2][ASR][README] Update an4 config #3913 by @pyf98
  • [Recipe][ESPnet2][ASR][README] aidatatang_200zh recipe #3892 by @sw005320
  • [Recipe][ESPnet2][README] Update README.md #3881 by @daisylab
  • [Recipe][ESPnet2][README] Update egs2/TEMPLATE/README.md #3793 by @kamo-naoyuki
  • [Recipe][ESPnet2][README] fix readme #3827 by @seastar105
  • [Recipe][ESPnet2][README][Recipe] Add ASR Recipe: Primewords_Chinese #3903 by @pyf98
  • [Recipe][ESPnet2][README][Recipe] Update MISP challenge ASR baseline and add AVSR baseline #3819 by @neillu23
  • [Recipe][ESPnet2][README][SLU] Fsc Maseeval scripts #3769 by @siddhu001
  • [Recipe][ESPnet2][README][SLU] Update Google Speechcommands (SLU recipe) #3915 by @pyf98
  • [Recipe][ESPnet2][README][TTS] ESPnet2 ARCTIC TTS #3791 by @peter-yh-wu
  • [Recipe][ESPnet2][README][TTS] Update README and add missing config #3917 by @kan-bayashi
  • [Recipe][ESPnet2][Recipe][SLU] Slue voxceleb Sentiment Analysis #3894 by @siddhu001
  • [Recipe][ESPnet2][SE] modified data type in enh.sh #3768 by @simpleoier

Bugfix

  • [Bugfix][ESPnet1][README][RNNT] Fix cache for Transducer search strategies + doc #3869 by @b-flo
  • [Bugfix][ESPnet1][RNNT] Fix recombine_hyps #3908 by @b-flo
  • [Bugfix][ESPnet1][RNNT] fix rnn-t ALSD beam search index bug #3794 by @maxwellzh
  • [Bugfix][ESPnet1][RNNT] fix the sort order in selectkexpansions() #3864 by @freewym
  • [Bugfix][ESPnet2] Bug fix for .gitignore and db fill up for CMU cluster #3891 by @siddalmia
  • [Bugfix][ESPnet2] Fix #3716 #3849 by @kan-bayashi
  • [Bugfix][ESPnet2] Merging asr_streaming.sh into asr.sh for laborotv egs2 #3868 by @D-Keqi
  • [Bugfix][ESPnet2] add init.py #3928 by @sw005320
  • [Bugfix][ESPnet2] fix small problem that used before defined in step 12 #3871 by @simpleoier
  • [Bugfix][ESPnet2] fix stft olens when winlengths is not equal to nfft #3812 by @IceCreamWW
  • [Bugfix][ESPnet2] update s3prl frontend w.r.t. recent modification in s3prl interface #3839 by @simpleoier
  • [Bugfix][ESPnet2][TTS] bugfix lang2lid in tts.sh #3906 by @imdanboy
  • [Bugfix][Installation] Fix #3783 #3786 by @kamo-naoyuki

Others

  • [CI] Fix G2P test failure in CI due to the dict update #3848 by @kan-bayashi
  • [CI][Documentation][ESPnet1][ESPnet2] Fixing issues about streaming Transformer/Conformer training #3880 by @D-Keqi
  • [CI][ESPnet1][ESPnet2][Installation][New Features][README] nbest rescoring with k2 #3567 by @glynpu
  • [Documentation][README] Update README.md #3893 by @sw005320
  • [Documentation][README][SSL] Add more docs about s3prl frontend #3796 by @simpleoier
  • [Documentation][README][streaming] Updating main README.md about streaming transformer #3855 by @D-Keqi
  • [ESPnet1][RNNT] Add exception for conformer decoder #3801 by @b-flo
  • [ESPnet2][README][Typo] Fix typo in README.md #3852 by @kan-bayashi
  • [ESPnet2][SE] add eps in beam-forming reference channel selection #3904 by @LiChenda
  • [ESPnet2][SLU] Add unit test for score_intent.py #3759 by @siddhu001
  • [ESPnet2][ST] Speech Translation Update #3860 by @ftshijt
  • [ESPnet2][TTS][Installation][Refactoring] Refactor Phonemizer-based G2P #3916 by @kan-bayashi

Acknowledgements

Special thanks to @D-Keqi, @DanBerrebbi, @IceCreamWW, @LiChenda, @b-flo, @brianyan918, @daisylab, @freewym, @ftshijt, @glynpu, @imdanboy, @kamo-naoyuki, @kan-bayashi, @komatta-san, @maxwellzh, @neillu23, @peter-yh-wu, @pyf98, @seastar105, @siddalmia, @siddhu001, @simpleoier, @sw005320, @unilight.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 4 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.4

New Features

  • [New Features][ESPnet1][ESPnet2][ASR][README] The code for Emiru's real streaming Transformer #3614 by @D-Keqi
  • [New Features][ESPnet1][MT][ST][Installation] Support sacreBLEU #3698 by @hirofumi0810
  • [New Features][ESPnet2][ST] ESPNet2 speech translation #3587 by @ftshijt

Enhancement

  • [Enhancement][ESPnet1][ASR] Fix e2easrmaskctc.py to make RTF computable #3634 by @eddiewng
  • [Enhancement][ESPnet2][Installation][README] HuggingFace Upload support for ESPnet2 tasks [cont.] #3677 by @Fhrozen
  • [Enhancement][ESPnet2][TTS][Installation] Add koreanjaso tokenizer and koreancleaner #3588 by @windtoker

Bugfix

  • [Bugfix][ESPnet1][ASR][RNNT] Fix quantization for Transducer #3616 by @b-flo
  • [Bugfix][ESPnet2][ASR][Recipe] added download test set, small modifications for path of aishell #3663 by @teinhonglo
  • [Bugfix][ESPnet2] Do stft with librosa when neither MKL nor CUDA is available. #3668 by @CTinRay
  • [Bugfix][ESPnet2] [bug fixed] allow adding noise independently of rir, bug fixed in #3692 by @ranchlai
  • [Bugfix][ESPnet2][Recipe] Create Symlinks for 1-channel/2-channel tracks in chime4 #3699 by @neillu23
  • [Bugfix][ESPnet2][Recipe] Fix SWBD Data Prep Bug #3742 by @brianyan918

Recipe

  • [Recipe][ESPnet1][ASR][MT][ST] Add CoVoST2 recipe #3720 by @hirofumi0810
  • [Recipe][ESPnet2][ASR][README] MISP2021 E2E ASR Baseline #3738 by @neillu23
  • [Recipe][ESPnet2][ASR][README] Wenetspeech #3686 by @pengchengguo
  • [Recipe][ESPnet2][SLU] Add snips hubert feature training #3619 by @yuekaizhang
  • [Recipe][ESPnet2][SLU] Make scoring part more general #3715 by @siddhu001
  • [Recipe][ESPnet2][SLU][README] Add ESPnet-SLU Recipe: Google Speech Commands #3693 by @pyf98
  • [Recipe][ESPnet2][SLU][README] Add an ESPnet2 recipe for the Grabo SLU dataset #3669 by @pyf98
  • [Recipe][ESPnet2][SLU][README] CATSLU-MAPS: Added recipe #3685 by @SujaySKumar
  • [Recipe][ESPnet2][SLU][README] ESPnet2 Japanese dialogue act classification recipe #3667 by @YushiUeda
  • [Recipe][ESPnet2][SLU][README] Slurp SLU with bpe encoded transcripts #3674 by @siddhu001
  • [Recipe][ESPnet2][SLU][README] Slurp entity classification #3739 by @siddhu001
  • [Recipe][ESPnet2][SSL] Add eps in acc computation of HuBERT model #3713 by @simpleoier
  • [Recipe][ESPnet2][TTS] Change the timing of srctexts creation #3734 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] update kss recipe with VITS configuration #3660 by @windtoker

Others

  • [CI][ESPnet2][Installation] Fix tests in CI #3700 by @kan-bayashi
  • [CI][ESPnet2][SLU][README] Add Hubert pretrained ASR in FSC SLU #3653 by @siddhu001
  • [CI][Installation] Minor update for CI #3656 by @kan-bayashi
  • [Documentation][ESPnet1][README][RNNT][Refactoring] Refactor custom Transducer build #3697 by @b-flo
  • [Documentation][ESPnet2][README] Hugging Face support - Doc [cont.] #3709 by @Fhrozen
  • [Installation] Update pyopenjtalk version #3733 by @kan-bayashi
  • [README] Huggingface spaces ESPnet2-TTS web demo #3673 by @AK391
  • [README][ESPnet2] Add Huggingface model documentation #3714 by @siddhu001
  • [README][ESPnet2] Fix readme #3750 by @takenori-y

Acknowledgements

Special thanks to @AK391, @CTinRay, @D-Keqi, @Fhrozen, @SujaySKumar, @YushiUeda, @b-flo, @brianyan918, @eddiewng, @ftshijt, @hirofumi0810, @kan-bayashi, @neillu23, @pengchengguo, @pyf98, @ranchlai, @siddhu001, @simpleoier, @takenori-y, @teinhonglo, @windtoker, @yuekaizhang.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 4 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.3

New Features

  • [New Features][ESPnet1][RNNT][Installation][README] FastEmit support #3591 by @b-flo
  • [New Features][ESPnet2][ASR] Add ASR portable evaluation script #3569 by @kan-bayashi
  • [New Features][ESPnet2][README] EEND-EDA model for diarization task #3621 by @YushiUeda

Bugfix

  • [Bugfix][ESPnet1] Fix /usr/bin/env bash -e #3651 by @kamo-naoyuki
  • [Bugfix][ESPnet1] ctc loss using dropout layer since .eval() will not work for F.dropout #3539 by @zh794390558
  • [Bugfix][ESPnet2] Minor fix of evaluate_asr.sh #3596 by @kan-bayashi
  • [Bugfix][ESPnet2][ASR] wav2vec2_encoder bug fix #3545 by @simpleoier
  • [Bugfix][ESPnet2][README][SSL] Fix some issues of #3512 and add README.md to librispeech/ssl1 recipe. #3572 by @Jzmo
  • [Bugfix][ESPnet2][TTS] Bug fix the attribute registration in VITS generator #3573 by @kan-bayashi
  • [Bugfix][ESPnet2][TTS] Fix pyopenjtalkg2paccent(withpause) #3555 by @zzxiang

Recipe

  • [Recipe][ESPnet1][ASR][RNNT] Update Transducer recipes #3465 by @b-flo
  • [Recipe][ESPnet1][ST] Clean libri-trans #3540 by @hirofumi0810
  • [Recipe][ESPnet2][ASR][README] Dan aishell4 branch #3585 by @DanBerrebbi
  • [Recipe][ESPnet2][ASR][README] update pretrained models of librispeech using hubert/wav2vec2 #3568 by @simpleoier
  • [Recipe][ESPnet2][SLU][README] Add slu snips data receipe #3407 by @yuekaizhang
  • [Recipe][ESPnet2][TTS] Update GAN-TTS based configurations #3570 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Add initial VITS results for JSUT #3550 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Add つくよみちゃんコーパス recipe #3552 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] IndicSpeech TTS Scripts #3435 by @peter-yh-wu
  • [Recipe][ESPnet2][TTS][README] Update ESPnet2-TTS results #3578 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update JSUT and JVS results #3553 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update LJSpeech and CSMSC results #3560 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update TTS results #3615 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update TTS results #3648 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update VCTK results #3581 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update pret-trained model for TTS recipes #3590 by @ftshijt
  • [Recipe][ESPnet2][TTS][README] update kss recipe with new result. #3589 by @windtoker
  • [Recipe][ESPnet2][TTS][Typo] Fix typo egs2/jtubespeech/tts1 #3564 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][Typo] Update JVS README #3554 by @kan-bayashi

Enhancement

  • [Enhancement][ESPnet2][SE][Refactoring] Add PyTorch Builtin Complex Support in the Speech Enhancement Task #3355 by @Emrys365
  • [Enhancement][ESPnet2][TTS] Hindi g2p #3579 by @peter-yh-wu
  • [Enhancement][ESPnet2][TTS] Unify spks / lids / spkembeddim type #3551 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Update evaluate_mcd.py script #3566 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS][Installation] Add the installer of tdmelodic pyopenjtalk #3561 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS][Installation][README] Update TTS objective eval scripts #3650 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS][README] Add a new Japanese G2P for TTS #3558 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS][README] Add a new english G2P #3597 by @kan-bayashi

Others

  • [CI] Add codecov config and flags. #3603 by @ShigekiKarita
  • [CI] Omit tools/ from code coverage. #3600 by @ShigekiKarita
  • [CI] Split test_integration.sh #3599 by @ShigekiKarita
  • [CI][ESPnet2][Installation][Refactoring] Make the installation of transformers optional #3622 by @kan-bayashi
  • [CI][Installation] Add no-check-certificate option in PESQ installation #3649 by @kan-bayashi
  • [CI][Installation][README][mergify] Change setup.py for pytorch1.9.1 #3636 by @kamo-naoyuki
  • [Documentation][ESPnet1][RNNT] Fix/improve doc(string)s related to Transducer model #3623 by @b-flo
  • [Documentation][ESPnet2][TTS][README] Update README of ESPnet2-TTS #3546 by @kan-bayashi
  • [Documentation][ESPnet2][TTS][README] Update TTS README #3565 by @kan-bayashi
  • [Documentation][ESPnet2][TTS][README] Update TTS fine-tuning README #3549 by @kan-bayashi
  • [Typo][ESPnet2] Minor bug in formatwavscp.py #3575 by @ftshijt
  • [Typo][ESPnet2][TTS] update mismatch help info for tts #3602 by @ftshijt

Acknowledgements

Special thanks to @DanBerrebbi, @Emrys365, @Jzmo, @ShigekiKarita, @YushiUeda, @b-flo, @ftshijt, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @peter-yh-wu, @simpleoier, @windtoker, @yuekaizhang, @zh794390558, @zzxiang.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 4 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.2

News

  • Hubert training is now available!
    • Try with egs2/librispeech/ssl1
  • GAN-based TTS model is now available!
    • Joint text2mel and vocoder training
    • End-to-end text-to-wave model (VITS) training
    • Try with egs2/ljspeech/tts1
  • Support from_pretrained function! ```python # e.g. from espnet2.bin.asrinference import Speech2Text asr = Speech2Text.frompretrained("model_tag")

from espnet2.bin.ttsinference import Text2Speech tts = Text2Speech.frompretrained("model_tag")

from espnet2.bin.enhinference import SeparateSpeech enh = SeparateSpeech.frompretrained("model_tag")

from espnet2.bin.diarinference import DiarizeSpeech diar = DiarizeSpeech.frompretrained("model_tag") ``` Please check the available pretrained models in espnetmodelzoo!

New Features

  • [New Features][ESPnet1] Intermediate CTC + Stochastic depth #3274 by @jaesong
  • [New Features][ESPnet2] Add new trainer for GAN-based training #3436 by @kan-bayashi
  • [New Features][ESPnet2][ASR] Add Hubert model in Espnet2/Refactor from #3458 #3512 by @Jzmo
  • [New Features][ESPnet2][ASR] batch decode with k2 ctc #3433 by @glynpu
  • [New Features][ESPnet2][ASR][SE] Support from_pretrained for ASR and ENH #3535 by @kan-bayashi
  • [New Features][ESPnet2][DIAR] Support from_pretrained for DIAR #3537 by @YushiUeda
  • [New Features][ESPnet2][SE] Adding portable speech enhancement scripts for other tasks #3487 by @Emrys365
  • [New Features][ESPnet2][TTS] Add GAN-TTS task with VITS #3449 by @kan-bayashi
  • [New Features][ESPnet2][TTS] Support SID and LID inputs for TTS models #3490 by @kan-bayashi
  • [New Features][ESPnet2][TTS] Support from_pretrained function in Text2Speech #3532 by @kan-bayashi
  • [New Features][ESPnet2][TTS] Support parallel_wavegan vocoders in tts_inference.py #3513 by @kan-bayashi
  • [New Features][ESPnet2][TTS] Support joint training of text2mel and vocoder #3501 by @kan-bayashi
  • [New Features][ESPnet2][TTS] Support language ID input for espnet2 TTS #3489 by @kan-bayashi
  • [New Features][ESPnet2][TTS] Support speaker id input for TTS models #3452 by @kan-bayashi

Enhancement

  • [Enhancement][ESPnet2][CTC segmentation][README] Fix CTC Segmentation #3500 by @shirayu
  • [Enhancement][ESPnet2][TTS] Add VITS-related modules #3448 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Add cython code for VITS #3483 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Add joint training config example #3508 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Add melgan module for joint training #3516 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Add parallel wavegan module for joint training #3515 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Add style melgan module for joint training #3517 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Add vocoder modules related to VITS #3439 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Change Text2Speech class output format #3437 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Follow up of the support speaker id input #3453 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Support cleaner option in phn converter util #3450 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Support language id in VITS #3499 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Support linear spectrogram #3438 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Support new g2p functions for various languages #3463 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Update the TTS inference #3498 by @kan-bayashi
  • [Enhancement][ESPnet2][SLU][README] Add support for intent classification on SLURP dataset #3482 by @siddhu001
  • [Enhancement][ESPnet2][SLU][README] Add NLU post-encoder using Hugging Face Transformers #3410 by @akreal

Recipe

  • [Recipe][ESPnet1][ASR] Mucs21 subtask1 #3376 by @sanket0211
  • [Recipe][ESPnet2][ASR][README] Add Swahili ASR recipe #3485 by @akreal
  • [Recipe][ESPnet2][ASR][README] Rename swahili recipe to iwslt21_low_resource #3522 by @akreal
  • [Recipe][ESPnet2][DIAR][README] Modify ESPnet2 diarization recipe #3524 by @YushiUeda
  • [Recipe][ESPnet2][ESPnet1][ASR] Espnet2 mucs_subtask2 #3415 by @bloodraven66
  • [Recipe][ESPnet2][ESPnet1][ASR] mucs subtask1 #3417 by @bloodraven66
  • [Recipe][ESPnet2][SE] Add Voicebank (vctk_noisy) script #3486 by @neillu23
  • [Recipe][ESPnet2][TTS] Add missing configs for LibriTTS recipe #3455 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Update VITS config comments and settings #3528 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] aishell3 dataset preparation #3505 by @actboy
  • [Recipe][ESPnet2][TTS][README] Add CSS10 recipe for ESPnet2-TTS #3464 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Add JtubeSpeech Recipe #3459 by @Takaaki-Saeki
  • [Recipe][ESPnet2][TTS][README] Add SIWIS recipe #3460 by @takenori-y
  • [Recipe][ESPnet2][TTS][README] TTS recipe for J-KAC corpus #3468 by @TanUkkii007
  • [Recipe][ESPnet2][TTS][README] TTS recipes for thchs30 and aishell3 #3470 by @ftshijt
  • [Recipe][ESPnet2][TTS][README] Update JMD README #3531 by @takenori-y
  • [Recipe][ESPnet2][TTS][README] Update SIWIS README #3509 by @takenori-y
  • [Recipe][ESPnet2][SLU][README] Predict ASR transcript along with Intent for SLU #3480 by @siddhu001
  • [Recipe][ESPnet2][SLU][README] Update SWBD DA configuration #3425 by @akreal

Bugfix

  • [Bugfix][ESPnet2] Add return_complex=False for stft #3476 by @D-X-Y
  • [Bugfix][ESPnet2] Dynamic import for the ngram function #3420 by @ftshijt
  • [Bugfix][ESPnet2][README][Recipe] Add the GigaSpeech normalization and fix the WER #3519 by @chaisz19
  • [Bugfix][ESPnet2][TTS] Add duration and focus_rate in output dict #3469 by @kan-bayashi
  • [Bugfix][ESPnet2][TTS] Add missing symlink to trim_silence.py for ESPnet2 #3467 by @kan-bayashi
  • [Bugfix][ESPnet2][TTS] Fix wrong arguments in pretrained vococder wrapper #3525 by @kan-bayashi
  • [Bugfix][ESPnet2][TTS] Revert wrongly removed lines in tts.sh #3503 by @kan-bayashi
  • [Bugfix][ESPnet2][TTS][Typo] Fix typo in hifigan #3504 by @kan-bayashi

Refactoring

  • [Refactoring][ESPnet1][ASR][RNNT][README] Transducer v5 #3217 by @b-flo
  • [Refactoring][ESPnet2][SE][DIAR] Remove prefix enh_ and diar_ #3538 by @kan-bayashi
  • [Refactoring][ESPnet2][TTS] Refactor TTS modules in ESPnet2 #3497 by @kan-bayashi
  • [Refactoring][ESPnet2][TTS] Remove the support of feats_type=fbank/stft in ESPnet2-TTS #3514 by @kan-bayashi

Others

  • [CI] Fix k2 version in CI using conda #3493 by @kan-bayashi
  • [CI] Fix test condition #3527 by @kan-bayashi
  • [CI][Installation] Update Sentencepiece and add python 3.9 to CI #3422 by @shirayu
  • [Docker] Docker Updates #3393 by @Fhrozen
  • [Documentation] Update the tutorial about maxlenratio usage #3523 by @akreal
  • [Documentation][ESPnet2][TTS] Update README.md #3502 by @kan-bayashi
  • [Installation][README] Added a link and a classifier for Python 3.9 #3440 by @shirayu
  • [Typo] Fix typos in "egs" #3447 by @shirayu
  • [Typo][Documentation] Fix typos in "doc" #3441 by @shirayu
  • [Typo][Documentation] Fix typos in "utils" #3442 by @shirayu
  • [Typo][ESPnet1][MT] Fix typos in "espnet" #3444 by @shirayu
  • [Typo][ESPnet2] Fix typos in "espnet2" #3443 by @shirayu
  • [Typo][ESPnet2][README] Fix typos in "egs2" #3445 by @shirayu

Acknowledgements

Special thanks to @D-X-Y, @Emrys365, @Fhrozen, @Jzmo, @Takaaki-Saeki, @TanUkkii007, @YushiUeda, @actboy, @akreal, @b-flo, @bloodraven66, @chaisz19, @ftshijt, @glynpu, @jaesong, @kan-bayashi, @neillu23, @sanket0211, @shirayu, @siddhu001, @takenori-y.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 4 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.1

New Features

  • [New Features][ESPnet2] Porting existing pre-trained models to hugging face #3321 by @siddhu001
  • [New Features][ESPnet2][ASR][CI][Installation] k2andespnet2 #3358 by @glynpu
  • [New Features][ESPnet2][ASR][LM][CI] espnet2 ngram #3345 by @qmpzzpmq
  • [New Features][ESPnet2][Installation] add s3prl frontend #3187 by @simpleoier

Recipe

  • [Recipe][ESPnet1][ASR] Fix the iconv error in hkust data prep #3397 by @sw005320
  • [Recipe][ESPnet1][ASR] mucs subtask2 baseline recipes (e2e and kaldi) #3362 by @bloodraven66
  • [Recipe][ESPnet1][ESPnet2][ASR] JTubeSpeech recipe and hkust espnet1 #3406 by @sw005320
  • [Recipe][ESPnet1][TTS] CMU INDIC TTS #3347 by @peter-yh-wu
  • [Recipe][ESPnet2][ASR] ESPnet2 Recipe for Ksponspeech #3387 by @YushiUeda
  • [Recipe][ESPnet2][ASR] Fix gigaspeech pre-trained model link #3317 by @sw005320
  • [Recipe][ESPnet2][ASR] LRS2 lipreading recipe #3346 by @LiChenda
  • [Recipe][ESPnet2][ASR] OpenSLR Sundanese ASR #3344 by @peter-yh-wu
  • [Recipe][ESPnet2][ASR] Recipe of JTubeSpeech #3311 by @sw005320
  • [Recipe][ESPnet2][ASR] fix path error in local/score.sh in swbd #3349 by @wonkyuml
  • [Recipe][ESPnet2][ASR] updated javanese and sundanese readmes #3369 by @peter-yh-wu
  • [Recipe][ESPnet2][ASR][Installation] OpenSLR Javanese ASR #2960 by @peter-yh-wu
  • [Recipe][ESPnet2][SLU] Add initial Switchboard Dialogue Act classification recipe #3395 by @akreal
  • [Recipe][ESPnet2][SLU] FSC Espnet2 data preparation #3352 by @siddhu001
  • [Recipe][ESPnet2][TTS] Add HUI-audio-corpus-german recipe for ESPnet2-TTS #3375 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Add JMD recipe #3394 by @takenori-y
  • [Recipe][ESPnet2][TTS] Add RUSLAN recipe for ESPnet2-TTS #3378 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Support KSS dataset recipe for ESPnet2-TTS #3383 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Update HUI audio corpus german recipe #3381 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Update HUI-audio-corpus-german recipe results of ESPnet2-TTS #3391 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Update KSS dataset recipe results of ESPnet2-TTS #3400 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Update RUSLAN recipe results of ESPnet2-TTS #3390 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] indic tts without pretrained model #3401 by @peter-yh-wu

Enhancement

  • [Enhancement][ESPnet2] Update wav2vec2_encoder.py #3312 by @brotheroak
  • [Enhancement][ESPnet2][TTS] Add trim_silence for ESPnet2-TTS #3380 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Allow override default 'speedcontrolalpha' parameter #3316 by @airenas
  • [Enhancement][ESPnet2][TTS] Support French G2P #3372 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Support German G2P #3371 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Support Korean G2P #3382 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Support Russian G2P #3377 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Support Spanish G2P #3373 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Update README about G2P #3374 by @kan-bayashi

Bugfix

  • [Bugfix][ESPnet1][ESPnet2] Fix a type error of swbd data preparation. #3324 by @pengchengguo
  • [Bugfix][ESPnet1][ESPnet2][TTS] Fixed label modification in Taco2 or Transformer-TTS with R > 1 #3392 by @kan-bayashi
  • [Bugfix][ESPnet2] fix a bug in OneCycleLR and CyclicLR #3319 by @sw005320

Others

  • [Typo][ESPnet1] Update batchbeamsearchonlinesim.py #3367 by @aky15
  • [Typo][ESPnet2] Fixed typo in model name #3364 by @kan-bayashi
  • [Typo][ESPnet2] Update contextualblocktransformer_encoder.py #3354 by @aky15

Acknowledgements

Special thanks to @LiChenda, @YushiUeda, @airenas, @akreal, @aky15, @bloodraven66, @brotheroak, @glynpu, @kan-bayashi, @pengchengguo, @peter-yh-wu, @qmpzzpmq, @siddhu001, @simpleoier, @sw005320, @takenori-y, @wonkyuml.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 4 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.0

From v.0.10.x, we drop the support pytorch < 1.3.
See more info in https://github.com/espnet/espnet/issues/3300

New Features and Enhancement

  • [New Features][ESPnet1][ASR][CI] Dynamic quantization for decoding #3210 by @xu-gaopeng
  • [New Features][ESPnet1] Add quantize args #3280 by @xu-gaopeng
  • [Enhancement][ESPnet2][README] Update W&B integration #3278 by @AyushExel
  • [Enhancement][ESPnet2][README] Change the default value of use_wandb to False #3287 by @kamo-naoyuki

Bugfix

  • [Bugfix][ESPnet1] Fix some bugs in xml2stm.py #3252 by @AshrafMahdhi
  • [Bugfix][ESPnet1][Recipe] fix the required number of arguments #3249 by @AshrafMahdhi
  • [Bugfix][ESPnet2] Bug fix of accum_grad when grad-nan #3283 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix #3255 #3257 by @tjysdsg
  • [Bugfix][ESPnet2] Fix bug when "--field -5" is passed to espnet2.bin.tokenize_text #3262 by @tjysdsg
  • [Bugfix][ESPnet2] Fix typo in asr.sh (espnet2) that might cause bug #3264 by @tjysdsg
  • [Bugfix][ESPnet2] Warn ignorenangrad with warpctc instead of error. #3298 by @ShigekiKarita
  • [Bugfix][ESPnet2][TTS] Fix a bug in the TTS transformer initialization #3251 by @sw005320

Recipe

  • [Recipe][ESPnet1][ST] Minor fix of Fisher-Callhome recipe #3305 by @hirofumi0810
  • [Recipe][ESPnet2][ASR] ESPnet2 Receipe for swbd #3269 by @yuekaizhang
  • [Recipe][ESPnet2][ASR][README] SWBD Result Update #3308 by @roshansh-cmu
  • [Recipe][ESPnet2][SE] Add scripts for DNS Interspeech 2020 in ESPNet-se #3259 by @neillu23
  • [Recipe][ESPnet2][SE][README] Pretrained model for vctk noisy reverberant recipe #3273 by @LiChenda
  • [Recipe][ESPnet2][SE][README] dnsins20: Add README.md and realrecording testing data. #3281 by @neillu23

Refactoring

  • [Refactoring][ESPnet2][ASR] Update ctc.py #3292 by @200987299
  • [Refactoring][ESPnet1][ASR][MT][CI][README] Delete old pytorch dispatch in espnet1 #3301 by @ShigekiKarita
  • [Refactoring][CI][Documentation][Installation][README] Remove travis and add .github/workflows/doc.yml to deploy doc #3294 by @ShigekiKarita
  • [Refactoring][CI][Installation][README] Add pytorch 1.9.0 support and remove 1.0.1, 1.1.0, and 1.2.0 #3299 by @ShigekiKarita

Others

  • [Documentation][ESPnet2] Add a comment for disabling the attention plot #3258 by @sw005320
  • [ESPnet2][Installation][mergify] Follow up for #3299, about pytorch1.9.0 in ci #3310 by @kamo-naoyuki

Acknowledgements

Special thanks to @200987299, @AshrafMahdhi, @AyushExel, @LiChenda, @ShigekiKarita, @hirofumi0810, @kamo-naoyuki, @neillu23, @roshansh-cmu, @sw005320, @tjysdsg, @xu-gaopeng, @yuekaizhang.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 4 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.10

New Features

  • [New Features][ESPnet1][ESPnet2][Installation][README] CTC Segmentation for ESPnet 2 #3087 by @lumaku

Bugfix

  • [Bugfix][ESPnet1] Fix mergeshortsegments.py #3171 by @hirofumi0810
  • [Bugfix][ESPnet1] update layer norm to reflect the dimension variable #3193 by @sw005320
  • [Bugfix][ESPnet1][ASR] Fix a bug about variable spelling errors #3208 by @lzm0706
  • [Bugfix][ESPnet1][ST] Fix ST-TED data preparation #3167 by @hirofumi0810
  • [Bugfix][ESPnet2] Fix a bug of adding noise to the training data. #3220 by @pengchengguo
  • [Bugfix][ESPnet2] fix a bug in the CTC mode #3190 by @sw005320
  • [Bugfix][ESPnet2] fix typo for AdapterForSoundScpReader #3096 by @deciding
  • [Bugfix][ESPnet2] remove findunusedparameters from DataParallel #3149 by @kamo-naoyuki
  • [Bugfix][ESPnet2][ASR] Changed to include nlsyms.txt in the pretrained model #3236 by @kamo-naoyuki
  • [Bugfix][ESPnet2][ASR] Fix missing nlsyms.txt for pretrained models #3234 by @lumaku
  • [Bugfix][ESPnet2][ASR] Workaround for missing nlsyms.txt #3235 by @kamo-naoyuki
  • [Bugfix][ESPnet1][ASR][Installation] GTN CTC bug fix, unit test, and installer #3199 by @brianyan918
  • [Bugfix][ESPnet2][README] Update README.md, edit wrong file link. #3164 by @xxjjvxb

Enhancement

  • [Enhancement] Added "transtype" to utils/removelongshortdata.sh and utils/update_json.sh #3148 by @teinhonglo
  • [Enhancement][ESPnet2][SE][README] Update the readme file for the SE demo page. #3225 by @LiChenda
  • [Enhancement][ESPnet2][ASR][README] update asr demo #3192 by @ftshijt

Recipe

  • [Recipe][ESPnet1][ASR] Fix segmentation in IWSLT21 ASR #3169 by @hirofumi0810
  • [Recipe][ESPnet1][ASR] Fix tokenization on TEDLIUM2 in IWSLT21 ASR recipe #3142 by @hirofumi0810
  • [Recipe][ESPnet1][ASR] fix addtodatadir.py in mgb2 recipe #3238 by @AshrafMahdhi
  • [Recipe][ESPnet1][ASR] fix receipe bug for swbd #3174 by @yuekaizhang
  • [Recipe][ESPnet1][ASR][RNNT] Transducer configs & results for AISHELL-1 #3240 by @yusshino
  • [Recipe][ESPnet1][ASR][ST] Fix IWSLT21 recipe for test set evaluation #3155 by @hirofumi0810
  • [Recipe][ESPnet1][ESPnet2][README] endangered language recognition espnet2 recipe #3214 by @ftshijt
  • [Recipe][ESPnet1][MT] Add IWSLT21 MT recipe #3140 by @hirofumi0810
  • [Recipe][ESPnet1][ST] Add IWSLT21 ST recipe #3150 by @hirofumi0810
  • [Recipe][ESPnet1][ST] Fix IWSLT evaluation data preparation #3168 by @hirofumi0810
  • [Recipe][ESPnet1][ST] IWSLT21 punctuation restoration recipe #3145 by @hirofumi0810
  • [Recipe][ESPnet1][ST] Merge short segments in IWSLT test sets #3162 by @hirofumi0810
  • [Recipe][ESPnet1][TTS] Fix misspelling in ./egs/jsut/tts1/local/download.sh #3227 by @muramasa2
  • [Recipe][ESPnet2][ASR] Normalization for Open_li52 #3215 by @ftshijt
  • [Recipe][ESPnet2][SE] ESPnet-SE Recipe for noisy reverberant dataset #3243 by @LiChenda
  • [Recipe][ESPnet2][SE][README] Update recipes for speech enhancement task #3153 by @LiChenda

Acknowledgements

Special thanks to @AshrafMahdhi, @LiChenda, @brianyan918, @deciding, @ftshijt, @hirofumi0810, @kamo-naoyuki, @lumaku, @lzm0706, @muramasa2, @pengchengguo, @sw005320, @teinhonglo, @xxjjvxb, @yuekaizhang, @yusshino.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 4 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.9

New Features

  • [New Features][ESPnet2] Speaker diarization implementation in ESPnet #2939 by @ftshijt
  • [New Features][ESPnet2] Adding gpumaxcachedmemGB in reporter's stats #3057 by @kamo-naoyuki
  • [New Features][ESPnet2] add --detect_anomaly option #3035 by @kamo-naoyuki
  • [New Features][ESPnet2][SE] Further update to speech enhancement task #2929 by @shincling

Bugfix

  • [Bugfix][ESPnet1] Fix a typo in the aishell config #3089 by @sw005320
  • [Bugfix][ESPnet1] Fix utils/speed_perturb.sh #3062 by @hirofumi0810
  • [Bugfix][ESPnet1] fix #3017 #3022 by @kamo-naoyuki
  • [Bugfix][ESPnet1][RNNT] Fix+update RNN encoder #3048 by @b-flo
  • [Bugfix][ESPnet1][RNNT] Minor fix for NSC #3030 by @b-flo
  • [Bugfix][ESPnet2] Fix #3072 #3073 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix ESPnet2-TTS conformer backward compatibility #3108 by @kan-bayashi
  • [Bugfix][ESPnet2] Fix a bug when use_amp=True without fairscale #3029 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix logging for pytorch>=1.8 #3056 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fixed backward compatibility issue of new conformer definition #3068 by @hfujihara
  • [Bugfix][Installation] Fix a bug of uninstalling typing #3058 by @kamo-naoyuki
  • [Bugfix][Installation] Fix setup.py to install filelock #3074 by @kamo-naoyuki
  • [Bugfix][Installation] fix the condition to install fairscale #3050 by @kamo-naoyuki
  • [Bugfix][Recipe][ESPnet1] Typo fixed for nahuatl recipe #3044 by @ftshijt
  • [Bugfix][Recipe][ESPnet1][ASR] Bugfix for downloadanduntar for nahuatl #3049 by @ftshijt
  • [Bugfix][Recipe][ESPnet1][ESPnet2][TTS] Fix CSMSC download script #3109 by @kan-bayashi
  • [Bugfix][Recipe][ESPnet2][TTS][README] fixed typo #3121 #3123 by @kan-bayashi

Enhancement

  • [Enhancement][ASR][ESPnet1][RNNT] Update loss report #3110 by @b-flo
  • [Enhancement][ESPnet1][RNNT] Fix related to custom encoder and aux task #3045 by @b-flo
  • [Enhancement][ESPnet2][Documentation][Installation][README] modification of freezing option for Wav2Vec encoder, add documents #3036 by @simpleoier

Recipe

  • [Recipe][ESPnet1][ASR] added results and uploaded models #3063 by @sw005320
  • [Recipe][ESPnet1][ASR][ST] fix download for puebla-nahuatl #3039 by @ftshijt
  • [Recipe][ESPnet1][MT] Update IWSLT18 MT recipe #3071 by @hirofumi0810
  • [Recipe][ESPnet1][ST] IWSLT21-low-resource recipe #3023 by @ftshijt
  • [Recipe][ESPnet1][ST] Nahuatl Speech Translation #3034 by @ftshijt
  • [Recipe][ESPnet2][ASR][README] Added spgispeech recipe in espnet2 #2986 by @sw005320
  • [Recipe][ESPnet2][ASR][README] Update librispeech result #3082 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][README] Updated ami ihm result #3091 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][README] added a bpe10000 model and result #3060 by @sw005320
  • [Recipe][ESPnet2][ASR][README] gigaspeech #3077 by @sw005320

Refactoring

  • [Refactoring][ESPnet1] Refactor layer selection in Transformer #3024 by @hirofumi0810
  • [Refactoring][ESPnet1][MT][ST] Unify divide_lang.sh #3066 by @hirofumi0810
  • [Refactoring][ESPnet2] Make batch bins sampler faster #3106 by @kamo-naoyuki
  • [Refactoring][Installation] Use new pyopenjtalk version #3107 by @kan-bayashi
  • [Refactoring][ESPnet1][ESPnet2][Installation][Docker][Documentation] Change '#!/bin/bash' to '#!/usr/bin/env bash' #3059 by @kamo-naoyuki

Other

  • [CI][Installation][README][mergify] Using torch=1.8.1 in ci tests #3122 by @kamo-naoyuki
  • [CI][Installation][README][mergify] Adding pytorch=1.8.0 to the ci #3046 by @kamo-naoyuki

Acknowledgements

Special thanks to @b-flo, @ftshijt, @hfujihara, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @shincling, @simpleoier, @sw005320.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 4 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.8

New Features

  • [New Features][ESPnet1][ASR][RNNT] Auxiliary task #2951 by @b-flo
  • [New Features][ESPnet1][Recipe] RTF calculation #2942 by @hirofumi0810
  • [New Features][ESPnet2] Supporting multiple optimizers in the default trainer #3014 by @kamo-naoyuki
  • [New Features][ESPnet2][ASR] Streaming Transformer ASR #2907 by @eml914
  • [New Features][ESPnet2][ASR][Installation] add wav2vec_encoder #2889 by @simpleoier
  • [New Features][ESPnet2][Documentation][Installation][README] Support sharded training of fairscale #2980 by @kamo-naoyuki
  • [New Features][ESPnet2][SE] Add SeparateSpeech API in espnet2/bin/enh_inference.py #2878 by @Emrys365
  • [New Features][ESPnet2][TTS][Installation][README] Support phonemizer for vairous language G2P #2959 by @kan-bayashi

Bugfix

  • [Bugfix][CI][Installation] Install warp-ctc using pip>=21.0 #2999 by @ysk24ok
  • [Bugfix][ESPnet1] Integration testing for asr_mix was using the wrong config. #3006 by @siddalmia
  • [Bugfix][ESPnet1][ASR] Fix model averaging #2910 by @b-flo
  • [Bugfix][ESPnet1][ASR] bug fixed for streaming transformer ASR #2981 by @eml914
  • [Bugfix][ESPnet1][ASR] builtin ctc modification #3001 by @siddalmia
  • [Bugfix][ESPnet1][ASR][CI] Fix transfer learning w/ pre-trained LM + finetuning tutorial #2967 by @b-flo
  • [Bugfix][ESPnet1][ASR][RNNT] Fix a condition in TSD #2965 by @b-flo
  • [Bugfix][ESPnet1][ASR][Recipe] fix egs/ljspeech/asr1 #2865 #2884 by @kan-bayashi
  • [Bugfix][ESPnet1][ASR][Recipe][ST] Fix bug in How2 recipe #2933 by @hirofumi0810
  • [Bugfix][ESPnet1][ASR][Refactoring] Fix data sorting in attention/CTC visualization #2883 by @hirofumi0810
  • [Bugfix][ESPnet1][Docker] Fix docker error caused by BeamSearchTransducer #2973 by @b-flo
  • [Bugfix][ESPnet1][ESPnet2] Fix bugs of our Conformer implementation. #2816 by @pengchengguo
  • [Bugfix][ESPnet1][ESPnet2][Refactoring] Fix arguments in dynamic and lightweight conv #3004 by @hirofumi0810
  • [Bugfix][ESPnet1][RNNT] fix out_dim definition #2915 by @b-flo
  • [Bugfix][ESPnet1][TTS] Fix attention plot bug #2984 #2985 by @kan-bayashi
  • [Bugfix][ESPnet1][mergify] swbd run.sh is including dev data in the training set #2977 by @brianyan918
  • [Bugfix][ESPnet2] Fix sharded_ddp mode #3015 by @kamo-naoyuki
  • [Bugfix][ESPnet2] bug fix for Wav2Vec encoder #2997 by @simpleoier
  • [Bugfix][ESPnet2][Documentation] Fix for sharded training with amp #2993 by @kamo-naoyuki
  • [Bugfix][ESPnet2][Documentation] Fix sharded training for multiple nodes #2994 by @kamo-naoyuki
  • [Bugfix][ESPnet2][SE] quick fix for librimix (SE) data preparation #2982 by @LiChenda

Recipe

  • [Recipe][ESPnet1][ASR] Fix dev set in IWSLT21 ASR recipe #3000 by @hirofumi0810
  • [Recipe][ESPnet1][ASR] IWSLT'21 ASR recipe #2934 by @hirofumi0810
  • [Recipe][ESPnet1][ASR] Update IWSLT21 ASR recipe #2987 by @hirofumi0810
  • [Recipe][ESPnet1][ASR] Update the pre-trained Conformer model link of Aishell-1 corpus. #2924 by @pengchengguo
  • [Recipe][ESPnet1][ASR] Update transformer training results on common vioce dataset #2927 by @wenjie-p
  • [Recipe][ESPnet1][ASR][CI][Installation][Refactoring] Update IWSLT18 (ST-TED) ASR recipe #2916 by @hirofumi0810
  • [Recipe][ESPnet1][ASR][MT][ST][README] Must-C v2 recipe #2963 by @hirofumi0810
  • [Recipe][ESPnet1][ASR][MT][ST][Refactoring] Refactor Fisher-CallHome recipe #2904 by @hirofumi0810
  • [Recipe][ESPnet1][ASR][MT][ST][Refactoring] Refactor How2 recipe #2906 by @hirofumi0810
  • [Recipe][ESPnet1][ASR][MT][ST][Refactoring] Refactor Must-C recipe #2901 by @hirofumi0810
  • [Recipe][ESPnet1][ASR][MT][ST][Refactoring] Refactor libri-trans recipe #2903 by @hirofumi0810
  • [Recipe][ESPnet1][ASR][ST][Refactoring] Update IWSLT'19 recipe #2940 by @hirofumi0810
  • [Recipe][ESPnet1][ST][CI][Refactoring] Refactor ST recipes #2975 by @hirofumi0810
  • [Recipe][ESPnet1][ST][Refactoring] Refactor Mboshi-French corpus #2911 by @hirofumi0810
  • [Recipe][ESPnet2][ASR] Open-li52(add language id scoring & text case align for test set) #2938 by @ftshijt
  • [Recipe][ESPnet2][ASR][README] Add Russian open STT recipe for ESPnet2 #2972 by @akreal
  • [Recipe][ESPnet2][ASR][README] MLS (multi-lingual librispeech) recipe #2869 by @ftshijt
  • [Recipe][ESPnet2][ASR][README] Update espnet2 librispeech result #2966 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][README] added nsc results #2937 by @sw005320
  • [Recipe][ESPnet2][ASR][README] fix librispeech model url #2976 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][README] minor fix of li52 and nsc recipes #2936 by @sw005320
  • [Recipe][ESPnet2][ASR][README] update the results of open li52 recipe #2974 by @sw005320
  • [Recipe][ESPnet2][SE] Librimix separation results for Conv-Tasnet, 8k, min #2928 by @anogkongda
  • [Recipe][ESPnet2][SE][README] Espnet-SE, Speech enhancement recipes #2888 by @LiChenda

Enhancement

  • [Enhancement][ESPnet1][ASR] Auto Resampling to 16khz for pretrained models #2969 by @siddalmia
  • [Enhancement][ESPnet1][ASR][RNNT] Minor refactoring #2932 by @b-flo
  • [Enhancement][ESPnet1][ASR][RNNT][README][CI][Documentation] Refactoring RNNT #2887 by @b-flo
  • [Enhancement][ESPnet1][ESPnet2][ASR][LM][MT][TTS] Print total params and trainable params. #2996 by @siddalmia
  • [Enhancement][ESPnet1][LM] Add LM options like embedding dropout and tie weights #3010 by @siddalmia
  • [Enhancement][ESPnet1][ST][Refactoring] Add the latest RPE implementation to the ST task. #3005 by @pengchengguo

Other

  • [CI][README][mergify] Stop circle ci #2978 by @kamo-naoyuki
  • [Documentation] Update docs for ESPnet contributing (especially for recipes part) #2905 by @ftshijt
  • [Documentation] fix a typo #3016 by @Huang17
  • [Installation] Uninstall typing #2979 by @kamo-naoyuki

Acknowledgements

Special thanks to @Emrys365, @Huang17, @LiChenda, @akreal, @anogkongda, @b-flo, @brianyan918, @eml914, @ftshijt, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @pengchengguo, @siddalmia, @simpleoier, @sw005320, @wenjie-p, @ysk24ok.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.7

New Feature

  • [New Features][ESPnet1][ASR] Option for GTN CTC mode #2866 by @brianyan918
  • [New Features][ESPnet2][SE][README] Update to speech enhancement task #2649 by @LiChenda
  • [New Features][ESPnet2][ASR][README] Lightweight Sinc Convolutions for Espnet2 #2768 by @lumaku
  • [New Features][ESPnet2][Documentation] --freeze_param option #2787 by @kamo-naoyuki
  • [New Features][ESPnet2][TTS][README] Add a new G2P pyopenjtalk_accent_with_pause #2843 by @kan-bayashi
  • [New Features][ESPnet2][TTS][README] Add pyopenjtalk_accent g2p for ESPnet2 TTS #2781 by @ota
  • [New Features][ESPnet2][TTS][README] Support X-vector based multi-speaker TTS model in ESPnet2 #2800 by @kan-bayashi

Enhancement

  • [Enhancement][ESPnet1][ESPnet2] Add version info in args #2841 by @kan-bayashi
  • [Enhancement][ESPnet1][ESPnet2][ASR] AMI Recipe (Short UTT checker) #2802 by @ftshijt
  • [Enhancement][Installation] add default activate_python.sh #2788 by @kamo-naoyuki
  • [Enhancement][Installation] modified: check_install.py #2834 by @kamo-naoyuki
  • [Enhancement][Installation][Documentation][ESPnet1][ESPnet2] Change version info location #2840 by @kan-bayashi

Bugfix

  • [Bugfix][ESPnet1][ASR] fix greedy decoding #2812 by @b-flo
  • [Bugfix][ESPnet2][ASR] Fix the compatibility of the pretrained ASR model #2794 by @kan-bayashi
  • [Bugfix][Installation] Fix #2799 #2830 by @kamo-naoyuki
  • [Bugfix][Installation] Fix HTS engine installation #2825 by @kan-bayashi
  • [Bugfix][Installation] fix the incorrect $PATH setting in tools/extra_path.sh #2833 by @jumon
  • [Bugfix][Recipe][ESPnet1][ASR] Minor fixes in CSJ #2837 by @YosukeHiguchi
  • [Bugfix][Recipe][ESPnet1][ASR] fix receipe bug for librispeech #2735 by @yuekaizhang
  • [Bugfix][Recipe][ESPnet2][ASR] fix a config name #2729 by @sw005320
  • [Bugfix][Recipe][ESPnet2][ASR][README] Fix dirha_wsj recipe #2747 by @kamo-naoyuki
  • [Bugfix][Recipe][ESPnet2][TTS] Add missing decoding configs in LibriTTS recipe #2827 by @kan-bayashi

Recipe

  • [Recipe][ESPnet1][ASR] Add LibriSpeech Conformer results for LibriCSS #2861 by @akreal
  • [Recipe][ESPnet1][ASR] Update Commonvoice Recipe with Conformer Settings #2739 by @ftshijt
  • [Recipe][ESPnet1][ASR] Update Russian open STT recipe for v1.01 of the dataset #2776 by @akreal
  • [Recipe][ESPnet1][ASR] Update models and results of Conformer. #2765 by @pengchengguo
  • [Recipe][ESPnet1][ESPnet2][ASR][README] ESPnet2 recipe for commonvoice #2793 by @hchung12
  • [Recipe][ESPnet1][VC][README] VCC2020 database #2754 by @unilight
  • [Recipe][ESPnet2][ASR][README] Update Dirha WSJ result #2756 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][README] espnet2 hkust recipe #2863 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][README] update the AMI result in espnet2 #2817 by @sw005320
  • [Recipe][ESPnet2][ASR][README] updated the laborotv result #2750 by @sw005320
  • [Recipe][ESPnet2][ASR][README] Update reverb result #2876 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR] Minor fix of laborotv recipe #2877 by @hfujihara
  • [Recipe][ESPnet2][TTS] Fix total number of iterations #2813 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Add libritts recipe for ESPnet2 #2807 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Add x-vector based configs for VCTK #2808 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Minor update TTS README #2818 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update JSUT TTS results #2792 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update JSUT results #2809 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update JSUT results #2871 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update LibriTTS results #2842 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update VCTK results #2814 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] Update libritts results #2828 by @kan-bayashi
  • [Recipe][ESPnet2][TTS][README] update latest CSMSC link address #2777 by @meowtech

Other

  • [CI][Documentation][Installation] Change warp-ctc and warp-transducer to extra #2748 by @kamo-naoyuki
  • [CI][README] Update ci setting #2848 by @kan-bayashi
  • [ASR][Documentation][ESPnet2] Sinc Convolutions - add documentation for plotsincfilters.py #2782 by @lumaku
  • [Documentation][ESPnet1] fixed some typos #2855 by @jumon
  • [Documentation][Installation] Update documentation #2757 by @kamo-naoyuki
  • [Installation][Refactoring] Move the dependencies coming from recipes #2740 by @kamo-naoyuki

Acknowledgements

Special thanks to @AdolfVonKleist, @LiChenda, @YosukeHiguchi, @akreal, @b-flo, @brianyan918, @ftshijt, @hchung12, @hfujihara, @jumon, @kamo-naoyuki, @kan-bayashi, @lumaku, @meowtech, @ota, @pengchengguo, @sw005320, @unilight, @yuekaizhang.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.6

New Feature

  • [New Features][ESPnet2] Wandb integration #2707 by @kamo-naoyuki
  • [New Features][ESPnet2][ASR] Add ignorenangrad option for CTC #2699 by @kamo-naoyuki
  • [New Features][ESPnet2][SE] Touching common modules before the main Enh PR #2705 by @LiChenda

Bug fix

  • [Bugfix][ESPnet1] bug fix for pytorch1.7 #2656 by @kamo-naoyuki
  • [Bugfix][ESPnet1][ESPnet2][TTS] Use nkf in CSMSC data prep #2726 by @kan-bayashi
  • [Bugfix][ESPnet2] Fix flooring for global_mvn.py #2623 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix small bug of tensorboard part #2702 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix wandb mode with multi gpus #2709 by @kamo-naoyuki
  • [Bugfix][ESPnet2][TTS] Fix token averaged feature the case when r > 1 #2704 by @kan-bayashi

Recipe

  • [Recipe][ESPnet1] Extend model averaging condition in run scripts #2613 by @b-flo
  • [Recipe][ESPnet1][ASR] Enable multi-thread processing of json files. #2681 by @Peidong-Wang
  • [Recipe][ESPnet1][ASR] Update KsponSpeech conformer results #2624 by @jubang0219
  • [Recipe][ESPnet1][ASR] Update Voxforge with Conformer results #2642 by @YosukeHiguchi
  • [Recipe][ESPnet1][ASR] lang was being used before being parsed for user input #2654 by @siddalmia
  • [Recipe][ESPnet1][ASR][ESPnet2][Installation][README] espnet2 reverb recipe #2691 by @kamo-naoyuki
  • [Recipe][ESPnet1][ASR][README] Update Switchboard with conformer results #2697 by @Emrys365
  • [Recipe][ESPnet1][ASR][README] add librispeech conformer w/ speed perturbation + specaug #2617 by @yuekaizhang
  • [Recipe][ESPnet2][ASR] ASR template recipe: --srctexts -> --lmtraintext, --bpetraintext #2660 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR] Add $tokentype to asrtag and lm_tag #2625 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][Installation][README][Recipe] Laborotv recipe #2703 by @sw005320
  • [Recipe][ESPnet2][ASR][README] Add AISHELL w/o LM result #2718 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][README] ESPnet2 recipe for TIMIT #2568 by @sknadig
  • [Recipe][ESPnet2][ASR][README] JSUT conformer recipe achieving 12.0/13.9 CER(%) for dev/eval1 #2720 by @hchung12
  • [Recipe][ESPnet2][ASR][README] Update README.md #2659 by @sw005320
  • [Recipe][ESPnet2][ASR][README] Update WSJ result #2628 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][README] espnet2 librispeech with conformer #2687 by @sw005320
  • [Recipe][ESPnet2][README] Corpus README in egs2 #2713 by @sw005320
  • [Recipe][ESPnet2][README] update egs2/README.md #2719 by @Emrys365

Enhancement

  • [Enhancement][Documentation][ESPnet2] Add --init_param option #2680 by @kamo-naoyuki
  • [Enhancement][ESPnet1][ASR] Save model snapshot at every epoch even if saveintervaliters > 0 - for model averaging #2637 by @sknadig
  • [Enhancement][ESPnet2] Update wandb part #2708 by @kamo-naoyuki
  • [Enhancement][ESPnet2][ASR] Add *statsdir options in asr.sh #2724 by @kan-bayashi

Documentation

  • [Documentation][ESPnet2][README] Update egs2 README #2723 by @kan-bayashi
  • [Documentation][ESPnet2][README][TTS] Update README about fine-tuning #2685 by @kan-bayashi
  • [Documentation][ESPnet2][README][TTS] Update TTS README.md #2650 by @kan-bayashi

Refactoring

  • [Refactoring][ESPnet1][ASR][README] Refactor Mask CTC non-autoregressive ASR #2223 by @YosukeHiguchi
  • [Refactoring][ESPnet2] Added unicode support for generated configs #2672 by @Piteryo

Others

  • [Installation] python setup.py install -> pip install -e #2619 by @kamo-naoyuki
  • [Installation][Refactoring] modify for zsh: tools/extra_path.sh #2696 by @kamo-naoyuki
  • [Docker] Docker flags for extra libraries (VC) #2622 by @Fhrozen

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @LiChenda, @Peidong-Wang, @Piteryo, @YosukeHiguchi, @b-flo, @hchung12, @jubang0219, @kamo-naoyuki, @kan-bayashi, @siddalmia, @sknadig, @sw005320, @yuekaizhang.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.5

New Features

  • [New Features][ESPnet2][TTS] Support g2p=none for text with phonemes #2551 by @kan-bayashi
  • [New Features][ESPnet2][TTS] Add MCD evaluation script for ESPnet2-TTS #2554 by @kan-bayashi
  • [New Features][ESPnet1][ST] Conformer End-to-End Speech Translation #2523 by @hirofumi0810

Bugfix

  • [Bugfix][ESPnet1] CTC segmentation - package update #2566 by @lumaku
  • [Bugfix][ASR][ESPnet1] fix bug about att_ws in multi-enc case #2549 by @lzm0706
  • [Bugfix][ESPnet1] Conformer averaging model support for pytorch 1.6 #2604 by @siddalmia
  • [Bugfix][ESPnet1][ASR] Set built-in CTC for asr_recog #2588 by @lumaku
  • [Bugfix][ESPnet1][ASR][Installation] Transducer float16 loss bug fix #2496 by @GNroy

Refactoring

  • [Refactoring][ESPnet1][ASR] Refactor BeamSearchTransducer and ErrorCalculatorTrans #2538 by @b-flo

Recipe

  • [Recipe][ESPnet1][ASR] Alignment recipe for CSJ. #2531 by @jnishi
  • [Recipe][ESPnet1][ASR] New Recipe for KsponSpeech (Korean spontaneous speech; 969 hours) #2555 by @jubang0219
  • [Recipe][ESPnet1][ASR] Update TedLium3 conformer results #2600 by @LiChenda
  • [Recipe][ESPnet1][ASR] Update VIVOS models #2574 by @b-flo
  • [Recipe][ESPnet1][ASR] Update model link in Puebla-Nahuatl #2607 by @ftshijt
  • [Recipe][ESPnet1][ASR] Update tedlium2 with conformer results #2599 by @Emrys365
  • [Recipe][ESPnet1][ASR] update the JSUT recipe with conformer #2546 by @sw005320
  • [Recipe][ESPnet2][ASR] Add CSJ conformer config #2560 by @kan-bayashi
  • [Recipe][ESPnet2][ASR] Add CSJ conformer results #2552 by @kan-bayashi
  • [Recipe][ESPnet2][ASR] Small changes for aishell config #2586 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR] Update espnet2 AISHELL results #2580 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR] update JSUT espnet2 with pre-trained models #2563 by @sw005320
  • [Recipe][ESPnet2][TTS] Add JSSS recipe for ESPnet2-TTS #2558 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Update ESPnet2 TTS result #2542 by @kan-bayashi

CI

  • [CI][Documentation] Support espnet2/bin in sphinx doc. #2544 by @ShigekiKarita
  • [CI][Installation][README] Add pytorch1.7.0 ci test #2605 by @kamo-naoyuki

Other

  • [Installation] Install warpctc-pytorch wheel when torch version is 1.1 - 1.6 #2547 by @ysk24ok
  • [Installation] Modified requirements: "dataclasses; python_version < '3.7'", #2541 by @kamo-naoyuki
  • [Installation] Remove pip3 check in setup_python.sh #2567 by @ShigekiKarita

Acknowledgements

Special thanks to @Emrys365, @GNroy, @LiChenda, @ShigekiKarita, @b-flo, @ftshijt, @hirofumi0810, @jnishi, @jubang0219, @kamo-naoyuki, @kan-bayashi, @lumaku, @lzm0706, @siddalmia, @sw005320, @ysk24ok.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.4

New Features

  • [New Features][ESPnet1][ASR] Transducer v4 #2444 by @b-flo
  • [New Features][ESPnet2] Support audio_format=flac.ark, wav.ark #2451 by @kamo-naoyuki
  • [New Features][ESPnet2][ASR] Support conformer encoder in ESPnet2 ASR #2515 by @kan-bayashi

Bugfix

  • [Bugfix][ESPnet1] Fixed IndexError in BatchBeamSearch.post_process() (#2483) #2484 by @kan-bayashi
  • [Bugfix][ESPnet1][LM] fix multigpu bug if pytorch>=1.5 #2492 by @kamo-naoyuki
  • [Bugfix][ESPnet2] remove cleaner #2529 by @kamo-naoyuki
  • [Bugfix][ESPnet2][TTS] Fix TTS inference bug for GST + Fastspeech2 #2498 by @kan-bayashi

Documentation

  • [Documentation] Update espnet2_tutorial.md #2528 by @kamo-naoyuki
  • [Documentation] Update espnet2_tutorial.md #2532 by @kamo-naoyuki
  • [Documentation] Update espnet2_tutorial.md #2534 by @kamo-naoyuki
  • [Documentation] Update notebook submodule #2499 by @kan-bayashi
  • [Documentation][ESPnet1] Small fixes for transducer #2514 by @b-flo
  • [Documentation][ESPnet2][README][TTS] Update ESPnet2 TTS README #2516 by @kan-bayashi
  • [Documentation][README] Update README #2504 by @kan-bayashi
  • [Documentation][README][ESPnet1] CTC segmentation - checks for blank chars and RNN models #2535 by @lumaku

Recipe

  • [Recipe][ESPnet1][ASR] add conformer results for librispeech #2510 by @yuekaizhang
  • [Recipe][ESPnet2][ASR] Update ESPnet2 CSJ Transformer results #2497 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Add results for ESPnet2 TTS #2503 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Update Transformer-TTS config #2494 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Update Transformer-TTS configs #2502 by @kan-bayashi

Refactoring

  • [Refactoring] Modify uttid to "${spkid}-${uttid}" for trn files #2527 by @kamo-naoyuki
  • [Refactoring][ESPnet1][ASR][LM] Remove all future lines #2481 by @ShigekiKarita
  • [Refactoring][ESPnet1][ASR][MT][ST] Unify arguments #2506 by @hirofumi0810
  • [Refactoring][ESPnet1][ESPnet2][TTS] Refactor length regulator to improve the speed #2482 by @kan-bayashi
  • [Refactoring][ESPnet1][MT][ST] Refactor decoding for translation tasks #2501 by @hirofumi0810
  • [Refactoring][ESPnet2] Change addscalars to addscalar for tensorboard SummaryWriter #2525 by @kamo-naoyuki

CI

  • [CI][ASR] Make teste2easr.py faster #2488 by @ShigekiKarita
  • [CI][ASR] Make teste2easr_maskctc.py faster. #2493 by @ShigekiKarita
  • [CI][ASR] Make test_recog.py faster #2486 by @ShigekiKarita
  • [CI][ESPnet1][ASR] make teste2easr_mulenc.py faster #2480 by @ruizhilijhu
  • [CI][ESPnet1][Installation] Update shellcheck url. #2500 by @ShigekiKarita
  • [CI][ESPnet2][Installation] Limit test execution time to 2.0 sec #2520 by @ShigekiKarita
  • [CI][SE] Make testbeamformernet.py faster #2489 by @ShigekiKarita
  • [CI][SE] shorten test time for tasnet #2491 by @LiChenda

Other

  • [Installation] Update h5py version to avoid errors in Python3.8 #2519 by @shigabeev
  • [Docker] Docker Updates #2509 by @Fhrozen

Acknowledgements

Special thanks to @Fhrozen, @LiChenda, @ShigekiKarita, @b-flo, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @lumaku, @ruizhilijhu, @shigabeev, @yuekaizhang.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.3

New Features

  • [New Features][ESPnet2] Implement --gradcliptype #2399 by @kamo-naoyuki
  • [New Features][ESPnet2][ASR] Implement batch_score() method for ASR decoder and LM #2377 by @kamo-naoyuki
  • [New Features][ESPnet2][README][TTS] Support Conformer-based FastSpeech / FastSpeech2 #2413 by @kan-bayashi

Bugfix

  • [Bugfix][CI][ESPnet1][ESPnet2] make sure chainer independent #2411 by @kamo-naoyuki
  • [Bugfix][CI][ESPnet1][Installation] Revert ctc seg installation #2392 by @kan-bayashi
  • [Bugfix][CI][Installation] Fix the installation error in CI #2476 by @kan-bayashi
  • [Bugfix][ESPnet1][ASR] Lazy import chainer in asr_utils.py #2407 by @kamo-naoyuki
  • [Bugfix][ESPnet1][ASR] asr: Fix recog issue on Transformer CTC model #2394 by @jaesong
  • [Bugfix][ESPnet1][MT][ST] Fix score_bleu.sh #2400 by @hirofumi0810
  • [Bugfix][ESPnet1][README][Typo] fixed typo in egs/README.md #2473 by @mrazizi
  • [Bugfix][ESPnet1][TTS] lazy import chainer: espnet/nets/tts_interface.py #2409 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Add missing database in db.sh #2427 by @kan-bayashi
  • [Bugfix][ESPnet2] Fix the CommonPreprocessor_multi missing issue #2460 by @LiChenda
  • [Bugfix][ESPnet2] Minor fix of egs2/commonvoice/asr1/local/data.sh #2438 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix the directory for initfileprefix #2412 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix typo of log_level choices #2472 by @glynpu
  • [Bugfix][ESPnet2][ASR] Add grep -H option #2388 by @kamo-naoyuki
  • [Bugfix][ESPnet2][TTS] Fix wrong sum axis in energy extraction #2469 by @kan-bayashi
  • [Bugfix][ESPnet2][Typo] Fix typo in help comment and docstrings #2470 by @kan-bayashi
  • [Bugfix][Installation] add warpctc_pytorch version==0.1.2 #2403 by @kamo-naoyuki

Documentation

  • [Documentation] Add bug report template #2396 by @sw005320
  • [Documentation] Add installation issue template #2397 by @sw005320
  • [Documentation] Update espnet2_distributed.md #2418 by @kamo-naoyuki
  • [Documentation] Update espnet2_distributed.md #2419 by @kamo-naoyuki
  • [Documentation] Update espnet2trainingoption.md #2421 by @kamo-naoyuki
  • [Documentation] Update faq.md #2431 by @kamo-naoyuki
  • [Documentation] Update parallelization.md #2428 by @kamo-naoyuki
  • [Documentation][ESPnet2][README] Update README.md #2430 by @kamo-naoyuki

Enhancement

  • [Enhancement][ESPnet1][ESPnet2] Add -c option for multi GPUs mode for slurm.conf #2406 by @kamo-naoyuki
  • [Enhancement][ESPnet1][Installation] Install warpctc-pytorch wheel when torch version is 1.1, 1.2 or 1.3 #2453 by @ysk24ok
  • [Enhancement][ESPnet1][README] ADD CSJ RNN pretrained model #2452 by @jnishi
  • [Enhancement][ESPnet2] Update db.sh #2426 by @kamo-naoyuki
  • [Enhancement][ESPnet2][TTS] Update ESPnet2 TTS config #2468 by @kan-bayashi
  • [Enhancement][ESPnet2][TTS] Update and add fastspeech2 configs #2429 by @kan-bayashi
  • [Enhancement][Installation] Add sanity check for setupcudaenv.sh #2389 by @kamo-naoyuki
  • [Enhancement][Installation] Change cudatoolkit to cuda if cuda_version=8.0 #2405 by @kamo-naoyuki
  • [Enhancement][Installation] Change to refer https://anaconda.org/pytorch/pytorch/files #2404 by @kamo-naoyuki
  • [Enhancement][Installation] Workaround for soundfile issue #2437 by @kamo-naoyuki

Recipe

  • [Recipe][ESPnet1][ASR] Add LibriCSS recipe #2246 by @akreal
  • [Recipe][ESPnet1][ASR] Update for the Official Split of YM Recipe #2435 by @ftshijt
  • [Recipe][ESPnet1][ESPnet2][ASR] Update CommonVoice for Latest Version #2455 by @ftshijt
  • [Recipe][ESPnet2][ASR] [zeroth korean] Not to use pipe format if feats_type=raw #2402 by @kamo-naoyuki
  • [Recipe][ESPnet2][ASR][README] espnet2 zerothkorean recipe changing featstype from fbank_pitch to raw. #2393 by @hchung12
  • [Recipe][ESPnet2][README][TTS] Add ESPnet2 TTS finetuning example recipe (JVS) #2465 by @kan-bayashi

CI

  • [CI] Add codecov actions. #2467 by @ShigekiKarita
  • [CI] Fix hangup of unittests #2424 by @kamo-naoyuki
  • [CI] Make espnet2 tts test faster #2461 by @kan-bayashi
  • [CI] Make teste2e{asr,st,mt}_{transformer,conformer}.py faster. #2464 by @ShigekiKarita
  • [CI] Update .gitignore #2434 by @kan-bayashi
  • [CI][ESPnet1] Make test(batch)beam_search.py faster. #2462 by @ShigekiKarita
  • [CI][ESPnet1] Support Debian9 and CentOS7 in Github Actions #2457 by @ShigekiKarita
  • [CI][ESPnet1][Installation] Fix HKUST recipe #2440 by @kamo-naoyuki

Acknowledgements

Special thanks to @LiChenda, @ShigekiKarita, @akreal, @ftshijt, @glynpu, @hchung12, @hirofumi0810, @jaesong, @jnishi, @kamo-naoyuki, @kan-bayashi, @mrazizi, @sw005320, @ysk24ok.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.2

New Features

  • [New Features][ESPnet1] CTC segmentation #2301 by @lumaku
  • [New Features][ESPnet2] Support multiple averaged nbest models #2353 by @kamo-naoyuki
  • [New Features][ESPnet2] Support recursive add in pack_funcs and add images to packed model #2367 by @kamo-naoyuki

Bugfix

  • [Bugfix][ASR][ESPnet1] remove ff_scale from conformer constructor arguments #2356 by @koji-okabe-hub
  • [Bugfix][ASR][ESPnet2] use lmexp instead of lmtag for inference_tag #2352 by @kamo-naoyuki
  • [Bugfix][CI][ESPnet1][Installation] Remove ctc_segmentation temporary #2385 by @kan-bayashi
  • [Bugfix][ESPnet1] Fix import error of conformer module #2384 by @kan-bayashi
  • [Bugfix][ESPnet1] Fix issue https://github.com/espnet/espnet/issues/2211 #2219 by @Emrys365
  • [Bugfix][ESPnet2] Add missing init.py #2326 by @kan-bayashi
  • [Bugfix][ESPnet2] Fix --outfilename option: formatwav_scp.sh #2348 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix amp #2362 by @kamo-naoyuki
  • [Bugfix][ESPnet2] add egs2/an4/asr1/local/path.sh #2343 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix recursive add: espnet2/mainfuncs/packfuncs.py #2369 by @kamo-naoyuki
  • [Bugfix][ESPnet2] remove unused import #2331 by @kamo-naoyuki
  • [Bugfix][ESPnet2][Installation][Typo] fix typo #2344 by @kamo-naoyuki
  • [Bugfix][ESPnet2][README] Fix typo #2372 by @Piteryo
  • [Bugfix][ESPnet2][TTS] make vietnamese_cleaner to opiton #2341 by @kamo-naoyuki
  • [Bugfix][Installation] Fix python version check for chainer #2342 by @kamo-naoyuki
  • [Bugfix][Installation] add undefined variable: checkpytorchcuda_compatibility.py #2361 by @kamo-naoyuki
  • [Bugfix][TTS] Fix device allocation error in guided attention loss #2282 #2317 by @kan-bayashi

Documentation

  • [Documentation] updated comment on the documentation #2351 by @GauravPandey892
  • [Documentation][ESPnet2] Update TTS README #2316 by @kan-bayashi
  • [Documentation][ESPnet2][README] Update ESPnet2 TTS README #2376 by @kan-bayashi
  • [Documentation][ESPnet2][README][TTS] Update README #2330 by @kan-bayashi
  • [Documentation][Installation] Devide setuppython.sh into setupvenv.sh and setup_python.sh #2382 by @kamo-naoyuki
  • [Documentation][Installation] add a description about check install. #2360 by @sw005320
  • [Documentation][README] CTC segmentation - Demo #2347 by @lumaku
  • [Documentation][README] Update README.md #2379 by @kamo-naoyuki

Enhancement

  • [Enhancement][ESPnet2] Change the default inference model to averaged model instead of the best #2346 by @kamo-naoyuki
  • [Enhancement][ESPnet2][TTS] Add pitch and energy stats in packing #2350 by @kan-bayashi
  • [Enhancement][Installation] Add checking for pytorch-cuda compatibility in Makefile #2334 by @kamo-naoyuki
  • [Enhancement][Installation] Show raw error message when failed to import packages #2374 by @kamo-naoyuki

Refactoring

  • [Refactoring] Apply new version black #2366 by @kamo-naoyuki
  • [Refactoring][ASR][ESPnet2] Not to add sp to $asrexp if --asr_exp option is specified #2368 by @kamo-naoyuki
  • [Refactoring][CI][ESPnet1][ESPnet2][Installation] Add installers for sctk and sph2pipe and create tools/extra_path.sh #2332 by @kamo-naoyuki
  • [Refactoring][ESPnet1][Recipe] Disable preparation for lm in wsj recipe #2373 by @kamo-naoyuki
  • [Refactoring][ESPnet2] Update Task design #2345 by @kamo-naoyuki
  • [Refactoring][ESPnet2][SE] Remove unused option from enh.sh:--feats_normalize #2325 by @kamo-naoyuki

Recipe

  • [Recipe][ASR][ESPnet1] MGB-2 #2289 by @AmirHussein96
  • [Recipe][ASR][ESPnet1] Remove duplicated class definition of Conformer and update some new results of Aishell1 and Switchboard. #2364 by @pengchengguo
  • [Recipe][ASR][ESPnet2][README] ASR WSJ RESULT update: Tuning LM #2355 by @kamo-naoyuki
  • [Recipe][ASR][ESPnet2][README] add pretrained model link #2378 by @kamo-naoyuki

CI

  • [CI][README] Update ubuntu images in circle ci #2349 by @ShigekiKarita
  • [CI][mergify] Update .mergify.yml #2333 by @kamo-naoyuki
  • [CI][mergify] Update .mergify.yml #2354 by @kamo-naoyuki

Acknowledgements

Special thanks to @AmirHussein96, @Emrys365, @GauravPandey892, @Piteryo, @ShigekiKarita, @kamo-naoyuki, @kan-bayashi, @koji-okabe-hub, @lumaku, @pengchengguo, @sw005320.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.1

New Features

  • [New Features] Add metric option to checkpoint averaging for Transformer #2259 by @hirofumi0810
  • [New Features][ESPnet2] Generate run.sh in the experiment dir for resuming #2284 by @kamo-naoyuki
  • [New Features][ESPnet2] Support larger numitersper_epoch than the number of batches in small corpus #2255 by @kamo-naoyuki
  • [New Features][ESPnet2] Support torch native automatic mixed precision for espnet2 #2257 by @kamo-naoyuki

Documentation

  • [Documentation] Update comments in MultiHeadAttention #2266 by @placebokkk
  • [Documentation][ESPnet2] append comment in reporter.py #2267 by @kamo-naoyuki
  • [Documentation][ESPnet2][README][TTS] Add ESPnet2 TTS recipe document #2312 by @kan-bayashi

Enhancement

  • [Enhancement][ESPnet2] Tensorboard stats between iterations #2252 by @kamo-naoyuki

Refactoring

  • [Refactoring][ESPnet2] Add some new features and a new recipe for the enhancement task #2238 by @Emrys365
  • [Refactoring][Documentation] Remove installation part of Python from Makefile #2245 by @kamo-naoyuki

Recipe

  • [Recipe][ASR] aidatatang conformer ESPnet1 recipe #2269 by @nzhoward
  • [Recipe][ESPnet2] espnet2 zeroth_korean recipe #2279 by @hchung12

Bug fix

  • [Bugfix] Fix #2295 #2311 by @kan-bayashi
  • [Bugfix] Minor fix for Makefile #2268 by @kamo-naoyuki
  • [Bugfix] Not to install cupy-cuda* for python>=3.8 #2277 by @kamo-naoyuki
  • [Bugfix] Remove channel: setup_anaconda.sh #2303 by @kamo-naoyuki
  • [Bugfix][ASR] ngram single decoding bug fix #2299 by @qmpzzpmq
  • [Bugfix][ASR][ESPnet2] Add missing init.py #2292 by @kamo-naoyuki
  • [Bugfix][ASR][ESPnet2] decode -> inference #2276 by @kamo-naoyuki
  • [Bugfix][ASR][ESPnet2] remove chainer dependency from showasrresult.sh #2281 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Avoid illegal summary name for tensorboard #2294 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix averagenbestmodels for pytorch=1.6 #2283 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix decode config extension in ESPnet2 CSJ recipe #2258 by @kan-bayashi
  • [Bugfix][ESPnet2] Fix for queue-freegpu.pl #2274 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix samplers about minbatchsize #2305 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Workaround for SGE jobname issue #2253 by @kamo-naoyuki
  • [Bugfix][ESPnet2] add missing shebang #2306 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix bug of reporter #2263 by @kamo-naoyuki
  • [Bugfix][ESPnet2][Recipe] Update zeroth_korean #2308 by @kamo-naoyuki
  • [Bugfix][ESPnet2][SE] add --spk-num 1 #2285 by @kamo-naoyuki
  • [Bugfix][ESPnet2][distributed] Not to save config.yaml if rank!=0 #2287 by @kamo-naoyuki

Others

  • [CI] Remove unnecessary installation when CI #2307 by @kamo-naoyuki
  • [CI] Take integration tests into coverage #2254 by @ShigekiKarita
  • [CI][ESPnet2] Add coverage measure for espnet2 integration test #2256 by @kamo-naoyuki
  • [CI][Installation] Install wheel #2304 by @kamo-naoyuki

Acknowledgements

Special thanks to @Emrys365, @ShigekiKarita, @hchung12, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @nzhoward, @placebokkk, @qmpzzpmq.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.0

New Features

  • [New Features][ASR] Non-autoregressive ASR with Mask CTC #2070 by @YosukeHiguchi
  • [New Features][ASR] Support Conformer model. #2144 by @pengchengguo
  • [New Features][ASR][ST] CTC posterior visualization during training #2221 by @hirofumi0810
  • [New Features][ESPnet2] Implement espnet2.bin.zenodo_upload #2168 by @kamo-naoyuki
  • [New Features][ESPnet2] Python API for inference #2092 by @kamo-naoyuki
  • [New Features][ESPnet2] Support TTS-Transformer in ESPnet2 #2134 by @kan-bayashi
  • [New Features][ESPnet2][ASR] Enable batch joint decoding with CTC in recog API v2 #2197 by @takaaki-hori
  • [New Features][ESPnet2][SE] Speech Enhancement Frontend for ESPNet2 Phase 1 #2124 by @LiChenda
  • [New Features][ESPnet2][TTS] Support FastSpeech for ESPnet2 TTS #2149 by @kan-bayashi
  • [New Features][ESPnet2][TTS] Support FastSpeech2 (+FastPitch) #2218 by @kan-bayashi
  • [New Features][ESPnet2][TTS] Support GST in ESPnet2 TTS #2139 by @kan-bayashi
  • [New Features][README][ASR] CTC forced alignment in E2E ASR Transformer model #2095 by @simpleoier
  • [New Features][VC] Voice Transformer Network #2064 by @unilight

Enhancement

  • [Enhancement] Fix error when downloading large files using download_from_google_drive.sh #2074 by @unilight
  • [Enhancement][ASR] added more beam search info #2130 by @sw005320
  • [Enhancement][ESPnet2] Change packed file of espnet2 to zip format #2161 by @kamo-naoyuki
  • [Enhancement][ESPnet2] Make read_text faster #2114 by @kamo-naoyuki
  • [Enhancement][ESPnet2] RESULTS.md -> README.md #2077 by @kamo-naoyuki
  • [Enhancement][ESPnet2] Remove long wave in template recipe #2075 by @kamo-naoyuki
  • [Enhancement][ESPnet2] Update ESPnet2 JSUT TTS recipe and TTS template #2110 by @kan-bayashi
  • [Enhancement][MT][ST] Fix ST/MT models for compatibility with ASR #2179 by @hirofumi0810
  • [Enhancement][ST] Add source case information to json files in ST task #2208 by @hirofumi0810
  • [Enhancement][ST] Refactor multi-task learning in ST #2202 by @hirofumi0810

Recipe

  • [Recipe][ASR] Add aidatatang_200zh recipe #2122 by @nzhoward
  • [Recipe][ASR] Add chime6 info #2250 by @sw005320
  • [Recipe][ASR] CHiME-6 recipe #2171 by @GNroy
  • [Recipe][ASR] Fix a bug in espnet wsj recipe. #2145 by @houwenxin
  • [Recipe][ASR] New Recipe for Yoloxóchitl-Mixtec (SLR89) #2085 by @ftshijt
  • [Recipe][ASR] Support averaging model for Conformer. #2244 by @pengchengguo
  • [Recipe][ASR] Updated model after tuning aidatatang_200zh recipe #2204 by @nzhoward
  • [Recipe][ASR] created a recipe to run asr on ljspeech #1996 by @ibkuroyagi
  • [Recipe][ASR] updatemodel link (add pre-trained bpe model and lm model) #2101 by @ftshijt
  • [Recipe][ESPnet2][ASR] espnet2 librispeech recipe #2109 by @sw005320
  • [Recipe][ESPnet2][ASR] espnet2 librispeech v2 #2189 by @sw005320
  • [Recipe][ESPnet2][ASR] update espnet2 aishell results #2150 by @Cescfangs
  • [Recipe][ESPnet2][ASR][TTS] fix devset/evalsets issues #2142 by @sw005320
  • [Recipe][ESPnet2][TTS] Add ESPnet2 CSMSC TTS recipe #2129 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Add ESPnet2 LJSpeech recipe #2117 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Add VCTK recipe for ESPnet2 TTS #2165 by @kan-bayashi
  • [Recipe][ESPnet2][TTS] Create espnet2 jsut/tts recipe #2047 by @kamo-naoyuki

Refactoring

  • [Refactoring][ESPnet2] Change stats_dir naming not to overwrite #2111 by @kan-bayashi
  • [Refactoring][ESPnet2] Move modules #2086 by @kamo-naoyuki
  • [Refactoring][ESPnet2] Remove $KALDI_ROOT/tools/env.sh from path.sh #2242 by @kamo-naoyuki
  • [Refactoring][ESPnet2] Several update for pretrain model #2212 by @kamo-naoyuki
  • [Refactoring][ESPnet2] Update Makefile #2225 by @kamo-naoyuki

Documentation

  • [README] Fix URL in README #2090 by @kan-bayashi
  • [README] Update README about TTS #2079 by @kan-bayashi
  • [README] Update README.md #2046 by @kamo-naoyuki
  • [README] Update README.md #2067 by @kamo-naoyuki
  • [README] Update README.md #2243 by @kamo-naoyuki
  • [README] Update citation #2206 by @hirofumi0810
  • [README] Update installation.md #2233 by @kamo-naoyuki
  • [README][ESPnet2] Update egs2/TEMPLATE/README.md #2098 by @kamo-naoyuki

Bugfix

  • [Bugfix] Add cupy.done in make python #2091 by @kan-bayashi
  • [Bugfix] Append a missing space in cmd-line args in utils/dump_pcm.sh #2209 by @yistLin
  • [Bugfix] Fix Makefile #2097 by @kamo-naoyuki
  • [Bugfix] Fix minor bug of Makefile #2055 by @kamo-naoyuki
  • [Bugfix] Fix old model compatibility #2048 #2060 #2063 by @kan-bayashi
  • [Bugfix] Fix pretrained model #2053 #2069 by @kan-bayashi
  • [Bugfix] Fix pyopenjtalk installation #2108 by @kan-bayashi
  • [Bugfix] Fix typo in run.sh of TTS recipes #2216 by @hirofumi0810
  • [Bugfix] Update Makefile to disable cupy for cuda=10.2 or later #2230 by @kamo-naoyuki
  • [Bugfix] fix path of PESQ #2058 by @kamo-naoyuki
  • [Bugfix] scorerinterface warning English correction #2076 by @qmpzzpmq
  • [Bugfix][CI] Fix bug in attention plotting #2185 by @hirofumi0810
  • [Bugfix][CI] Freeze the matplotlib version with 3.1.0 #2181 by @sw005320
  • [Bugfix][CI] fix integrationtestctcalignwav.bats with a small model #2170 by @simpleoier
  • [Bugfix][CI] temporally disable subsample 6 and 8 tests #2205 by @sw005320
  • [Bugfix][CI][MT][ST] Add integration test for ST/MT tasks #2210 by @hirofumi0810
  • [Bugfix][ESPnet2] Add missing path.sh in egs2/vctk/tts1 #2167 by @kan-bayashi
  • [Bugfix][ESPnet2] Fix TTS inference #2222 by @kan-bayashi
  • [Bugfix][ESPnet2] Fix tts_inference when feats_extract is None #2176 by @kan-bayashi
  • [Bugfix][ESPnet2] Fix bug for feats_type=extracted #2087 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix bug of iterable dataset when num_workers>=1 #2081 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix bug of when espnet2/bin/tokenizetext.py --cutoff or --vocabularysize is used #2158 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Fix log: benchmark -> deterministic #2080 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Implement configargparse in espnet2 #2157 by @kamo-naoyuki
  • [Bugfix][ESPnet2] Select torchaudio version according to torch version #2214 by @kamo-naoyuki
  • [Bugfix][ESPnet2] avoid UnboundLocalError when lm is not loaded #2227 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix #2050 #2051 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix #2198: PhonemeTokenizer can't perform with multiprocessing #2201 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix bestmodelcriterion: wsj/asr1/conf/tuning/train_lm.yaml #2153 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix bug of lm.py #2056 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix the stage number: enh.sh #2220 by @kamo-naoyuki
  • [Bugfix][ESPnet2] fix: decodeconfig -> inferenceconfig #2239 by @kamo-naoyuki
  • [Bugfix][ESPnet2][Recipe] Not removing short/long utterances for eval_sets #2112 by @kamo-naoyuki
  • [Bugfix][ESPnet2][SE] Fix bugs in espnet2/enh and format related directory structures #2215 by @Emrys365
  • [Bugfix][ESPnet2][TTS] Fix feature extractor of TTS for compatibility #2102 by @kamo-naoyuki

Acknowledgements

Special thanks to @Cescfangs, @Emrys365, @GNroy, @LiChenda, @YosukeHiguchi, @ftshijt, @hirofumi0810, @houwenxin, @ibkuroyagi, @kamo-naoyuki, @kan-bayashi, @nzhoward, @pengchengguo, @qmpzzpmq, @simpleoier, @sw005320, @takaaki-hori, @unilight, @yistLin.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.8.0

ESPnet2

  • [ESPnet2] Solve memory issue with super large corpus training #1972 by @kamo-naoyuki
  • [ESPnet2] Added model parameter count to trainer #1867 by @SeanNaren
  • [ESPnet2] Refactoring espnet2/utils/fileio.py -> espnet2/fileio #1807 by @kamo-naoyuki

New Features

  • [New Features] Lightweight and Dynamic Convolutions. #1599 by @yuyfujit
  • [New Features] Implement Ngram scorer #1946 by @qmpzzpmq
  • [New Features] resampling in utils/compute-fbank-feats.py and utils/compute-stft-feats.py #2035 by @kamo-naoyuki

Enhancement

  • [Enhancement] Ngram scorer update #1992 by @qmpzzpmq

Documentation

  • [Documentation] fix a typo for the decoder addargumentgroup #2030 by @sw005320
  • [Documentation] Update multiple GPU descriptions. #2016 by @sw005320
  • [Documentation] Finetuning doc + freezing parameters option #1897 by @b-flo

Bugfix

  • [Bugfix] Fix memory issue when resuming #2040 by @kamo-naoyuki
  • [Bugfix] fixed typo in cmvn.py #1988 by @gullyboy007
  • [Bugfix] update notebook #1986 by @ShigekiKarita
  • [Bugfix] Fix freezing modules (when using multi-gpu) #1983 by @atozto9
  • [Bugfix] Fix BLEU/PPL calculation during training #2009 by @hirofumi0810
  • [Bugfix] Fix download file extension #2042 by @takenori-y
  • [Bugfix] fix tedlium2/3 model link #2032 by @sw005320
  • [Bugfix] Fix bug for pure Transformer-CTC #2023 by @hirofumi0810
  • [Bugfix] li42 recipe: add li42 results; fix bug in adding language id "zh_TW" #1950 by @houwenxin

CI

  • [CI] Add espnet2 in ci/doc.sh #1976 by @ShigekiKarita
  • [CI] Add test for pytorch1.5 #1881 by @kamo-naoyuki

Acknowledgements

Special thanks to @SeanNaren, @ShigekiKarita, @atozto9, @b-flo, @gullyboy007, @hirofumi0810, @houwenxin, @kamo-naoyuki, @qmpzzpmq, @sw005320, @takenori-y, @yuyfujit.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.7.0

Now, the ESPnet project moves on to a new endeavor! We launched espnet2, which aims to refine the modularities (chainer-free, kaldi-free), use a more customizable trainer, support distributed training, and achieve the scalability mainly led by @kamo-naoyuki with his great efforts and leadership. This project is one of the outcomes of our ESPnet hackathon in Tokyo 2019 with a lot of discussions about the design, new features, and community contributions. espnet2 currently supports main ASR recipes (with a well-designed recipe template) and limited TTS recipes. We maintain both espnet1 and espnet2, but gradually move to our development in espnet2. The ESPnet project is further accelerated!

ESPnet2

  • [ESPnet2] keep the latest model #1769 by @kamo-naoyuki
  • [ESPnet2] Remove "E2E" from all comments #1766 by @kamo-naoyuki
  • [ESPnet2] Refactoring for ESPnetDataset #1758 by @kamo-naoyuki
  • [ESPnet2] Implement SpecAug for ESPnet2 #1746 by @kamo-naoyuki
  • [ESPnet2] Implement BatchBinSampler #1742 by @kamo-naoyuki
  • [ESPnet2] Support torch_optimizer #1739 by @kamo-naoyuki
  • [ESPnet2] Log rotation for launch.py #1737 by @kamo-naoyuki
  • [ESPnet2] Change the type of --chunklength to stror_int #1733 by @kamo-naoyuki
  • [ESPnet2] Change cudnn deterministic mode to default #1732 by @kamo-naoyuki
  • [ESPnet2] Add wsj results for espnet2 #1724 by @kamo-naoyuki
  • [ESPnet2] Show estimated time to finish #1717 by @kamo-naoyuki
  • [ESPnet2] Add --name option for training job #1714 by @kamo-naoyuki
  • [ESPnet2] Show the log file when training process is failed: espnet2.bin.launch.py #1713 by @kamo-naoyuki
  • [ESPnet2] --maxlength -> --foldlength #1712 by @kamo-naoyuki
  • [ESPnet2] Double quoter for NCCLSOCKETIFNAME #1706 by @kamo-naoyuki
  • [ESPnet2] Save apex state in checkpoint and support apex optimizer #1705 by @kamo-naoyuki
  • [ESPnet2] Update asr.sh #1694 by @zh794390558
  • [ESPnet2] Update ctc.py #1688 by @zh794390558
  • [ESPnet2] Update launch.py #1681 by @zh794390558
  • [ESPnet2] Update asr.sh #1678 by @zh794390558
  • [ESPnet2] --keepnbestcheckpoints -> --keepnbest_models #1647 by @kamo-naoyuki
  • [ESPnet2] Avoid deprecated warning: reduction="none" #1510 by @kamo-naoyuki
  • [ESPnet2] Minor change for speed perturbation #1627 by @kamo-naoyuki
  • [ESPnet2] Fix how2 recipe #1620 by @kamo-naoyuki
  • [ESPnet2] Fix recipes #1617 by @kamo-naoyuki
  • [ESPnet2] Renaming #1610 by @kamo-naoyuki
  • [ESPnet2] Implement chunk iterator #1608 by @kamo-naoyuki
  • [ESPnet2] Update voxforge RESULTS #1601 by @kamo-naoyuki
  • [ESPnet2] vivos recipe: --audio_format wav #1592 by @kamo-naoyuki
  • [ESPnet2] Lower python requirements to 3.6 #1565 by @kamo-naoyuki
  • [ESPnet2] dirha_wsj recipe for espnet2 #1556 by @yuekaizhang
  • [ESPnet2] Update AISHELL ASR Recipe #1549 by @Emrys365
  • [ESPnet2] Remove short data #1531 by @kamo-naoyuki
  • [ESPnet2] [WIP] Update JSUT ASR Recipe #1529 by @YosukeHiguchi
  • [ESPnet2] Update HOW2 recipe #1522 by @b-flo
  • [ESPnet2] [WIP] Update CSJ ASR Recipe #1520 by @YosukeHiguchi
  • [ESPnet2] Change NoamLR to deprecated and implement WarmupLR #1519 by @kamo-naoyuki
  • [ESPnet2] Implement --maxcachesize option #1509 by @kamo-naoyuki
  • [ESPnet2] distributed training #1506 by @kamo-naoyuki
  • [ESPnet2] ESPNet2 Recipe Update -- commonvoice, babel, ami #1504 by @ftshijt
  • [ESPnet2] Refactoring #1494 by @kamo-naoyuki
  • [ESPnet2] Fix ci of flake8 part #1491 by @kamo-naoyuki
  • [ESPnet2] Tensorboard, --numitersper_epoch, etc. #1487 by @kamo-naoyuki
  • [ESPnet2] Fix espnet2.bin.pack #1486 by @kamo-naoyuki
  • [ESPnet2] show_result.sh #1478 by @kamo-naoyuki
  • [ESPnet2] Pack and Unpack model #1477 by @kamo-naoyuki
  • [ESPnet2] collect-stats mode, trainer class, etc. #1462 by @kamo-naoyuki
  • [ESPnet2] add test codes for asr decoders #1445 by @kamo-naoyuki
  • [ESPnet2] Integrate Griffin-Lim with tts_decode() #1442 by @kan-bayashi
  • [ESPnet2] Update ASR recipe #1439 by @kan-bayashi
  • [ESPnet2] Update TTS recipes #1430 by @kan-bayashi
  • [ESPnet2] Disable wer/cer calculation when training #1547 by @kamo-naoyuki
  • [ESPnet2] Change CTC default to builtin #1546 by @kamo-naoyuki
  • [ESPnet2] Update chime4 asr1 Recipe #1570 by @yuekaizhang
  • [ESPnet2] Create documentation for espnet2 #1710 by @kamo-naoyuki
  • [ESPnet2] shellcheck for local/data.sh #1524 by @kamo-naoyuki
  • [ESPnet2] commonvoice: RESULTS.md -> README.md #1797 by @kamo-naoyuki

Bugfix

  • [Bugfix] % -> percent: espnet2/tasks/abs_task.py #1767 by @kamo-naoyuki
  • [Bugfix] Fix gpu mode for tts_inference.py #1755 by @kamo-naoyuki
  • [Bugfix] Fix SubReporter #1748 by @kamo-naoyuki
  • [Bugfix] Fix calculateallattentions for espnet2 #1747 by @kamo-naoyuki
  • [Bugfix] Not to create the averaged mdel if --keepnbestmodels=1 #1744 by @kamo-naoyuki
  • [Bugfix] Fix --bestmodelcriterions #1743 by @kamo-naoyuki
  • [Bugfix] Fix the gpu device when resuming #1731 by @kamo-naoyuki
  • [Bugfix] Fix error log for espnet2/bin/launch.py #1730 by @kamo-naoyuki
  • [Bugfix] Disable CUDNN deterministic for CTC: espnet2/asr/ctc.py #1720 by @kamo-naoyuki
  • [Bugfix] Update default.py #1698 by @zh794390558
  • [Bugfix] Fix chunk iterator and refactoring for distributed training #1685 by @kamo-naoyuki
  • [Bugfix] Update vggrnnencoder.py #1676 by @zh794390558
  • [Bugfix] [ESPnet2] chmod +x: run.sh for JSUT #1628 by @kamo-naoyuki
  • [Bugfix] [ESPnet2]Remove nlsyms when word scoring #1614 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Fix setup.sh #1596 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Fix launch.py for slurm #1588 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Fix ci for local/data.sh #1572 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Fix nj of scripts/audio/formatwavscp.sh #1550 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Use loadscpsequential in formartwavscp.py #1541 by @kamo-naoyuki
  • [Bugfix] [ESPNet2] Minor fix for CSJ recipe #1540 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Fix transformer #1539 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] fix rnn_type when bidirectional is used #1533 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Fix formatwavscp.py #1532 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Fix bug of using GPU even if CPU mode #1526 by @kamo-naoyuki
  • [Bugfix] [ESPnet2 ] Fix --accum_grad #1525 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Fix voxforge config #1511 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Bug fix of splitting files for collect_stats mode #1505 by @kamo-naoyuki
  • [Bugfix] fix to use queue.conf #1431 by @sw005320
  • [Bugfix] [ESPnet2] Fix a bug in TTS #1428 by @kan-bayashi
  • [Bugfix] [ESPnet2] Refactor Encoder and Decoder and bug fix #1427 by @kamo-naoyuki
  • [Bugfix] [ESPnet2] Fix bug of text-chars converter #1426 by @kamo-naoyuki
  • [Bugfix] Optionize trans_type in egs/ljspeech/tts2 #1789 by @kan-bayashi
  • [Bugfix] bugfix in ljspeech/tts2 #1783 by @beckgom
  • [Bugfix] missing argument for local/data_prep.sh added #1782 by @beckgom
  • [Bugfix] avoid sentencepiece==0.1.90 #1923 by @kamo-naoyuki
  • [Bugfix] FIX E523,E541,E741 #1918 by @kamo-naoyuki
  • [Bugfix] fix reverse option for cmvn #1906 by @magictron
  • [Bugfix] Error handling for Transformer with CTC-based VAD #1875 by @takenori-y
  • [Bugfix] Revert deletion of init files #1842 by @Fhrozen
  • [Bugfix] fix the missing link of tedlium3 #1841 by @sw005320
  • [Bugfix] Add test for torch>1.1 #1840 by @kamo-naoyuki
  • [Bugfix] Fix #1808: change the argument order of --batch_type for collect stat… #1810 by @kamo-naoyuki
  • [Bugfix] Change to configargparse>=1.2.1 #1803 by @kamo-naoyuki
  • [Bugfix] typo fixed for attention type #1793 by @beckgom
  • [Bugfix] fix https://github.com/espnet/espnet/issues/1780 #1784 by @qmeeus
  • [Bugfix] Fix bug of espnet2 asr_inference.py #1952 by @kamo-naoyuki
  • [Bugfix] Minor fix of import place and comments #1959 by @kan-bayashi

New Features

  • [New Features] Add utils/translate_wav.sh #1530 by @ShigekiKarita
  • [New Features] Batch beam search V2 for Transformer (no CTC) #1402 by @ShigekiKarita

Enhancement

  • [Enhancement] Support multiple sentences in synth_wav.sh #1788 by @kan-bayashi
  • [Enhancement] fix+update transducer #1760 by @b-flo

Documentation

  • [Documentation] Update notebook #1963 by @kan-bayashi
  • [Documentation] Update installation manual #1960 by @kan-bayashi
  • [Documentation] Update installation.md #1957 by @kamo-naoyuki
  • [Documentation] Add note in synth_wav.sh #1785 by @kan-bayashi
  • [Documentation] Update docs #1954 #1955 by @kamo-naoyuki
  • [Documentation] Update docs #1938 by @kamo-naoyuki
  • [Documentation] docs: added fbank link to the experiment readme #1910 by @kdubovikov

Recipe

  • [Recipe] Added some TIMIT results #1819 by @sknadig
  • [Recipe] add recipe for French Polyphone: ELRA-S0030_02 #1711 by @AdolfVonKleist
  • [Recipe] Use espnetttsfrontend #1794 by @kamo-naoyuki

CI

  • [CI] Use cache in actions #1917 by @ShigekiKarita
  • [CI] Apply black #1850 by @kamo-naoyuki
  • [CI] Create .mergify.yml #1813 by @kamo-naoyuki

Acknowledgements

Special thanks to @AdolfVonKleist, @Emrys365, @Fhrozen, @ShigekiKarita, @YosukeHiguchi, @beckgom, @b-flo, @ftshijt, @kamo-naoyuki, @kan-bayashi, @kdubovikov, @magictron, @qmeeus, @sknadig, @sw005320, @takenori-y, @yuekaizhang, @zh794390558

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.6.3

New Features

  • [New Features] VCC2020 baseline recipe #1641 by @unilight
  • [New Features] Embed defaultlm #1623 by @qmpzzpmq

Enhancement

  • [Enhancement] add test -d $(KALDI): tools/Makefile #1718 by @kamo-naoyuki
  • [Enhancement] Add option to load pretrained model in TTS #1639 by @kan-bayashi
  • [Enhancement] Add reverse_direction option to MT #1658 by @hirofumi0810

Recipe

  • [Recipe] Remove unnecessary lines on Fisher-CallHome Spanish #1650 by @hirofumi0810
  • [Recipe] Add the Aishell2 recipe for the master branch. #1615 by @pengchengguo
  • [Recipe] Reformat the RESULTS.md in vivos #1689 by @sw005320

Documentation

  • [Documentation] Added multiple GPU TIPS #1734 by @sw005320
  • [Documentation] added pure attention decoding TIPS #1725 by @sw005320

Docker

  • [Docker] Docker local updates #1677 by @Fhrozen
  • [Docker] Docker updates #1624 by @Fhrozen

Bugfix

  • [Bugfix] fix #1751 #1779 by @qmpzzpmq
  • [Bugfix] Fix v.0.3.0 pretrained Transformer model compatibility #1778 by @ShigekiKarita
  • [Bugfix] Fix torch.ctc not implemented in float16 by casting float32 #1777 by @ShigekiKarita
  • [Bugfix] Workaround for bug of configargparse==1.2 #1764 by @kamo-naoyuki
  • [Bugfix] change train_iter to be the dataloader object #1741 by @bobchennan
  • [Bugfix] fix #1634 #1719 by @kamo-naoyuki
  • [Bugfix] [VCC2020 baseline] Extra reference set #1684 by @unilight
  • [Bugfix] missing torch version in check_install.py #1675 by @beckgom
  • [Bugfix] Fix model link in the tedlium2 recipe #1662 by @sw005320
  • [Bugfix] Update Install for Pytorch version #1659 by @Fhrozen
  • [Bugfix] Fix lm compatibility for v2 #1653 by @kan-bayashi
  • [Bugfix] correct results with builtin CTC and PyTorch 1.3 in WSJ recipe #1652 by @Emrys365
  • [Bugfix] Fix lm backward compatibility #1649 by @kan-bayashi
  • [Bugfix] fix #1604 #1626 by @TitouanT
  • [Bugfix] Fix a bug in csmsc recipe #1618 by @kan-bayashi
  • [Bugfix] Update e2easrcommon.py #1735 by @zh794390558
  • [Bugfix] remove non-available options #1738 by @sw005320

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @ShigekiKarita, @TitouanT, @beckgom, @bobchennan, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @pengchengguo, @qmpzzpmq, @sw005320, @unilight, @zh794390558.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 5 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.6.2

New Features

  • [New Features] Transducer v3 (w/ transformer support for encoder/decoder) #1422 by @b-flo
  • [New Features] Improving LM training (custom optimizer, custom scheduler, Transformer LM, etc) #1246 by @ShigekiKarita

Enhancement

  • [Enhancement] Add MelGAN pretrained model and support in demo notebook #1581 by @kan-bayashi

Recipe

  • [Recipe] Update fisher-callhome results #1606 by @hirofumi0810
  • [Recipe] Update run_rnnt.sh #1602 by @qmpzzpmq
  • [Recipe] Upload Must-C models #1594 by @hirofumi0810
  • [Recipe] Upload Libri trans models #1569 by @hirofumi0810
  • [Recipe] Upload How2 models #1568 by @hirofumi0810
  • [Recipe] Add Mboshi-French corpus #1545 by @hirofumi0810
  • [Recipe] Update WSJ results using PyTorch 1.3.1 and builtin CTC #1527 by @Emrys365
  • [Recipe] [WIP] IWSLT2016 Recipe #1492 by @butsugiri
  • [Recipe] Update for Common Voice recipe & Multilingual training recipe #1485 by @ftshijt
  • [Recipe] [WIP] DiPCo Recipe #1472 by @Fhrozen

Documentation

  • [Documentation] Support markdown-table for sphinx #1611 by @kamo-naoyuki
  • [Documentation] update docs & README.md #1605 by @kamo-naoyuki
  • [Documentation] fix a link within README.md #1584 by @sw005320
  • [Documentation] Add MT result #1576 by @butsugiri
  • [Documentation] update readme to include Linux installation guides from CI #1567 by @sw005320
  • [Documentation] Update WSJ results in the main README.md #1537 by @Emrys365

Bugfix

  • [Bugfix] Fix a typo in AMI script? #1595 by @HuangZiliAndy
  • [Bugfix] ruopenstt recipe bug fix #1589 by @qmpzzpmq
  • [Bugfix] Fix pure CTC decoding #1580 by @takaaki-hori
  • [Bugfix] fix snapshot/model test condition #1577 by @IceCreamWW
  • [Bugfix] Fix IWSLT16 Script Permission #1543 by @butsugiri
  • [Bugfix] Fix bug in MT training script #1515 by @hirofumi0810
  • [Bugfix] Use Markdown table instead for WER results #1514 by @lijunzh
  • [Bugfix] Fix a compatibility problem with PyTorch 1.3.0 in ESPnet (v0.6.0) #1421 by @Emrys365

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @HuangZiliAndy, @IceCreamWW, @ShigekiKarita, @b-flo, @butsugiri, @ftshijt, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @lijunzh, @qmpzzpmq, @sw005320, @takaaki-hori.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.6.1

Happy new year!

New Features

  • [New Features] Transformer NMT #1479 by @hirofumi0810
  • [New Features] Support knowledge distillation in FastSpeech training #1415 by @kan-bayashi
  • [New Features] Support attention constraint for Tacotron 2 #1407 by @kan-bayashi

Enhancement

  • [Enhancement] Add focus rate logging in decoding #1412 by @kan-bayashi
  • [Enhancement] Support Tacotron 2 as a teacher of FastSpeech #1406 by @kan-bayashi
  • [Enhancement] Support length-weighted normalization in loss calculation #1397 by @kan-bayashi
  • [Enhancement] Transformer End-to-End Speech Translation #1348 by @hirofumi0810

Recipe

  • [Recipe] Add LM training/decoding in swbd recipe #1463 by @YosukeHiguchi
  • [Recipe] Add Fisher-CallHome asr1b recipe #1390 by @hirofumi0810
  • [Recipe] RECIPE JESC for MT #1346 by @Fhrozen

Documentation

  • [Documentation] added interspeech 2019 tutorial link and performed spell check #1476 by @sw005320
  • [Documentation] Updated README in ljspeech about FastSpeech training #1468 by @kan-bayashi
  • [Documentation] Add knowledge dist based FastSpeech link in README #1465 by @kan-bayashi

Refactoring

  • [Refactoring] Unify TTS Transformer mask with ASR Transformer #1470 by @kan-bayashi

Bugfix

  • [Bugfix] fixed a small problem in run.sh #1466 by @Peidong-Wang
  • [Bugfix] Fix wrong SC2026 fixing #1458 by @kan-bayashi
  • [Bugfix] Fix multi-encoder ASR integration test #1432 by @ShigekiKarita
  • [Bugfix] Fix wrong type float -> int #1413 by @kan-bayashi
  • [Bugfix] Fix missing key error in Tacotron2 #1408 by @kan-bayashi
  • [Bugfix] TransformerST on Fisher-Callhome #1398 by @hirofumi0810
  • [Bugfix] fix rnnlm load bug #1391 by @Cescfangs
  • [Bugfix] Fix gradient accumlation #1388 by @hirofumi0810

Acknowledgements

Special thanks to @Cescfangs, @Fhrozen, @Peidong-Wang, @ShigekiKarita, @YosukeHiguchi, @hirofumi0810, @kan-bayashi, @sw005320.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.6.0

New Features

  • [New Features] Support Parallel WaveGAN #1333 by @kan-bayashi
  • [New Features] Support save snapshot by iteration #1204 by @fanlu
  • [New Features] Multi-encoder architecture with hierarchical attention and per-encoder CTC #1193 by @ruizhilijhu
  • [New Features] Support multiple inputs #1180 by @ruizhilijhu
  • [New Features] Add E2E-ST specific modules #1139 by @hirofumi0810

Enhancement

  • [Enhancement] Fixing compatibility problems with PyTorch 1.3.0 in ESPnet (v0.5.3) #1343 by @Emrys365
  • [Enhancement] Change log level info -> warning about batchsize #1336 by @kan-bayashi
  • [Enhancement] Support batch decoding for streaming E2E #1270 by @takenori-y
  • [Enhancement] Implement attention cache in Transformer for faster decoding #1240 by @ShigekiKarita

Bugfix

  • [Bugfix] Fix pretrained model URL for master #1351 by @kan-bayashi
  • [Bugfix] Return parser in add_arguments method for transducer #1337 by @b-flo
  • [Bugfix] Disabling nonlinear activation of the last encoder layer #1323 by @simpleoier
  • [Bugfix] Fixed error: "Expected object of device type cuda but got device type cpu" in decoder of transducer #1315 by @rai4
  • [Bugfix] Fix ASR eval for TTS in the case of trans_type=phn #1368 by @kan-bayashi
  • [Bugfix] Make --preprocessconf optional in packmodel.sh #1365 by @kan-bayashi
  • [Bugfix] Remove set start method to fix #1290 #1363 by @kan-bayashi
  • [Bugfix] Fix pretrained model URL #1354 by @kan-bayashi
  • [Bugfix] Fix pretrained model URL #1350 by @kan-bayashi
  • [Bugfix] Fix TTS transformer attention weight calculation in inference #1331 by @kan-bayashi
  • [Bugfix] Fix decoding for chainer transformer #1101 by @Fhrozen

Recipe

  • [Recipe] Update libri_trans asr recipe #1344 by @hirofumi0810
  • [Recipe] Update LJSpeech to limit frequency range #1330 by @kan-bayashi
  • [Recipe] IWSLT19 Speech Translation recipe #1169 by @hirofumi0810
  • [Recipe] Must-C NMT recipe #1168 by @hirofumi0810
  • [Recipe] How2 NMT recipe #1165 by @hirofumi0810
  • [Recipe] Update how2 recipe #1148 by @hirofumi0810
  • [Recipe] Pre-trained CSJ model #1341 by @takenori-y
  • [Recipe] TTS: add FastSpeech config and result for jsut #1321 by @r9y9
  • [Recipe] Asr commonvoice recipe update #1241 by @ftshijt

Documentation

  • [Documentation] Update notebook submodule #1367 by @kan-bayashi
  • [Documentation] Fix sphinx warning of TTS modules #1366 by @kan-bayashi
  • [Documentation] Update notebook and add to Sphinx document #1364 by @kan-bayashi
  • [Documentation] Update notebook #1352 by @kan-bayashi
  • [Documentation] Doc for Chainer transformer #1017 by @Fhrozen
  • [Documentation] Update README #1342 by @takenori-y

Refactoring

  • [Refactoring] Indirect call for training method [chainer] #1256 by @Fhrozen
  • [Refactoring] Refact transformer for transformer LM #1223 by @Fhrozen
  • [Refactoring] Refine NMT #1152 by @hirofumi0810
  • [Refactoring] Small changes in chainer backend #1110 by @Fhrozen
  • [Refactoring] Format Chainer E2E transformer forward (fixed) #1034 by @Fhrozen

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @ShigekiKarita, @b-flo, @fanlu, @ftshijt, @hirofumi0810, @kan-bayashi, @r9y9, @rai4, @ruizhilijhu, @simpleoier, @takenori-y.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.4

Bugfix

  • [Bugfix] Fixed pretrained model URL in CSMSC reicpe #1314 by @kan-bayashi
  • [Bugfix] Fix CSMSC wavenet link #1298 by @kan-bayashi
  • [Bugfix] Minor fix of FastSpeech #1295 by @kan-bayashi
  • [Bugfix] [bug fixing] Using inplace maskedfill() #1273 by @Emrys365
  • [Bugfix] Fix RuntimeError in setting spawn multiple times #1267 by @kan-bayashi
  • [Bugfix] Use spawn in multiprocessing to fix #404 #1251 by @kan-bayashi

Documentation

  • [Documentation] Update README.md #1309 by @kan-bayashi
  • [Documentation] Fix docstrings #1288 by @kan-bayashi
  • [Documentation] Fixed a typo in swbd asr1 #1220 by @Shujian2015
  • [Documentation] update notebook #1219 by @ShigekiKarita

Recipe

  • [Recipe] Update VAIS1000 recipe RESULTS.md #1308 by @kan-bayashi
  • [Recipe] Fix VAIS1000 recipe #1305 by @kan-bayashi
  • [Recipe] Update CSMSC results #1299 by @kan-bayashi
  • [Recipe] Add vais1000 recipe - Vietnamese TTS #1283 by @enamoria
  • [Recipe] Add VIVOS recipe - Vietnamese ASR #1271 by @hieuthi
  • [Recipe] Add JNAS tts1 recipe #1269 by @kan-bayashi
  • [Recipe] Support Polish speakers in M-AILABS #1265 by @kan-bayashi
  • [Recipe] Add TWEB recipe #1263 by @kan-bayashi
  • [Recipe] Update M-AILABS results #1262 by @kan-bayashi
  • [Recipe] Add CSMSC reicpe #1259 by @kan-bayashi
  • [Recipe] Add JVS recipe #1258 by @kan-bayashi
  • [Recipe] Add CMU Arctic recipes #1257 by @kan-bayashi
  • [Recipe] Add M-AILABS pretrained models #1229 by @kan-bayashi

New Features

  • [New Features] Add eval-interval-epochs for the tiny dataset #1306 by @kan-bayashi
  • [New Features] ASR-based CER/WER eval for TTS #1190 by @potato-inoue

Enhancement

  • [Enhancement] Add Mandarin Pretrained Wavenet #1292 by @kan-bayashi
  • [Enhancement] Add pretrained models: JSUT and LibriTTS #1260 by @r9y9
  • [Enhancement] Improved JSUT TTS recipe #1216 by @r9y9

Acknowledgements

Special thanks to @Emrys365, @ShigekiKarita, @Shujian2015, @enamoria, @hieuthi, @kan-bayashi, @potato-inoue, @r9y9.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi about 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.3

Bugfix

  • [Bugfix] Fix a bug in building docker container #1197 by @protoget
  • [Bugfix] fixed h5py version as 2.9.0 #1183 by @ruizhilijhu
  • [Bugfix] Fix error on waveform generation by WaveNet #1170 by @r9y9
  • [Bugfix] Sort nbest_hyps without limiting them to beam size #1157 by @elgeish
  • [Bugfix] fix recursive make #1153 by @b-flo
  • [Bugfix] missing file in iwslt19 #1147 by @sw005320
  • [Bugfix] Wsj mix #1145 by @simpleoier

Enhancement

  • [Enhancement] Install warp-ctc from PyPI #1196 by @ysk24ok
  • [Enhancement] TTS: MoL WaveNet minor update #1195 by @r9y9
  • [Enhancement] Transducer v1.2 #1173 by @b-flo

New Features

  • [New Features] Add support for MoL WaveNet to synth_wav.sh #1186 by @r9y9
  • [New Features] Using pytorch dataloader for pytorch backend #1138 by @bobchennan

Recipe

  • [Recipe] dirha_wsj recipe #1179 by @ruizhilijhu
  • [Recipe] Update Russian open STT recipe for v0.5 of the dataset #1160 by @akreal
  • [Recipe] Blizzard recipe #1056 by @potato-inoue

Refactoring

  • [Refactoring] Install warpctc-pytorch from pytorch-0.4 branch when PyTorch version is 0.4.X #1162 by @ysk24ok
  • [Refactoring] using python3 as default #1159 by @zh794390558
  • [Refactoring] Fix download_from gdrive.sh on osx #1158 by @r9y9

Documentation

  • [Documentation] Fix doc/module2rst.py to use glob and remove --nowarn from travis-sphinx #1155 by @ShigekiKarita

Acknowledgements

Special thanks to @ShigekiKarita, @akreal, @b-flo, @bobchennan, @elgeish, @potato-inoue, @protoget, @r9y9, @ruizhilijhu, @simpleoier, @sw005320, @ysk24ok, @zh794390558.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.2

Documentation

  • [Documentation] Clean up TTS module docstrings #1143 by @kan-bayashi
  • [Documentation] update readme for warp-transducer #1125 by @sw005320
  • [Documentation] Fix flake8 blacklist #1107 by @ShigekiKarita

Bugfix

  • [Bugfix] Minor fix #1142 by @kan-bayashi
  • [Bugfix] Fix apex error when opt == “noam" #1134 by @kan-bayashi
  • [Bugfix] Fix model compatibility #1133 by @kan-bayashi
  • [Bugfix] Fix backward compatibility problem in PositionalEncoding by adding pre-hook to ignore self.pe #1127 by @ShigekiKarita
  • [Bugfix] fix iwslt19 recipe #1124 by @sw005320
  • [Bugfix] Fix best validation perplexity LM averaging #1122 by @akreal
  • [Bugfix] Fix bug in how2 asr1 #1117 by @hirofumi0810
  • [Bugfix] Fix: wrong variable in greedy decode #1113 by @b-flo
  • [Bugfix] Chainer fix mixed input #1096 by @Fhrozen
  • [Bugfix] Fix deleted argument atype #1095 by @Fhrozen
  • [Bugfix] Fix guided attention loss in Tacotron2 when reduction factor > 1 #1087 by @kan-bayashi
  • [Bugfix] Fix multi gpu LM issues and add hdf5 LM dataset dump #1083 by @ShigekiKarita

Enhancement

  • [Enhancement] Add stdout.pl for debugging version run.pl #1141 by @ShigekiKarita
  • [Enhancement] Update recog_wav.sh #1140 by @kan-bayashi
  • [Enhancement] Update spm_train and test it #1135 by @ShigekiKarita
  • [Enhancement] Transducer v1.1 #1129 by @b-flo
  • [Enhancement] Allow to extend the length of positional encoding at training and inference #1105 by @ShigekiKarita
  • [Enhancement] Update batchfy.py #1104 by @zh794390558
  • [Enhancement] Add PYTHONIOENCODING=UTF-8 in path.sh #1099 by @kan-bayashi
  • [Enhancement] Improve batch decoding #980 by @takaaki-hori
  • [Enhancement] Implement add_arguments method of E2E for rnn. #941 by @kamo-naoyuki

Recipe

  • [Recipe] Update swbd #1137 by @sw005320
  • [Recipe] Updated symlink in Librispeech #1130 by @kan-bayashi
  • [Recipe] Add missing lines to iwslt19 LM training data #1126 by @hirofumi0810
  • [Recipe] Add iwslt19 ASR recipe #1120 by @hirofumi0810
  • [Recipe] How2 speech translation recipe #1102 by @hirofumi0810
  • [Recipe] Must-C ASR recipe #1098 by @hirofumi0810
  • [Recipe] Must-C speech translation corpus #1085 by @hirofumi0810
  • [Recipe] Replace character-level recipe with the BPE one in iwslt18 #1079 by @hirofumi0810
  • [Recipe] Fix swbd recipe v2 #1072 by @sw005320
  • [Recipe] Updated REVERB multi-channel E2E recipe #1057 by @Xiaofei-Wang

New Features

  • [New Features] Add --train-dtype option for float16/float32/float64 precision training in pytorch ASR and LM #1119 by @ShigekiKarita
  • [New Features] transfer learning #1103 by @b-flo
  • [New Features] New beam-search framework: ScorerInterface, CPU/GPU float16/32/64 decoding, and new language models (SeqRNNLM and TransformerLM) #1092 by @ShigekiKarita
  • [New Features] Support pretrained WaveNet vocoder #1081 by @kan-bayashi
  • [New Features] RNN-Transducer #1065 by @b-flo

Acknowledgements

Special thanks to @Fhrozen, @ShigekiKarita, @Xiaofei-Wang, @akreal, @b-flo, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @sw005320, @takaaki-hori, @zh794390558.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.1

Bugfix

  • [Bugfix] Fix conda installation error #1076 by @kan-bayashi
  • [Bugfix] Minor fix batchsize log when batchsize = 0 #1068 by @kan-bayashi
  • [Bugfix] Fix spm decode #1062 by @ShigekiKarita
  • [Bugfix] Minor fix to use fastspeech in synth_wav.sh #1061 by @kan-bayashi
  • [Bugfix] Fix help message to enable line break #1059 by @kan-bayashi
  • [Bugfix] Fix tensorboard interval in validation #1054 by @ShigekiKarita
  • [Bugfix] Update E2E-ASR test #1041 by @kan-bayashi
  • [Bugfix] Fix Loss Calculation #1039 by @Fhrozen

Refactoring

  • [Refactoring] Remove unused conf #1070 by @kan-bayashi
  • [Refactoring] [Reopen] Support default arguments #1067 by @kan-bayashi
  • [Refactoring] Refactor E2E-TTS test #1042 by @kan-bayashi

CI

  • [CI] Add TTS integration test #1069 by @kan-bayashi
  • [CI] Make test smaller to speed up #1044 by @kan-bayashi
  • [CI] Separate tasks in each job of circleci #1043 by @kan-bayashi

Recipe

  • [Recipe] Add data augmentation to ami recipe #1066 by @Jzmo
  • [Recipe] Update accum_grad for a single gpu in CSJ #1050 by @kan-bayashi
  • [Recipe] add commonvoice recipe #1000 by @YosukeHiguchi
  • [Recipe] REVERB multi-channel E2E recipe #985 by @Xiaofei-Wang

New Features

  • [New Features] Support multi gpu in pytorch lm #1063 by @ShigekiKarita

Enhancement

  • [Enhancement] Use librosa's fast Griffin-Lim #1058 by @kan-bayashi
  • [Enhancement] Add option to select the integration type of speaker embedding #1047 by @kan-bayashi
  • [Enhancement] update tedlium3 recipe with transformer #1037 by @ShigekiKarita
  • [Enhancement] update tedlium2 config #1036 by @ShigekiKarita
  • [Enhancement] Support of other recipe in recog_wav.sh #1026 by @hiratake55

Acknowledgements

Special thanks to @Fhrozen, @Jzmo, @ShigekiKarita, @Xiaofei-Wang, @YosukeHiguchi, @hiratake55, @kan-bayashi.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.0

CI

  • [CI] Integration test with mini AN4 #1035 by @ShigekiKarita
  • [CI] codecov support #850 by @ShigekiKarita

Bugfix

  • [Bugfix] [Bug] Fix error calculator for report false #1032 by @Fhrozen
  • [Bugfix] fix unk scoring #1002 by @sw005320
  • [Bugfix] make tensorboard logging done every 100 iters #996 by @sw005320

Refactoring

  • [Refactoring] TTS: avoid using asr module in TTS #1031 by @r9y9
  • [Refactoring] Exit 1 when source command return 1 #1030 by @kan-bayashi
  • [Refactoring] Refactor FileReaderWrapper and FileWriterWrapper #947 by @kamo-naoyuki

Enhancement

  • [Enhancement] Use pypi sentencepiece #1029 by @ShigekiKarita
  • [Enhancement] Add log of the inference speed of TTS models #1027 by @kan-bayashi
  • [Enhancement] Add GPU decodable test for TTS modules #1025 by @kan-bayashi
  • [Enhancement] Support multi-speaker FastSpeech #1006 by @kan-bayashi
  • [Enhancement] Custom Training extensions for ASR chainer #1004 by @Fhrozen
  • [Enhancement] Support multi-speaker Transformer #1001 by @kan-bayashi
  • [Enhancement] RFC: Add keepalldataonmem option #999 by @r9y9
  • [Enhancement] Support saving of attention weights and probability in decoding #995 by @kan-bayashi
  • [Enhancement] Implement Fast Speech #848 by @kan-bayashi
  • [Enhancement] Transformer Chainer #774 by @Fhrozen
  • [Enhancement] Neural Machine Translation #563 by @hirofumi0810

Recipe

  • [Recipe] fix bugs to make a swbd recipe run #1024 by @sw005320
  • [Recipe] Add multi-speaker Transformer config in LibriTTS #1022 by @kan-bayashi
  • [Recipe] Rename RESULTS to RESULTS.md #1021 by @kan-bayashi
  • [Recipe] Clean LibriTTS RESULTS.md #1020 by @kan-bayashi
  • [Recipe] Clean LJSPeech RESULTS.md #1019 by @kan-bayashi
  • [Recipe] Update JSUT TTS RESULTS.md #1018 by @kan-bayashi
  • [Recipe] Add Transformer config in JSUT #1009 by @kan-bayashi
  • [Recipe] Update libri trans #949 by @hirofumi0810
  • [Recipe] iwslt18 NMT recipe #937 by @hirofumi0810
  • [Recipe] libri_trans NMT recipe #931 by @hirofumi0810
  • [Recipe] Add fastspeech.v2 result #925 by @kan-bayashi

Documentation

  • [Documentation] [Docstrings] Removing empty init files to avoid docs #1016 by @Fhrozen
  • [Documentation] add egs info #1015 by @sw005320
  • [Documentation] Update docstrings in espnet.nets.chainer_backend #974 by @Masao-Someki
  • [Documentation] Reformat docstrings in espnet/asr #914 by @Masao-Someki
  • [Documentation] Update TTS module’s docstrings and refactor some modules #898 by @kan-bayashi

Acknowledgements

Special thanks to @Fhrozen, @Masao-Someki, @ShigekiKarita, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @r9y9, @sw005320.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.4.3

Enhancement

  • [Enhancement] Use queue-freegpu.pl in all cmd.sh #1013

Documentation

  • [Documentation] nbsphinx support #1003
  • [Documentation] Update docstrings #994

Recipe

  • [Recipe] CSJ asr1: prettify RESULTS.md #1008
  • [Recipe] WSJ asr1: prettify RESULTS.md #1007

Bugfix

  • [Bugfix] fix Cupy Import Error #969 #1010
  • [Bugfix] Fix a bug in synthesis_wav.sh #989
  • [Bugfix] Fix lmnaverage in lang_model #988

Refactoring

  • [Refactoring] Remove "free-gpu" from *_train and create queue-freegpu.pl #938

CI

  • [ci] reduce travis jobs #1011

Acknowledgements

Special thanks to @Fhrozen @kamo-naoyuki @Magic-Bubble @ShigekiKarita @takenori-y @Xiaofei-Wang.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.4.2

Bugfix

  • [Bugfix] Fix pytorch LM GPU training without cupy #981
  • [Bugfix] make tensorboard logging done every 100 iters #966
  • [Bugfix] FiX ER calculator #955
  • [Bugfix] Fix a typo bug in computing guided attention loss #956
  • [Bugfix] run.sh should exit if sourcing path.sh return error #954

Recipe

  • [Recipe] Update Librispeech recipe #970
  • [Recipe] New RNN and Transformer result of AMI recipe(ihm) #978
  • [Recipe] BPE support for SwitchBoard & Transformer config #909
  • [Recipe] Update li10 #965
  • [Recipe] Update libri trans #949

Enhancement

  • [Enhancement] transform: expose pad_mode for logmelspectrogram #957

Acknowledgements

Special thanks to @Fhrozen, @geekboood, @hirofumi0810, @Jzmo, @naxingyu, @r9y9, @ShigekiKarita.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.4.1

Bugfix

  • [Bugfix] Fix a bug in calculateallattentions #862
  • [Bugfix] Fix bugs in frontend #875
  • [Bugfix] Fix grad noise v2 #912
  • [Bugfix] Fix plot fail #913
  • [Bugfix] Fix tgz typo #892
  • [Bugfix] Fix: Output dimension of Conv2dSubsampling #822 #921
  • [Bugfix] Fix: espnet/transform/transformation.py #866
  • [Bugfix] Fixed certain typos #893
  • [Bugfix] Modified if conditions #908
  • [Bugfix] fix bugs in grad noise #886
  • [Bugfix] CER/WER & CER_CTC in Transformer pytorch #936
  • [Bugfix] Update iwslt18 recipe #808

Documentation

  • [Documentation] Add model link #899
  • [Documentation] Document espnet tools and modules #884
  • [Documentation] Fix typo #930
  • [Documentation] Reformat docstrings in espnet/asr #914
  • [Documentation] Update CONTRIBUTING.md #880
  • [Documentation] add recipe related documentations to CONTRIBUTING.md #872
  • [Documentation] skip ci when gh-pages is deployed #901
  • [Documentation] use only conda to build doc #895

Enhancement

  • [Enhancement] Script for docker builds from the local repo #877
  • [Enhancement] Demo script for TTS #871
  • [Enhancement] Fix plot attention for chainer transformer #940
  • [Enhancement] Implement Fast Speech #848
  • [Enhancement] Move the dependency links to github from Makefile to setup.py #858
  • [Enhancement] Support new version in Docker containers #836
  • [Enhancement] gradient noise injection from std normal dis #881
  • [Enhancement] [Discussion] Create show_result.sh #874

Recipe

  • [Recipe] Add Jsut asr recipe #793
  • [Recipe] AURORA4 RESULTS.md file #835
  • [Recipe] Add Librispeech French corpus #882
  • [Recipe] Add transformer config in m_ailabs/tts1 recipe #924
  • [Recipe] Change librispeechfrench to libritrans #903
  • [Recipe] Fix: utils/show_result.sh #915
  • [Recipe] Minor update for speech translation recipe #907
  • [Recipe] Transformer for CHiME4 Single Channel #837
  • [Recipe] Update LJSpeech RESULTS.md #861
  • [Recipe] Update LJSpeech RESULTS.md #887
  • [Recipe] Update Librispeech recipe #885
  • [Recipe] Update fisher callhome spanish for speech translation #868
  • [Recipe] libri_trans NMT recipe #931

Refactoring

  • [Refactoring] Refactor TTS Transformer #865
  • [Refacotring] test: avoid using grep and sed in subprocess and use python stdlib instead #854
  • [Refactoring] Update TTS module’s docstrings and refactor some modules #898

Acknowledgements

Special thanks to @27jiangziyan, @Fhrozen, @Masao-Someki, @ShigekiKarita, @SuperGops7, @creatorscan, @hirofumi0810, @kamo-naoyuki, @lumaku, @naxingyu, @r9y9, @simpleoier, @takenori-y.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.4.0

New features and improvements

  • E2E Mulchi channels system #596
    • Changed to use pip-install for pytorch_wpe #843
  • Transformer
    • ASR chainer #655
    • ASR pytorch #690
    • TTS pytorch #752
  • Specaugment #734 #745 #754
  • Streaming attention encoder-decdoer E2E-ASR #757
    • Offline recognition demo #809
  • New batch making strategies #759
  • Guided Attention Loss #816

Important changes

  • drop python2 support
  • use utils/fix_data_dir.sh as default #660
  • CPU-only installation #677 #687 #704
  • fix to use python2 as default in travis #685
  • add CUDA_VERSION in Makefile #687
  • use Pytorch 1.0.1 as default #721
  • use yaml format configuration file #722
  • modularize TTS components #746 #815
  • use Chainer/Cupy 6.0.0 as default #753
  • reinforce CI #763
  • Google drive downloader #798
  • New scripts to pack model and get system info #790 #802
  • change the scoring in multi-speaker case from shell to python #805
  • update patience in TTS recipes #817
  • n_average option in TTS #823
    • update TTS recipes to use config files #780
  • make ngpu=1 as default for all of the recipes #800
  • deprecate egs/librispeech/tts1 recipe #806
  • maintain the pytorch warp-ctc under espnet #838

New recipes

  • AURORA4 #722 #770 #824
  • JNAS #725
  • LibriTTS #795
  • Tedlium release3 #739
    • added the model link and missing files #831
  • TIMIT #698
  • Russian Open STT #768

Recipe updates

  • Aishell
    • support Transformer #827
    • fix the indent of RESULTS.md in the aishell recipe #828
  • CSJ
    • support Transformer #737 #742 #782
  • HKUST
    • support Transformer #840
  • IWSLT18
    • add missing files for iwslt18 recipe #767
  • Librispeech
    • support Transformer #781
  • LJSpeech
    • added more samples #825 #842
    • support Transformer #752
  • Tedlium release2
    • support word LM in TEDLIUM recipe #683
    • fix duplicated line in tedlium recipe #714
    • fix a bug in the TEDLIUM recipe #771
    • support Transformer #803
  • Voxforge
    • bugfix in voxforge #684
    • unify rnn and transformer recipes for the voxforge task #769
    • support Transformer #758
    • update config files in the voxforge recipe #783
  • WSJ
    • support Specaugment #745
    • support Transformer #655 #690

Documentation

  • add citation bibtex entry for ESPnet #676
  • add NACCL paper repliation link for CMU Wilderness Multilingual Speech Dataset #717 #731
  • update library information #789
  • Add table of contents #812
  • add GPU decoding document Documentation #813
  • minibatch explanation #821

Bugfix

  • fix recognizebatch for 2d, locationreccurent, multi-head attentions for #665 and add test #681
  • fix CER/WER calculation during training #678
  • add version check for matplotlib installation #679
  • make sure hlens is tensor in recognize_batch #680
  • fix choice between pytorch and pytorch-cpu #702
  • fix merge_json behavior (#699) when no labels for #708
  • fix check_install.py #728
  • use ensure_ascii=False to make json human-readable #730
  • Fix argument name for SummaryWriter #747
  • use scikit-learn 0.20 #749
  • fix pytorch for chainer v6.0.0 #772
  • fix model compatibility #799
  • fix minor typos in the recipes #801
  • bug fix: egs/chime4/asr1_multich/conf/train.yaml #826
  • bug fix: espnet/utils/training/batchfy.py #833
  • fix to use sentencepiece v.0.1.82 #839

Acknowledegements

Special thanks to @27jiangziyan, @akreal, @bobchennan, @creatorscan, @danoneata, @Fhrozen, @gtache, @hirofumi0810, @jan-schuchardt, @jnishi, @kamo-naoyuki, @Masao-Someki, @oadams, @simpleoier, @sknadig, @ShigekiKarita, @takenori-y

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 6 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.3.1 (stable)

New improvements

  • Add instant speech recognition #581
  • Add CTC greedy decoding CER monitor #587
  • Add Streaming encoder #638
  • Add Uni-directional encoder #624 #629
  • Add model compatibility test #615 #649
  • Update fishercallhomespanish recipe #625
  • Improve swbd scoring #614 #620
  • Improve memory usage in json merge script #579
  • Improve background job failure check in decoding state #627 #643 #648
  • Separate installation of basic tools and extra tools #628

Bugfix

  • Fix CTC type selection #617 #618
  • Fix MultiProcessIterator #613
  • Fix chainer sortgrad bug
  • Fix installer #594 #595 #604 #609 #622
  • Fix WSJ-mix recipe #610 #630 #641
  • Fix remove_longshortdata.sh #646

Thank you for a lot of contributions @kamo-naoyuki, @gtache, @simpleoier, @takenori-y, @Fhrozen, @JaejinCho, @pzelasko, @zh794390558, @kan-bayashi, @sw005320.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 7 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet v.0.3.0 beta

New features and improvements

  • Support Pytorch 1.0 #553
  • Support the use of Tensorboard #506
  • Support early stopping #508
  • Support stop_stage option #539
  • Support sortgrad #550
  • Add GRU architecture #496
  • Add GPU batch decoding #318
  • Support HDF5 format instead of kaldi ark #412 #493
  • Add speech separation recipe #531
  • Add TTS recipes (German, Spanish, Italy, Japanese...) #562 #569 #519
  • Add ASR recipes #574 #519
  • Improve ASR recipes #491 #521 #546 #435 #467 #469
  • Improve speech translation recipes #468
  • Improve Python2/3 compatibility #567
  • Improve cmd.sh usage #538 #547
  • Add test scripts for shell scripts #484 #498
  • Change to use conda with Python3.7 as default #567
  • Python code modularization #440 #484

We really appreciate a lot of contributions, @gtache, @kamo-naoyuki, @hirofumi0810, @ShigekiKarita, @takenori-y, @simpleoier, @Fhrozen, @sas91, @mn5k, @JaejinCho. @Xiaofei-Wang, @jnishi, @Magic-Bubble.

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi almost 7 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet v.0.2.0 (Major update)

New feature and improvement

  • add data prefetch #340
  • add new recipes
    • IWSLT speech translation recipe #325
    • REVERB challenge recipe #359
  • add test codes
    • for checking warp ctc behaviors in the multitask mode #369
    • for a multiple GPU #362
    • for a single GPU #376
    • for read/write models #362 #376
  • add check script for python library installation #373 #389
  • improve some ASR baseline recipes by using a shallow and wide BLSTM encoder and subwords
    • librispeech #354 #386
    • CSJ #326
    • HKUST #366

Important changes

  • fix to use PyTorch 0.4.1 (stop to support PyTorch 0.3.x) #332
  • rename some functions
    • e2e_asr_attctc.py -> e2e_asr.py
    • e2e_asr_attctc_th.py -> e2e_asr_th.py
  • change the format of model.conf from pickle to JSON #342
  • remove deprecated options #336
  • unify the data converter with TTS one #343
  • unify model variable arguments between TTS and ASR #337
  • fix pytorch backend snapshot functions including the save of optimizers #362
  • avoid to use feat-to-len. Use write_utt2num_frames=true, and read utt2num instead of executing feat-to-len #339
  • refacor asr_pytorch.py and asr_chainer.py.
    • refactor the recog part in asrchainer.py and asrpytorch especially after it gets nbest. #370
    • make nets/e2e_common.py, and move some common functions there

Bug fix

  • warpctc gradient scaling (Thanks @jnishi)
  • warpctc multi-gpu bug (Thanks @jnishi)
  • undefined gpuid bug in cpu RNN training #379
  • no hypothesis bug #378
  • Python3 compatibility #375 #341 (Thanks @akreal)

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 7 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet v.0.1.5 (minor update)

  • update the Librispeech ASR recipe and use subword modeling as default.
  • attached Librispeech ASR model (librispeech_asr1.tgz):
    • RNNLM: exp/train_rnnlm_2layer_bs256_unigram2000/rnnlm.model.best
    • ASR models: exp/train_960_vggblstm_e4_subsample1_2_2_1_1_unit1024_proj1024_d1_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_unigram2000/results/{model.acc.best,model.conf}
    • performance:

| | WER (%) | |-----------|:----:| | Librispeech devclean | 5.0 | | Librispeech testclean | 5.0 |

- - when we use the above models, please insert the ASR model directory (expdir) and RNNLM model directory (lmexpdir) in run.sh as follows: ``` expdir=exp/train960vggblstme4subsample12211unit1024proj1024d1unit1024location1024aconvc10aconvf100mtlalpha0.5adadeltabs30mli800mlo150unigram2000 lmexpdir=exp/trainrnnlm2layerbs256_unigram2000

    ${decode_cmd} JOB=1:${nj} ${expdir}/${decode_dir}/log/decode.JOB.log \
        asr_recog.py \
        --ngpu ${ngpu} \
        --backend ${backend} \
        --recog-json ${feat_recog_dir}/split${nj}utt/data_${bpemode}${nbpe}.JOB.json \
        --result-label ${expdir}/${decode_dir}/data.JOB.json \
        --model ${expdir}/results/model.${recog_model}  \
        --model-conf ${expdir}/results/model.conf  \
        --beam-size ${beam_size} \
        --penalty ${penalty} \
        --maxlenratio ${maxlenratio} \
        --minlenratio ${minlenratio} \
        --ctc-weight ${ctc_weight} \
        --rnnlm ${lmexpdir}/rnnlm.model.best \
        --lm-weight ${lm_weight} \

```

Scientific Software - Peer-reviewed - Python
Published by sw005320 over 7 years ago

Software Design and User Interface of ESPnet-SE++ - ESPnet v.0.1.4

  • Added TTS recipe based on Tacotron2 egs/ljspeech/tts1
  • Extended the above TTS recipe to multispeaker TTS egs/librispeech/tts1/
  • Supported PyTorch 0.4.0
  • Added word level decoding
  • (Finally) fixed CNN (VGG) layer issues in PyTorch
  • Fixed warp CTC scaling issues in PyTorch
  • Added subword modeling based on sentence piece toolkit
  • Many bug fix
  • Updated CSJ performance

Scientific Software - Peer-reviewed - Python
Published by sw005320 over 7 years ago

Software Design and User Interface of ESPnet-SE++ - stable version for jsalt18 summer school

  • bug fix
  • improve the jsalt18e2e recipe
  • improve the JSON format
  • simplify Makefile

Scientific Software - Peer-reviewed - Python
Published by sw005320 over 7 years ago

Software Design and User Interface of ESPnet-SE++ - Change JSON format and use feature compression

  • change the JSON format to deal with multiple inputs and outputs
  • use feature compression to reduce the data I/O

Scientific Software - Peer-reviewed - Python
Published by sw005320 over 7 years ago

Software Design and User Interface of ESPnet-SE++ - Added attention visualization and jsalt18e2e recipe, and refined Librispeech recipe

Support attention visualization.

  • Added PlotAttentionReport which save attention weight as figure for each epoch.
  • Refactored test script test_e2e_model to check various attention functions

Added JSALT18 end-to-end ASR recipe

Refined the Librispeech recipe - Removed long utterances during training - Added RNNLM

Scientific Software - Peer-reviewed - Python
Published by kan-bayashi over 7 years ago

Software Design and User Interface of ESPnet-SE++ - First (test) release

First release. - CTC, attention-based encoder-decoder, and hybrid CTC/attention based end-to-end ASR - Fast/accurate training with CTC/attention multitask training - CTC/attention joint decoding to boost monotonic alignment decoding - Encoder: VGG-like CNN + BLSTM or pyramid BLSTM - Attention: Dot product, location-aware attention, variants of multihead (pytorch only) - Incorporate RNNLM/LSTMLM trained only with text data - Flexible network architecture thanks to chainer and pytorch - Kaldi style complete recipe - Support numbers of ASR benchmarks (WSJ, Switchboard, CHiME-4, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, etc.) - State-of-the-art performance in Japanese/Chinese benchmarks (comparable/superior to hybrid DNN/HMM and CTC+FST) - Moderate performance in standard English benchmarks - Support multiple GPU training

Scientific Software - Peer-reviewed - Python
Published by sw005320 over 7 years ago