Recent Releases of Software Design and User Interface of ESPnet-SE++
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202506
New Features
- [New Features][ESPnet2][ESPnet3][CI][size:XXL][lgtm] [espnet3-3] Add trainer and model #6172 by @Masao-Someki
- [New Features][ESPnet3][CI][size:XXL][lgtm] [espnet3-1] Add Data Organizer #6167 by @Masao-Someki
- [New Features][ESPnet2][size:XL] LID-1: Training and task setup #6155 by @Qingzheng-Wang
- [New Features][ESPnet2][SID][size:XL] Update SPK recipe for CN-celeb #6154 by @holvan
- [New Features][ESPnet2][SLU] Add code for training turn taking prediction model #5948 by @siddhu001
Recipe
- [Recipe][ESPnet2][size:XXL] S2T Recipe for IPAPack++: Data Preparation #6169 by @chinjouli
- [Recipe][ESPnet2][size:XL] S2T Recipe for IPAPack++: main recipe #6168 by @chinjouli
- [Recipe][ESPnet2][Codec] add: complete codec1 recipe for AudioSet and musdb18 #6068 by @whr-a
- [Recipe][ESPnet2][ASR] Additional results for the discrete ASR challenge #6067 by @juice500ml
- [Recipe][ESPnet2][Installation][SE] Add implementations of USES2 speech enhancement models #5761 by @Emrys365
Bugfix
- [Bugfix][ESPnet2][size:XS] Fix FutureWarning
torch.cuda.amp.autocast(args...)is deprecated #6190 by @KanTakahiro - [Bugfix][ESPnet2][ESPnet1] Resolve logger warnings #6117 by @emmanuel-ferdman
- [Bugfix][ESPnet2] Fix for issue #6112 Lagacy torch tensor constructor causes issue when… #6114 by @advaitvd
Documentation
- [Documentation][ESPnet1][size:S] docs: clarify CBHG encoder vs post‑net roles in Tacotron 1 #6188 by @ZhuoyanTao
- [Documentation][ESPnet3][Docker][CI][size:L] Add devcontainer change from Espnet3 #6145 by @sw005320
- [Documentation][CI][size:M] Update PULLREQUESTTEMPLATE.md #6144 by @sw005320
- [Documentation][CI][size:M] Update document to add tutorials + more easy connection to installation #6143 by @juice500ml
- [Documentation][ESPnet3][Docker][size:L][lgtm] Espnet3/devcontainer #6141 by @Masao-Someki
- [Documentation][Installation] Update Makefile #6124 by @sw005320
Refactoring
- [Refactoring][ESPnet2][size:L] Refactor ACESinger's audio segmentation #6151 by @Arllan-lanliu
- [Refactoring][ESPnet2][ESPnet1][CI][size:L][lgtm] Flake8 CI Fixes #6140 by @Fhrozen
Others
- [Others][CI][size:S][lgtm] Workaround for shellcheck v0.11.0 #6197 by @Masao-Someki
- [Others][Installation][size:XS] Update transformers installation #6191 by @Fhrozen
- [Others][ESPnet3][CI][size:L] [espnet3-2] Add Config Loading script #6171 by @Masao-Someki
- [Others][ESPnet2][ESPnet1][ESPnetEZ][Installation][size:L] [espnet3] Format files #6164 by @Masao-Someki
- [Others][ESPnet2][SE] Update BSRNN implementations to support more flexible band-split schemes #6123 by @Emrys365
- [Others][ESPnet2][Music] [SVS1] SingingGenerate and VISinger Inference Fix #6113 by @HANJionghao
- [Others][CI] FIX CI test_import #6111 by @Fhrozen
- [Others][ESPnet2] [Recipe] Create inference recipe for non-native English ASR benchmark (ALLSSTAR) #6110 by @chenehk
- [Others][Docker][Installation][CI] Torch Version Update #6095 by @Fhrozen
- [Others][ESPnet2][ASR] Add explicit typecheck for warning msg #6082 by @ftshijt
- [Others][ESPnet2][ESPnet1][SSL][size:XL] SSL Fine-tuning PR #6069 by @wanchichen
New Contributors
- @Arllan-lanliu made their first contribution in https://github.com/espnet/espnet/pull/6090
- @chinjouli made their first contribution in https://github.com/espnet/espnet/pull/6109
- @chenehk made their first contribution in https://github.com/espnet/espnet/pull/6110
- @advaitvd made their first contribution in https://github.com/espnet/espnet/pull/6114
- @whr-a made their first contribution in https://github.com/espnet/espnet/pull/6068
- @holvan made their first contribution in https://github.com/espnet/espnet/pull/6126
- @Qingzheng-Wang made their first contribution in https://github.com/espnet/espnet/pull/6155
- @ZhuoyanTao made their first contribution in https://github.com/espnet/espnet/pull/6188
- @KanTakahiro made their first contribution in https://github.com/espnet/espnet/pull/6190
Acknowledgements
Special thanks to @Arllan-lanliu, @Emrys365, @Fhrozen, @HANJionghao, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @ZhuoyanTao, @advaitvd, @chenehk, @chinjouli, @emmanuel-ferdman, @ftshijt, @holvan, @juice500ml, @siddhu001, @sw005320, @wanchichen, @whr-a.
Full Changelog: https://github.com/espnet/espnet/compare/v.202503...v.202506
Scientific Software - Peer-reviewed
- Python
Published by Fhrozen 5 months ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202503
New Features
- [New Features][ESPnet2] Add Hugging Face Front End #5913 by @taiqihe
Enhancement
- [Enhancement][ESPnet2][ESPnet1][OWSM] Improving efficiency of large-scale training #6024 by @pyf98
- [Enhancement][ESPnet2][Codec] Update scoring config to support WER/CER information with VERSA #6001 by @ftshijt
- [Enhancement][ESPnet1] Add Scaled Dot Product Attention (SDPA) from PyTorch #5994 by @pyf98
- [Enhancement][ESPnet2][ESPnet1][Installation] Support PyTorch Lightning Trainer in ESPnet2 #5954 by @pyf98
Recipe
- [Recipe][ESPnet2][ASR] cmu_kids #6017 by @wangpuup
- [Recipe][ESPnet2][ASR] EDACC dataset automatic speech recognition #5996 by @uwanny
- [Recipe][ESPnet2][ASR] ml-superb 2024 recipe #5989 by @wanchichen
- [Recipe][ESPnet2] Clotho_v2 Audio Captioning (DCASE 2023 implementation) #5967 by @Shikhar-S
Bugfix
- [Bugfix][Installation] Downgrade Transformers version #6071 by @Fhrozen
- [Bugfix][ESPnet2] Docs Fix #6065 by @Fhrozen
- [Bugfix][ESPnet2][ST] A quick fix for type error when dealing with multi-decoder (ST) #6064 by @ftshijt
- [Bugfix][ESPnet2][SID] fixed few typos on egs2/spk template #6060 by @yigitcatak
- [Bugfix][ESPnet2] Bugfix #6057 #6058 by @Masao-Someki
- [Bugfix][ESPnet2][SID] fix some minor errors in SID recipe #6045 by @shimhz
- [Bugfix][ESPnet2] Fix the deprecated amp interface #6036 by @ftshijt
- [Bugfix][ESPnet2] Add explicit weights_only=False for checkpoint loading #6035 by @ftshijt
- [Bugfix][Installation] Fix boost URL #6034 by @sw005320
- [Bugfix][Installation] Fix minor bug in Makefile #6031 by @juice500ml
- [Bugfix][ESPnet2] Logging bugfix, skip import #6023 by @Shikhar-S
- [Bugfix][ESPnet2][OWSM] Fix minor bug in OWSM-CTC preprocessor #6005 by @pyf98
- [Bugfix][ESPnet2][ASR] Minor formatting fixes in mlsuperb 2 recipe #6003 by @wanchichen
Documentation
- [Documentation][ESPnet2][CI] [Doc] Update parser on lightning_train #6020 by @Fhrozen
Others
- [Others][Installation] Transformers version check #6076 by @Fhrozen
- [Others][ESPnet2][ESPnet1] New SSL Recipe #6053 by @wanchichen
- [Others][Installation] Update tools/README.md #6030 by @popcornell
- [Others][ESPnet2][OWSM] doc: update OWSM data preparation instructions #6026 by @kalvinchang
- [Others][ESPnet2][OWSM] fix: OWSM v3.1 - remove flash attention args #6025 by @kalvinchang
- [Others][ESPnet2][SED] BEATs Tokenizer Inference #6008 by @Shikhar-S
- [Others][ESPnet2][ESPnet1] Implement unified batch decode interface for OWSM-CTC #6007 by @pyf98
- [Others][ESPnet2][TTS] [feature]finish versa eval in TTS recipe #6002 by @Whale-Dolphin
- [Others][ESPnet2][ESPnet1][Installation][CI][SED] Classification Task and AudioSet-20K #5998 by @Shikhar-S
- [Others][ESPnet2][ESPnet1][Installation][CI] remove gtn in setup.py #5982 by @sw005320
- [Others][ESPnet2][ESPnet1][SED] ESC-50 classification with BEATs #5977 by @Shikhar-S
- [Others][ESPnet2][TTS][ASR][SLU] Spoken dialogue systems demo recipe #5975 by @siddhu001
- [Others][ESPnet2][SE] fix: gradient truncation bug in pit_solver.py #5974 by @YuzhuWang-code
Acknowledgements
Special thanks to @Fhrozen, @Masao-Someki, @Shikhar-S, @Whale-Dolphin, @YuzhuWang-code, @ftshijt, @juice500ml, @kalvinchang, @popcornell, @pyf98, @shimhz, @siddhu001, @sw005320, @taiqihe, @uwanny, @wanchichen, @wangpuup, @yigitcatak.
Scientific Software - Peer-reviewed
- Python
Published by Fhrozen 9 months ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202412
New Features
- [New Features][ESPnet2][Codec] Add HiFiCodec model #5898 by @RayYuki
Enhancement
- [Enhancement][ESPnetEZ] Add missing functionalities for espnetez #5890 by @Masao-Someki
Recipe
- [Recipe][ESPnet2][ASR] My Science Tutor (MyST) Children's Conversational Speech Corpus #5964 by @eric102004
- [Recipe][ESPnet2] Feature/improve is24 asr2 #5938 by @juice500ml
- [Recipe][ESPnet2][ASR] Add asr1 recipe for libriheavy_small #5932 by @Miamoto
- [Recipe][ESPnet2][SID] Add RATS dataset for SV task #5840 by @shimhz
Bugfix
- [Bugfix][ESPnet2][Diarization] [Bugfix] fix keyword argument error in stage 7 of diar.sh #5969 by @eric102004
- [Bugfix][ESPnetEZ] Bug fixed for #5949 #5950 by @Masao-Someki
- [Bugfix][ESPnet2][ASR] removed ''continue'' statement from the for loop in run_mono.sh #5946 by @Trikaldarshi
- [Bugfix][ESPnet2] Add SWBD text processing fix #5941 by @siddhu001
- [Bugfix][ESPnet2][ESPnet1] Training code patches #5931 by @wanchichen
Documentation
- [Documentation] Fix bug in document that overflows the page #5940 by @juice500ml
- [Documentation] Update CI reference #5939 by @emmanuel-ferdman
- [Documentation] fix: collcatefn -> collatefn #5925 by @kalvinchang
- [Documentation][Docker][Installation][CI] Migration from Anaconda to conda-forge #5924 by @yoshipon
Others
- [Others][ESPnet2][Codec] Fix versa interface #5951 by @ftshijt
- [Others][ESPnet2][ESPnet1] Add OWSM-CTC #5933 by @pyf98
- [Others][ESPnet2] Recipe/ogi kids speech #5916 by @anyuyay
Acknowledgements
Special thanks to @Masao-Someki, @Miamoto, @RayYuki, @Trikaldarshi, @anyuyay, @emmanuel-ferdman, @eric102004, @ftshijt, @juice500ml, @kalvinchang, @pyf98, @shimhz, @siddhu001, @wanchichen, @yoshipon.
Scientific Software - Peer-reviewed
- Python
Published by Fhrozen about 1 year ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202409
New Features
- [New Features][ESPnet2][TTS][Codec] Support Codec feature for TTS2 task #5857 by @wyh2000
- [New Features][ESPnet2][Codec] Codec downstream task support: TTS #5763 by @jctian98
- [New Features][ESPnet2][Codec] Add Encodec features for Codec toolkit #5758 by @jctian98
- [New Features][ESPnet2][Installation][TTS] Add evaluation scripts with DiscreteSpeechMetrics. #5661 by @Takaaki-Saeki
- [New Features][ESPnet2][ASR] Integrate adapter for s3prl frontend #5609 by @Stanwang1210
- [New Features][ESPnet2][CI][OWSM] Support external dataset library for ESPnetEasy #5584 by @Masao-Someki
- [New Features][ESPnet2][CI][LM] Pr voxtlm #5472 by @soumimaiti
Enhancement
- [Enhancement][ESPnet2][SLM] MT Task in SpeechLM #5899 by @ftshijt
- [Enhancement][ESPnet2][Codec] Categorical Balnced Chunk iterator #5894 by @ftshijt
- [Enhancement][ESPnet2][ESPnet1] TransformerDecoder forwardonestep with memory_mask #5679 by @albertz
- [Enhancement][ESPnet2] Update espnet_model.py #5646 by @shen9712
Recipe
- [Recipe][ESPnet2][Music] Fixed KiSing Data Preparation #5895 by @HANJionghao
- [Recipe][ESPnet2][ASR] CORAAL asr1 recipe #5882 by @kalvinchang
- [Recipe][ESPnet2][ASR] ml_superb asr2 recipe #5866 by @Stanwang1210
- [Recipe][ESPnet2] Add more download links for ML-SUPERB #5863 by @ftshijt
- [Recipe][ESPnet2][ASR] Fix bug in asr2.sh #5859 by @juice500ml
- [Recipe][ESPnet2][Music] fix bugs in SVS1 #5851 by @South-Twilight
- [Recipe][ESPnet2][TTS] New Recipe of tts2+aishell3 #5849 by @Tsukasane
- [Recipe][ESPnet2][ASR] Espnet Multi-convformer implementation #5832 by @Darshan7575
- [Recipe][ESPnet2][SE] Update of SE functions #5825 by @Emrys365
- [Recipe][ESPnet2] SPRING-INX Recipe (Speech Lab, IIT, Madras) #5811 by @arjun-gangwar
- [Recipe][ESPnet2][TTS] Adding Hifitts recipe for espnet #5784 by @coding-phoenix-12
- [Recipe][ESPnet2][ASR] Updated results for CHiME-8 DASR baseline with new notsofar1 dev set #5771 by @popcornell
- [Recipe][ESPnet2][SE] Final model scores for TF-GridNetV2 on the Kinect-WSJ dataset #5754 by @atharva253
- [Recipe][ESPnet2] Apply normalization on validation set for CHiME-8 recipe #5749 by @popcornell
- [Recipe][ESPnet2][Need review][Codec] ESPnet-Codec decoding and Scoring #5747 by @ftshijt
- [Recipe][ESPnet2][CI][ST] Add recipe for IWSLT 2024 shared task Indic track #5744 by @cromz22
- [Recipe][ESPnet2][Music] [SVS] VISinger Plus #5741 by @jerryuhoo
- [Recipe][ESPnet2][Need review][Codec] ESPnet-codec Training and Setup #5732 by @ftshijt
- [Recipe][ESPnet2][ASR] ESPnet Recipe for ASR on the Makerere Radio Speech Corpus #5730 by @satvik-dixit
- [Recipe][ESPnet2][SE] ESPnet recipe for the Kinect-WSJ dataset #5711 by @atharva253
- [Recipe][ESPnet2][TTS][ASR][Music] Update bitrate calculation scripts for the IS24 discrete speech challenge #5677 by @ftshijt
- [Recipe][ESPnet2][ASR] Add some documents for JTubeSpeech #5663 by @sw005320
- [Recipe][ESPnet2][SID] ESPnet-SPK: add SdSV 2021 recipe #5659 by @Alexgichamba
- [Recipe][ESPnet2][ASR] Add E-Branchformer model for FLEURS #5657 by @wanchichen
- [Recipe][ESPnet2][Installation][CI][ASR] CHiME-8 DASR recipe based on CHiME-7 DASR baseline #5641 by @popcornell
- [Recipe][ESPnet2][ASR] add interspeech2024dsuchallenge/asr2 #5627 by @simpleoier
- [Recipe][ESPnet2][Installation][TTS] Discrete token-based TTS implementation #5626 by @ftshijt
Bugfix
- [Bugfix] fix: replace ellipses (...) in ESPnet-EZ Trainer documentation #5911 by @kalvinchang
- [Bugfix] Bugfix/homepage #5885 by @Masao-Someki
- [Bugfix][ESPnet2] Fix absolute paths in aishell3_tts2 #5884 by @Tsukasane
- [Bugfix] Bug fix for source link #5883 by @Masao-Someki
- [Bugfix][Installation] [CI] Add required file for g2p_en #5869 by @Fhrozen
- [Bugfix][ESPnet2] A fix to newer torch version (compatible to old version with typecheck) #5830 by @ftshijt
- [Bugfix][ESPnet2] Revert change to abs_task to keep the consistency behavior #5789 by @ftshijt
- [Bugfix][ESPnet2] Fix Whisper frontend #5760 by @siddhu001
- [Bugfix][ESPnet2][SE] Update TSE recipe egs2/librimix/tse1 #5731 by @Emrys365
- [Bugfix][ESPnet2] Fix LoRA issues when saving all parameters. #5722 by @simpleoier
- [Bugfix][ESPnet2] Fix tts packing with new spk embedding #5715 by @ftshijt
- [Bugfix][ESPnet2][TTS] Fix stage references in generated run.sh in TTS recipes #5714 by @G-Thor
- [Bugfix][ESPnet2][OWSM] fix a small issue in OWSM decode_long #5703 by @jctian98
- [Bugfix][ESPnet2][Installation] Upgrade typeguard #5702 by @sw005320
- [Bugfix][ESPnet2] Quick fix to calculation of bitrate #5692 by @ftshijt
- [Bugfix][ESPnet2][SSUM] Fix typo in summarization scoring #5688 by @YoshikiMas
- [Bugfix][ESPnet2] Update egs2/TEMPLATE/asr2/asr2.sh #5682 by @simpleoier
- [Bugfix][ESPnet2][ASR] Fix over-lengthy audio in ml_superb data prep #5678 by @ftshijt
- [Bugfix][ESPnet2] fix typo #5673 by @hiranoyu0830
- [Bugfix][Installation][ST] Fix CI Multilingual ST test #5672 by @Fhrozen
- [Bugfix][ESPnet2][SLU] Fix speed perturbation when not using transcript in slu.sh #5671 by @siddhu001
- [Bugfix][ESPnet2][SLU] Fix loading pre-trained model from transformers #5668 by @siddhu001
- [Bugfix][ESPnet2] Correct the argument errors in the whisper tokenizer language. #5666 by @pengchengguo
Documentation
- [Documentation][ESPnet2][Music] Fixed SingingGenerate docstring examples #5889 by @HANJionghao
- [Documentation][ESPnet2][CI] Separate packing and uploading stages #5752 by @cromz22
- [Documentation] Add script to make release note from milestone #5653 by @kan-bayashi
Refactoring
- [Refactoring] Modified easy to ez #5719 by @Masao-Someki
Others
- [Others][CI] Bugfix for the paper publish workflow #5909 by @juice500ml
- [Others][ESPnet2] Revision on Speechlm vocabulary extension script #5906 by @jctian98
- [Others][ESPnet2][TTS] Fix tts.sh path in aishell3 tts2 #5879 by @sw005320
- [Others][ESPnet2][Installation] Add DeepSpeed trainer for large-scale training #5856 by @jctian98
- [Others] Update README info #5852 by @ftshijt
- [Others][ESPnet2][ESPnet1][Installation] Add flash-attn #5839 by @wanchichen
- [Others][ESPnet2][Music] [SVS] fix VISinger2 typecheck error #5838 by @jerryuhoo
- [Others][ESPnet2] Fixed kising/acesinger google drive download #5834 by @HANJionghao
- [Others][ESPnet2][SID] update MFA-Conformer performance after fixing the bug in #5797 #5826 by @Jungjee
- [Others][ESPnet2][CI][SE] SE function updates: new models and support for handling various sampling frequencies #5800 by @Emrys365
- [Others][ESPnet2][SID] fix spk mfa-conformer forwarding #5797 by @series2
- [Others][ESPnet2][CI][Music] [SVS] Add CI tests for VISinger Plus #5786 by @jerryuhoo
- [Others][ESPnet2][LM] Bug fix for VoxtLM v1 recipe #5782 by @cromz22
- [Others][ESPnet2][ESPnet1] Added partially auto-regressive decoding #5769 by @Masao-Someki
- [Others][Installation][CI] Fix minor issue in anaconda downloading #5753 by @ftshijt
- [Others] [pre-commit.ci] pre-commit autoupdate #5738 by @pre-commit-ci[bot]
- [Others][ESPnet2][Installation][CI] Upgrade typeguard [Subst.] #5724 by @Fhrozen
- [Others][ESPnet2][SE] TF-GridNet training recipe for DNS Interspeech 2020 dataset #5710 by @nateanl
- [Others][ESPnet2][LM] Adding transformer_opt #5709 by @soumimaiti
- [Others][ESPnet2] Add Readme for Voxtlm #5693 by @wyh2000
- [Others][ESPnet2][SID] ESPnet-SPK: add ASVspoof19 SASV recipe #5687 by @Alexgichamba
Acknowledgements
Special thanks to @Alexgichamba, @Darshan7575, @Emrys365, @Fhrozen, @G-Thor, @HANJionghao, @Jungjee, @Masao-Someki, @South-Twilight, @Stanwang1210, @Takaaki-Saeki, @Tsukasane, @YoshikiMas, @albertz, @arjun-gangwar, @atharva253, @coding-phoenix-12, @cromz22, @ftshijt, @hiranoyu0830, @jctian98, @jerryuhoo, @juice500ml, @kalvinchang, @kan-bayashi, @nateanl, @pengchengguo, @popcornell, @pre-commit-ci[bot], @satvik-dixit, @series2, @shen9712, @siddhu001, @simpleoier, @soumimaiti, @sw005320, @wanchichen, @wyh2000.
Scientific Software - Peer-reviewed
- Python
Published by Fhrozen about 1 year ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202402
News
We're thrilled to announce that our latest update brings two groundbreaking features to our project: espnetez and ESPnet-SPK!
New Features
- [New Features][ESPnet2][ESPnet1][Installation][SE] Add diffusion-base SE model to ESPnet-SE #5572 by @LiChenda
- [New Features][ESPnet2][ESPnet1][CI][ASR] Add Bayes Risk CTC (reworked) #5519 by @jctian98
- [New Features][ESPnet2][TTS] TTS evaluation script and monitoring functionality using MOS prediction model #5485 by @Takaaki-Saeki
- [New Features][ESPnet2][SE] Add USES model for speech enhancement in diverse conditions #5482 by @Emrys365
- [New Features][ESPnet2][CI][SID] ESPnet-SPk: major update #5408 by @Jungjee
- [New Features][ESPnet2][TTS][ASR] Add espnetez #5372 by @Masao-Someki
Enhancement
- [Enhancement][ESPnet2][OWSM] Improving OWSM inference interface #5618 by @pyf98
- [Enhancement][ESPnet2][OWSM] Add OWSM v3.1 #5611 by @pyf98
- [Enhancement][ESPnet2][CI] ESPnet-SPK: Additional models, supplement readme #5559 by @Jungjee
- [Enhancement][ESPnet2][CI][SE] Add PyTorch & GPU support for DNSMOS calculation #5548 by @Emrys365
- [Enhancement][ESPnet2][TTS][SID] Speaker embedding extractor (with ESPnet pre-trained speaker model) #5579 by @ftshijt
Recipe
- [Recipe][ESPnet2][Music] Fix relative setting of train-dev-test #5623 by @ftshijt
- [Recipe][ESPnet2][SID] ESPnet-SPK: add Voxblink recipe #5583 by @Jungjee
- [Recipe][ESPnet2][SID] ESPnet-SPK: Model upload and result generation #5558 by @Jungjee
- [Recipe][ESPnet2][Music] ACE singer recipe fixing #5551 by @ftshijt
- [Recipe][ESPnet2][TTS] TTS2 Template #5541 by @ftshijt
- [Recipe][ESPnet2][ASR] fix kaldi dependency in asr2 #5540 by @ftshijt
- [Recipe][ESPnet2][CI][S2ST] CI test for s2st #5526 by @ftshijt
- [Recipe][ESPnet2][ASR] Added data.sh to SPRING-INX IITM Recipe #5522 by @arjun-gangwar
- [Recipe][ESPnet2][ASR] Add Libriheavy small and medium ASR2 recipes #5512 by @akreal
- [Recipe][ESPnet2][ASR] SPRING-INX IITM RECIPE #5505 by @arjun-gangwar
- [Recipe][ESPnet2][ASR][RNNT] Add transducer conformer configuration to commonvoice recipe #5503 by @zuazo
- [Recipe][ESPnet2][ESPnet1] add centralized data preparation for OWSM #5478 by @jctian98
- [Recipe][ESPnet1] Added clean speech results #5649 by @linan2
- [Recipe][ESPnet2][Installation][AV] AVSR recipe for Easycom Dataset #5630 by @ms-dot-k
- [Recipe][ESPnet2] Update CHiME-7 ASR1 recipe #5555 by @popcornell
- [Recipe][ESPnet2] Add E-Branchformer model checkpoint in OWSM v2 #5517 by @pyf98
- [Recipe][ESPnet2][SLU] Slue PR configs #5087 by @siddhu001
Bugfix
- [Bugfix][ESPnet2] Fix path dependency in ESPnet tutorial #5645 by @siddhu001
- [Bugfix][ESPnet2] Fix ESPnet tutorial #5644 by @siddhu001
- [Bugfix] Fix CI #5642 by @siddhu001
- [Bugfix][ESPnet2] Fixed bug by copying missing Kaldi scripts #5636 by @VicentCano
- [Bugfix][ESPnet1][ASR] CTC prefix score, fix if blank == eos #5620 by @albertz
- [Bugfix][ESPnet2] Fix minor OWSM data prep bug #5607 by @juice500ml
- [Bugfix][ESPnet2][ESPnet1][CI] E721 #5589 by @sw005320
- [Bugfix][ESPnet2][ESPnet1] Make minlenratio effective #5581 by @jctian98
- [Bugfix][ESPnet2] Fix except #5567 by @takenori-y
- [Bugfix][ESPnet1][Installation][CI] Improve error robustness of unit tests #5535 by @Emrys365
- [Bugfix][ESPnet2][AV] Fix bug in lrs3 data preprocessing #5520 by @ms-dot-k
- [Bugfix][ESPnet1] replace old mustc links with new instructions #5516 by @brianyan918
- [Bugfix][ESPnet2][ST] Fix s2st HF model uploading #5504 by @tjysdsg
- [Bugfix][ESPnet2][ESPnet1] bug fixes for must_c v2 recipe #5640 by @jasonmusespresso
Documentation
- [Documentation][ESPnet2] Add instructions for finetuning owsm #5539 by @pyf98
- [Documentation] Updated the reference of the accepted JOSS paper #5515 by @neillu23
Others
- [Others] Update Discord Invitation Link #5578 by @Fhrozen
- [Others][ESPnet2][CI] Improve error robustness of unit tests #5523 by @Emrys365
Acknowledgements
Special thanks to @Emrys365, @Fhrozen, @Jungjee, @LiChenda, @Masao-Someki, @Takaaki-Saeki, @VicentCano, @akreal, @albertz, @arjun-gangwar, @brianyan918, @ftshijt, @jasonmusespresso, @jctian98, @juice500ml, @linan2, @ms-dot-k, @neillu23, @popcornell, @pyf98, @siddhu001, @sw005320, @takenori-y, @tjysdsg, @zuazo.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 2 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202310
What's Changed
- Support arbitrary language finetune for Whisper models. by @pengchengguo in https://github.com/espnet/espnet/pull/5344
- Update Dipco Data URL by @Fhrozen in https://github.com/espnet/espnet/pull/5391
- Update readme in TEMPLATE/svs1 by @linyueqian in https://github.com/espnet/espnet/pull/5394
- add gramvaani asr recipe by @bloodraven66 in https://github.com/espnet/espnet/pull/5366
- ESPnet-SPK: sampler by @Jungjee in https://github.com/espnet/espnet/pull/5365
- Adding general data augmentation methods for speech preprocessing by @Emrys365 in https://github.com/espnet/espnet/pull/5370
- Update of several SE recipes and some minor fixes by @Emrys365 in https://github.com/espnet/espnet/pull/5401
- Reproducing MIMOIRIS by @YoshikiMas in https://github.com/espnet/espnet/pull/5409
- Kathbath asr by @bloodraven66 in https://github.com/espnet/espnet/pull/5369
- Add pytorch2.0.1 to CI by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5413
- [skip ci] Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5417
- In spec_augment.py, check whether an array is writeable before modifying it inplace by @mdecerbo in https://github.com/espnet/espnet/pull/5416
- Docker updates for local builds by @Fhrozen in https://github.com/espnet/espnet/pull/5406
- fix typo in TEMPLATE/svs1/README.md by @linyueqian in https://github.com/espnet/espnet/pull/5426
- Update install_mwerSegmenter.sh by @sw005320 in https://github.com/espnet/espnet/pull/5437
- Support Whisper-style training as a new task S2T by @pyf98 in https://github.com/espnet/espnet/pull/5120
- fix twice numpy installation issue by @kan-bayashi in https://github.com/espnet/espnet/pull/5447
- Add Whisper SOT recipe for Librimix by @LiChenda in https://github.com/espnet/espnet/pull/5371
- Update for the JOSS paper editor review by @neillu23 in https://github.com/espnet/espnet/pull/5418
- Add the VOiCES recipe for ASR by @Emrys365 in https://github.com/espnet/espnet/pull/5448
- Improve diacritic compatibility in data_prep.pl preprocessing scripts by @zuazo in https://github.com/espnet/espnet/pull/5445
- [WIP] create recipe for acesinger by @linyueqian in https://github.com/espnet/espnet/pull/5431
- Add BibleTTS recipe by @wyh2000 in https://github.com/espnet/espnet/pull/5436
- ASR2 CHiME4 & Gigaspeech Recipes by @yichen14 in https://github.com/espnet/espnet/pull/5434
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5427
- Simple fix to reduce testsluinference time by @siddhu001 in https://github.com/espnet/espnet/pull/5460
- Do not use root logger in Beamsearch by @vsd-vector in https://github.com/espnet/espnet/pull/5454
- Fix whisper test by @siddhu001 in https://github.com/espnet/espnet/pull/5464
- Add doc for OWSM by @pyf98 in https://github.com/espnet/espnet/pull/5463
- Speech-to-speech translation Task by @ftshijt in https://github.com/espnet/espnet/pull/4859
- AVSR recipes on LRS3 using pre-trained AV-HuBERT model by @ms-dot-k in https://github.com/espnet/espnet/pull/5456
- Support LoRA based large model finetuning. by @pengchengguo in https://github.com/espnet/espnet/pull/5400
- Multilingual Librispeech (MLS) refactor ASR1 recipe by @juice500ml in https://github.com/espnet/espnet/pull/5323
- Add phonemized LibriTTS ASR recipe by @akreal in https://github.com/espnet/espnet/pull/5466
- Update the Enh framework to support training with variable numbers of speakers by @Emrys365 in https://github.com/espnet/espnet/pull/5414
- speed up TFGridNet code by @zqwang7 in https://github.com/espnet/espnet/pull/5395
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5468
- ASR2 recipe on Tedlium3 dataset by @kohei0209 in https://github.com/espnet/espnet/pull/5331
- Create README.md in OWSM v1 by @pyf98 in https://github.com/espnet/espnet/pull/5489
- Update setup.py by @sw005320 in https://github.com/espnet/espnet/pull/5490
- Fix default value in ML-SUPERB by @ftshijt in https://github.com/espnet/espnet/pull/5492
- Fix bugs of Whisper SOT. by @pengchengguo in https://github.com/espnet/espnet/pull/5494
- Multilingual Librispeech ASR2 + ASR1 baselines by @juice500ml in https://github.com/espnet/espnet/pull/5441
- Add a new SE recipe combining five public corpora by @Emrys365 in https://github.com/espnet/espnet/pull/5484
- Update .mergify.yml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5502
- update version to 202310 by @kan-bayashi in https://github.com/espnet/espnet/pull/5501
New Contributors
- @linyueqian made their first contribution in https://github.com/espnet/espnet/pull/5394
- @mdecerbo made their first contribution in https://github.com/espnet/espnet/pull/5416
- @zuazo made their first contribution in https://github.com/espnet/espnet/pull/5445
- @wyh2000 made their first contribution in https://github.com/espnet/espnet/pull/5436
- @yichen14 made their first contribution in https://github.com/espnet/espnet/pull/5434
- @vsd-vector made their first contribution in https://github.com/espnet/espnet/pull/5454
- @ms-dot-k made their first contribution in https://github.com/espnet/espnet/pull/5456
- @juice500ml made their first contribution in https://github.com/espnet/espnet/pull/5323
- @kohei0209 made their first contribution in https://github.com/espnet/espnet/pull/5331
Full Changelog: https://github.com/espnet/espnet/compare/v.202308...v.202310
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 2 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202308
What's Changed
- Update tutorial by @ftshijt in https://github.com/espnet/espnet/pull/4648
- Update tutorials by @ftshijt in https://github.com/espnet/espnet/pull/4898
- add e-branchformer result for tedlium3 and add checker for text output length by @Some-random in https://github.com/espnet/espnet/pull/5130
- Limit the Numpy version (<1.24) to fix CI error temporarily. by @simpleoier in https://github.com/espnet/espnet/pull/5162
- [SVS] Add new recipes by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5158
- Update README.md of CHiME-7 DASR: fixing typos by @popcornell in https://github.com/espnet/espnet/pull/5166
- Fix typo in CONTRIBUTING.md by @eltociear in https://github.com/espnet/espnet/pull/5167
- CHiME-7 DASR: Update install_dependencies.sh, fix lhotse version by @popcornell in https://github.com/espnet/espnet/pull/5168
- Update TD-SpeakerBeam by @Emrys365 in https://github.com/espnet/espnet/pull/5155
- Add pre-trained causal speech separation model and streaming demo by @LiChenda in https://github.com/espnet/espnet/pull/5172
- KSC recipe by @khassanoff in https://github.com/espnet/espnet/pull/5171
- [SVS] Add new recipe by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5173
- Update AphasiaBank Recipe by @tjysdsg in https://github.com/espnet/espnet/pull/5104
- fix the gradient backward issue when joint training with s3prl frontend by @simpleoier in https://github.com/espnet/espnet/pull/5159
- Add installer for ParallelWaveGAN by @ftshijt in https://github.com/espnet/espnet/pull/4052
- [GAN SVS] Add VISinger2, UHifiGAN, Avocodo by @jerryuhoo in https://github.com/espnet/espnet/pull/5123
- [SVS] Update docs README.md by @South-Twilight in https://github.com/espnet/espnet/pull/5178
- Update SVS README.md by @jerryuhoo in https://github.com/espnet/espnet/pull/5180
- Adding eendss models by @soumimaiti in https://github.com/espnet/espnet/pull/5157
- 2022fall new task tutorial by @ftshijt in https://github.com/espnet/espnet/pull/5186
- [SVS] Updates for recipes by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5187
- [GAN SVS] fix phoneme predictor by @jerryuhoo in https://github.com/espnet/espnet/pull/5188
- Update generatelibrimixsd.sh by @leepeiying in https://github.com/espnet/espnet/pull/5182
- Bug fix for #5195 by @YosukeHiguchi in https://github.com/espnet/espnet/pull/5196
- [SVS] Update on recipes by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5197
- Update preprocessor.py by @sw005320 in https://github.com/espnet/espnet/pull/5200
- Minor fixes for ML-SUPERB by @ftshijt in https://github.com/espnet/espnet/pull/5202
- Quick fix for whisper specaug by @siddhu001 in https://github.com/espnet/espnet/pull/5206
- espnet-spk data preparation part by @Jungjee in https://github.com/espnet/espnet/pull/5184
- Fix M4singer multi-spk recipe by @ftshijt in https://github.com/espnet/espnet/pull/5201
- Update Dataset link for mlsuperb by @ftshijt in https://github.com/espnet/espnet/pull/5216
- Fix bug when scoretype is set to normal in mlsuperb by @ftshijt in https://github.com/espnet/espnet/pull/5217
- Add new functions and fix some bugs in SE by @Emrys365 in https://github.com/espnet/espnet/pull/5193
- Update import order by @ftshijt in https://github.com/espnet/espnet/pull/5229
- Closed CHiME-7 DASR adding evaluation inference + adding support to use diarization baseline "pre-computed" JSONs (new PR) by @popcornell in https://github.com/espnet/espnet/pull/5228
- Standalone Transducer v1.1 by @b-flo in https://github.com/espnet/espnet/pull/5140
- Small fixes for Transducer by @b-flo in https://github.com/espnet/espnet/pull/5247
- add asr2 task and librispeech recipe as an example. by @simpleoier in https://github.com/espnet/espnet/pull/5181
- fix norm compatibility in scale discriminator by @kan-bayashi in https://github.com/espnet/espnet/pull/5240
- CFSD, SECS metrics for TTS by @imdanboy in https://github.com/espnet/espnet/pull/5235
- Add new SE recipes: chime1/enh1, chime2/enh1, reverb/enh1, and wsj0_2mix/tse1 by @Emrys365 in https://github.com/espnet/espnet/pull/5246
- Fix bugs in mfa_format.py by @G-Thor in https://github.com/espnet/espnet/pull/5223
- New features for SVS by @ftshijt in https://github.com/espnet/espnet/pull/5245
- re-fix norm compatibility in scale discriminator by @kan-bayashi in https://github.com/espnet/espnet/pull/5249
- add conv1d subsampling 3 and egs2/librispeech/asr2 wavlmlarge21 kmeans (1000/2000) results by @simpleoier in https://github.com/espnet/espnet/pull/5252
- Revise the ESPnet-SE++ Joss paper to incorporate the feedback from the reviewer. by @neillu23 in https://github.com/espnet/espnet/pull/5212
- Fix a bug in score script for ML-SUPERB by @ftshijt in https://github.com/espnet/espnet/pull/5254
- Refactor prep_segments in SVS by @jerryuhoo in https://github.com/espnet/espnet/pull/5210
- A minor fix for numsplitsssl for training by @ftshijt in https://github.com/espnet/espnet/pull/5262
- [SVS] add singing tacotron by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5233
- Add script to use speaker averaged xvectors in TTS training by @G-Thor in https://github.com/espnet/espnet/pull/5244
- Fix filling of waveform_buffer with samples for streaming inference by @espnetUser in https://github.com/espnet/espnet/pull/5267
- Some name update for ml-superb by @ftshijt in https://github.com/espnet/espnet/pull/5276
- Add support for K2 pruned transducer loss by @b-flo in https://github.com/espnet/espnet/pull/5268
- Fix Transducer doc by @b-flo in https://github.com/espnet/espnet/pull/5306
- Update installation.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5291
- Update install_nkf.sh by @sw005320 in https://github.com/espnet/espnet/pull/5300
- Fix Cython version to pass the installation of libraries with Cython by @kan-bayashi in https://github.com/espnet/espnet/pull/5310
- Update README.md by @sw005320 in https://github.com/espnet/espnet/pull/5315
- Update setup.py by @sw005320 in https://github.com/espnet/espnet/pull/5316
- Migrate recipe for nit_song070 from Muskit by @wwwbxy123 in https://github.com/espnet/espnet/pull/5251
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5294
- A few updates for asr2 and hubert by @simpleoier in https://github.com/espnet/espnet/pull/5285
- Add decodeoptions and hypcleaner in evaluatewhisperinference by @pyf98 in https://github.com/espnet/espnet/pull/5272
- update pyworld version by @kan-bayashi in https://github.com/espnet/espnet/pull/5319
- fix a data preparation issue for librimix recipe. by @LiChenda in https://github.com/espnet/espnet/pull/5322
- Update README.md in egs2/librimix/tse1 and egs2/wsj0_2mix/tse1 by @Emrys365 in https://github.com/espnet/espnet/pull/5289
- fix the s3prl frontend gradient backprop bug, ensuring featuregradmult=1.0 by @simpleoier in https://github.com/espnet/espnet/pull/5297
- ESPNet-SPK part 2 - training by @Jungjee in https://github.com/espnet/espnet/pull/5258
- remove some tests in espnet1 integration test by @sw005320 in https://github.com/espnet/espnet/pull/5328
- Fix random segments by @iamanigeeit in https://github.com/espnet/espnet/pull/5274
- Skip CI for draft PR by @ftshijt in https://github.com/espnet/espnet/pull/5333
- Update cancel.yml by @kan-bayashi in https://github.com/espnet/espnet/pull/5334
- Update several SE recipes and bash scripts by @Emrys365 in https://github.com/espnet/espnet/pull/5327
- Add PULLREQUESTTEMPLATE.md by @kan-bayashi in https://github.com/espnet/espnet/pull/5340
- ESPnet-Spk part 3 - inference every epoch using EER by @Jungjee in https://github.com/espnet/espnet/pull/5314
- Minimize espnet2 integration test by @kan-bayashi in https://github.com/espnet/espnet/pull/5324
- PR Labels for CI control by @Fhrozen in https://github.com/espnet/espnet/pull/5320
- Split ci into several jobs by @kan-bayashi in https://github.com/espnet/espnet/pull/5343
- Update CONTRIBUTING.md by @sw005320 in https://github.com/espnet/espnet/pull/5335
- Update Scoring for Speech Summarization from NLG-Eval to Huggingface Evaluate by @roshansh-cmu in https://github.com/espnet/espnet/pull/5341
- Fix documentation skip CI by @Fhrozen in https://github.com/espnet/espnet/pull/5351
- Update the usage by @sw005320 in https://github.com/espnet/espnet/pull/5349
- Docker Update by @Fhrozen in https://github.com/espnet/espnet/pull/5321
- Update installation.md by @sw005320 in https://github.com/espnet/espnet/pull/5348
- Fix doc condition by @kan-bayashi in https://github.com/espnet/espnet/pull/5355
- Update issue templates by @sw005320 in https://github.com/espnet/espnet/pull/5357
- Update Contribution.md by @Fhrozen in https://github.com/espnet/espnet/pull/5352
- Fix .mergify condition by @kan-bayashi in https://github.com/espnet/espnet/pull/5354
- Reduce ffmpeg installation time in ci by @kan-bayashi in https://github.com/espnet/espnet/pull/5356
- Update CI table by @kan-bayashi in https://github.com/espnet/espnet/pull/5359
- Clean workflow files by @kan-bayashi in https://github.com/espnet/espnet/pull/5360
- Couple of tweaks for asr2.sh for the HF hub upload by @akreal in https://github.com/espnet/espnet/pull/5362
- Update TEMPLATEHFReadme.md (fix bash typo) by @akreal in https://github.com/espnet/espnet/pull/5361
- Add discrete-token ASR for LibriSpeech 100h by @akreal in https://github.com/espnet/espnet/pull/5350
- Whisper fine-tuning recipes for CHiME-4 and WSJ by @YoshikiMas in https://github.com/espnet/espnet/pull/5342
- Fix bug in ngram training in slu.sh by @siddhu001 in https://github.com/espnet/espnet/pull/5364
- Add musdb18 recipe for music source separation by @Emrys365 in https://github.com/espnet/espnet/pull/5338
- Bugfix: JETS CTCLoss by @imdanboy in https://github.com/espnet/espnet/pull/5288
- Check the value of
n_shift==upsample_factorin GAN_TTS by @imdanboy in https://github.com/espnet/espnet/pull/5299 - MFA format fix by @iamanigeeit in https://github.com/espnet/espnet/pull/5275
- add --num-workers 0 option to enable coverage to truck data loader by @kan-bayashi in https://github.com/espnet/espnet/pull/5368
- ESPnet-SPK: fix data augment by @Jungjee in https://github.com/espnet/espnet/pull/5347
- A few minor fixes for SSL by @ftshijt in https://github.com/espnet/espnet/pull/5265
- remove unused file + small typo/style by @b-flo in https://github.com/espnet/espnet/pull/5346
- ESPnet-SPK: EER validation efficiency improvement by @Jungjee in https://github.com/espnet/espnet/pull/5358
- New Architectures for ST by @brianyan918 in https://github.com/espnet/espnet/pull/4815
- [SVS] Add CI test by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5269
- Add causal LM to Hugging Face Transformers Decoder by @akreal in https://github.com/espnet/espnet/pull/5313
- Make
make_pad_maskonnx convertible by @Masao-Someki in https://github.com/espnet/espnet/pull/5326 - fix numerical error of parallel wavegan compatibility test in CI by @kan-bayashi in https://github.com/espnet/espnet/pull/5380
- Add LibriTTS-R recipe by @ShigekiKarita in https://github.com/espnet/espnet/pull/5379
- minor fix: correct wrong comments by @imdanboy in https://github.com/espnet/espnet/pull/5378
- Add quotation marks to install_datasets.sh by @qmeeus in https://github.com/espnet/espnet/pull/5387
New Contributors
- @khassanoff made their first contribution in https://github.com/espnet/espnet/pull/5171
- @leepeiying made their first contribution in https://github.com/espnet/espnet/pull/5182
- @Jungjee made their first contribution in https://github.com/espnet/espnet/pull/5184
- @wwwbxy123 made their first contribution in https://github.com/espnet/espnet/pull/5251
Full Changelog: https://github.com/espnet/espnet/compare/v.202304...v.202308
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 2 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202304
What's Changed
- Update collect stats stage so that less memory cost in Utt_mvn by @simpleoier in https://github.com/espnet/espnet/pull/4888
- Apply the latest black by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4907
- Add pytorch=1.13.1 to CI configuration by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4906
- How2 fix README, incorrect url by @roshansh-cmu in https://github.com/espnet/espnet/pull/4902
- standardized inference and number of iterations for mSuperb single lang track by @DanBerrebbi in https://github.com/espnet/espnet/pull/4905
- Fix typo in lrs/README.md by @eltociear in https://github.com/espnet/espnet/pull/4911
- MSUPERB setting update by @ftshijt in https://github.com/espnet/espnet/pull/4913
- Update test_import.yaml to install numba by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4918
- update pyopenjtalk version to 0.3.0 by @kan-bayashi in https://github.com/espnet/espnet/pull/4912
- CHiME-7 Task1 recipe by @popcornell in https://github.com/espnet/espnet/pull/4894
- Update CHiME-7 Task 1 README.md by @popcornell in https://github.com/espnet/espnet/pull/4920
- Use native CPU version of STFT on newer pytorch versions, fix librosa window size < ftt by @bmilde in https://github.com/espnet/espnet/pull/4922
- Add few shot subset for mSuperb multilingual setting by @guapaQAQ in https://github.com/espnet/espnet/pull/4923
- Fix existing bugs in the TSE task by @Emrys365 in https://github.com/espnet/espnet/pull/4915
- IAM OCR recipe updates by @kenzheng99 in https://github.com/espnet/espnet/pull/4927
- Fixing some issues with chime7-task1 baseline by @popcornell in https://github.com/espnet/espnet/pull/4925
- set default none decoder for ASR by @ftshijt in https://github.com/espnet/espnet/pull/4917
- Update inference and training setting for mSuperb multilingual model by @guapaQAQ in https://github.com/espnet/espnet/pull/4932
- Add E-Branchformer Transducer results by @pyf98 in https://github.com/espnet/espnet/pull/4933
- add tf-gridnet by @zqwang7 in https://github.com/espnet/espnet/pull/4864
- Fixes + Channel Selection for CHiME-7 Task by @popcornell in https://github.com/espnet/espnet/pull/4934
- fix extracted feature dummy generation by @roshansh-cmu in https://github.com/espnet/espnet/pull/4926
- Fix device mismatch error in GPU decoding with PyTorch 1.13 by @pyf98 in https://github.com/espnet/espnet/pull/4941
- CHiME-7 DASR MD5 checksum fix for mixer6/train_call by @popcornell in https://github.com/espnet/espnet/pull/4942
- Update showasrresult.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4943
- CHiME-7 DASR correct development results by @popcornell in https://github.com/espnet/espnet/pull/4946
- Fix 'floordiv is deprecated' warnings by @fujimotos in https://github.com/espnet/espnet/pull/4945
- Added WSLII installation instruction by @sw005320 in https://github.com/espnet/espnet/pull/4949
- Update Muskits by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4931
- Set a longer time execution threshold for related failed time-outs CI by @ftshijt in https://github.com/espnet/espnet/pull/4962
- Modify data prep for mSUPERB multilingual by @guapaQAQ in https://github.com/espnet/espnet/pull/4965
- Add E-Branchformer results in some recipes by @pyf98 in https://github.com/espnet/espnet/pull/4958
- Add 'six' as a required Python module by @fujimotos in https://github.com/espnet/espnet/pull/4964
- add msuperb linguistic analysis by @hhhaaahhhaa in https://github.com/espnet/espnet/pull/4938
- Fix a 'refchannel'-related issue in espnet2/bin/enhinference.py by @Emrys365 in https://github.com/espnet/espnet/pull/4972
- Add E-Branchformer results in slurp_entity by @pyf98 in https://github.com/espnet/espnet/pull/4971
- Add Conformer and E-Branchformer results in fisherspanishcallhome ASR by @pyf98 in https://github.com/espnet/espnet/pull/4976
- [SVS] Add Joint-training by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4977
- Update the chunk iterator for the TSE task by @Emrys365 in https://github.com/espnet/espnet/pull/4929
- update msuperb LID scoring script by @hhhaaahhhaa in https://github.com/espnet/espnet/pull/4979
- add multilingual+lid lid score generation by @hhhaaahhhaa in https://github.com/espnet/espnet/pull/4982
- Add python=3.10 to CI by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4627
- LID score v2 by @hhhaaahhhaa in https://github.com/espnet/espnet/pull/4983
- Fix ci by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4985
- Change to use Ubuntu-latest instead of Ubuntu-18.04 in CI by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4986
- Remove six by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4988
- Modify formatwavscp.py to support PCM of uint8, int32, float32, float64, etc. by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4997
- Fix Whisper tokenizer CI error by @slSeanWU in https://github.com/espnet/espnet/pull/5004
- fix s3prl upstream attribute bug by @jwrh in https://github.com/espnet/espnet/pull/5003
- [Recipe] Add iwslt22 low resource speech translation task for egs2 by @freddy5566 in https://github.com/espnet/espnet/pull/4994
- Fix typeguard version by @silvanocerza in https://github.com/espnet/espnet/pull/5009
- Add .pre-commit-config.yaml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5011
- Copy Kaldi utils/steps/sid and add a new github action to check the consistency by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4998
- Modfiy .pre-commit-config.yaml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5012
- Modify .pre-commit-config.yaml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5014
- Modify .pre-commit-config.yaml by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5015
- [Tuning] iwslt22 low-resource ST decode configuration tuning by @freddy5566 in https://github.com/espnet/espnet/pull/5019
- Modify asr.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5020
- [SVS] Improve visinger by @jerryuhoo in https://github.com/espnet/espnet/pull/5022
- Use scripts/utils/printargs.sh instead of pyscripts/utils/printargs.py by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5025
- Add docstring in extra_path.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5028
- Update installation.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5029
- Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5030
- Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5031
- Change bc to python by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5032
- Update tools/Makefile and path.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5027
- Fix for formatwavscp.py by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5038
- Add execute permission to installiceg2p.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5040
- Bug fix of #5025 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5039
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5041
- Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5042
- Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5043
- Update README.md by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5045
- Fix in gentask1data.sh from CHiME7 by @boeddeker in https://github.com/espnet/espnet/pull/4953
- Update README.md by @eml914 in https://github.com/espnet/espnet/pull/5044
- Add installers/install_ffmpeg.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5046
- Fix broken links reported by #5048 by @ShigekiKarita in https://github.com/espnet/espnet/pull/5050
- fix: resolve upgrade issues with praatio 6.0; lock praatio version by @timmahrt in https://github.com/espnet/espnet/pull/4978
- Add miniconda in gitignore by @pyf98 in https://github.com/espnet/espnet/pull/5052
- CHiME-7 DASR fixes from participants feedback by @popcornell in https://github.com/espnet/espnet/pull/4999
- Fix the condition for maxlen warning in beam search by @pyf98 in https://github.com/espnet/espnet/pull/5055
- Fixed SQLalchemy version for MFA by @Fhrozen in https://github.com/espnet/espnet/pull/5059
- Support Multi-Blank Transducer in Espnet2 by @jctian98 in https://github.com/espnet/espnet/pull/4876
- Fix chime7 DASR task1 run.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5060
- CHiME-7 DASR recipe, fix display bug for scenario-wide DER and JER by @popcornell in https://github.com/espnet/espnet/pull/5061
- Add testformatwavscpsh.bats by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5062
- Update documentation by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5063
- Support SOT training on LibriMix data. by @pengchengguo in https://github.com/espnet/espnet/pull/4861
- Update check_install.py by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5066
- Tedlium3 recipe by @Some-random in https://github.com/espnet/espnet/pull/5068
- Bug Fix: pretrained s3prl-frontend based models loaded with parameters key mismatch error by @simpleoier in https://github.com/espnet/espnet/pull/5074
- Mechanism for multi channels input using multi columns wav.scp by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5075
- Clean ML-SUPERB by @ftshijt in https://github.com/espnet/espnet/pull/5067
- CHiME-7 DASR: first diarization system based on Pyannote. by @popcornell in https://github.com/espnet/espnet/pull/5054
- Chime7-task1 diarization (updated results) by @popcornell in https://github.com/espnet/espnet/pull/5088
- Add InterCTC to E-Branchformer encoder, and the ability to save InterCTC inference output to files by @tjysdsg in https://github.com/espnet/espnet/pull/5084
- [SVS] Bug fix: sample rate by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5094
- [SVS] Extend SingingGenerate by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5100
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/espnet/espnet/pull/5080
- Add kaldi steps/libs by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5106
- Fix sentencepice version to v0.1.97 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5107
- Drop PyTorch<=1.9 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5111
- Update installers/install_kenlm.sh by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5110
- Merge */{scripts,pyscripts} into asr1/{scripts,pyscripts} by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5109
- Update ReazonSpeech training recipe for v1.1.0 by @fujimotos in https://github.com/espnet/espnet/pull/5114
- Fix typo in espnet2formatwav_scp.md by @boeddeker in https://github.com/espnet/espnet/pull/5116
- Dtype for Speechbrain by @Fhrozen in https://github.com/espnet/espnet/pull/5112
- Add test of soundfile for Makefile by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5119
- Add lm_inference for conditional text generation by @pyf98 in https://github.com/espnet/espnet/pull/5122
- CHiME-7 diarization (updated README.md) by @popcornell in https://github.com/espnet/espnet/pull/5102
- [WIP] Update Docker by @Fhrozen in https://github.com/espnet/espnet/pull/5128
- Fix several bugs and improve function design in SE by @Emrys365 in https://github.com/espnet/espnet/pull/5103
- [SVS] Update XiaoiceSing by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/5124
- Add missing filterscps scripts and note about kaldi for diarization example of minilibrispeech by @toto6038 in https://github.com/espnet/espnet/pull/5139
- Bump up the debian version to 11 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5144
- Bug fixing and improvement in SE functions by @Emrys365 in https://github.com/espnet/espnet/pull/5143
- Add data augmentation to ReazonSpeech recipe by @fujimotos in https://github.com/espnet/espnet/pull/5127
- Update error calculator for transducer by @aky15 in https://github.com/espnet/espnet/pull/5097
- Add streaming speech enhancemnt inference. by @LiChenda in https://github.com/espnet/espnet/pull/5049
- Update README.md about debian by @sw005320 in https://github.com/espnet/espnet/pull/5146
- Fix issues in split scps by @pyf98 in https://github.com/espnet/espnet/pull/5138
- fix 5148 by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5149
- fix formatwavscp.py by @kamo-naoyuki in https://github.com/espnet/espnet/pull/5150
- Add more stats to the training log by @Emrys365 in https://github.com/espnet/espnet/pull/5147
- update version to 202304 by @kan-bayashi in https://github.com/espnet/espnet/pull/5151
New Contributors
- @bmilde made their first contribution in https://github.com/espnet/espnet/pull/4922
- @guapaQAQ made their first contribution in https://github.com/espnet/espnet/pull/4923
- @zqwang7 made their first contribution in https://github.com/espnet/espnet/pull/4864
- @hhhaaahhhaa made their first contribution in https://github.com/espnet/espnet/pull/4938
- @jwrh made their first contribution in https://github.com/espnet/espnet/pull/5003
- @freddy5566 made their first contribution in https://github.com/espnet/espnet/pull/4994
- @silvanocerza made their first contribution in https://github.com/espnet/espnet/pull/5009
- @pre-commit-ci made their first contribution in https://github.com/espnet/espnet/pull/5041
- @boeddeker made their first contribution in https://github.com/espnet/espnet/pull/4953
- @timmahrt made their first contribution in https://github.com/espnet/espnet/pull/4978
- @Some-random made their first contribution in https://github.com/espnet/espnet/pull/5068
- @toto6038 made their first contribution in https://github.com/espnet/espnet/pull/5139
Full Changelog: https://github.com/espnet/espnet/compare/v.202301...v.202304
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 2 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202301
What's Changed
- Initialize VISinger branch by @ftshijt in https://github.com/espnet/espnet/pull/4683
- Update VISInger branch by @ftshijt in https://github.com/espnet/espnet/pull/4705
- Update UASR branch with latest ESPnet functions by @ftshijt in https://github.com/espnet/espnet/pull/4752
- Update uasr by @ftshijt in https://github.com/espnet/espnet/pull/4770
- Shell scripts for UASR processing by @ftshijt in https://github.com/espnet/espnet/pull/4769
- Uasr python scripts by @DongjiGao in https://github.com/espnet/espnet/pull/4791
- Update visinger by @ftshijt in https://github.com/espnet/espnet/pull/4818
- Update testcustomtransducer.py by @sw005320 in https://github.com/espnet/espnet/pull/4826
- Update asr.sh by @sw005320 in https://github.com/espnet/espnet/pull/4827
- Fixed pad mode for librosa.stft by @Masao-Someki in https://github.com/espnet/espnet/pull/4832
- Add E-Branchformer models in some recipes by @pyf98 in https://github.com/espnet/espnet/pull/4833
- Fix data prep in GigaSpeech by @pyf98 in https://github.com/espnet/espnet/pull/4836
- time sync decoding for asr by @brianyan918 in https://github.com/espnet/espnet/pull/4792
- Remove duplicated VOXFORGE in db.sh (line81 and line157) by @pyf98 in https://github.com/espnet/espnet/pull/4840
- Fix argument parsing for nonlinguisticsymbols in asr.sh by @pyf98 in https://github.com/espnet/espnet/pull/4841
- Add a warning statement when the hypo length equals to the max out length. by @pengchengguo in https://github.com/espnet/espnet/pull/4843
- Add target speaker extraction (TSE) functions by @Emrys365 in https://github.com/espnet/espnet/pull/4823
- Multilingual superb by @ftshijt in https://github.com/espnet/espnet/pull/4824
- VISinger by @jerryuhoo in https://github.com/espnet/espnet/pull/4689
- Update VISInger to latest by @ftshijt in https://github.com/espnet/espnet/pull/4849
- VISinger for singing voice synthesis by @ftshijt in https://github.com/espnet/espnet/pull/4848
- Reduce word counts for ESPnet-SE++ Joss paper by @neillu23 in https://github.com/espnet/espnet/pull/4844
- Add E-Branchformer configs and models in ASR recipes by @pyf98 in https://github.com/espnet/espnet/pull/4837
- Address Muskits updates on README by @ftshijt in https://github.com/espnet/espnet/pull/4850
- Minor fix for MSUPERB recipe by @ftshijt in https://github.com/espnet/espnet/pull/4851
- Update for the latest changes in the draft (minor changes) by @neillu23 in https://github.com/espnet/espnet/pull/4852
- Add E-Branchformer results on Librispeech by @kkim-asapp in https://github.com/espnet/espnet/pull/4856
- Update hubert implementation. by @simpleoier in https://github.com/espnet/espnet/pull/4747
- VISinger unit test by @jerryuhoo in https://github.com/espnet/espnet/pull/4855
- Minor fix to commonvoice espnet1 by @ftshijt in https://github.com/espnet/espnet/pull/4862
- [WIP] Add S4 decoder in ESPnet2 by @m-koichi in https://github.com/espnet/espnet/pull/4845
- Update hubert feature and acknowledge information in related Readmes. by @simpleoier in https://github.com/espnet/espnet/pull/4863
- Generating MFA aligments by @Fhrozen in https://github.com/espnet/espnet/pull/4803
- [WIP] EURO uasr scripts by @DongjiGao in https://github.com/espnet/espnet/pull/4846
- Update README.md related to ASR architecture by @m-koichi in https://github.com/espnet/espnet/pull/4865
- Minor fix to librimix diar recipe by @ftshijt in https://github.com/espnet/espnet/pull/4867
- Add Full Whisper Model for Finetuning by @slSeanWU in https://github.com/espnet/espnet/pull/4793
- Add torchaudio version check for HuBERT pretraining by @simpleoier in https://github.com/espnet/espnet/pull/4872
- add k2 decoder related scripts for EURO by @DongjiGao in https://github.com/espnet/espnet/pull/4868
- EURO: small fix (temporarily remove support for nbest_rescoring) by @DongjiGao in https://github.com/espnet/espnet/pull/4875
- Add description for Whisper ASR in homepage readme by @slSeanWU in https://github.com/espnet/espnet/pull/4877
- Update README.md by @eltociear in https://github.com/espnet/espnet/pull/4879
- add explanations to text tokenizing related scripts and remove unused script by @DongjiGao in https://github.com/espnet/espnet/pull/4880
- update information about source and our modification for k2 related scripts by @DongjiGao in https://github.com/espnet/espnet/pull/4881
- AphasiaBank ASR recipe by @tjysdsg in https://github.com/espnet/espnet/pull/4860
- Multilingual SUPERB update by @ftshijt in https://github.com/espnet/espnet/pull/4878
- ESPnet Unsupervised ASR (EURO project) by @ftshijt in https://github.com/espnet/espnet/pull/4774
- Support ProDiff in TTS by @Fhrozen in https://github.com/espnet/espnet/pull/4808
- Add E-Branchformer for GigaSpeech by @pyf98 in https://github.com/espnet/espnet/pull/4882
- FLEURS - Auxillary CTC conditioning tasks by @wanchichen in https://github.com/espnet/espnet/pull/4756
- Add python 3.8 requirement for Whisper & update tests by @slSeanWU in https://github.com/espnet/espnet/pull/4891
- Update some ASR results in the main readme file by @pyf98 in https://github.com/espnet/espnet/pull/4883
- Add Conv2dSubsampling1 module and test it in AphasiaBank ASR recipe by @tjysdsg in https://github.com/espnet/espnet/pull/4892
- Support x-vector extractor based on RawNet by @Takaaki-Saeki in https://github.com/espnet/espnet/pull/4884
- single language track setups by @DanBerrebbi in https://github.com/espnet/espnet/pull/4895
- fixing bug deu1 by @DanBerrebbi in https://github.com/espnet/espnet/pull/4900
- Fix dataprep issues based on updated data release via Google form by @roshansh-cmu in https://github.com/espnet/espnet/pull/4899
- Add a new EGS2 recipe 'reazonspeech' by @fujimotos in https://github.com/espnet/espnet/pull/4885
- Update version to 202301 by @kan-bayashi in https://github.com/espnet/espnet/pull/4901
New Contributors
- @DongjiGao made their first contribution in https://github.com/espnet/espnet/pull/4791
- @jerryuhoo made their first contribution in https://github.com/espnet/espnet/pull/4689
- @m-koichi made their first contribution in https://github.com/espnet/espnet/pull/4845
- @fujimotos made their first contribution in https://github.com/espnet/espnet/pull/4885
Full Changelog: https://github.com/espnet/espnet/compare/v.202211...v.202301
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 3 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202211
What's Changed
- Update muskits update by @ftshijt in https://github.com/espnet/espnet/pull/4616
- Muskit installation by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4617
- Sync Muskits branch with Master by @ftshijt in https://github.com/espnet/espnet/pull/4640
- Updates on Muskit Migration by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4631
- Update Muskits branch by @ftshijt in https://github.com/espnet/espnet/pull/4662
- Add stage 5 & stage 6 by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4649
- Muskit: rename & reorganize features by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4668
- Update Muskits branch by @ftshijt in https://github.com/espnet/espnet/pull/4671
- Muskits CI fixing by @ftshijt in https://github.com/espnet/espnet/pull/4672
- Muskits CI fix by @ftshijt in https://github.com/espnet/espnet/pull/4673
- Muskits - apply isort by @ftshijt in https://github.com/espnet/espnet/pull/4677
- Muskits CI fix by @ftshijt in https://github.com/espnet/espnet/pull/4678
- Muskit: Add tokenizer by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4676
- Muskits - various fix for CI test by @ftshijt in https://github.com/espnet/espnet/pull/4679
- Muskit: add recipe ofuton by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4681
- Muskits (CI fix) by @ftshijt in https://github.com/espnet/espnet/pull/4682
- Fix CI issue in muskits by @ftshijt in https://github.com/espnet/espnet/pull/4687
- Add dns_icassp22 Speech Enhancement Recipe by @slSeanWU in https://github.com/espnet/espnet/pull/4657
- Singing Voice Synthesis Task for ESPnet by @ftshijt in https://github.com/espnet/espnet/pull/4670
- Documentation of Tutorial and Muskits by @ftshijt in https://github.com/espnet/espnet/pull/4692
- Add tests on MacOS and Windows (only installation) by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4669
- Add missing entries in readme by @ftshijt in https://github.com/espnet/espnet/pull/4699
- Support ST without texts in source language by @sophia1488 in https://github.com/espnet/espnet/pull/4688
- Update ConvInput for Transducer by @b-flo in https://github.com/espnet/espnet/pull/4720
- Small changes for standalone Transducer by @b-flo in https://github.com/espnet/espnet/pull/4722
- Fix input block tutorial documentation for Transducer by @b-flo in https://github.com/espnet/espnet/pull/4724
- Fix HF Pytest Errors by @siddhu001 in https://github.com/espnet/espnet/pull/4737
- Update to puebla-nahuatl recipe (some minor fixes) by @ftshijt in https://github.com/espnet/espnet/pull/4713
- Add espnet2 TTS recipe on M-AILABS by @Takaaki-Saeki in https://github.com/espnet/espnet/pull/4701
- Update outdated enh config files by @Emrys365 in https://github.com/espnet/espnet/pull/4719
- add srcsos & srceos for mt task to address the index out of range w… by @simpleoier in https://github.com/espnet/espnet/pull/4736
- Add g2pkexplicitspace tokenizer by @jonghwanhyeon in https://github.com/espnet/espnet/pull/4718
- Fix JETS inference with GST (#4743) by @kan-bayashi in https://github.com/espnet/espnet/pull/4744
- Update on Muskit by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4700
- add fleurs conformer+sc-ctc results by @wanchichen in https://github.com/espnet/espnet/pull/4746
- Add recipe for OCR task on IAM handwriting dataset by @kenzheng99 in https://github.com/espnet/espnet/pull/4707
- Add Talromur2 recipe by @G-Thor in https://github.com/espnet/espnet/pull/4680
- Add multi-channel enh_asr for CHiME-4 by @YoshikiMas in https://github.com/espnet/espnet/pull/4706
- chunk_mask error by @aky15 in https://github.com/espnet/espnet/pull/4751
- fix wav2vec2 encoder mask bug by @simpleoier in https://github.com/espnet/espnet/pull/4772
- Add Hugging Face Transformers Decoder, Tokenizer and their example on SLURP by @akreal in https://github.com/espnet/espnet/pull/4099
- [Recipe PR] MELD: Multimodal EmotionLines Dataset by @realzza in https://github.com/espnet/espnet/pull/4771
- MultiIRIS follow up by @YoshikiMas in https://github.com/espnet/espnet/pull/4765
- Add CATSLU results for XLS-R with mBART-50 by @akreal in https://github.com/espnet/espnet/pull/4782
- Add MEDIA and PortMEDIA results for XLS-R with mBART-50 by @akreal in https://github.com/espnet/espnet/pull/4794
- Add SLUE-VoxPopuli results for WavLM with mBART-50 by @akreal in https://github.com/espnet/espnet/pull/4777
- Follow up for SLURP and CATSLU by @akreal in https://github.com/espnet/espnet/pull/4796
- Update README in chime4/enh_asr1 by @YoshikiMas in https://github.com/espnet/espnet/pull/4795
- fix parsing token_list by @imdanboy in https://github.com/espnet/espnet/pull/4778
- Use torchaudio functions for beamforming related operations in torch 1.12.1+ by @Emrys365 in https://github.com/espnet/espnet/pull/4638
- PIT E2E multi-speaker ASR and librimix recipe by @simpleoier in https://github.com/espnet/espnet/pull/4753
- Fix an audio format issue in some enh recipes by @YoshikiMas in https://github.com/espnet/espnet/pull/4799
- Fixing How2-2000h Data preparation and Seq Length Assert for Longformer Encoder by @roshansh-cmu in https://github.com/espnet/espnet/pull/4805
- Adding MFA scripts for LJSpeech by @iamanigeeit in https://github.com/espnet/espnet/pull/4801
- fix typo in espnet2_tutorial.md by @eltociear in https://github.com/espnet/espnet/pull/4811
- [WIP] E-Branchformer Encoder in ESPnet2 by @kkim-asapp in https://github.com/espnet/espnet/pull/4812
- Muskit update by @A-Quarter-Mile in https://github.com/espnet/espnet/pull/4783
New Contributors
- @A-Quarter-Mile made their first contribution in https://github.com/espnet/espnet/pull/4617
- @sophia1488 made their first contribution in https://github.com/espnet/espnet/pull/4688
- @kenzheng99 made their first contribution in https://github.com/espnet/espnet/pull/4707
- @realzza made their first contribution in https://github.com/espnet/espnet/pull/4771
- @iamanigeeit made their first contribution in https://github.com/espnet/espnet/pull/4801
- @eltociear made their first contribution in https://github.com/espnet/espnet/pull/4811
- @kkim-asapp made their first contribution in https://github.com/espnet/espnet/pull/4812
Full Changelog: https://github.com/espnet/espnet/compare/v.202209...v.202211
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 3 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202209
What's Changed
- Add dynamic mixing in the speech separation task. by @LiChenda in https://github.com/espnet/espnet/pull/4387
- Added test script and usage for calculate_rtf.py script to ESPnet2 tutorial page by @espnetUser in https://github.com/espnet/espnet/pull/4560
- Offline/Online (standalone) ESPnet2 Transducer by @b-flo in https://github.com/espnet/espnet/pull/4479
- Unfix matplotlib version by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4576
- use torch.finfo for dtype other than float by @wenzhe-nrv in https://github.com/espnet/espnet/pull/4584
- Update recipe for slurp-entity by @ftshijt in https://github.com/espnet/espnet/pull/4585
- Egs2 aesrc by @brianyan918 in https://github.com/espnet/espnet/pull/4592
- update checks for bias in initialization by @LiChenda in https://github.com/espnet/espnet/pull/4574
- [WIP] Update to fit the recent update in s3prl. by @simpleoier in https://github.com/espnet/espnet/pull/4593
- Unfix numpy version by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4598
- Update to fit the recent update in s3prl. by @simpleoier in https://github.com/espnet/espnet/pull/4600
- Add improved results on FLEURS dataset by @wanchichen in https://github.com/espnet/espnet/pull/4596
- Update mp4towav.sh by @jaehyun-ko in https://github.com/espnet/espnet/pull/4605
- Pass output_dir as str to wandb.init() by @jonghwanhyeon in https://github.com/espnet/espnet/pull/4607
- Support enh_s2t joint training on multi-speaker data by @Emrys365 in https://github.com/espnet/espnet/pull/4566
- Add ASR results for commonvoice zh_TW by @slSeanWU in https://github.com/espnet/espnet/pull/4612
- Fix both utt2sid and utt2lid when removing long/short data by @jonghwanhyeon in https://github.com/espnet/espnet/pull/4609
- recipe config update by @ftshijt in https://github.com/espnet/espnet/pull/4621
- Add pytorch=1.12.1 to CI configurations by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4604
- New SLU task by @siddhu001 in https://github.com/espnet/espnet/pull/4569
- Joss paper: Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing by @neillu23 in https://github.com/espnet/espnet/pull/4620
- Update conformer result of AMI corpus by @teinhonglo in https://github.com/espnet/espnet/pull/4629
- Offline/Online Branchformer Transducer by @b-flo in https://github.com/espnet/espnet/pull/4582
- Change to install numba using pip instead of conda by @kamo-naoyuki in https://github.com/espnet/espnet/pull/4637
- Add MixIT support. It is unsupervised only. Semi-supervised config is not available for now. by @simpleoier in https://github.com/espnet/espnet/pull/4619
- Add 2-pass SLU code for FSC Challenge by @siddhu001 in https://github.com/espnet/espnet/pull/4636
- CI fix and some other minor recipe fixes by @ftshijt in https://github.com/espnet/espnet/pull/4656
- Update the title of plots to be y-label vs x-label by @pyf98 in https://github.com/espnet/espnet/pull/4647
- Update VIVOS download link by @hieuthi in https://github.com/espnet/espnet/pull/4644
- Add ASR recipe of MAGICDATA mandarin read speech by @tjysdsg in https://github.com/espnet/espnet/pull/4635
- Amend to CI fix by @ftshijt in https://github.com/espnet/espnet/pull/4663
- qasr update by @massabaali7 in https://github.com/espnet/espnet/pull/4642
- Open_li110 for large-scale multilingual speech by @ftshijt in https://github.com/espnet/espnet/pull/4408
- Fix the path of calculate_rft.py by @sw005320 in https://github.com/espnet/espnet/pull/4660
- Fix importlib-metadata version by @kan-bayashi in https://github.com/espnet/espnet/pull/4686
- Cmu arctic tts pretrain finetune by @soumimaiti in https://github.com/espnet/espnet/pull/4456
- updated version to 202209 by @kan-bayashi in https://github.com/espnet/espnet/pull/4685
New Contributors
- @wenzhe-nrv made their first contribution in https://github.com/espnet/espnet/pull/4584
- @jaehyun-ko made their first contribution in https://github.com/espnet/espnet/pull/4605
- @jonghwanhyeon made their first contribution in https://github.com/espnet/espnet/pull/4607
- @slSeanWU made their first contribution in https://github.com/espnet/espnet/pull/4612
- @massabaali7 made their first contribution in https://github.com/espnet/espnet/pull/4642
- @soumimaiti made their first contribution in https://github.com/espnet/espnet/pull/4456
Full Changelog: https://github.com/espnet/espnet/compare/v.202207...v.202209
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 3 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202207
New Features
- [New Features][ESPnet1][ASR] Add DDP support for v1 ASR training. #4430 by @lazykyama
- [New Features][ESPnet2] Support tensorboard graph #4418 by @kamo-naoyuki
- [New Features][ESPnet2][ASR] Branchformer Encoder in ESPnet2 #4400 by @pyf98
- [New Features][ESPnet2][Diarization][SE] enh_diar joint model #4339 by @YushiUeda
- [New Features][ESPnet2][ESPnet1] Calculate RTF and latency in espnet2 #4382 by @espnetUser
- [New Features][ESPnet2][ESPnet1][SE] Add EnhPreprocessor for Speech Enhancement #4321 by @Emrys365
- [New Features][ESPnet2][SE] Add DPTNet and WarmupStepLR scheduler #4449 by @Emrys365
- [New Features][ESPnet2][SE] Add support for calculating losses on noise and dereverberated signals #4476 by @Emrys365
Recipe
- [Recipe][ESPnet2] Aishell-2 GPU info #4501 by @jctian98
- [Recipe][ESPnet2] Fix librispeech default path to signify auto download #4517 by @karthik19967829
- [Recipe][ESPnet2] Recipe fix for PueblaNahuatl Recipe #4522 by @ftshijt
- [Recipe][ESPnet2][ASR][README] Add Aishell-2 ASR Recipe for Espnet2 #4451 by @jctian98
- [Recipe][ESPnet2][ASR][README] Add AmericasNLP 2022 baselines #4428 by @akreal
- [Recipe][ESPnet2][ESPnet1][ASR][Installation] FLEURS ASR Recipe for ESPnet2 #4455 by @wanchichen
- [Recipe][ESPnet2][ESPnet1][ASR][README] tedxspanishcorpus egs2 recipe #4523 by @jessicah25
- [Recipe][ESPnet2][ESPnet1][ASR][SE] Adding L3DAS22 Task1 model to ESPNet-SE #3994 by @popcornell
- [Recipe][ESPnet2][ESPnet1][ST] Must_C v1 and v2 in egs2 #4306 by @brianyan918
- [Recipe][ESPnet2][README] Dcase task1 Baseline #4317 by @siddhu001
- [Recipe][ESPnet2][README] Report Aishell-2 Transducer results #4489 by @jctian98
- [Recipe][ESPnet2][README] Update language codes in AmericasNLP 2022 baseline #4441 by @akreal
- [Recipe][ESPnet2][README] Vox populi baseline #4478 by @siddhu001
- [Recipe][ESPnet2][SE] L3DAS22 enhancement recipe #4269 by @neillu23
- [Recipe][ESPnet2][SE] Update notes in the recipes for DNS challenges #4433 by @YoshikiMas
- [Recipe][ESPnet2][SE][SLU][ST] LT-Spatialized and SLURP-Spatialized combined enhancement recipe #4268 by @neillu23
- [Recipe][ESPnet2][ST] Add moses check for ST recipes #4417 by @ftshijt
- [Recipe][ESPnet2][TTS] Add talromur recipe #4379 by @G-Thor
- [Recipe][ESPnet2][TTS] Fix for issue #4401 #4402 by @G-Thor
- [Recipe][ESPnet2][TTS] add pre-trained model jets in the recipe of ljspeech, kss #4406 by @imdanboy
Bugfix
- [Bugfix][ESPnet1] fix the corrupted pretrained model #4490 by @wentaoxandry
- [Bugfix][ESPnet1][ESPnet2] Fix an4 URL #4427 by @pyf98
- [Bugfix][ESPnet1][ESPnet2][RNNT] Fix mAES with big vocab size #4312 by @b-flo
- [Bugfix][ESPnet2] Adding init.py to espnet2/diar/layers and espnet2/diar/separator #4470 by @cycentum
- [Bugfix][ESPnet2] Fix tensorboard-graph creation for multi gpu mode #4431 by @kamo-naoyuki
- [Bugfix][ESPnet2] Update char_tokenizer.py #4499 by @xiabingquan
- [Bugfix][ESPnet2][ESPnet1][ASR][LM][MT][TTS] Fix Transducer LM fusion and add Logging for Transducer inference #4327 by @chintu619
- [Bugfix][ESPnet2][SE] Fix a bug in enh unit test #4435 by @Emrys365
Enhancement
- [Enhancement][ESPnet2] Optionize graph creation #4551 by @kan-bayashi
- [Enhancement][ESPnet2][Installation][TTS] Add icelandic g2p #4384 by @G-Thor
- [Enhancement][ESPnet2][SE] Add support of test-only criterions after each epoch #4381 by @Emrys365
- [Enhancement][ESPnet2][SSL] raise more useful error in espnet2/asr/frontend/s3prl.py if s3prl is not installed #4480 by @popcornell
- [Enhancement][ESPnet2][TTS] Add JETS AlignmentModule in calculateallattentions.py #4446 by @seastar105
Refactoring
- [Refactoring][ESPnet1] Refactoring 'is_prefix' function #4530 by @jhlee9010
- [Refactoring][ESPnet2][ASR] Zero_infinity option for ctc loss #4415 by @kamo-naoyuki
Others
- [CI][ESPnet1][ESPnet2][Installation] Remove the version restriction for numpy #4419 by @kamo-naoyuki
- [CI][ESPnet2] Canged to install espnet from wheel in the test_import CI test #4471 by @kamo-naoyuki
- [CI][Installation] Temporary fixed numpy version #4464 by @kamo-naoyuki
- [Documentation] Add notes on batch size and num of GPUs in ESPnet2 documentation #4436 by @pyf98
- [Documentation][ESPnet1] Update decoder.py #4322 by @sw005320
- [Documentation][ESPnet2] Add a note to follow the installation instructions #4477 by @akreal
Acknowledgements
Special thanks to @Emrys365, @G-Thor, @YoshikiMas, @YushiUeda, @akreal, @b-flo, @brianyan918, @chintu619, @cycentum, @espnetUser, @ftshijt, @imdanboy, @jctian98, @jessicah25, @jhlee9010, @kamo-naoyuki, @kan-bayashi, @karthik19967829, @lazykyama, @neillu23, @popcornell, @pyf98, @seastar105, @siddhu001, @sw005320, @wanchichen, @wentaoxandry, @xiabingquan.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 3 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet version 202205
New Features
- [New Features][ESPnet1][ESPnet2][ASR] Add quantization in ESPnet2 for asr inference #4349 by @pyf98
- [New Features][ESPnet2][SE] Add svoice recipe for wsj0-2mix speech separation #4257 by @nateanl
- [New Features][ESPnet2][SE] Merge Deep Clustering and Deep Attractor Network to enh separator #4110 by @earthmanylf
- [New Features][ESPnet2][SE] Some improvements to current enh functions #4251 by @Emrys365
- [New Features][ESPnet2][SE][Installation] Import fastbsseval and update some time-domain losses for enh task #4256 by @LiChenda
- [New Features][ESPnet2][TTS] add e2e tts model: JETS #4364 by @imdanboy
Bugfix
- [Bugfix][ESPnet1] Fix minimum input length for Conv2dSubsampling2 in checkshortutt #4378 by @akreal
- [Bugfix][ESPnet1][ESPnet2] Minor fixes for the intermediate loss usage and Mask-CTC decoding #4374 by @YosukeHiguchi
- [Bugfix][ESPnet2] Fix #4396 #4398 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix a bug in utterance_mvn #4304 by @Emrys365
- [Bugfix][ESPnet2] Minor fix for Mask-CTC forward function #4347 by @YosukeHiguchi
- [Bugfix][ESPnet2] Wandb Minor Fix for Model Resume #4329 by @roshansh-cmu
- [Bugfix][ESPnet2] fix the enhs2ttask argument in espnet2/bin/st_inference.py #4323 by @simpleoier
- [Bugfix][ESPnet2][MT][ST] fix bug in mt/st templates for having separate token lists #4149 by @brianyan918
- [Bugfix][ESPnet2][Recipe] Fix aishell3 data preparation script #4277 by @LanceaKing
- [Bugfix][ESPnet2][SE] Fix a bug in stats aggregation when PITSolver is used #4343 by @Emrys365
- [Bugfix][ESPnet2][SE] fix for enhancement model loading compatibility #4259 by @LiChenda
- [Bugfix][ESPnet2][ST] bug fixes in ST recipes #4341 by @chintu619
- [Bugfix][ESPnet2][TTS] Fix optional data names for TTS #4355 by @kan-bayashi
- [Bugfix][ESPnet2][TTS] fix a bug in Mandarin pypinying2pphone #4206 by @WeiGodHorse
- [Bugfix][ESPnet2][TTS] fix loss = NaN in VITS with mixed precision #4356 by @kan-bayashi
- [Bugfix][ESPnet2][streaming] Add unit test to streaming ASR inference #4352 by @espnetUser
- [Bugfix][Installation] fix s3prl install by using legacy version. Temporal solution. #4399 by @simpleoier
- [Bugfix][README] Fix typo #4338 by @ftshijt
Enhancement
- [Enhancement][ESPnet1][ESPnet2][ASR][SE][SLU][ST] enh_s2t joint model #4226 by @simpleoier
- [Enhancement][ESPnet2] Add progress bar to phonemization #4320 by @G-Thor
- [Enhancement][ESPnet2][MT] Update showtranslationresult.sh to show all decoding results under the given exp directory #4330 by @pyf98
Recipe
- [Recipe][ESPnet1][ASR] Accented English Speech Recognition Challenge 2020 recipe (AESRC2020) #3898 by @brianyan918
- [Recipe][ESPnet1][ESPnet2][ASR][README][Recipe] Add MediaSpeech ASR recipe #4183 by @AshibaWu
- [Recipe][ESPnet2][ASR][README] recipee for Microsoft speech corpus for Indian Languages #4191 by @navya-yarrabelly
- [Recipe][ESPnet2][ASR][README] Accented French Openslr57 ASR recipe (ESPnet2) (part of Homework3 MNLP) #4280 by @DanBerrebbi
- [Recipe][ESPnet2][ASR][README] Add Mask-CTC results #4180 by @YosukeHiguchi
- [Recipe][ESPnet2][ASR][README] Add ml_openslr63 ASR recipe #4173 by @bharaniuk
- [Recipe][ESPnet2][ASR][README] Adding new recipe for Burmese (OpenSLR80) #4182 by @JainSameer06
- [Recipe][ESPnet2][ASR][README] add chime6 recipe #4332 by @simpleoier
- [Recipe][ESPnet2][ASR][SE][README] add egs2/chime4/enh_asr1 recipe and results #4316 by @simpleoier
- [Recipe][ESPnet2][README][RNNT] updated librispeech-asr with rnn-t results #4281 by @chintu619
- [Recipe][ESPnet2][README][SE] 2021 Clarity Challenge recipe #4210 by @popcornell
- [Recipe][ESPnet2][README][SE] Add AISHELL-4 ENH recipe #4249 by @Emrys365
- [Recipe][ESPnet2][README][SE] Add ConferencingSpeech 2021 recipe to egs2 #4192 by @Emrys365
- [Recipe][ESPnet2][README][SE] Add ICASSP2021 DNS Challenge 2 recipe #4253 by @YoshikiMas
- [Recipe][ESPnet2][README][SE] Add INTERSPEECH 2021 DNS Challenge 3 recipe #4238 by @YoshikiMas
- [Recipe][ESPnet2][README][SE] Add results of ICASSP2021 DNS Challenge 2 recipe #4309 by @YoshikiMas
- [Recipe][ESPnet2][README][SE] Rename egs2/clarity21/enh_2021 to egs2/clarity21/enh1 #4328 by @Emrys365
- [Recipe][ESPnet2][README][SE] add convtasnet recipe for dns_ins20 #4314 by @muqiaoy
- [Recipe][ESPnet2][README][SLU] Harpervalley recipe #4315 by @YushiUeda
- [Recipe][ESPnet2][README][SLU] SLUE Voxpopuli base recipe #4262 by @siddhu001
- [Recipe][ESPnet2][README][ST] CoVOST2 recipes #4300 by @ftshijt
- [Recipe][ESPnet2][SLU][README] Update SLU results for ICASSP #4283 by @siddhu001
Others
- [CI][Docker] Github Action Trigger Docker Build #4295 by @Fhrozen
- [CI][Docker] Github Action for Docker build #4219 by @Fhrozen
- [CI][ESPnet1][ESPnet2][Installation][README] Add isort checking to the CI tests #4372 by @kamo-naoyuki
- [CI][ESPnet1][ESPnet2][Installation][README][mergify] Add pytorch=1.10.2 and 1.11.0 to ci configurations #4348 by @kamo-naoyuki
- [CI][ESPnet2][ASR][SE] add integration test and fix the decoding in enhasr and enhst #4310 by @simpleoier
- [CI][ESPnet2][New Features][SLU][ST][streaming] Add streaming ST/SLU #4243 by @D-Keqi
- [CI][ESPnet2][ST] Add Test Functions for ST Train and Inference #4324 by @ftshijt
- [CI][Installation] update install_pesq.sh #4265 by @LiChenda
- [Documentation][ESPnet2][README][TTS] Minor update for JETS #4369 by @kan-bayashi
- [Documentation][README] Change the order of README #4289 by @ftshijt
- [Documentation][README] Update README.md #4284 by @sw005320
Acknowledgements
Special thanks to @AshibaWu, @D-Keqi, @DanBerrebbi, @Emrys365, @Fhrozen, @G-Thor, @JainSameer06, @LanceaKing, @LiChenda, @WeiGodHorse, @YoshikiMas, @YosukeHiguchi, @YushiUeda, @akreal, @bharaniuk, @brianyan918, @chintu619, @earthmanylf, @espnetUser, @ftshijt, @imdanboy, @kamo-naoyuki, @kan-bayashi, @muqiaoy, @nateanl, @navya-yarrabelly, @popcornell, @pyf98, @roshansh-cmu, @siddhu001, @simpleoier, @sw005320.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 3 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 202204
News
From this version, we decided to use date-based versioning, e.g., v.202204.
New Features
- [New Features][ESPnet1] added learnable fourier features #4029 by @popcornell
- [New Features][ESPnet1][ESPnet2][ASR] Restricted Self Attention for E2E Speech Summarization #4071 by @roshansh-cmu
- [New Features][ESPnet1][Installation][README] add lrs avsr recipe #4104 by @wentaoxandry
- [New Features][ESPnet1][README] add lip reading sentences dataset code #4074 by @wentaoxandry
- [New Features][ESPnet2][ASR] [ESPnet2] Intermediate/Self-conditioned CTC #4084 by @YosukeHiguchi
- [New Features][ESPnet2][ASR] [WIP] [ESPnet2] Mask-CTC #4158 by @YosukeHiguchi
- [New Features][ESPnet2][ASR][README] Add stochastic depth to conformer and share results on LibriSpeech 960h #4142 by @pyf98
- [New Features][ESPnet2][MT] MT task for espnet2 with IWSLT14 recipe #4111 by @siddalmia
- [New Features][ESPnet2][README][SE] Add DC-CRN complex masking and spectral mapping approach for speech enhancement #4127 by @Emrys365
- [New Features][ESPnet2][README][SE] Add DCCRN separator #4097 by @Johnson-Lsx
- [New Features][ESPnet2][README][SE] Add a new separator for speech enhancement/separation tasks #4062 by @LiChenda
- [New Features][ESPnet2][README][SE] Add iFaSNet for enhancement/separation tasks. #4130 by @LiChenda
- [New Features][ESPnet2][SE] Refactor DNN_Beamformer in espnet2 and add new beamformers #4082 by @Emrys365
Enhancement
- [Enhancement][ESPnet2] Add an optional suffix to the averaged model file name #4067 by @pyf98
- [Enhancement][ESPnet2] Update perturbdatadir_speed.sh #4091 by @AmirHussein96
- [Enhancement][ESPnet2][ASR] Add tests for Intermediate/Self-conditioned CTC #4117 by @YosukeHiguchi
- [Enhancement][ESPnet2][TTS] Add option to use norm. feats over denorm. #4250 by @G-Thor
Recipe
- [Recipe][ESPnet1][RNNT] [ESPNET1] Add the results of conformer-transducer for Librispeech #4080 by @eesungkim
- [Recipe][ESPnet2][ASR] Add ASR recipe for VCTK dataset based on TTS's dataprep. #4088 by @kashikashi
- [Recipe][ESPnet2][ASR] Add new conformer config with hop length 160 for LibriSpeech 960h #4162 by @pyf98
- [Recipe][ESPnet2][ASR] Add new zh_openslr38 ASR recipe #4181 by @cuichenx
- [Recipe][ESPnet2][ASR] Add transformer results for LibriSpeech 100h #4089 by @pyf98
- [Recipe][ESPnet2][ASR] Added Marathi OpenSLR 64 recipe #4179 by @SujaySKumar
- [Recipe][ESPnet2][ASR] Added recipe for Microsoft Speech Corpus (Indian languages) #4194 by @chintu619
- [Recipe][ESPnet2][ASR] Automatic lyric recognition Recipe #4129 by @ftshijt
- [Recipe][ESPnet2][ASR] ESPNET - LRS3 Recepie #4101 by @gdebayan
- [Recipe][ESPnet2][ASR] bengali asr model with no finetuning #4047 by @dzeinali
- [Recipe][ESPnet2][MT] IWSLT'14 Results using ESPnet2-MT #4132 by @pyf98
- [Recipe][ESPnet2][README] Mandarin ISO id should be CMN instead of ZHO #4125 by @xinjli
- [Recipe][ESPnet2][README] Update README.md #4037 by @dzeinali
- [Recipe][ESPnet2][README] Update README.md #4121 by @dzeinali
- [Recipe][ESPnet2][README] Update README.md for How2 2000h ASR,SUM #4155 by @roshansh-cmu
- [Recipe][ESPnet2][RNNT] Create decodernntconformer.yaml #4058 by @sw005320
- [Recipe][ESPnet2][RNNT] Create trainrnntconformer.yaml #4057 by @sw005320
- [Recipe][ESPnet2][SLU] Add IEMOCAP results and configs #4100 by @YushiUeda
- [Recipe][ESPnet2][SLU] Add new config and support for computing WER in SLUE-VoxCeleb #4152 by @siddhu001
- [Recipe][ESPnet2][SLU] Add sentiment data preparation for IEMOCAP #4065 by @YushiUeda
- [Recipe][ESPnet2][SLU] ESPnet2 swbd_sentiment recipe #4134 by @YushiUeda
- [Recipe][ESPnet2][ST] egs2/iwslt22_dialect #4013 by @brianyan918
Bugfix
- [Bugfix][CI][ESPnet2] Fix CI test failures related to torch_complex 0.4.0 #4112 by @Emrys365
- [Bugfix][CI][Installation] fix doc ci by pinning jinja version #4239 by @xinjli
- [Bugfix][ESPnet2] Fix n-gram decoding #4168 by @sw005320
- [Bugfix][ESPnet2] bug fixes and efficient train/dev split in data prep of Microsoft Indian Languages recipe #4196 by @chintu619
- [Bugfix][ESPnet2] fix errors in configs of librispeech ssl frontends #4098 by @simpleoier
- [Bugfix][ESPnet2][ASR][ST] [bug patch] egs2/iwslt22_dialect #4049 by @brianyan918
- [Bugfix][ESPnet2][MT][ST] Fix joint tokenization in st.sh #4143 by @pyf98
- [Bugfix][ESPnet2][MT][ST] scoring fixes MT and ST #4146 by @siddalmia
- [Bugfix][ESPnet2][TTS] Fix speaker normalization #4229 by @LanceaKing
- [Bugfix][Installation] set gtn version #4122 by @brianyan918
- [Bugfix][ESPnet1][ESPnet2] minor fixes in ST in espnet2 #4056 by @siddalmia
Others
- [CI] Simplify vocoder compatibility test #4061 by @kan-bayashi
- [CI][Documentation] Fix notebook in the official doc. #4171 by @ShigekiKarita
- [Docker] Docker Updates #4064 by @Fhrozen
- [Documentation] Add a checklist for PRs on recipe #4053 by @ftshijt
- [Documentation] README Update for E2E Speech Summarization #4071 #4150 by @roshansh-cmu
- [Documentation] Update the example PyTorch version in Installation doc #4116 by @pyf98
- [Documentation] [documentation] fix minor typo in installation.md #4164 by @JDongian
- [Documentation][ESPnet1] fix typo #4044 by @ooyamatakehisa
- [Documentation][ESPnet1][ESPnet2][ASR] Add Huggingface-cli usage #4027 by @karthik19967829
Acknowledgements
Special thanks to @AmirHussein96, @Emrys365, @Fhrozen, @G-Thor, @JDongian, @Johnson-Lsx, @LanceaKing, @LiChenda, @ShigekiKarita, @SujaySKumar, @YosukeHiguchi, @YushiUeda, @brianyan918, @chintu619, @cuichenx, @dzeinali, @eesungkim, @ftshijt, @gdebayan, @kan-bayashi, @karthik19967829, @kashikashi, @ooyamatakehisa, @popcornell, @pyf98, @roshansh-cmu, @siddalmia, @siddhu001, @simpleoier, @sw005320, @wentaoxandry, @xinjli.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 3 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.6
New Features
- [New Features][ESPnet2][TTS][Installation][README] [TTS] Support python-based toolkit for xvector extractors #4016 by @Fhrozen
- [New Features][ESPnet2] Add SpecAug2 which supports variable maximum width in time masking #3902 by @pyf98
Recipe
- [Recipe][ESPnet1][ASR] Add librispeech-100h recipe #3997 by @YosukeHiguchi
- [Recipe][ESPnet1][ASR] Update egs/librispeech_100 #4036 by @YosukeHiguchi
- [Recipe][ESPnet2][ASR][README] Scoring Mandarin / English separately for the SEAME corpus #3976 by @vectominist
- [Recipe][ESPnet2][ASR][README] update LibriSpeech Pretrained models with SSLRs: results and huggingf… #3979 by @simpleoier
- [Recipe][ESPnet2][ASR][README][ST] Speech translation framework (merging into master) #3987 by @ftshijt
- [Recipe][ESPnet2][ASR][TTS] Update two recipes (googlei18n and hub4_spanish) #3895 by @ftshijt
- [Recipe][ESPnet2][SLU][README] updated the results of Slue voxceleb #3929 by @siddhu001
- [Recipe][ESPnet2][ST] Update the default setting for st #3993 by @ftshijt
Bugfix
- [Bugfix][ESPnet1][RNNT] Fix bug for Conformer-T #4020 by @YosukeHiguchi
- [Bugfix][ESPnet2][Diarization] Diarization: fix for convolutional input layer in the encoder #3957 by @alumae
- [Bugfix][ESPnet2][Diarization] Two fixes to diarization evaluation scripts #3938 by @alumae
- [Bugfix][ESPnet2][Diarization][Recipe] Fix issues in EEND-EDA & add Librimix_diar recipe #3900 by @YushiUeda
- [Bugfix][ESPnet2][ESPnet1][ASR][streaming] streaming conformer bugfix #4025 by @jeon30c
- [Bugfix][ESPnet2][LM] Bugfix for espnet2 ngram #4002 by @yaochie
- [Bugfix][ESPnet2][RNNT] espnet2 asr inference bugfix for transducer #3943 by @jeon30c
- [Bugfix][ESPnet2][ST] Bugfix for ST scoring #3972 by @ftshijt
Enhancement
- [Enhancement][ESPnet2] cleaned tensorboard and stats logging for espnet2 #3910 by @siddalmia
- [Enhancement][ESPnet2][Diarization] Add test codes for diarization #3953 by @YushiUeda
- [Enhancement][ESPnet2][streaming] Add reference for streaming ASR #4014 by @D-Keqi
Ohter
- [CI] remove the support of pytorch 1.3.1 #4038 by @sw005320
- [CI][ESPnet1][ESPnet2] fix ci for librosa update #4043 by @ftshijt
- [CI][Installation] Fix numpy version #3965 by @kan-bayashi
- [CI][Installation] temporary fixed pypinyin version #3995 by @kan-bayashi
- [Documentation][ESPnet1][ESPnet2][README][SLU] Add Sinhala E2E SLU Recipe #3890 by @karthik19967829
- [Documentation][README] Update README.md #4039 by @sw005320
- [ESPnet2][README] Update README.md #3931 by @sw005320
- [ESPnet2][README][TTS][Typo] Fix typo in README.md #4024 by @kan-bayashi
Acknowledgements
Special thanks to @D-Keqi, @Fhrozen, @YosukeHiguchi, @YushiUeda, @alumae, @ftshijt, @jeon30c, @kan-bayashi, @karthik19967829, @pyf98, @siddalmia, @siddhu001, @simpleoier, @sw005320, @vectominist, @yaochie.
Full Changelog
https://github.com/espnet/espnet/compare/v.0.10.5...v.0.10.6
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 4 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.5
New Features
- [New Features][ESPnet1][ASR] Implement self-conditioned CTC #3856 by @komatta-san
- [New Features][ESPnet2][ASR][CI][Installation] GTN CTC for ESPnet2 #3778 by @brianyan918
- [New Features][ESPnet2][ASR][Refactoring] [ESPnet2] Transducer #2533 by @b-flo
- [New Features][ESPnet2][README][Recipe] Frontends fusion (any type, any number, linear fusion only for now) for ASR in espnet2 #3824 by @DanBerrebbi
- [New Features][ESPnet2][SE] Refactor loss computation in enhancement tasks. #3838 by @LiChenda
Recipe
- [Recipe][ESPnet1][ESPnet2][ASR][README] updated the results of aidatatang_200zh #3925 by @sw005320
- [Recipe][ESPnet1][VC] Various fixes of voice conversion recipes #3800 by @unilight
- [Recipe][ESPnet2][ASR][README] Expanding egs2 of Tedlium2 #3795 by @D-Keqi
- [Recipe][ESPnet2][ASR][README] Update an4 config #3913 by @pyf98
- [Recipe][ESPnet2][ASR][README] aidatatang_200zh recipe #3892 by @sw005320
- [Recipe][ESPnet2][README] Update README.md #3881 by @daisylab
- [Recipe][ESPnet2][README] Update egs2/TEMPLATE/README.md #3793 by @kamo-naoyuki
- [Recipe][ESPnet2][README] fix readme #3827 by @seastar105
- [Recipe][ESPnet2][README][Recipe] Add ASR Recipe: Primewords_Chinese #3903 by @pyf98
- [Recipe][ESPnet2][README][Recipe] Update MISP challenge ASR baseline and add AVSR baseline #3819 by @neillu23
- [Recipe][ESPnet2][README][SLU] Fsc Maseeval scripts #3769 by @siddhu001
- [Recipe][ESPnet2][README][SLU] Update Google Speechcommands (SLU recipe) #3915 by @pyf98
- [Recipe][ESPnet2][README][TTS] ESPnet2 ARCTIC TTS #3791 by @peter-yh-wu
- [Recipe][ESPnet2][README][TTS] Update README and add missing config #3917 by @kan-bayashi
- [Recipe][ESPnet2][Recipe][SLU] Slue voxceleb Sentiment Analysis #3894 by @siddhu001
- [Recipe][ESPnet2][SE] modified data type in enh.sh #3768 by @simpleoier
Bugfix
- [Bugfix][ESPnet1][README][RNNT] Fix cache for Transducer search strategies + doc #3869 by @b-flo
- [Bugfix][ESPnet1][RNNT] Fix recombine_hyps #3908 by @b-flo
- [Bugfix][ESPnet1][RNNT] fix rnn-t ALSD beam search index bug #3794 by @maxwellzh
- [Bugfix][ESPnet1][RNNT] fix the sort order in selectkexpansions() #3864 by @freewym
- [Bugfix][ESPnet2] Bug fix for .gitignore and db fill up for CMU cluster #3891 by @siddalmia
- [Bugfix][ESPnet2] Fix #3716 #3849 by @kan-bayashi
- [Bugfix][ESPnet2] Merging asr_streaming.sh into asr.sh for laborotv egs2 #3868 by @D-Keqi
- [Bugfix][ESPnet2] add init.py #3928 by @sw005320
- [Bugfix][ESPnet2] fix small problem that used before defined in step 12 #3871 by @simpleoier
- [Bugfix][ESPnet2] fix stft olens when winlengths is not equal to nfft #3812 by @IceCreamWW
- [Bugfix][ESPnet2] update s3prl frontend w.r.t. recent modification in s3prl interface #3839 by @simpleoier
- [Bugfix][ESPnet2][TTS] bugfix lang2lid in tts.sh #3906 by @imdanboy
- [Bugfix][Installation] Fix #3783 #3786 by @kamo-naoyuki
Others
- [CI] Fix G2P test failure in CI due to the dict update #3848 by @kan-bayashi
- [CI][Documentation][ESPnet1][ESPnet2] Fixing issues about streaming Transformer/Conformer training #3880 by @D-Keqi
- [CI][ESPnet1][ESPnet2][Installation][New Features][README] nbest rescoring with k2 #3567 by @glynpu
- [Documentation][README] Update README.md #3893 by @sw005320
- [Documentation][README][SSL] Add more docs about s3prl frontend #3796 by @simpleoier
- [Documentation][README][streaming] Updating main README.md about streaming transformer #3855 by @D-Keqi
- [ESPnet1][RNNT] Add exception for conformer decoder #3801 by @b-flo
- [ESPnet2][README][Typo] Fix typo in README.md #3852 by @kan-bayashi
- [ESPnet2][SE] add eps in beam-forming reference channel selection #3904 by @LiChenda
- [ESPnet2][SLU] Add unit test for score_intent.py #3759 by @siddhu001
- [ESPnet2][ST] Speech Translation Update #3860 by @ftshijt
- [ESPnet2][TTS][Installation][Refactoring] Refactor Phonemizer-based G2P #3916 by @kan-bayashi
Acknowledgements
Special thanks to @D-Keqi, @DanBerrebbi, @IceCreamWW, @LiChenda, @b-flo, @brianyan918, @daisylab, @freewym, @ftshijt, @glynpu, @imdanboy, @kamo-naoyuki, @kan-bayashi, @komatta-san, @maxwellzh, @neillu23, @peter-yh-wu, @pyf98, @seastar105, @siddalmia, @siddhu001, @simpleoier, @sw005320, @unilight.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 4 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.4
New Features
- [New Features][ESPnet1][ESPnet2][ASR][README] The code for Emiru's real streaming Transformer #3614 by @D-Keqi
- [New Features][ESPnet1][MT][ST][Installation] Support sacreBLEU #3698 by @hirofumi0810
- [New Features][ESPnet2][ST] ESPNet2 speech translation #3587 by @ftshijt
Enhancement
- [Enhancement][ESPnet1][ASR] Fix e2easrmaskctc.py to make RTF computable #3634 by @eddiewng
- [Enhancement][ESPnet2][Installation][README] HuggingFace Upload support for ESPnet2 tasks [cont.] #3677 by @Fhrozen
- [Enhancement][ESPnet2][TTS][Installation] Add koreanjaso tokenizer and koreancleaner #3588 by @windtoker
Bugfix
- [Bugfix][ESPnet1][ASR][RNNT] Fix quantization for Transducer #3616 by @b-flo
- [Bugfix][ESPnet2][ASR][Recipe] added download test set, small modifications for path of aishell #3663 by @teinhonglo
- [Bugfix][ESPnet2] Do stft with librosa when neither MKL nor CUDA is available. #3668 by @CTinRay
- [Bugfix][ESPnet2] [bug fixed] allow adding noise independently of rir, bug fixed in #3692 by @ranchlai
- [Bugfix][ESPnet2][Recipe] Create Symlinks for 1-channel/2-channel tracks in chime4 #3699 by @neillu23
- [Bugfix][ESPnet2][Recipe] Fix SWBD Data Prep Bug #3742 by @brianyan918
Recipe
- [Recipe][ESPnet1][ASR][MT][ST] Add CoVoST2 recipe #3720 by @hirofumi0810
- [Recipe][ESPnet2][ASR][README] MISP2021 E2E ASR Baseline #3738 by @neillu23
- [Recipe][ESPnet2][ASR][README] Wenetspeech #3686 by @pengchengguo
- [Recipe][ESPnet2][SLU] Add snips hubert feature training #3619 by @yuekaizhang
- [Recipe][ESPnet2][SLU] Make scoring part more general #3715 by @siddhu001
- [Recipe][ESPnet2][SLU][README] Add ESPnet-SLU Recipe: Google Speech Commands #3693 by @pyf98
- [Recipe][ESPnet2][SLU][README] Add an ESPnet2 recipe for the Grabo SLU dataset #3669 by @pyf98
- [Recipe][ESPnet2][SLU][README] CATSLU-MAPS: Added recipe #3685 by @SujaySKumar
- [Recipe][ESPnet2][SLU][README] ESPnet2 Japanese dialogue act classification recipe #3667 by @YushiUeda
- [Recipe][ESPnet2][SLU][README] Slurp SLU with bpe encoded transcripts #3674 by @siddhu001
- [Recipe][ESPnet2][SLU][README] Slurp entity classification #3739 by @siddhu001
- [Recipe][ESPnet2][SSL] Add eps in acc computation of HuBERT model #3713 by @simpleoier
- [Recipe][ESPnet2][TTS] Change the timing of srctexts creation #3734 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] update kss recipe with VITS configuration #3660 by @windtoker
Others
- [CI][ESPnet2][Installation] Fix tests in CI #3700 by @kan-bayashi
- [CI][ESPnet2][SLU][README] Add Hubert pretrained ASR in FSC SLU #3653 by @siddhu001
- [CI][Installation] Minor update for CI #3656 by @kan-bayashi
- [Documentation][ESPnet1][README][RNNT][Refactoring] Refactor custom Transducer build #3697 by @b-flo
- [Documentation][ESPnet2][README] Hugging Face support - Doc [cont.] #3709 by @Fhrozen
- [Installation] Update pyopenjtalk version #3733 by @kan-bayashi
- [README] Huggingface spaces ESPnet2-TTS web demo #3673 by @AK391
- [README][ESPnet2] Add Huggingface model documentation #3714 by @siddhu001
- [README][ESPnet2] Fix readme #3750 by @takenori-y
Acknowledgements
Special thanks to @AK391, @CTinRay, @D-Keqi, @Fhrozen, @SujaySKumar, @YushiUeda, @b-flo, @brianyan918, @eddiewng, @ftshijt, @hirofumi0810, @kan-bayashi, @neillu23, @pengchengguo, @pyf98, @ranchlai, @siddhu001, @simpleoier, @takenori-y, @teinhonglo, @windtoker, @yuekaizhang.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 4 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.3
New Features
- [New Features][ESPnet1][RNNT][Installation][README] FastEmit support #3591 by @b-flo
- [New Features][ESPnet2][ASR] Add ASR portable evaluation script #3569 by @kan-bayashi
- [New Features][ESPnet2][README] EEND-EDA model for diarization task #3621 by @YushiUeda
Bugfix
- [Bugfix][ESPnet1] Fix /usr/bin/env bash -e #3651 by @kamo-naoyuki
- [Bugfix][ESPnet1] ctc loss using dropout layer since .eval() will not work for F.dropout #3539 by @zh794390558
- [Bugfix][ESPnet2] Minor fix of
evaluate_asr.sh#3596 by @kan-bayashi - [Bugfix][ESPnet2][ASR] wav2vec2_encoder bug fix #3545 by @simpleoier
- [Bugfix][ESPnet2][README][SSL] Fix some issues of #3512 and add README.md to librispeech/ssl1 recipe. #3572 by @Jzmo
- [Bugfix][ESPnet2][TTS] Bug fix the attribute registration in VITS generator #3573 by @kan-bayashi
- [Bugfix][ESPnet2][TTS] Fix pyopenjtalkg2paccent(withpause) #3555 by @zzxiang
Recipe
- [Recipe][ESPnet1][ASR][RNNT] Update Transducer recipes #3465 by @b-flo
- [Recipe][ESPnet1][ST] Clean libri-trans #3540 by @hirofumi0810
- [Recipe][ESPnet2][ASR][README] Dan aishell4 branch #3585 by @DanBerrebbi
- [Recipe][ESPnet2][ASR][README] update pretrained models of librispeech using hubert/wav2vec2 #3568 by @simpleoier
- [Recipe][ESPnet2][SLU][README] Add slu snips data receipe #3407 by @yuekaizhang
- [Recipe][ESPnet2][TTS] Update GAN-TTS based configurations #3570 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Add initial VITS results for JSUT #3550 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Add つくよみちゃんコーパス recipe #3552 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] IndicSpeech TTS Scripts #3435 by @peter-yh-wu
- [Recipe][ESPnet2][TTS][README] Update ESPnet2-TTS results #3578 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update JSUT and JVS results #3553 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update LJSpeech and CSMSC results #3560 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update TTS results #3615 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update TTS results #3648 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update VCTK results #3581 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update pret-trained model for TTS recipes #3590 by @ftshijt
- [Recipe][ESPnet2][TTS][README] update kss recipe with new result. #3589 by @windtoker
- [Recipe][ESPnet2][TTS][Typo] Fix typo
egs2/jtubespeech/tts1#3564 by @kan-bayashi - [Recipe][ESPnet2][TTS][Typo] Update JVS README #3554 by @kan-bayashi
Enhancement
- [Enhancement][ESPnet2][SE][Refactoring] Add PyTorch Builtin Complex Support in the Speech Enhancement Task #3355 by @Emrys365
- [Enhancement][ESPnet2][TTS] Hindi g2p #3579 by @peter-yh-wu
- [Enhancement][ESPnet2][TTS] Unify spks / lids / spkembeddim type #3551 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Update
evaluate_mcd.pyscript #3566 by @kan-bayashi - [Enhancement][ESPnet2][TTS][Installation] Add the installer of tdmelodic pyopenjtalk #3561 by @kan-bayashi
- [Enhancement][ESPnet2][TTS][Installation][README] Update TTS objective eval scripts #3650 by @kan-bayashi
- [Enhancement][ESPnet2][TTS][README] Add a new Japanese G2P for TTS #3558 by @kan-bayashi
- [Enhancement][ESPnet2][TTS][README] Add a new english G2P #3597 by @kan-bayashi
Others
- [CI] Add codecov config and flags. #3603 by @ShigekiKarita
- [CI] Omit tools/ from code coverage. #3600 by @ShigekiKarita
- [CI] Split test_integration.sh #3599 by @ShigekiKarita
- [CI][ESPnet2][Installation][Refactoring] Make the installation of transformers optional #3622 by @kan-bayashi
- [CI][Installation] Add no-check-certificate option in PESQ installation #3649 by @kan-bayashi
- [CI][Installation][README][mergify] Change setup.py for pytorch1.9.1 #3636 by @kamo-naoyuki
- [Documentation][ESPnet1][RNNT] Fix/improve doc(string)s related to Transducer model #3623 by @b-flo
- [Documentation][ESPnet2][TTS][README] Update README of ESPnet2-TTS #3546 by @kan-bayashi
- [Documentation][ESPnet2][TTS][README] Update TTS README #3565 by @kan-bayashi
- [Documentation][ESPnet2][TTS][README] Update TTS fine-tuning README #3549 by @kan-bayashi
- [Typo][ESPnet2] Minor bug in formatwavscp.py #3575 by @ftshijt
- [Typo][ESPnet2][TTS] update mismatch help info for tts #3602 by @ftshijt
Acknowledgements
Special thanks to @DanBerrebbi, @Emrys365, @Jzmo, @ShigekiKarita, @YushiUeda, @b-flo, @ftshijt, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @peter-yh-wu, @simpleoier, @windtoker, @yuekaizhang, @zh794390558, @zzxiang.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 4 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.2
News
- Hubert training is now available!
- Try with
egs2/librispeech/ssl1
- Try with
- GAN-based TTS model is now available!
- Joint text2mel and vocoder training
- End-to-end text-to-wave model (VITS) training
- Try with
egs2/ljspeech/tts1
- Support
from_pretrainedfunction! ```python # e.g. from espnet2.bin.asrinference import Speech2Text asr = Speech2Text.frompretrained("model_tag")
from espnet2.bin.ttsinference import Text2Speech tts = Text2Speech.frompretrained("model_tag")
from espnet2.bin.enhinference import SeparateSpeech enh = SeparateSpeech.frompretrained("model_tag")
from espnet2.bin.diarinference import DiarizeSpeech diar = DiarizeSpeech.frompretrained("model_tag") ``` Please check the available pretrained models in espnetmodelzoo!
New Features
- [New Features][ESPnet1] Intermediate CTC + Stochastic depth #3274 by @jaesong
- [New Features][ESPnet2] Add new trainer for GAN-based training #3436 by @kan-bayashi
- [New Features][ESPnet2][ASR] Add Hubert model in Espnet2/Refactor from #3458 #3512 by @Jzmo
- [New Features][ESPnet2][ASR] batch decode with k2 ctc #3433 by @glynpu
- [New Features][ESPnet2][ASR][SE] Support
from_pretrainedfor ASR and ENH #3535 by @kan-bayashi - [New Features][ESPnet2][DIAR] Support
from_pretrainedfor DIAR #3537 by @YushiUeda - [New Features][ESPnet2][SE] Adding portable speech enhancement scripts for other tasks #3487 by @Emrys365
- [New Features][ESPnet2][TTS] Add GAN-TTS task with VITS #3449 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support SID and LID inputs for TTS models #3490 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support
from_pretrainedfunction inText2Speech#3532 by @kan-bayashi - [New Features][ESPnet2][TTS] Support
parallel_waveganvocoders intts_inference.py#3513 by @kan-bayashi - [New Features][ESPnet2][TTS] Support joint training of text2mel and vocoder #3501 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support language ID input for espnet2 TTS #3489 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support speaker id input for TTS models #3452 by @kan-bayashi
Enhancement
- [Enhancement][ESPnet2][CTC segmentation][README] Fix CTC Segmentation #3500 by @shirayu
- [Enhancement][ESPnet2][TTS] Add VITS-related modules #3448 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add cython code for VITS #3483 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add joint training config example #3508 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add melgan module for joint training #3516 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add parallel wavegan module for joint training #3515 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add style melgan module for joint training #3517 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Add vocoder modules related to VITS #3439 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Change Text2Speech class output format #3437 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Follow up of the support speaker id input #3453 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support cleaner option in phn converter util #3450 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support language id in VITS #3499 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support linear spectrogram #3438 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support new g2p functions for various languages #3463 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Update the TTS inference #3498 by @kan-bayashi
- [Enhancement][ESPnet2][SLU][README] Add support for intent classification on SLURP dataset #3482 by @siddhu001
- [Enhancement][ESPnet2][SLU][README] Add NLU post-encoder using Hugging Face Transformers #3410 by @akreal
Recipe
- [Recipe][ESPnet1][ASR] Mucs21 subtask1 #3376 by @sanket0211
- [Recipe][ESPnet2][ASR][README] Add Swahili ASR recipe #3485 by @akreal
- [Recipe][ESPnet2][ASR][README] Rename
swahilirecipe toiwslt21_low_resource#3522 by @akreal - [Recipe][ESPnet2][DIAR][README] Modify ESPnet2 diarization recipe #3524 by @YushiUeda
- [Recipe][ESPnet2][ESPnet1][ASR] Espnet2 mucs_subtask2 #3415 by @bloodraven66
- [Recipe][ESPnet2][ESPnet1][ASR] mucs subtask1 #3417 by @bloodraven66
- [Recipe][ESPnet2][SE] Add Voicebank (vctk_noisy) script #3486 by @neillu23
- [Recipe][ESPnet2][TTS] Add missing configs for LibriTTS recipe #3455 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Update VITS config comments and settings #3528 by @kan-bayashi
- [Recipe][ESPnet2][TTS] aishell3 dataset preparation #3505 by @actboy
- [Recipe][ESPnet2][TTS][README] Add CSS10 recipe for ESPnet2-TTS #3464 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Add JtubeSpeech Recipe #3459 by @Takaaki-Saeki
- [Recipe][ESPnet2][TTS][README] Add SIWIS recipe #3460 by @takenori-y
- [Recipe][ESPnet2][TTS][README] TTS recipe for J-KAC corpus #3468 by @TanUkkii007
- [Recipe][ESPnet2][TTS][README] TTS recipes for thchs30 and aishell3 #3470 by @ftshijt
- [Recipe][ESPnet2][TTS][README] Update JMD README #3531 by @takenori-y
- [Recipe][ESPnet2][TTS][README] Update SIWIS README #3509 by @takenori-y
- [Recipe][ESPnet2][SLU][README] Predict ASR transcript along with Intent for SLU #3480 by @siddhu001
- [Recipe][ESPnet2][SLU][README] Update SWBD DA configuration #3425 by @akreal
Bugfix
- [Bugfix][ESPnet2] Add return_complex=False for stft #3476 by @D-X-Y
- [Bugfix][ESPnet2] Dynamic import for the ngram function #3420 by @ftshijt
- [Bugfix][ESPnet2][README][Recipe] Add the GigaSpeech normalization and fix the WER #3519 by @chaisz19
- [Bugfix][ESPnet2][TTS] Add duration and focus_rate in output dict #3469 by @kan-bayashi
- [Bugfix][ESPnet2][TTS] Add missing symlink to trim_silence.py for ESPnet2 #3467 by @kan-bayashi
- [Bugfix][ESPnet2][TTS] Fix wrong arguments in pretrained vococder wrapper #3525 by @kan-bayashi
- [Bugfix][ESPnet2][TTS] Revert wrongly removed lines in
tts.sh#3503 by @kan-bayashi - [Bugfix][ESPnet2][TTS][Typo] Fix typo in hifigan #3504 by @kan-bayashi
Refactoring
- [Refactoring][ESPnet1][ASR][RNNT][README] Transducer v5 #3217 by @b-flo
- [Refactoring][ESPnet2][SE][DIAR] Remove prefix
enh_anddiar_#3538 by @kan-bayashi - [Refactoring][ESPnet2][TTS] Refactor TTS modules in ESPnet2 #3497 by @kan-bayashi
- [Refactoring][ESPnet2][TTS] Remove the support of feats_type=fbank/stft in ESPnet2-TTS #3514 by @kan-bayashi
Others
- [CI] Fix k2 version in CI using conda #3493 by @kan-bayashi
- [CI] Fix test condition #3527 by @kan-bayashi
- [CI][Installation] Update Sentencepiece and add python 3.9 to CI #3422 by @shirayu
- [Docker] Docker Updates #3393 by @Fhrozen
- [Documentation] Update the tutorial about maxlenratio usage #3523 by @akreal
- [Documentation][ESPnet2][TTS] Update README.md #3502 by @kan-bayashi
- [Installation][README] Added a link and a classifier for Python 3.9 #3440 by @shirayu
- [Typo] Fix typos in "egs" #3447 by @shirayu
- [Typo][Documentation] Fix typos in "doc" #3441 by @shirayu
- [Typo][Documentation] Fix typos in "utils" #3442 by @shirayu
- [Typo][ESPnet1][MT] Fix typos in "espnet" #3444 by @shirayu
- [Typo][ESPnet2] Fix typos in "espnet2" #3443 by @shirayu
- [Typo][ESPnet2][README] Fix typos in "egs2" #3445 by @shirayu
Acknowledgements
Special thanks to @D-X-Y, @Emrys365, @Fhrozen, @Jzmo, @Takaaki-Saeki, @TanUkkii007, @YushiUeda, @actboy, @akreal, @b-flo, @bloodraven66, @chaisz19, @ftshijt, @glynpu, @jaesong, @kan-bayashi, @neillu23, @sanket0211, @shirayu, @siddhu001, @takenori-y.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 4 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.1
New Features
- [New Features][ESPnet2] Porting existing pre-trained models to hugging face #3321 by @siddhu001
- [New Features][ESPnet2][ASR][CI][Installation] k2andespnet2 #3358 by @glynpu
- [New Features][ESPnet2][ASR][LM][CI] espnet2 ngram #3345 by @qmpzzpmq
- [New Features][ESPnet2][Installation] add s3prl frontend #3187 by @simpleoier
Recipe
- [Recipe][ESPnet1][ASR] Fix the iconv error in hkust data prep #3397 by @sw005320
- [Recipe][ESPnet1][ASR] mucs subtask2 baseline recipes (e2e and kaldi) #3362 by @bloodraven66
- [Recipe][ESPnet1][ESPnet2][ASR] JTubeSpeech recipe and hkust espnet1 #3406 by @sw005320
- [Recipe][ESPnet1][TTS] CMU INDIC TTS #3347 by @peter-yh-wu
- [Recipe][ESPnet2][ASR] ESPnet2 Recipe for Ksponspeech #3387 by @YushiUeda
- [Recipe][ESPnet2][ASR] Fix gigaspeech pre-trained model link #3317 by @sw005320
- [Recipe][ESPnet2][ASR] LRS2 lipreading recipe #3346 by @LiChenda
- [Recipe][ESPnet2][ASR] OpenSLR Sundanese ASR #3344 by @peter-yh-wu
- [Recipe][ESPnet2][ASR] Recipe of JTubeSpeech #3311 by @sw005320
- [Recipe][ESPnet2][ASR] fix path error in local/score.sh in swbd #3349 by @wonkyuml
- [Recipe][ESPnet2][ASR] updated javanese and sundanese readmes #3369 by @peter-yh-wu
- [Recipe][ESPnet2][ASR][Installation] OpenSLR Javanese ASR #2960 by @peter-yh-wu
- [Recipe][ESPnet2][SLU] Add initial Switchboard Dialogue Act classification recipe #3395 by @akreal
- [Recipe][ESPnet2][SLU] FSC Espnet2 data preparation #3352 by @siddhu001
- [Recipe][ESPnet2][TTS] Add HUI-audio-corpus-german recipe for ESPnet2-TTS #3375 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Add JMD recipe #3394 by @takenori-y
- [Recipe][ESPnet2][TTS] Add RUSLAN recipe for ESPnet2-TTS #3378 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Support KSS dataset recipe for ESPnet2-TTS #3383 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Update HUI audio corpus german recipe #3381 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Update HUI-audio-corpus-german recipe results of ESPnet2-TTS #3391 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Update KSS dataset recipe results of ESPnet2-TTS #3400 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Update RUSLAN recipe results of ESPnet2-TTS #3390 by @kan-bayashi
- [Recipe][ESPnet2][TTS] indic tts without pretrained model #3401 by @peter-yh-wu
Enhancement
- [Enhancement][ESPnet2] Update wav2vec2_encoder.py #3312 by @brotheroak
- [Enhancement][ESPnet2][TTS] Add trim_silence for ESPnet2-TTS #3380 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Allow override default 'speedcontrolalpha' parameter #3316 by @airenas
- [Enhancement][ESPnet2][TTS] Support French G2P #3372 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support German G2P #3371 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support Korean G2P #3382 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support Russian G2P #3377 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Support Spanish G2P #3373 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Update README about G2P #3374 by @kan-bayashi
Bugfix
- [Bugfix][ESPnet1][ESPnet2] Fix a type error of swbd data preparation. #3324 by @pengchengguo
- [Bugfix][ESPnet1][ESPnet2][TTS] Fixed label modification in Taco2 or Transformer-TTS with R > 1 #3392 by @kan-bayashi
- [Bugfix][ESPnet2] fix a bug in OneCycleLR and CyclicLR #3319 by @sw005320
Others
- [Typo][ESPnet1] Update batchbeamsearchonlinesim.py #3367 by @aky15
- [Typo][ESPnet2] Fixed typo in model name #3364 by @kan-bayashi
- [Typo][ESPnet2] Update contextualblocktransformer_encoder.py #3354 by @aky15
Acknowledgements
Special thanks to @LiChenda, @YushiUeda, @airenas, @akreal, @aky15, @bloodraven66, @brotheroak, @glynpu, @kan-bayashi, @pengchengguo, @peter-yh-wu, @qmpzzpmq, @siddhu001, @simpleoier, @sw005320, @takenori-y, @wonkyuml.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 4 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.10.0
From v.0.10.x, we drop the support pytorch < 1.3.
See more info in https://github.com/espnet/espnet/issues/3300
New Features and Enhancement
- [New Features][ESPnet1][ASR][CI] Dynamic quantization for decoding #3210 by @xu-gaopeng
- [New Features][ESPnet1] Add quantize args #3280 by @xu-gaopeng
- [Enhancement][ESPnet2][README] Update W&B integration #3278 by @AyushExel
- [Enhancement][ESPnet2][README] Change the default value of use_wandb to False #3287 by @kamo-naoyuki
Bugfix
- [Bugfix][ESPnet1] Fix some bugs in xml2stm.py #3252 by @AshrafMahdhi
- [Bugfix][ESPnet1][Recipe] fix the required number of arguments #3249 by @AshrafMahdhi
- [Bugfix][ESPnet2] Bug fix of accum_grad when grad-nan #3283 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix #3255 #3257 by @tjysdsg
- [Bugfix][ESPnet2] Fix bug when "--field -5" is passed to espnet2.bin.tokenize_text #3262 by @tjysdsg
- [Bugfix][ESPnet2] Fix typo in asr.sh (espnet2) that might cause bug #3264 by @tjysdsg
- [Bugfix][ESPnet2] Warn ignorenangrad with warpctc instead of error. #3298 by @ShigekiKarita
- [Bugfix][ESPnet2][TTS] Fix a bug in the TTS transformer initialization #3251 by @sw005320
Recipe
- [Recipe][ESPnet1][ST] Minor fix of Fisher-Callhome recipe #3305 by @hirofumi0810
- [Recipe][ESPnet2][ASR] ESPnet2 Receipe for swbd #3269 by @yuekaizhang
- [Recipe][ESPnet2][ASR][README] SWBD Result Update #3308 by @roshansh-cmu
- [Recipe][ESPnet2][SE] Add scripts for DNS Interspeech 2020 in ESPNet-se #3259 by @neillu23
- [Recipe][ESPnet2][SE][README] Pretrained model for vctk noisy reverberant recipe #3273 by @LiChenda
- [Recipe][ESPnet2][SE][README] dnsins20: Add README.md and realrecording testing data. #3281 by @neillu23
Refactoring
- [Refactoring][ESPnet2][ASR] Update ctc.py #3292 by @200987299
- [Refactoring][ESPnet1][ASR][MT][CI][README] Delete old pytorch dispatch in espnet1 #3301 by @ShigekiKarita
- [Refactoring][CI][Documentation][Installation][README] Remove travis and add .github/workflows/doc.yml to deploy doc #3294 by @ShigekiKarita
- [Refactoring][CI][Installation][README] Add pytorch 1.9.0 support and remove 1.0.1, 1.1.0, and 1.2.0 #3299 by @ShigekiKarita
Others
- [Documentation][ESPnet2] Add a comment for disabling the attention plot #3258 by @sw005320
- [ESPnet2][Installation][mergify] Follow up for #3299, about pytorch1.9.0 in ci #3310 by @kamo-naoyuki
Acknowledgements
Special thanks to @200987299, @AshrafMahdhi, @AyushExel, @LiChenda, @ShigekiKarita, @hirofumi0810, @kamo-naoyuki, @neillu23, @roshansh-cmu, @sw005320, @tjysdsg, @xu-gaopeng, @yuekaizhang.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 4 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.10
New Features
- [New Features][ESPnet1][ESPnet2][Installation][README] CTC Segmentation for ESPnet 2 #3087 by @lumaku
Bugfix
- [Bugfix][ESPnet1] Fix mergeshortsegments.py #3171 by @hirofumi0810
- [Bugfix][ESPnet1] update layer norm to reflect the dimension variable #3193 by @sw005320
- [Bugfix][ESPnet1][ASR] Fix a bug about variable spelling errors #3208 by @lzm0706
- [Bugfix][ESPnet1][ST] Fix ST-TED data preparation #3167 by @hirofumi0810
- [Bugfix][ESPnet2] Fix a bug of adding noise to the training data. #3220 by @pengchengguo
- [Bugfix][ESPnet2] fix a bug in the CTC mode #3190 by @sw005320
- [Bugfix][ESPnet2] fix typo for AdapterForSoundScpReader #3096 by @deciding
- [Bugfix][ESPnet2] remove findunusedparameters from DataParallel #3149 by @kamo-naoyuki
- [Bugfix][ESPnet2][ASR] Changed to include nlsyms.txt in the pretrained model #3236 by @kamo-naoyuki
- [Bugfix][ESPnet2][ASR] Fix missing nlsyms.txt for pretrained models #3234 by @lumaku
- [Bugfix][ESPnet2][ASR] Workaround for missing nlsyms.txt #3235 by @kamo-naoyuki
- [Bugfix][ESPnet1][ASR][Installation] GTN CTC bug fix, unit test, and installer #3199 by @brianyan918
- [Bugfix][ESPnet2][README] Update README.md, edit wrong file link. #3164 by @xxjjvxb
Enhancement
- [Enhancement] Added "transtype" to utils/removelongshortdata.sh and utils/update_json.sh #3148 by @teinhonglo
- [Enhancement][ESPnet2][SE][README] Update the readme file for the SE demo page. #3225 by @LiChenda
- [Enhancement][ESPnet2][ASR][README] update asr demo #3192 by @ftshijt
Recipe
- [Recipe][ESPnet1][ASR] Fix segmentation in IWSLT21 ASR #3169 by @hirofumi0810
- [Recipe][ESPnet1][ASR] Fix tokenization on TEDLIUM2 in IWSLT21 ASR recipe #3142 by @hirofumi0810
- [Recipe][ESPnet1][ASR] fix addtodatadir.py in mgb2 recipe #3238 by @AshrafMahdhi
- [Recipe][ESPnet1][ASR] fix receipe bug for swbd #3174 by @yuekaizhang
- [Recipe][ESPnet1][ASR][RNNT] Transducer configs & results for AISHELL-1 #3240 by @yusshino
- [Recipe][ESPnet1][ASR][ST] Fix IWSLT21 recipe for test set evaluation #3155 by @hirofumi0810
- [Recipe][ESPnet1][ESPnet2][README] endangered language recognition espnet2 recipe #3214 by @ftshijt
- [Recipe][ESPnet1][MT] Add IWSLT21 MT recipe #3140 by @hirofumi0810
- [Recipe][ESPnet1][ST] Add IWSLT21 ST recipe #3150 by @hirofumi0810
- [Recipe][ESPnet1][ST] Fix IWSLT evaluation data preparation #3168 by @hirofumi0810
- [Recipe][ESPnet1][ST] IWSLT21 punctuation restoration recipe #3145 by @hirofumi0810
- [Recipe][ESPnet1][ST] Merge short segments in IWSLT test sets #3162 by @hirofumi0810
- [Recipe][ESPnet1][TTS] Fix misspelling in ./egs/jsut/tts1/local/download.sh #3227 by @muramasa2
- [Recipe][ESPnet2][ASR] Normalization for Open_li52 #3215 by @ftshijt
- [Recipe][ESPnet2][SE] ESPnet-SE Recipe for noisy reverberant dataset #3243 by @LiChenda
- [Recipe][ESPnet2][SE][README] Update recipes for speech enhancement task #3153 by @LiChenda
Acknowledgements
Special thanks to @AshrafMahdhi, @LiChenda, @brianyan918, @deciding, @ftshijt, @hirofumi0810, @kamo-naoyuki, @lumaku, @lzm0706, @muramasa2, @pengchengguo, @sw005320, @teinhonglo, @xxjjvxb, @yuekaizhang, @yusshino.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 4 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.9
New Features
- [New Features][ESPnet2] Speaker diarization implementation in ESPnet #2939 by @ftshijt
- [New Features][ESPnet2] Adding gpumaxcachedmemGB in reporter's stats #3057 by @kamo-naoyuki
- [New Features][ESPnet2] add --detect_anomaly option #3035 by @kamo-naoyuki
- [New Features][ESPnet2][SE] Further update to speech enhancement task #2929 by @shincling
Bugfix
- [Bugfix][ESPnet1] Fix a typo in the aishell config #3089 by @sw005320
- [Bugfix][ESPnet1] Fix utils/speed_perturb.sh #3062 by @hirofumi0810
- [Bugfix][ESPnet1] fix #3017 #3022 by @kamo-naoyuki
- [Bugfix][ESPnet1][RNNT] Fix+update RNN encoder #3048 by @b-flo
- [Bugfix][ESPnet1][RNNT] Minor fix for NSC #3030 by @b-flo
- [Bugfix][ESPnet2] Fix #3072 #3073 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix ESPnet2-TTS conformer backward compatibility #3108 by @kan-bayashi
- [Bugfix][ESPnet2] Fix a bug when use_amp=True without fairscale #3029 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix logging for pytorch>=1.8 #3056 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fixed backward compatibility issue of new conformer definition #3068 by @hfujihara
- [Bugfix][Installation] Fix a bug of uninstalling typing #3058 by @kamo-naoyuki
- [Bugfix][Installation] Fix setup.py to install filelock #3074 by @kamo-naoyuki
- [Bugfix][Installation] fix the condition to install fairscale #3050 by @kamo-naoyuki
- [Bugfix][Recipe][ESPnet1] Typo fixed for nahuatl recipe #3044 by @ftshijt
- [Bugfix][Recipe][ESPnet1][ASR] Bugfix for downloadanduntar for nahuatl #3049 by @ftshijt
- [Bugfix][Recipe][ESPnet1][ESPnet2][TTS] Fix CSMSC download script #3109 by @kan-bayashi
- [Bugfix][Recipe][ESPnet2][TTS][README] fixed typo #3121 #3123 by @kan-bayashi
Enhancement
- [Enhancement][ASR][ESPnet1][RNNT] Update loss report #3110 by @b-flo
- [Enhancement][ESPnet1][RNNT] Fix related to custom encoder and aux task #3045 by @b-flo
- [Enhancement][ESPnet2][Documentation][Installation][README] modification of freezing option for Wav2Vec encoder, add documents #3036 by @simpleoier
Recipe
- [Recipe][ESPnet1][ASR] added results and uploaded models #3063 by @sw005320
- [Recipe][ESPnet1][ASR][ST] fix download for puebla-nahuatl #3039 by @ftshijt
- [Recipe][ESPnet1][MT] Update IWSLT18 MT recipe #3071 by @hirofumi0810
- [Recipe][ESPnet1][ST] IWSLT21-low-resource recipe #3023 by @ftshijt
- [Recipe][ESPnet1][ST] Nahuatl Speech Translation #3034 by @ftshijt
- [Recipe][ESPnet2][ASR][README] Added spgispeech recipe in espnet2 #2986 by @sw005320
- [Recipe][ESPnet2][ASR][README] Update librispeech result #3082 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][README] Updated ami ihm result #3091 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][README] added a bpe10000 model and result #3060 by @sw005320
- [Recipe][ESPnet2][ASR][README] gigaspeech #3077 by @sw005320
Refactoring
- [Refactoring][ESPnet1] Refactor layer selection in Transformer #3024 by @hirofumi0810
- [Refactoring][ESPnet1][MT][ST] Unify divide_lang.sh #3066 by @hirofumi0810
- [Refactoring][ESPnet2] Make batch bins sampler faster #3106 by @kamo-naoyuki
- [Refactoring][Installation] Use new pyopenjtalk version #3107 by @kan-bayashi
- [Refactoring][ESPnet1][ESPnet2][Installation][Docker][Documentation] Change '#!/bin/bash' to '#!/usr/bin/env bash' #3059 by @kamo-naoyuki
Other
- [CI][Installation][README][mergify] Using torch=1.8.1 in ci tests #3122 by @kamo-naoyuki
- [CI][Installation][README][mergify] Adding pytorch=1.8.0 to the ci #3046 by @kamo-naoyuki
Acknowledgements
Special thanks to @b-flo, @ftshijt, @hfujihara, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @shincling, @simpleoier, @sw005320.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 4 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.8
New Features
- [New Features][ESPnet1][ASR][RNNT] Auxiliary task #2951 by @b-flo
- [New Features][ESPnet1][Recipe] RTF calculation #2942 by @hirofumi0810
- [New Features][ESPnet2] Supporting multiple optimizers in the default trainer #3014 by @kamo-naoyuki
- [New Features][ESPnet2][ASR] Streaming Transformer ASR #2907 by @eml914
- [New Features][ESPnet2][ASR][Installation] add wav2vec_encoder #2889 by @simpleoier
- [New Features][ESPnet2][Documentation][Installation][README] Support sharded training of fairscale #2980 by @kamo-naoyuki
- [New Features][ESPnet2][SE] Add SeparateSpeech API in espnet2/bin/enh_inference.py #2878 by @Emrys365
- [New Features][ESPnet2][TTS][Installation][README] Support phonemizer for vairous language G2P #2959 by @kan-bayashi
Bugfix
- [Bugfix][CI][Installation] Install warp-ctc using pip>=21.0 #2999 by @ysk24ok
- [Bugfix][ESPnet1] Integration testing for asr_mix was using the wrong config. #3006 by @siddalmia
- [Bugfix][ESPnet1][ASR] Fix model averaging #2910 by @b-flo
- [Bugfix][ESPnet1][ASR] bug fixed for streaming transformer ASR #2981 by @eml914
- [Bugfix][ESPnet1][ASR] builtin ctc modification #3001 by @siddalmia
- [Bugfix][ESPnet1][ASR][CI] Fix transfer learning w/ pre-trained LM + finetuning tutorial #2967 by @b-flo
- [Bugfix][ESPnet1][ASR][RNNT] Fix a condition in TSD #2965 by @b-flo
- [Bugfix][ESPnet1][ASR][Recipe] fix egs/ljspeech/asr1 #2865 #2884 by @kan-bayashi
- [Bugfix][ESPnet1][ASR][Recipe][ST] Fix bug in How2 recipe #2933 by @hirofumi0810
- [Bugfix][ESPnet1][ASR][Refactoring] Fix data sorting in attention/CTC visualization #2883 by @hirofumi0810
- [Bugfix][ESPnet1][Docker] Fix docker error caused by BeamSearchTransducer #2973 by @b-flo
- [Bugfix][ESPnet1][ESPnet2] Fix bugs of our Conformer implementation. #2816 by @pengchengguo
- [Bugfix][ESPnet1][ESPnet2][Refactoring] Fix arguments in dynamic and lightweight conv #3004 by @hirofumi0810
- [Bugfix][ESPnet1][RNNT] fix out_dim definition #2915 by @b-flo
- [Bugfix][ESPnet1][TTS] Fix attention plot bug #2984 #2985 by @kan-bayashi
- [Bugfix][ESPnet1][mergify] swbd run.sh is including dev data in the training set #2977 by @brianyan918
- [Bugfix][ESPnet2] Fix sharded_ddp mode #3015 by @kamo-naoyuki
- [Bugfix][ESPnet2] bug fix for Wav2Vec encoder #2997 by @simpleoier
- [Bugfix][ESPnet2][Documentation] Fix for sharded training with amp #2993 by @kamo-naoyuki
- [Bugfix][ESPnet2][Documentation] Fix sharded training for multiple nodes #2994 by @kamo-naoyuki
- [Bugfix][ESPnet2][SE] quick fix for librimix (SE) data preparation #2982 by @LiChenda
Recipe
- [Recipe][ESPnet1][ASR] Fix dev set in IWSLT21 ASR recipe #3000 by @hirofumi0810
- [Recipe][ESPnet1][ASR] IWSLT'21 ASR recipe #2934 by @hirofumi0810
- [Recipe][ESPnet1][ASR] Update IWSLT21 ASR recipe #2987 by @hirofumi0810
- [Recipe][ESPnet1][ASR] Update the pre-trained Conformer model link of Aishell-1 corpus. #2924 by @pengchengguo
- [Recipe][ESPnet1][ASR] Update transformer training results on common vioce dataset #2927 by @wenjie-p
- [Recipe][ESPnet1][ASR][CI][Installation][Refactoring] Update IWSLT18 (ST-TED) ASR recipe #2916 by @hirofumi0810
- [Recipe][ESPnet1][ASR][MT][ST][README] Must-C v2 recipe #2963 by @hirofumi0810
- [Recipe][ESPnet1][ASR][MT][ST][Refactoring] Refactor Fisher-CallHome recipe #2904 by @hirofumi0810
- [Recipe][ESPnet1][ASR][MT][ST][Refactoring] Refactor How2 recipe #2906 by @hirofumi0810
- [Recipe][ESPnet1][ASR][MT][ST][Refactoring] Refactor Must-C recipe #2901 by @hirofumi0810
- [Recipe][ESPnet1][ASR][MT][ST][Refactoring] Refactor libri-trans recipe #2903 by @hirofumi0810
- [Recipe][ESPnet1][ASR][ST][Refactoring] Update IWSLT'19 recipe #2940 by @hirofumi0810
- [Recipe][ESPnet1][ST][CI][Refactoring] Refactor ST recipes #2975 by @hirofumi0810
- [Recipe][ESPnet1][ST][Refactoring] Refactor Mboshi-French corpus #2911 by @hirofumi0810
- [Recipe][ESPnet2][ASR] Open-li52(add language id scoring & text case align for test set) #2938 by @ftshijt
- [Recipe][ESPnet2][ASR][README] Add Russian open STT recipe for ESPnet2 #2972 by @akreal
- [Recipe][ESPnet2][ASR][README] MLS (multi-lingual librispeech) recipe #2869 by @ftshijt
- [Recipe][ESPnet2][ASR][README] Update espnet2 librispeech result #2966 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][README] added nsc results #2937 by @sw005320
- [Recipe][ESPnet2][ASR][README] fix librispeech model url #2976 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][README] minor fix of li52 and nsc recipes #2936 by @sw005320
- [Recipe][ESPnet2][ASR][README] update the results of open li52 recipe #2974 by @sw005320
- [Recipe][ESPnet2][SE] Librimix separation results for Conv-Tasnet, 8k, min #2928 by @anogkongda
- [Recipe][ESPnet2][SE][README] Espnet-SE, Speech enhancement recipes #2888 by @LiChenda
Enhancement
- [Enhancement][ESPnet1][ASR] Auto Resampling to 16khz for pretrained models #2969 by @siddalmia
- [Enhancement][ESPnet1][ASR][RNNT] Minor refactoring #2932 by @b-flo
- [Enhancement][ESPnet1][ASR][RNNT][README][CI][Documentation] Refactoring RNNT #2887 by @b-flo
- [Enhancement][ESPnet1][ESPnet2][ASR][LM][MT][TTS] Print total params and trainable params. #2996 by @siddalmia
- [Enhancement][ESPnet1][LM] Add LM options like embedding dropout and tie weights #3010 by @siddalmia
- [Enhancement][ESPnet1][ST][Refactoring] Add the latest RPE implementation to the ST task. #3005 by @pengchengguo
Other
- [CI][README][mergify] Stop circle ci #2978 by @kamo-naoyuki
- [Documentation] Update docs for ESPnet contributing (especially for recipes part) #2905 by @ftshijt
- [Documentation] fix a typo #3016 by @Huang17
- [Installation] Uninstall typing #2979 by @kamo-naoyuki
Acknowledgements
Special thanks to @Emrys365, @Huang17, @LiChenda, @akreal, @anogkongda, @b-flo, @brianyan918, @eml914, @ftshijt, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @pengchengguo, @siddalmia, @simpleoier, @sw005320, @wenjie-p, @ysk24ok.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.7
New Feature
- [New Features][ESPnet1][ASR] Option for GTN CTC mode #2866 by @brianyan918
- [New Features][ESPnet2][SE][README] Update to speech enhancement task #2649 by @LiChenda
- [New Features][ESPnet2][ASR][README] Lightweight Sinc Convolutions for Espnet2 #2768 by @lumaku
- [New Features][ESPnet2][Documentation] --freeze_param option #2787 by @kamo-naoyuki
- [New Features][ESPnet2][TTS][README] Add a new G2P
pyopenjtalk_accent_with_pause#2843 by @kan-bayashi - [New Features][ESPnet2][TTS][README] Add pyopenjtalk_accent g2p for ESPnet2 TTS #2781 by @ota
- [New Features][ESPnet2][TTS][README] Support X-vector based multi-speaker TTS model in ESPnet2 #2800 by @kan-bayashi
Enhancement
- [Enhancement][ESPnet1][ESPnet2] Add version info in args #2841 by @kan-bayashi
- [Enhancement][ESPnet1][ESPnet2][ASR] AMI Recipe (Short UTT checker) #2802 by @ftshijt
- [Enhancement][Installation] add default activate_python.sh #2788 by @kamo-naoyuki
- [Enhancement][Installation] modified: check_install.py #2834 by @kamo-naoyuki
- [Enhancement][Installation][Documentation][ESPnet1][ESPnet2] Change version info location #2840 by @kan-bayashi
Bugfix
- [Bugfix][ESPnet1][ASR] fix greedy decoding #2812 by @b-flo
- [Bugfix][ESPnet2][ASR] Fix the compatibility of the pretrained ASR model #2794 by @kan-bayashi
- [Bugfix][Installation] Fix #2799 #2830 by @kamo-naoyuki
- [Bugfix][Installation] Fix HTS engine installation #2825 by @kan-bayashi
- [Bugfix][Installation] fix the incorrect $PATH setting in tools/extra_path.sh #2833 by @jumon
- [Bugfix][Recipe][ESPnet1][ASR] Minor fixes in CSJ #2837 by @YosukeHiguchi
- [Bugfix][Recipe][ESPnet1][ASR] fix receipe bug for librispeech #2735 by @yuekaizhang
- [Bugfix][Recipe][ESPnet2][ASR] fix a config name #2729 by @sw005320
- [Bugfix][Recipe][ESPnet2][ASR][README] Fix dirha_wsj recipe #2747 by @kamo-naoyuki
- [Bugfix][Recipe][ESPnet2][TTS] Add missing decoding configs in LibriTTS recipe #2827 by @kan-bayashi
Recipe
- [Recipe][ESPnet1][ASR] Add LibriSpeech Conformer results for LibriCSS #2861 by @akreal
- [Recipe][ESPnet1][ASR] Update Commonvoice Recipe with Conformer Settings #2739 by @ftshijt
- [Recipe][ESPnet1][ASR] Update Russian open STT recipe for v1.01 of the dataset #2776 by @akreal
- [Recipe][ESPnet1][ASR] Update models and results of Conformer. #2765 by @pengchengguo
- [Recipe][ESPnet1][ESPnet2][ASR][README] ESPnet2 recipe for commonvoice #2793 by @hchung12
- [Recipe][ESPnet1][VC][README] VCC2020 database #2754 by @unilight
- [Recipe][ESPnet2][ASR][README] Update Dirha WSJ result #2756 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][README] espnet2 hkust recipe #2863 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][README] update the AMI result in espnet2 #2817 by @sw005320
- [Recipe][ESPnet2][ASR][README] updated the laborotv result #2750 by @sw005320
- [Recipe][ESPnet2][ASR][README] Update reverb result #2876 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR] Minor fix of laborotv recipe #2877 by @hfujihara
- [Recipe][ESPnet2][TTS] Fix total number of iterations #2813 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Add libritts recipe for ESPnet2 #2807 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Add x-vector based configs for VCTK #2808 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Minor update TTS README #2818 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update JSUT TTS results #2792 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update JSUT results #2809 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update JSUT results #2871 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update LibriTTS results #2842 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update VCTK results #2814 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] Update libritts results #2828 by @kan-bayashi
- [Recipe][ESPnet2][TTS][README] update latest CSMSC link address #2777 by @meowtech
Other
- [CI][Documentation][Installation] Change warp-ctc and warp-transducer to extra #2748 by @kamo-naoyuki
- [CI][README] Update ci setting #2848 by @kan-bayashi
- [ASR][Documentation][ESPnet2] Sinc Convolutions - add documentation for plotsincfilters.py #2782 by @lumaku
- [Documentation][ESPnet1] fixed some typos #2855 by @jumon
- [Documentation][Installation] Update documentation #2757 by @kamo-naoyuki
- [Installation][Refactoring] Move the dependencies coming from recipes #2740 by @kamo-naoyuki
Acknowledgements
Special thanks to @AdolfVonKleist, @LiChenda, @YosukeHiguchi, @akreal, @b-flo, @brianyan918, @ftshijt, @hchung12, @hfujihara, @jumon, @kamo-naoyuki, @kan-bayashi, @lumaku, @meowtech, @ota, @pengchengguo, @sw005320, @unilight, @yuekaizhang.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.6
New Feature
- [New Features][ESPnet2] Wandb integration #2707 by @kamo-naoyuki
- [New Features][ESPnet2][ASR] Add ignorenangrad option for CTC #2699 by @kamo-naoyuki
- [New Features][ESPnet2][SE] Touching common modules before the main Enh PR #2705 by @LiChenda
Bug fix
- [Bugfix][ESPnet1] bug fix for pytorch1.7 #2656 by @kamo-naoyuki
- [Bugfix][ESPnet1][ESPnet2][TTS] Use
nkfin CSMSC data prep #2726 by @kan-bayashi - [Bugfix][ESPnet2] Fix flooring for global_mvn.py #2623 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix small bug of tensorboard part #2702 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix wandb mode with multi gpus #2709 by @kamo-naoyuki
- [Bugfix][ESPnet2][TTS] Fix token averaged feature the case when r > 1 #2704 by @kan-bayashi
Recipe
- [Recipe][ESPnet1] Extend model averaging condition in run scripts #2613 by @b-flo
- [Recipe][ESPnet1][ASR] Enable multi-thread processing of json files. #2681 by @Peidong-Wang
- [Recipe][ESPnet1][ASR] Update KsponSpeech conformer results #2624 by @jubang0219
- [Recipe][ESPnet1][ASR] Update Voxforge with Conformer results #2642 by @YosukeHiguchi
- [Recipe][ESPnet1][ASR] lang was being used before being parsed for user input #2654 by @siddalmia
- [Recipe][ESPnet1][ASR][ESPnet2][Installation][README] espnet2 reverb recipe #2691 by @kamo-naoyuki
- [Recipe][ESPnet1][ASR][README] Update Switchboard with conformer results #2697 by @Emrys365
- [Recipe][ESPnet1][ASR][README] add librispeech conformer w/ speed perturbation + specaug #2617 by @yuekaizhang
- [Recipe][ESPnet2][ASR] ASR template recipe: --srctexts -> --lmtraintext, --bpetraintext #2660 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR] Add $tokentype to asrtag and lm_tag #2625 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][Installation][README][Recipe] Laborotv recipe #2703 by @sw005320
- [Recipe][ESPnet2][ASR][README] Add AISHELL w/o LM result #2718 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][README] ESPnet2 recipe for TIMIT #2568 by @sknadig
- [Recipe][ESPnet2][ASR][README] JSUT conformer recipe achieving 12.0/13.9 CER(%) for dev/eval1 #2720 by @hchung12
- [Recipe][ESPnet2][ASR][README] Update README.md #2659 by @sw005320
- [Recipe][ESPnet2][ASR][README] Update WSJ result #2628 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][README] espnet2 librispeech with conformer #2687 by @sw005320
- [Recipe][ESPnet2][README] Corpus README in egs2 #2713 by @sw005320
- [Recipe][ESPnet2][README] update egs2/README.md #2719 by @Emrys365
Enhancement
- [Enhancement][Documentation][ESPnet2] Add --init_param option #2680 by @kamo-naoyuki
- [Enhancement][ESPnet1][ASR] Save model snapshot at every epoch even if saveintervaliters > 0 - for model averaging #2637 by @sknadig
- [Enhancement][ESPnet2] Update wandb part #2708 by @kamo-naoyuki
- [Enhancement][ESPnet2][ASR] Add *statsdir options in asr.sh #2724 by @kan-bayashi
Documentation
- [Documentation][ESPnet2][README] Update egs2 README #2723 by @kan-bayashi
- [Documentation][ESPnet2][README][TTS] Update README about fine-tuning #2685 by @kan-bayashi
- [Documentation][ESPnet2][README][TTS] Update TTS README.md #2650 by @kan-bayashi
Refactoring
- [Refactoring][ESPnet1][ASR][README] Refactor Mask CTC non-autoregressive ASR #2223 by @YosukeHiguchi
- [Refactoring][ESPnet2] Added unicode support for generated configs #2672 by @Piteryo
Others
- [Installation] python setup.py install -> pip install -e #2619 by @kamo-naoyuki
- [Installation][Refactoring] modify for zsh: tools/extra_path.sh #2696 by @kamo-naoyuki
- [Docker] Docker flags for extra libraries (VC) #2622 by @Fhrozen
Acknowledgements
Special thanks to @Emrys365, @Fhrozen, @LiChenda, @Peidong-Wang, @Piteryo, @YosukeHiguchi, @b-flo, @hchung12, @jubang0219, @kamo-naoyuki, @kan-bayashi, @siddalmia, @sknadig, @sw005320, @yuekaizhang.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.5
New Features
- [New Features][ESPnet2][TTS] Support
g2p=nonefor text with phonemes #2551 by @kan-bayashi - [New Features][ESPnet2][TTS] Add MCD evaluation script for ESPnet2-TTS #2554 by @kan-bayashi
- [New Features][ESPnet1][ST] Conformer End-to-End Speech Translation #2523 by @hirofumi0810
Bugfix
- [Bugfix][ESPnet1] CTC segmentation - package update #2566 by @lumaku
- [Bugfix][ASR][ESPnet1] fix bug about att_ws in multi-enc case #2549 by @lzm0706
- [Bugfix][ESPnet1] Conformer averaging model support for pytorch 1.6 #2604 by @siddalmia
- [Bugfix][ESPnet1][ASR] Set built-in CTC for asr_recog #2588 by @lumaku
- [Bugfix][ESPnet1][ASR][Installation] Transducer float16 loss bug fix #2496 by @GNroy
Refactoring
- [Refactoring][ESPnet1][ASR] Refactor BeamSearchTransducer and ErrorCalculatorTrans #2538 by @b-flo
Recipe
- [Recipe][ESPnet1][ASR] Alignment recipe for CSJ. #2531 by @jnishi
- [Recipe][ESPnet1][ASR] New Recipe for KsponSpeech (Korean spontaneous speech; 969 hours) #2555 by @jubang0219
- [Recipe][ESPnet1][ASR] Update TedLium3 conformer results #2600 by @LiChenda
- [Recipe][ESPnet1][ASR] Update VIVOS models #2574 by @b-flo
- [Recipe][ESPnet1][ASR] Update model link in Puebla-Nahuatl #2607 by @ftshijt
- [Recipe][ESPnet1][ASR] Update tedlium2 with conformer results #2599 by @Emrys365
- [Recipe][ESPnet1][ASR] update the JSUT recipe with conformer #2546 by @sw005320
- [Recipe][ESPnet2][ASR] Add CSJ conformer config #2560 by @kan-bayashi
- [Recipe][ESPnet2][ASR] Add CSJ conformer results #2552 by @kan-bayashi
- [Recipe][ESPnet2][ASR] Small changes for aishell config #2586 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR] Update espnet2 AISHELL results #2580 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR] update JSUT espnet2 with pre-trained models #2563 by @sw005320
- [Recipe][ESPnet2][TTS] Add JSSS recipe for ESPnet2-TTS #2558 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Update ESPnet2 TTS result #2542 by @kan-bayashi
CI
- [CI][Documentation] Support espnet2/bin in sphinx doc. #2544 by @ShigekiKarita
- [CI][Installation][README] Add pytorch1.7.0 ci test #2605 by @kamo-naoyuki
Other
- [Installation] Install warpctc-pytorch wheel when torch version is 1.1 - 1.6 #2547 by @ysk24ok
- [Installation] Modified requirements: "dataclasses; python_version < '3.7'", #2541 by @kamo-naoyuki
- [Installation] Remove pip3 check in setup_python.sh #2567 by @ShigekiKarita
Acknowledgements
Special thanks to @Emrys365, @GNroy, @LiChenda, @ShigekiKarita, @b-flo, @ftshijt, @hirofumi0810, @jnishi, @jubang0219, @kamo-naoyuki, @kan-bayashi, @lumaku, @lzm0706, @siddalmia, @sw005320, @ysk24ok.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.4
New Features
- [New Features][ESPnet1][ASR] Transducer v4 #2444 by @b-flo
- [New Features][ESPnet2] Support audio_format=flac.ark, wav.ark #2451 by @kamo-naoyuki
- [New Features][ESPnet2][ASR] Support conformer encoder in ESPnet2 ASR #2515 by @kan-bayashi
Bugfix
- [Bugfix][ESPnet1] Fixed IndexError in BatchBeamSearch.post_process() (#2483) #2484 by @kan-bayashi
- [Bugfix][ESPnet1][LM] fix multigpu bug if pytorch>=1.5 #2492 by @kamo-naoyuki
- [Bugfix][ESPnet2] remove cleaner #2529 by @kamo-naoyuki
- [Bugfix][ESPnet2][TTS] Fix TTS inference bug for GST + Fastspeech2 #2498 by @kan-bayashi
Documentation
- [Documentation] Update espnet2_tutorial.md #2528 by @kamo-naoyuki
- [Documentation] Update espnet2_tutorial.md #2532 by @kamo-naoyuki
- [Documentation] Update espnet2_tutorial.md #2534 by @kamo-naoyuki
- [Documentation] Update notebook submodule #2499 by @kan-bayashi
- [Documentation][ESPnet1] Small fixes for transducer #2514 by @b-flo
- [Documentation][ESPnet2][README][TTS] Update ESPnet2 TTS README #2516 by @kan-bayashi
- [Documentation][README] Update README #2504 by @kan-bayashi
- [Documentation][README][ESPnet1] CTC segmentation - checks for blank chars and RNN models #2535 by @lumaku
Recipe
- [Recipe][ESPnet1][ASR] add conformer results for librispeech #2510 by @yuekaizhang
- [Recipe][ESPnet2][ASR] Update ESPnet2 CSJ Transformer results #2497 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Add results for ESPnet2 TTS #2503 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Update Transformer-TTS config #2494 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Update Transformer-TTS configs #2502 by @kan-bayashi
Refactoring
- [Refactoring] Modify uttid to "${spkid}-${uttid}" for trn files #2527 by @kamo-naoyuki
- [Refactoring][ESPnet1][ASR][LM] Remove all future lines #2481 by @ShigekiKarita
- [Refactoring][ESPnet1][ASR][MT][ST] Unify arguments #2506 by @hirofumi0810
- [Refactoring][ESPnet1][ESPnet2][TTS] Refactor length regulator to improve the speed #2482 by @kan-bayashi
- [Refactoring][ESPnet1][MT][ST] Refactor decoding for translation tasks #2501 by @hirofumi0810
- [Refactoring][ESPnet2] Change addscalars to addscalar for tensorboard SummaryWriter #2525 by @kamo-naoyuki
CI
- [CI][ASR] Make teste2easr.py faster #2488 by @ShigekiKarita
- [CI][ASR] Make teste2easr_maskctc.py faster. #2493 by @ShigekiKarita
- [CI][ASR] Make test_recog.py faster #2486 by @ShigekiKarita
- [CI][ESPnet1][ASR] make teste2easr_mulenc.py faster #2480 by @ruizhilijhu
- [CI][ESPnet1][Installation] Update shellcheck url. #2500 by @ShigekiKarita
- [CI][ESPnet2][Installation] Limit test execution time to 2.0 sec #2520 by @ShigekiKarita
- [CI][SE] Make testbeamformernet.py faster #2489 by @ShigekiKarita
- [CI][SE] shorten test time for tasnet #2491 by @LiChenda
Other
- [Installation] Update h5py version to avoid errors in Python3.8 #2519 by @shigabeev
- [Docker] Docker Updates #2509 by @Fhrozen
Acknowledgements
Special thanks to @Fhrozen, @LiChenda, @ShigekiKarita, @b-flo, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @lumaku, @ruizhilijhu, @shigabeev, @yuekaizhang.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.3
New Features
- [New Features][ESPnet2] Implement --gradcliptype #2399 by @kamo-naoyuki
- [New Features][ESPnet2][ASR] Implement batch_score() method for ASR decoder and LM #2377 by @kamo-naoyuki
- [New Features][ESPnet2][README][TTS] Support Conformer-based FastSpeech / FastSpeech2 #2413 by @kan-bayashi
Bugfix
- [Bugfix][CI][ESPnet1][ESPnet2] make sure chainer independent #2411 by @kamo-naoyuki
- [Bugfix][CI][ESPnet1][Installation] Revert ctc seg installation #2392 by @kan-bayashi
- [Bugfix][CI][Installation] Fix the installation error in CI #2476 by @kan-bayashi
- [Bugfix][ESPnet1][ASR] Lazy import chainer in asr_utils.py #2407 by @kamo-naoyuki
- [Bugfix][ESPnet1][ASR] asr: Fix recog issue on Transformer CTC model #2394 by @jaesong
- [Bugfix][ESPnet1][MT][ST] Fix score_bleu.sh #2400 by @hirofumi0810
- [Bugfix][ESPnet1][README][Typo] fixed typo in egs/README.md #2473 by @mrazizi
- [Bugfix][ESPnet1][TTS] lazy import chainer: espnet/nets/tts_interface.py #2409 by @kamo-naoyuki
- [Bugfix][ESPnet2] Add missing database in db.sh #2427 by @kan-bayashi
- [Bugfix][ESPnet2] Fix the CommonPreprocessor_multi missing issue #2460 by @LiChenda
- [Bugfix][ESPnet2] Minor fix of egs2/commonvoice/asr1/local/data.sh #2438 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix the directory for initfileprefix #2412 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix typo of log_level choices #2472 by @glynpu
- [Bugfix][ESPnet2][ASR] Add grep -H option #2388 by @kamo-naoyuki
- [Bugfix][ESPnet2][TTS] Fix wrong sum axis in energy extraction #2469 by @kan-bayashi
- [Bugfix][ESPnet2][Typo] Fix typo in help comment and docstrings #2470 by @kan-bayashi
- [Bugfix][Installation] add warpctc_pytorch version==0.1.2 #2403 by @kamo-naoyuki
Documentation
- [Documentation] Add bug report template #2396 by @sw005320
- [Documentation] Add installation issue template #2397 by @sw005320
- [Documentation] Update espnet2_distributed.md #2418 by @kamo-naoyuki
- [Documentation] Update espnet2_distributed.md #2419 by @kamo-naoyuki
- [Documentation] Update espnet2trainingoption.md #2421 by @kamo-naoyuki
- [Documentation] Update faq.md #2431 by @kamo-naoyuki
- [Documentation] Update parallelization.md #2428 by @kamo-naoyuki
- [Documentation][ESPnet2][README] Update README.md #2430 by @kamo-naoyuki
Enhancement
- [Enhancement][ESPnet1][ESPnet2] Add -c option for multi GPUs mode for slurm.conf #2406 by @kamo-naoyuki
- [Enhancement][ESPnet1][Installation] Install warpctc-pytorch wheel when torch version is 1.1, 1.2 or 1.3 #2453 by @ysk24ok
- [Enhancement][ESPnet1][README] ADD CSJ RNN pretrained model #2452 by @jnishi
- [Enhancement][ESPnet2] Update db.sh #2426 by @kamo-naoyuki
- [Enhancement][ESPnet2][TTS] Update ESPnet2 TTS config #2468 by @kan-bayashi
- [Enhancement][ESPnet2][TTS] Update and add fastspeech2 configs #2429 by @kan-bayashi
- [Enhancement][Installation] Add sanity check for setupcudaenv.sh #2389 by @kamo-naoyuki
- [Enhancement][Installation] Change cudatoolkit to cuda if cuda_version=8.0 #2405 by @kamo-naoyuki
- [Enhancement][Installation] Change to refer https://anaconda.org/pytorch/pytorch/files #2404 by @kamo-naoyuki
- [Enhancement][Installation] Workaround for soundfile issue #2437 by @kamo-naoyuki
Recipe
- [Recipe][ESPnet1][ASR] Add LibriCSS recipe #2246 by @akreal
- [Recipe][ESPnet1][ASR] Update for the Official Split of YM Recipe #2435 by @ftshijt
- [Recipe][ESPnet1][ESPnet2][ASR] Update CommonVoice for Latest Version #2455 by @ftshijt
- [Recipe][ESPnet2][ASR] [zeroth korean] Not to use pipe format if feats_type=raw #2402 by @kamo-naoyuki
- [Recipe][ESPnet2][ASR][README] espnet2 zerothkorean recipe changing featstype from fbank_pitch to raw. #2393 by @hchung12
- [Recipe][ESPnet2][README][TTS] Add ESPnet2 TTS finetuning example recipe (JVS) #2465 by @kan-bayashi
CI
- [CI] Add codecov actions. #2467 by @ShigekiKarita
- [CI] Fix hangup of unittests #2424 by @kamo-naoyuki
- [CI] Make espnet2 tts test faster #2461 by @kan-bayashi
- [CI] Make teste2e{asr,st,mt}_{transformer,conformer}.py faster. #2464 by @ShigekiKarita
- [CI] Update .gitignore #2434 by @kan-bayashi
- [CI][ESPnet1] Make test(batch)beam_search.py faster. #2462 by @ShigekiKarita
- [CI][ESPnet1] Support Debian9 and CentOS7 in Github Actions #2457 by @ShigekiKarita
- [CI][ESPnet1][Installation] Fix HKUST recipe #2440 by @kamo-naoyuki
Acknowledgements
Special thanks to @LiChenda, @ShigekiKarita, @akreal, @ftshijt, @glynpu, @hchung12, @hirofumi0810, @jaesong, @jnishi, @kamo-naoyuki, @kan-bayashi, @mrazizi, @sw005320, @ysk24ok.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.2
New Features
- [New Features][ESPnet1] CTC segmentation #2301 by @lumaku
- [New Features][ESPnet2] Support multiple averaged nbest models #2353 by @kamo-naoyuki
- [New Features][ESPnet2] Support recursive add in pack_funcs and add images to packed model #2367 by @kamo-naoyuki
Bugfix
- [Bugfix][ASR][ESPnet1] remove ff_scale from conformer constructor arguments #2356 by @koji-okabe-hub
- [Bugfix][ASR][ESPnet2] use lmexp instead of lmtag for inference_tag #2352 by @kamo-naoyuki
- [Bugfix][CI][ESPnet1][Installation] Remove ctc_segmentation temporary #2385 by @kan-bayashi
- [Bugfix][ESPnet1] Fix import error of conformer module #2384 by @kan-bayashi
- [Bugfix][ESPnet1] Fix issue https://github.com/espnet/espnet/issues/2211 #2219 by @Emrys365
- [Bugfix][ESPnet2] Add missing init.py #2326 by @kan-bayashi
- [Bugfix][ESPnet2] Fix --outfilename option: formatwav_scp.sh #2348 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix amp #2362 by @kamo-naoyuki
- [Bugfix][ESPnet2] add egs2/an4/asr1/local/path.sh #2343 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix recursive add: espnet2/mainfuncs/packfuncs.py #2369 by @kamo-naoyuki
- [Bugfix][ESPnet2] remove unused import #2331 by @kamo-naoyuki
- [Bugfix][ESPnet2][Installation][Typo] fix typo #2344 by @kamo-naoyuki
- [Bugfix][ESPnet2][README] Fix typo #2372 by @Piteryo
- [Bugfix][ESPnet2][TTS] make vietnamese_cleaner to opiton #2341 by @kamo-naoyuki
- [Bugfix][Installation] Fix python version check for chainer #2342 by @kamo-naoyuki
- [Bugfix][Installation] add undefined variable: checkpytorchcuda_compatibility.py #2361 by @kamo-naoyuki
- [Bugfix][TTS] Fix device allocation error in guided attention loss #2282 #2317 by @kan-bayashi
Documentation
- [Documentation] updated comment on the documentation #2351 by @GauravPandey892
- [Documentation][ESPnet2] Update TTS README #2316 by @kan-bayashi
- [Documentation][ESPnet2][README] Update ESPnet2 TTS README #2376 by @kan-bayashi
- [Documentation][ESPnet2][README][TTS] Update README #2330 by @kan-bayashi
- [Documentation][Installation] Devide setuppython.sh into setupvenv.sh and setup_python.sh #2382 by @kamo-naoyuki
- [Documentation][Installation] add a description about check install. #2360 by @sw005320
- [Documentation][README] CTC segmentation - Demo #2347 by @lumaku
- [Documentation][README] Update README.md #2379 by @kamo-naoyuki
Enhancement
- [Enhancement][ESPnet2] Change the default inference model to averaged model instead of the best #2346 by @kamo-naoyuki
- [Enhancement][ESPnet2][TTS] Add pitch and energy stats in packing #2350 by @kan-bayashi
- [Enhancement][Installation] Add checking for pytorch-cuda compatibility in Makefile #2334 by @kamo-naoyuki
- [Enhancement][Installation] Show raw error message when failed to import packages #2374 by @kamo-naoyuki
Refactoring
- [Refactoring] Apply new version black #2366 by @kamo-naoyuki
- [Refactoring][ASR][ESPnet2] Not to add sp to $asrexp if --asr_exp option is specified #2368 by @kamo-naoyuki
- [Refactoring][CI][ESPnet1][ESPnet2][Installation] Add installers for sctk and sph2pipe and create tools/extra_path.sh #2332 by @kamo-naoyuki
- [Refactoring][ESPnet1][Recipe] Disable preparation for lm in wsj recipe #2373 by @kamo-naoyuki
- [Refactoring][ESPnet2] Update Task design #2345 by @kamo-naoyuki
- [Refactoring][ESPnet2][SE] Remove unused option from enh.sh:--feats_normalize #2325 by @kamo-naoyuki
Recipe
- [Recipe][ASR][ESPnet1] MGB-2 #2289 by @AmirHussein96
- [Recipe][ASR][ESPnet1] Remove duplicated class definition of Conformer and update some new results of Aishell1 and Switchboard. #2364 by @pengchengguo
- [Recipe][ASR][ESPnet2][README] ASR WSJ RESULT update: Tuning LM #2355 by @kamo-naoyuki
- [Recipe][ASR][ESPnet2][README] add pretrained model link #2378 by @kamo-naoyuki
CI
- [CI][README] Update ubuntu images in circle ci #2349 by @ShigekiKarita
- [CI][mergify] Update .mergify.yml #2333 by @kamo-naoyuki
- [CI][mergify] Update .mergify.yml #2354 by @kamo-naoyuki
Acknowledgements
Special thanks to @AmirHussein96, @Emrys365, @GauravPandey892, @Piteryo, @ShigekiKarita, @kamo-naoyuki, @kan-bayashi, @koji-okabe-hub, @lumaku, @pengchengguo, @sw005320.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.1
New Features
- [New Features] Add metric option to checkpoint averaging for Transformer #2259 by @hirofumi0810
- [New Features][ESPnet2] Generate run.sh in the experiment dir for resuming #2284 by @kamo-naoyuki
- [New Features][ESPnet2] Support larger numitersper_epoch than the number of batches in small corpus #2255 by @kamo-naoyuki
- [New Features][ESPnet2] Support torch native automatic mixed precision for espnet2 #2257 by @kamo-naoyuki
Documentation
- [Documentation] Update comments in MultiHeadAttention #2266 by @placebokkk
- [Documentation][ESPnet2] append comment in reporter.py #2267 by @kamo-naoyuki
- [Documentation][ESPnet2][README][TTS] Add ESPnet2 TTS recipe document #2312 by @kan-bayashi
Enhancement
- [Enhancement][ESPnet2] Tensorboard stats between iterations #2252 by @kamo-naoyuki
Refactoring
- [Refactoring][ESPnet2] Add some new features and a new recipe for the enhancement task #2238 by @Emrys365
- [Refactoring][Documentation] Remove installation part of Python from Makefile #2245 by @kamo-naoyuki
Recipe
- [Recipe][ASR] aidatatang conformer ESPnet1 recipe #2269 by @nzhoward
- [Recipe][ESPnet2] espnet2 zeroth_korean recipe #2279 by @hchung12
Bug fix
- [Bugfix] Fix #2295 #2311 by @kan-bayashi
- [Bugfix] Minor fix for Makefile #2268 by @kamo-naoyuki
- [Bugfix] Not to install cupy-cuda* for python>=3.8 #2277 by @kamo-naoyuki
- [Bugfix] Remove channel: setup_anaconda.sh #2303 by @kamo-naoyuki
- [Bugfix][ASR] ngram single decoding bug fix #2299 by @qmpzzpmq
- [Bugfix][ASR][ESPnet2] Add missing init.py #2292 by @kamo-naoyuki
- [Bugfix][ASR][ESPnet2] decode -> inference #2276 by @kamo-naoyuki
- [Bugfix][ASR][ESPnet2] remove chainer dependency from showasrresult.sh #2281 by @kamo-naoyuki
- [Bugfix][ESPnet2] Avoid illegal summary name for tensorboard #2294 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix averagenbestmodels for pytorch=1.6 #2283 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix decode config extension in ESPnet2 CSJ recipe #2258 by @kan-bayashi
- [Bugfix][ESPnet2] Fix for queue-freegpu.pl #2274 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix samplers about minbatchsize #2305 by @kamo-naoyuki
- [Bugfix][ESPnet2] Workaround for SGE jobname issue #2253 by @kamo-naoyuki
- [Bugfix][ESPnet2] add missing shebang #2306 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix bug of reporter #2263 by @kamo-naoyuki
- [Bugfix][ESPnet2][Recipe] Update zeroth_korean #2308 by @kamo-naoyuki
- [Bugfix][ESPnet2][SE] add --spk-num 1 #2285 by @kamo-naoyuki
- [Bugfix][ESPnet2][distributed] Not to save config.yaml if rank!=0 #2287 by @kamo-naoyuki
Others
- [CI] Remove unnecessary installation when CI #2307 by @kamo-naoyuki
- [CI] Take integration tests into coverage #2254 by @ShigekiKarita
- [CI][ESPnet2] Add coverage measure for espnet2 integration test #2256 by @kamo-naoyuki
- [CI][Installation] Install wheel #2304 by @kamo-naoyuki
Acknowledgements
Special thanks to @Emrys365, @ShigekiKarita, @hchung12, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @nzhoward, @placebokkk, @qmpzzpmq.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.9.0
New Features
- [New Features][ASR] Non-autoregressive ASR with Mask CTC #2070 by @YosukeHiguchi
- [New Features][ASR] Support Conformer model. #2144 by @pengchengguo
- [New Features][ASR][ST] CTC posterior visualization during training #2221 by @hirofumi0810
- [New Features][ESPnet2] Implement espnet2.bin.zenodo_upload #2168 by @kamo-naoyuki
- [New Features][ESPnet2] Python API for inference #2092 by @kamo-naoyuki
- [New Features][ESPnet2] Support TTS-Transformer in ESPnet2 #2134 by @kan-bayashi
- [New Features][ESPnet2][ASR] Enable batch joint decoding with CTC in recog API v2 #2197 by @takaaki-hori
- [New Features][ESPnet2][SE] Speech Enhancement Frontend for ESPNet2 Phase 1 #2124 by @LiChenda
- [New Features][ESPnet2][TTS] Support FastSpeech for ESPnet2 TTS #2149 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support FastSpeech2 (+FastPitch) #2218 by @kan-bayashi
- [New Features][ESPnet2][TTS] Support GST in ESPnet2 TTS #2139 by @kan-bayashi
- [New Features][README][ASR] CTC forced alignment in E2E ASR Transformer model #2095 by @simpleoier
- [New Features][VC] Voice Transformer Network #2064 by @unilight
Enhancement
- [Enhancement] Fix error when downloading large files using
download_from_google_drive.sh#2074 by @unilight - [Enhancement][ASR] added more beam search info #2130 by @sw005320
- [Enhancement][ESPnet2] Change packed file of espnet2 to zip format #2161 by @kamo-naoyuki
- [Enhancement][ESPnet2] Make read_text faster #2114 by @kamo-naoyuki
- [Enhancement][ESPnet2] RESULTS.md -> README.md #2077 by @kamo-naoyuki
- [Enhancement][ESPnet2] Remove long wave in template recipe #2075 by @kamo-naoyuki
- [Enhancement][ESPnet2] Update ESPnet2 JSUT TTS recipe and TTS template #2110 by @kan-bayashi
- [Enhancement][MT][ST] Fix ST/MT models for compatibility with ASR #2179 by @hirofumi0810
- [Enhancement][ST] Add source case information to json files in ST task #2208 by @hirofumi0810
- [Enhancement][ST] Refactor multi-task learning in ST #2202 by @hirofumi0810
Recipe
- [Recipe][ASR] Add aidatatang_200zh recipe #2122 by @nzhoward
- [Recipe][ASR] Add chime6 info #2250 by @sw005320
- [Recipe][ASR] CHiME-6 recipe #2171 by @GNroy
- [Recipe][ASR] Fix a bug in espnet wsj recipe. #2145 by @houwenxin
- [Recipe][ASR] New Recipe for Yoloxóchitl-Mixtec (SLR89) #2085 by @ftshijt
- [Recipe][ASR] Support averaging model for Conformer. #2244 by @pengchengguo
- [Recipe][ASR] Updated model after tuning aidatatang_200zh recipe #2204 by @nzhoward
- [Recipe][ASR] created a recipe to run asr on ljspeech #1996 by @ibkuroyagi
- [Recipe][ASR] updatemodel link (add pre-trained bpe model and lm model) #2101 by @ftshijt
- [Recipe][ESPnet2][ASR] espnet2 librispeech recipe #2109 by @sw005320
- [Recipe][ESPnet2][ASR] espnet2 librispeech v2 #2189 by @sw005320
- [Recipe][ESPnet2][ASR] update espnet2 aishell results #2150 by @Cescfangs
- [Recipe][ESPnet2][ASR][TTS] fix devset/evalsets issues #2142 by @sw005320
- [Recipe][ESPnet2][TTS] Add ESPnet2 CSMSC TTS recipe #2129 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Add ESPnet2 LJSpeech recipe #2117 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Add VCTK recipe for ESPnet2 TTS #2165 by @kan-bayashi
- [Recipe][ESPnet2][TTS] Create espnet2 jsut/tts recipe #2047 by @kamo-naoyuki
Refactoring
- [Refactoring][ESPnet2] Change stats_dir naming not to overwrite #2111 by @kan-bayashi
- [Refactoring][ESPnet2] Move modules #2086 by @kamo-naoyuki
- [Refactoring][ESPnet2] Remove $KALDI_ROOT/tools/env.sh from path.sh #2242 by @kamo-naoyuki
- [Refactoring][ESPnet2] Several update for pretrain model #2212 by @kamo-naoyuki
- [Refactoring][ESPnet2] Update Makefile #2225 by @kamo-naoyuki
Documentation
- [README] Fix URL in README #2090 by @kan-bayashi
- [README] Update README about TTS #2079 by @kan-bayashi
- [README] Update README.md #2046 by @kamo-naoyuki
- [README] Update README.md #2067 by @kamo-naoyuki
- [README] Update README.md #2243 by @kamo-naoyuki
- [README] Update citation #2206 by @hirofumi0810
- [README] Update installation.md #2233 by @kamo-naoyuki
- [README][ESPnet2] Update egs2/TEMPLATE/README.md #2098 by @kamo-naoyuki
Bugfix
- [Bugfix] Add cupy.done in make python #2091 by @kan-bayashi
- [Bugfix] Append a missing space in cmd-line args in utils/dump_pcm.sh #2209 by @yistLin
- [Bugfix] Fix Makefile #2097 by @kamo-naoyuki
- [Bugfix] Fix minor bug of Makefile #2055 by @kamo-naoyuki
- [Bugfix] Fix old model compatibility #2048 #2060 #2063 by @kan-bayashi
- [Bugfix] Fix pretrained model #2053 #2069 by @kan-bayashi
- [Bugfix] Fix pyopenjtalk installation #2108 by @kan-bayashi
- [Bugfix] Fix typo in run.sh of TTS recipes #2216 by @hirofumi0810
- [Bugfix] Update Makefile to disable cupy for cuda=10.2 or later #2230 by @kamo-naoyuki
- [Bugfix] fix path of PESQ #2058 by @kamo-naoyuki
- [Bugfix] scorerinterface warning English correction #2076 by @qmpzzpmq
- [Bugfix][CI] Fix bug in attention plotting #2185 by @hirofumi0810
- [Bugfix][CI] Freeze the matplotlib version with 3.1.0 #2181 by @sw005320
- [Bugfix][CI] fix integrationtestctcalignwav.bats with a small model #2170 by @simpleoier
- [Bugfix][CI] temporally disable subsample 6 and 8 tests #2205 by @sw005320
- [Bugfix][CI][MT][ST] Add integration test for ST/MT tasks #2210 by @hirofumi0810
- [Bugfix][ESPnet2] Add missing path.sh in egs2/vctk/tts1 #2167 by @kan-bayashi
- [Bugfix][ESPnet2] Fix TTS inference #2222 by @kan-bayashi
- [Bugfix][ESPnet2] Fix
tts_inferencewhenfeats_extractis None #2176 by @kan-bayashi - [Bugfix][ESPnet2] Fix bug for feats_type=extracted #2087 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix bug of iterable dataset when num_workers>=1 #2081 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix bug of when espnet2/bin/tokenizetext.py --cutoff or --vocabularysize is used #2158 by @kamo-naoyuki
- [Bugfix][ESPnet2] Fix log: benchmark -> deterministic #2080 by @kamo-naoyuki
- [Bugfix][ESPnet2] Implement configargparse in espnet2 #2157 by @kamo-naoyuki
- [Bugfix][ESPnet2] Select torchaudio version according to torch version #2214 by @kamo-naoyuki
- [Bugfix][ESPnet2] avoid UnboundLocalError when lm is not loaded #2227 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix #2050 #2051 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix #2198: PhonemeTokenizer can't perform with multiprocessing #2201 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix bestmodelcriterion: wsj/asr1/conf/tuning/train_lm.yaml #2153 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix bug of lm.py #2056 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix the stage number: enh.sh #2220 by @kamo-naoyuki
- [Bugfix][ESPnet2] fix: decodeconfig -> inferenceconfig #2239 by @kamo-naoyuki
- [Bugfix][ESPnet2][Recipe] Not removing short/long utterances for eval_sets #2112 by @kamo-naoyuki
- [Bugfix][ESPnet2][SE] Fix bugs in espnet2/enh and format related directory structures #2215 by @Emrys365
- [Bugfix][ESPnet2][TTS] Fix feature extractor of TTS for compatibility #2102 by @kamo-naoyuki
Acknowledgements
Special thanks to @Cescfangs, @Emrys365, @GNroy, @LiChenda, @YosukeHiguchi, @ftshijt, @hirofumi0810, @houwenxin, @ibkuroyagi, @kamo-naoyuki, @kan-bayashi, @nzhoward, @pengchengguo, @qmpzzpmq, @simpleoier, @sw005320, @takaaki-hori, @unilight, @yistLin.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.8.0
ESPnet2
- [ESPnet2] Solve memory issue with super large corpus training #1972 by @kamo-naoyuki
- [ESPnet2] Added model parameter count to trainer #1867 by @SeanNaren
- [ESPnet2] Refactoring espnet2/utils/fileio.py -> espnet2/fileio #1807 by @kamo-naoyuki
New Features
- [New Features] Lightweight and Dynamic Convolutions. #1599 by @yuyfujit
- [New Features] Implement Ngram scorer #1946 by @qmpzzpmq
- [New Features] resampling in utils/compute-fbank-feats.py and utils/compute-stft-feats.py #2035 by @kamo-naoyuki
Enhancement
- [Enhancement] Ngram scorer update #1992 by @qmpzzpmq
Documentation
- [Documentation] fix a typo for the decoder addargumentgroup #2030 by @sw005320
- [Documentation] Update multiple GPU descriptions. #2016 by @sw005320
- [Documentation] Finetuning doc + freezing parameters option #1897 by @b-flo
Bugfix
- [Bugfix] Fix memory issue when resuming #2040 by @kamo-naoyuki
- [Bugfix] fixed typo in cmvn.py #1988 by @gullyboy007
- [Bugfix] update notebook #1986 by @ShigekiKarita
- [Bugfix] Fix freezing modules (when using multi-gpu) #1983 by @atozto9
- [Bugfix] Fix BLEU/PPL calculation during training #2009 by @hirofumi0810
- [Bugfix] Fix download file extension #2042 by @takenori-y
- [Bugfix] fix tedlium2/3 model link #2032 by @sw005320
- [Bugfix] Fix bug for pure Transformer-CTC #2023 by @hirofumi0810
- [Bugfix] li42 recipe: add li42 results; fix bug in adding language id "zh_TW" #1950 by @houwenxin
CI
- [CI] Add espnet2 in ci/doc.sh #1976 by @ShigekiKarita
- [CI] Add test for pytorch1.5 #1881 by @kamo-naoyuki
Acknowledgements
Special thanks to @SeanNaren, @ShigekiKarita, @atozto9, @b-flo, @gullyboy007, @hirofumi0810, @houwenxin, @kamo-naoyuki, @qmpzzpmq, @sw005320, @takenori-y, @yuyfujit.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.7.0
Now, the ESPnet project moves on to a new endeavor! We launched espnet2, which aims to refine the modularities (chainer-free, kaldi-free), use a more customizable trainer, support distributed training, and achieve the scalability mainly led by @kamo-naoyuki with his great efforts and leadership. This project is one of the outcomes of our ESPnet hackathon in Tokyo 2019 with a lot of discussions about the design, new features, and community contributions. espnet2 currently supports main ASR recipes (with a well-designed recipe template) and limited TTS recipes. We maintain both espnet1 and espnet2, but gradually move to our development in espnet2. The ESPnet project is further accelerated!
ESPnet2
- [ESPnet2] keep the latest model #1769 by @kamo-naoyuki
- [ESPnet2] Remove "E2E" from all comments #1766 by @kamo-naoyuki
- [ESPnet2] Refactoring for ESPnetDataset #1758 by @kamo-naoyuki
- [ESPnet2] Implement SpecAug for ESPnet2 #1746 by @kamo-naoyuki
- [ESPnet2] Implement BatchBinSampler #1742 by @kamo-naoyuki
- [ESPnet2] Support torch_optimizer #1739 by @kamo-naoyuki
- [ESPnet2] Log rotation for launch.py #1737 by @kamo-naoyuki
- [ESPnet2] Change the type of --chunklength to stror_int #1733 by @kamo-naoyuki
- [ESPnet2] Change cudnn deterministic mode to default #1732 by @kamo-naoyuki
- [ESPnet2] Add wsj results for espnet2 #1724 by @kamo-naoyuki
- [ESPnet2] Show estimated time to finish #1717 by @kamo-naoyuki
- [ESPnet2] Add --name option for training job #1714 by @kamo-naoyuki
- [ESPnet2] Show the log file when training process is failed: espnet2.bin.launch.py #1713 by @kamo-naoyuki
- [ESPnet2] --maxlength -> --foldlength #1712 by @kamo-naoyuki
- [ESPnet2] Double quoter for NCCLSOCKETIFNAME #1706 by @kamo-naoyuki
- [ESPnet2] Save apex state in checkpoint and support apex optimizer #1705 by @kamo-naoyuki
- [ESPnet2] Update asr.sh #1694 by @zh794390558
- [ESPnet2] Update ctc.py #1688 by @zh794390558
- [ESPnet2] Update launch.py #1681 by @zh794390558
- [ESPnet2] Update asr.sh #1678 by @zh794390558
- [ESPnet2] --keepnbestcheckpoints -> --keepnbest_models #1647 by @kamo-naoyuki
- [ESPnet2] Avoid deprecated warning: reduction="none" #1510 by @kamo-naoyuki
- [ESPnet2] Minor change for speed perturbation #1627 by @kamo-naoyuki
- [ESPnet2] Fix how2 recipe #1620 by @kamo-naoyuki
- [ESPnet2] Fix recipes #1617 by @kamo-naoyuki
- [ESPnet2] Renaming #1610 by @kamo-naoyuki
- [ESPnet2] Implement chunk iterator #1608 by @kamo-naoyuki
- [ESPnet2] Update voxforge RESULTS #1601 by @kamo-naoyuki
- [ESPnet2] vivos recipe: --audio_format wav #1592 by @kamo-naoyuki
- [ESPnet2] Lower python requirements to 3.6 #1565 by @kamo-naoyuki
- [ESPnet2] dirha_wsj recipe for espnet2 #1556 by @yuekaizhang
- [ESPnet2] Update AISHELL ASR Recipe #1549 by @Emrys365
- [ESPnet2] Remove short data #1531 by @kamo-naoyuki
- [ESPnet2] [WIP] Update JSUT ASR Recipe #1529 by @YosukeHiguchi
- [ESPnet2] Update HOW2 recipe #1522 by @b-flo
- [ESPnet2] [WIP] Update CSJ ASR Recipe #1520 by @YosukeHiguchi
- [ESPnet2] Change NoamLR to deprecated and implement WarmupLR #1519 by @kamo-naoyuki
- [ESPnet2] Implement --maxcachesize option #1509 by @kamo-naoyuki
- [ESPnet2] distributed training #1506 by @kamo-naoyuki
- [ESPnet2] ESPNet2 Recipe Update -- commonvoice, babel, ami #1504 by @ftshijt
- [ESPnet2] Refactoring #1494 by @kamo-naoyuki
- [ESPnet2] Fix ci of flake8 part #1491 by @kamo-naoyuki
- [ESPnet2] Tensorboard, --numitersper_epoch, etc. #1487 by @kamo-naoyuki
- [ESPnet2] Fix espnet2.bin.pack #1486 by @kamo-naoyuki
- [ESPnet2] show_result.sh #1478 by @kamo-naoyuki
- [ESPnet2] Pack and Unpack model #1477 by @kamo-naoyuki
- [ESPnet2] collect-stats mode, trainer class, etc. #1462 by @kamo-naoyuki
- [ESPnet2] add test codes for asr decoders #1445 by @kamo-naoyuki
- [ESPnet2] Integrate Griffin-Lim with tts_decode() #1442 by @kan-bayashi
- [ESPnet2] Update ASR recipe #1439 by @kan-bayashi
- [ESPnet2] Update TTS recipes #1430 by @kan-bayashi
- [ESPnet2] Disable wer/cer calculation when training #1547 by @kamo-naoyuki
- [ESPnet2] Change CTC default to builtin #1546 by @kamo-naoyuki
- [ESPnet2] Update chime4 asr1 Recipe #1570 by @yuekaizhang
- [ESPnet2] Create documentation for espnet2 #1710 by @kamo-naoyuki
- [ESPnet2] shellcheck for local/data.sh #1524 by @kamo-naoyuki
- [ESPnet2] commonvoice: RESULTS.md -> README.md #1797 by @kamo-naoyuki
Bugfix
- [Bugfix] % -> percent: espnet2/tasks/abs_task.py #1767 by @kamo-naoyuki
- [Bugfix] Fix gpu mode for tts_inference.py #1755 by @kamo-naoyuki
- [Bugfix] Fix SubReporter #1748 by @kamo-naoyuki
- [Bugfix] Fix calculateallattentions for espnet2 #1747 by @kamo-naoyuki
- [Bugfix] Not to create the averaged mdel if --keepnbestmodels=1 #1744 by @kamo-naoyuki
- [Bugfix] Fix --bestmodelcriterions #1743 by @kamo-naoyuki
- [Bugfix] Fix the gpu device when resuming #1731 by @kamo-naoyuki
- [Bugfix] Fix error log for espnet2/bin/launch.py #1730 by @kamo-naoyuki
- [Bugfix] Disable CUDNN deterministic for CTC: espnet2/asr/ctc.py #1720 by @kamo-naoyuki
- [Bugfix] Update default.py #1698 by @zh794390558
- [Bugfix] Fix chunk iterator and refactoring for distributed training #1685 by @kamo-naoyuki
- [Bugfix] Update vggrnnencoder.py #1676 by @zh794390558
- [Bugfix] [ESPnet2] chmod +x: run.sh for JSUT #1628 by @kamo-naoyuki
- [Bugfix] [ESPnet2]Remove nlsyms when word scoring #1614 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Fix setup.sh #1596 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Fix launch.py for slurm #1588 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Fix ci for local/data.sh #1572 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Fix nj of scripts/audio/formatwavscp.sh #1550 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Use loadscpsequential in formartwavscp.py #1541 by @kamo-naoyuki
- [Bugfix] [ESPNet2] Minor fix for CSJ recipe #1540 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Fix transformer #1539 by @kamo-naoyuki
- [Bugfix] [ESPnet2] fix rnn_type when bidirectional is used #1533 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Fix formatwavscp.py #1532 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Fix bug of using GPU even if CPU mode #1526 by @kamo-naoyuki
- [Bugfix] [ESPnet2 ] Fix --accum_grad #1525 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Fix voxforge config #1511 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Bug fix of splitting files for collect_stats mode #1505 by @kamo-naoyuki
- [Bugfix] fix to use queue.conf #1431 by @sw005320
- [Bugfix] [ESPnet2] Fix a bug in TTS #1428 by @kan-bayashi
- [Bugfix] [ESPnet2] Refactor Encoder and Decoder and bug fix #1427 by @kamo-naoyuki
- [Bugfix] [ESPnet2] Fix bug of text-chars converter #1426 by @kamo-naoyuki
- [Bugfix] Optionize trans_type in egs/ljspeech/tts2 #1789 by @kan-bayashi
- [Bugfix] bugfix in ljspeech/tts2 #1783 by @beckgom
- [Bugfix] missing argument for local/data_prep.sh added #1782 by @beckgom
- [Bugfix] avoid sentencepiece==0.1.90 #1923 by @kamo-naoyuki
- [Bugfix] FIX E523,E541,E741 #1918 by @kamo-naoyuki
- [Bugfix] fix reverse option for cmvn #1906 by @magictron
- [Bugfix] Error handling for Transformer with CTC-based VAD #1875 by @takenori-y
- [Bugfix] Revert deletion of init files #1842 by @Fhrozen
- [Bugfix] fix the missing link of tedlium3 #1841 by @sw005320
- [Bugfix] Add test for torch>1.1 #1840 by @kamo-naoyuki
- [Bugfix] Fix #1808: change the argument order of --batch_type for collect stat… #1810 by @kamo-naoyuki
- [Bugfix] Change to configargparse>=1.2.1 #1803 by @kamo-naoyuki
- [Bugfix] typo fixed for attention type #1793 by @beckgom
- [Bugfix] fix https://github.com/espnet/espnet/issues/1780 #1784 by @qmeeus
- [Bugfix] Fix bug of espnet2 asr_inference.py #1952 by @kamo-naoyuki
- [Bugfix] Minor fix of import place and comments #1959 by @kan-bayashi
New Features
- [New Features] Add utils/translate_wav.sh #1530 by @ShigekiKarita
- [New Features] Batch beam search V2 for Transformer (no CTC) #1402 by @ShigekiKarita
Enhancement
- [Enhancement] Support multiple sentences in synth_wav.sh #1788 by @kan-bayashi
- [Enhancement] fix+update transducer #1760 by @b-flo
Documentation
- [Documentation] Update notebook #1963 by @kan-bayashi
- [Documentation] Update installation manual #1960 by @kan-bayashi
- [Documentation] Update installation.md #1957 by @kamo-naoyuki
- [Documentation] Add note in synth_wav.sh #1785 by @kan-bayashi
- [Documentation] Update docs #1954 #1955 by @kamo-naoyuki
- [Documentation] Update docs #1938 by @kamo-naoyuki
- [Documentation] docs: added fbank link to the experiment readme #1910 by @kdubovikov
Recipe
- [Recipe] Added some TIMIT results #1819 by @sknadig
- [Recipe] add recipe for French Polyphone: ELRA-S0030_02 #1711 by @AdolfVonKleist
- [Recipe] Use espnetttsfrontend #1794 by @kamo-naoyuki
CI
- [CI] Use cache in actions #1917 by @ShigekiKarita
- [CI] Apply black #1850 by @kamo-naoyuki
- [CI] Create .mergify.yml #1813 by @kamo-naoyuki
Acknowledgements
Special thanks to @AdolfVonKleist, @Emrys365, @Fhrozen, @ShigekiKarita, @YosukeHiguchi, @beckgom, @b-flo, @ftshijt, @kamo-naoyuki, @kan-bayashi, @kdubovikov, @magictron, @qmeeus, @sknadig, @sw005320, @takenori-y, @yuekaizhang, @zh794390558
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.6.3
New Features
- [New Features] VCC2020 baseline recipe #1641 by @unilight
- [New Features] Embed defaultlm #1623 by @qmpzzpmq
Enhancement
- [Enhancement] add test -d $(KALDI): tools/Makefile #1718 by @kamo-naoyuki
- [Enhancement] Add option to load pretrained model in TTS #1639 by @kan-bayashi
- [Enhancement] Add reverse_direction option to MT #1658 by @hirofumi0810
Recipe
- [Recipe] Remove unnecessary lines on Fisher-CallHome Spanish #1650 by @hirofumi0810
- [Recipe] Add the Aishell2 recipe for the master branch. #1615 by @pengchengguo
- [Recipe] Reformat the RESULTS.md in vivos #1689 by @sw005320
Documentation
- [Documentation] Added multiple GPU TIPS #1734 by @sw005320
- [Documentation] added pure attention decoding TIPS #1725 by @sw005320
Docker
- [Docker] Docker local updates #1677 by @Fhrozen
- [Docker] Docker updates #1624 by @Fhrozen
Bugfix
- [Bugfix] fix #1751 #1779 by @qmpzzpmq
- [Bugfix] Fix v.0.3.0 pretrained Transformer model compatibility #1778 by @ShigekiKarita
- [Bugfix] Fix torch.ctc not implemented in float16 by casting float32 #1777 by @ShigekiKarita
- [Bugfix] Workaround for bug of configargparse==1.2 #1764 by @kamo-naoyuki
- [Bugfix] change train_iter to be the dataloader object #1741 by @bobchennan
- [Bugfix] fix #1634 #1719 by @kamo-naoyuki
- [Bugfix] [VCC2020 baseline] Extra reference set #1684 by @unilight
- [Bugfix] missing torch version in check_install.py #1675 by @beckgom
- [Bugfix] Fix model link in the tedlium2 recipe #1662 by @sw005320
- [Bugfix] Update Install for Pytorch version #1659 by @Fhrozen
- [Bugfix] Fix lm compatibility for v2 #1653 by @kan-bayashi
- [Bugfix] correct results with builtin CTC and PyTorch 1.3 in WSJ recipe #1652 by @Emrys365
- [Bugfix] Fix lm backward compatibility #1649 by @kan-bayashi
- [Bugfix] fix #1604 #1626 by @TitouanT
- [Bugfix] Fix a bug in csmsc recipe #1618 by @kan-bayashi
- [Bugfix] Update e2easrcommon.py #1735 by @zh794390558
- [Bugfix] remove non-available options #1738 by @sw005320
Acknowledgements
Special thanks to @Emrys365, @Fhrozen, @ShigekiKarita, @TitouanT, @beckgom, @bobchennan, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @pengchengguo, @qmpzzpmq, @sw005320, @unilight, @zh794390558.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 5 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.6.2
New Features
- [New Features] Transducer v3 (w/ transformer support for encoder/decoder) #1422 by @b-flo
- [New Features] Improving LM training (custom optimizer, custom scheduler, Transformer LM, etc) #1246 by @ShigekiKarita
Enhancement
- [Enhancement] Add MelGAN pretrained model and support in demo notebook #1581 by @kan-bayashi
Recipe
- [Recipe] Update fisher-callhome results #1606 by @hirofumi0810
- [Recipe] Update run_rnnt.sh #1602 by @qmpzzpmq
- [Recipe] Upload Must-C models #1594 by @hirofumi0810
- [Recipe] Upload Libri trans models #1569 by @hirofumi0810
- [Recipe] Upload How2 models #1568 by @hirofumi0810
- [Recipe] Add Mboshi-French corpus #1545 by @hirofumi0810
- [Recipe] Update WSJ results using PyTorch 1.3.1 and builtin CTC #1527 by @Emrys365
- [Recipe] [WIP] IWSLT2016 Recipe #1492 by @butsugiri
- [Recipe] Update for Common Voice recipe & Multilingual training recipe #1485 by @ftshijt
- [Recipe] [WIP] DiPCo Recipe #1472 by @Fhrozen
Documentation
- [Documentation] Support markdown-table for sphinx #1611 by @kamo-naoyuki
- [Documentation] update docs & README.md #1605 by @kamo-naoyuki
- [Documentation] fix a link within README.md #1584 by @sw005320
- [Documentation] Add MT result #1576 by @butsugiri
- [Documentation] update readme to include Linux installation guides from CI #1567 by @sw005320
- [Documentation] Update WSJ results in the main README.md #1537 by @Emrys365
Bugfix
- [Bugfix] Fix a typo in AMI script? #1595 by @HuangZiliAndy
- [Bugfix] ruopenstt recipe bug fix #1589 by @qmpzzpmq
- [Bugfix] Fix pure CTC decoding #1580 by @takaaki-hori
- [Bugfix] fix snapshot/model test condition #1577 by @IceCreamWW
- [Bugfix] Fix IWSLT16 Script Permission #1543 by @butsugiri
- [Bugfix] Fix bug in MT training script #1515 by @hirofumi0810
- [Bugfix] Use Markdown table instead for WER results #1514 by @lijunzh
- [Bugfix] Fix a compatibility problem with PyTorch 1.3.0 in ESPnet (v0.6.0) #1421 by @Emrys365
Acknowledgements
Special thanks to @Emrys365, @Fhrozen, @HuangZiliAndy, @IceCreamWW, @ShigekiKarita, @b-flo, @butsugiri, @ftshijt, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @lijunzh, @qmpzzpmq, @sw005320, @takaaki-hori.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.6.1
Happy new year!
New Features
- [New Features] Transformer NMT #1479 by @hirofumi0810
- [New Features] Support knowledge distillation in FastSpeech training #1415 by @kan-bayashi
- [New Features] Support attention constraint for Tacotron 2 #1407 by @kan-bayashi
Enhancement
- [Enhancement] Add focus rate logging in decoding #1412 by @kan-bayashi
- [Enhancement] Support Tacotron 2 as a teacher of FastSpeech #1406 by @kan-bayashi
- [Enhancement] Support length-weighted normalization in loss calculation #1397 by @kan-bayashi
- [Enhancement] Transformer End-to-End Speech Translation #1348 by @hirofumi0810
Recipe
- [Recipe] Add LM training/decoding in swbd recipe #1463 by @YosukeHiguchi
- [Recipe] Add Fisher-CallHome asr1b recipe #1390 by @hirofumi0810
- [Recipe] RECIPE JESC for MT #1346 by @Fhrozen
Documentation
- [Documentation] added interspeech 2019 tutorial link and performed spell check #1476 by @sw005320
- [Documentation] Updated README in ljspeech about FastSpeech training #1468 by @kan-bayashi
- [Documentation] Add knowledge dist based FastSpeech link in README #1465 by @kan-bayashi
Refactoring
- [Refactoring] Unify TTS Transformer mask with ASR Transformer #1470 by @kan-bayashi
Bugfix
- [Bugfix] fixed a small problem in run.sh #1466 by @Peidong-Wang
- [Bugfix] Fix wrong SC2026 fixing #1458 by @kan-bayashi
- [Bugfix] Fix multi-encoder ASR integration test #1432 by @ShigekiKarita
- [Bugfix] Fix wrong type float -> int #1413 by @kan-bayashi
- [Bugfix] Fix missing key error in Tacotron2 #1408 by @kan-bayashi
- [Bugfix] TransformerST on Fisher-Callhome #1398 by @hirofumi0810
- [Bugfix] fix rnnlm load bug #1391 by @Cescfangs
- [Bugfix] Fix gradient accumlation #1388 by @hirofumi0810
Acknowledgements
Special thanks to @Cescfangs, @Fhrozen, @Peidong-Wang, @ShigekiKarita, @YosukeHiguchi, @hirofumi0810, @kan-bayashi, @sw005320.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.6.0
New Features
- [New Features] Support Parallel WaveGAN #1333 by @kan-bayashi
- [New Features] Support save snapshot by iteration #1204 by @fanlu
- [New Features] Multi-encoder architecture with hierarchical attention and per-encoder CTC #1193 by @ruizhilijhu
- [New Features] Support multiple inputs #1180 by @ruizhilijhu
- [New Features] Add E2E-ST specific modules #1139 by @hirofumi0810
Enhancement
- [Enhancement] Fixing compatibility problems with PyTorch 1.3.0 in ESPnet (v0.5.3) #1343 by @Emrys365
- [Enhancement] Change log level info -> warning about batchsize #1336 by @kan-bayashi
- [Enhancement] Support batch decoding for streaming E2E #1270 by @takenori-y
- [Enhancement] Implement attention cache in Transformer for faster decoding #1240 by @ShigekiKarita
Bugfix
- [Bugfix] Fix pretrained model URL for master #1351 by @kan-bayashi
- [Bugfix] Return parser in add_arguments method for transducer #1337 by @b-flo
- [Bugfix] Disabling nonlinear activation of the last encoder layer #1323 by @simpleoier
- [Bugfix] Fixed error: "Expected object of device type cuda but got device type cpu" in decoder of transducer #1315 by @rai4
- [Bugfix] Fix ASR eval for TTS in the case of trans_type=phn #1368 by @kan-bayashi
- [Bugfix] Make --preprocessconf optional in packmodel.sh #1365 by @kan-bayashi
- [Bugfix] Remove set start method to fix #1290 #1363 by @kan-bayashi
- [Bugfix] Fix pretrained model URL #1354 by @kan-bayashi
- [Bugfix] Fix pretrained model URL #1350 by @kan-bayashi
- [Bugfix] Fix TTS transformer attention weight calculation in inference #1331 by @kan-bayashi
- [Bugfix] Fix decoding for chainer transformer #1101 by @Fhrozen
Recipe
- [Recipe] Update libri_trans asr recipe #1344 by @hirofumi0810
- [Recipe] Update LJSpeech to limit frequency range #1330 by @kan-bayashi
- [Recipe] IWSLT19 Speech Translation recipe #1169 by @hirofumi0810
- [Recipe] Must-C NMT recipe #1168 by @hirofumi0810
- [Recipe] How2 NMT recipe #1165 by @hirofumi0810
- [Recipe] Update how2 recipe #1148 by @hirofumi0810
- [Recipe] Pre-trained CSJ model #1341 by @takenori-y
- [Recipe] TTS: add FastSpeech config and result for jsut #1321 by @r9y9
- [Recipe] Asr commonvoice recipe update #1241 by @ftshijt
Documentation
- [Documentation] Update notebook submodule #1367 by @kan-bayashi
- [Documentation] Fix sphinx warning of TTS modules #1366 by @kan-bayashi
- [Documentation] Update notebook and add to Sphinx document #1364 by @kan-bayashi
- [Documentation] Update notebook #1352 by @kan-bayashi
- [Documentation] Doc for Chainer transformer #1017 by @Fhrozen
- [Documentation] Update README #1342 by @takenori-y
Refactoring
- [Refactoring] Indirect call for training method [chainer] #1256 by @Fhrozen
- [Refactoring] Refact transformer for transformer LM #1223 by @Fhrozen
- [Refactoring] Refine NMT #1152 by @hirofumi0810
- [Refactoring] Small changes in chainer backend #1110 by @Fhrozen
- [Refactoring] Format Chainer E2E transformer forward (fixed) #1034 by @Fhrozen
Acknowledgements
Special thanks to @Emrys365, @Fhrozen, @ShigekiKarita, @b-flo, @fanlu, @ftshijt, @hirofumi0810, @kan-bayashi, @r9y9, @rai4, @ruizhilijhu, @simpleoier, @takenori-y.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.4
Bugfix
- [Bugfix] Fixed pretrained model URL in CSMSC reicpe #1314 by @kan-bayashi
- [Bugfix] Fix CSMSC wavenet link #1298 by @kan-bayashi
- [Bugfix] Minor fix of FastSpeech #1295 by @kan-bayashi
- [Bugfix] [bug fixing] Using inplace maskedfill() #1273 by @Emrys365
- [Bugfix] Fix RuntimeError in setting spawn multiple times #1267 by @kan-bayashi
- [Bugfix] Use spawn in multiprocessing to fix #404 #1251 by @kan-bayashi
Documentation
- [Documentation] Update README.md #1309 by @kan-bayashi
- [Documentation] Fix docstrings #1288 by @kan-bayashi
- [Documentation] Fixed a typo in swbd asr1 #1220 by @Shujian2015
- [Documentation] update notebook #1219 by @ShigekiKarita
Recipe
- [Recipe] Update VAIS1000 recipe RESULTS.md #1308 by @kan-bayashi
- [Recipe] Fix VAIS1000 recipe #1305 by @kan-bayashi
- [Recipe] Update CSMSC results #1299 by @kan-bayashi
- [Recipe] Add vais1000 recipe - Vietnamese TTS #1283 by @enamoria
- [Recipe] Add VIVOS recipe - Vietnamese ASR #1271 by @hieuthi
- [Recipe] Add JNAS tts1 recipe #1269 by @kan-bayashi
- [Recipe] Support Polish speakers in M-AILABS #1265 by @kan-bayashi
- [Recipe] Add TWEB recipe #1263 by @kan-bayashi
- [Recipe] Update M-AILABS results #1262 by @kan-bayashi
- [Recipe] Add CSMSC reicpe #1259 by @kan-bayashi
- [Recipe] Add JVS recipe #1258 by @kan-bayashi
- [Recipe] Add CMU Arctic recipes #1257 by @kan-bayashi
- [Recipe] Add M-AILABS pretrained models #1229 by @kan-bayashi
New Features
- [New Features] Add eval-interval-epochs for the tiny dataset #1306 by @kan-bayashi
- [New Features] ASR-based CER/WER eval for TTS #1190 by @potato-inoue
Enhancement
- [Enhancement] Add Mandarin Pretrained Wavenet #1292 by @kan-bayashi
- [Enhancement] Add pretrained models: JSUT and LibriTTS #1260 by @r9y9
- [Enhancement] Improved JSUT TTS recipe #1216 by @r9y9
Acknowledgements
Special thanks to @Emrys365, @ShigekiKarita, @Shujian2015, @enamoria, @hieuthi, @kan-bayashi, @potato-inoue, @r9y9.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi about 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.3
Bugfix
- [Bugfix] Fix a bug in building docker container #1197 by @protoget
- [Bugfix] fixed h5py version as 2.9.0 #1183 by @ruizhilijhu
- [Bugfix] Fix error on waveform generation by WaveNet #1170 by @r9y9
- [Bugfix] Sort nbest_hyps without limiting them to beam size #1157 by @elgeish
- [Bugfix] fix recursive make #1153 by @b-flo
- [Bugfix] missing file in iwslt19 #1147 by @sw005320
- [Bugfix] Wsj mix #1145 by @simpleoier
Enhancement
- [Enhancement] Install warp-ctc from PyPI #1196 by @ysk24ok
- [Enhancement] TTS: MoL WaveNet minor update #1195 by @r9y9
- [Enhancement] Transducer v1.2 #1173 by @b-flo
New Features
- [New Features] Add support for MoL WaveNet to synth_wav.sh #1186 by @r9y9
- [New Features] Using pytorch dataloader for pytorch backend #1138 by @bobchennan
Recipe
- [Recipe] dirha_wsj recipe #1179 by @ruizhilijhu
- [Recipe] Update Russian open STT recipe for v0.5 of the dataset #1160 by @akreal
- [Recipe] Blizzard recipe #1056 by @potato-inoue
Refactoring
- [Refactoring] Install warpctc-pytorch from pytorch-0.4 branch when PyTorch version is 0.4.X #1162 by @ysk24ok
- [Refactoring] using python3 as default #1159 by @zh794390558
- [Refactoring] Fix download_from gdrive.sh on osx #1158 by @r9y9
Documentation
- [Documentation] Fix doc/module2rst.py to use glob and remove --nowarn from travis-sphinx #1155 by @ShigekiKarita
Acknowledgements
Special thanks to @ShigekiKarita, @akreal, @b-flo, @bobchennan, @elgeish, @potato-inoue, @protoget, @r9y9, @ruizhilijhu, @simpleoier, @sw005320, @ysk24ok, @zh794390558.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.2
Documentation
- [Documentation] Clean up TTS module docstrings #1143 by @kan-bayashi
- [Documentation] update readme for warp-transducer #1125 by @sw005320
- [Documentation] Fix flake8 blacklist #1107 by @ShigekiKarita
Bugfix
- [Bugfix] Minor fix #1142 by @kan-bayashi
- [Bugfix] Fix apex error when opt == “noam" #1134 by @kan-bayashi
- [Bugfix] Fix model compatibility #1133 by @kan-bayashi
- [Bugfix] Fix backward compatibility problem in PositionalEncoding by adding pre-hook to ignore
self.pe#1127 by @ShigekiKarita - [Bugfix] fix iwslt19 recipe #1124 by @sw005320
- [Bugfix] Fix best validation perplexity LM averaging #1122 by @akreal
- [Bugfix] Fix bug in how2 asr1 #1117 by @hirofumi0810
- [Bugfix] Fix: wrong variable in greedy decode #1113 by @b-flo
- [Bugfix] Chainer fix mixed input #1096 by @Fhrozen
- [Bugfix] Fix deleted argument atype #1095 by @Fhrozen
- [Bugfix] Fix guided attention loss in Tacotron2 when reduction factor > 1 #1087 by @kan-bayashi
- [Bugfix] Fix multi gpu LM issues and add hdf5 LM dataset dump #1083 by @ShigekiKarita
Enhancement
- [Enhancement] Add stdout.pl for debugging version run.pl #1141 by @ShigekiKarita
- [Enhancement] Update recog_wav.sh #1140 by @kan-bayashi
- [Enhancement] Update spm_train and test it #1135 by @ShigekiKarita
- [Enhancement] Transducer v1.1 #1129 by @b-flo
- [Enhancement] Allow to extend the length of positional encoding at training and inference #1105 by @ShigekiKarita
- [Enhancement] Update batchfy.py #1104 by @zh794390558
- [Enhancement] Add PYTHONIOENCODING=UTF-8 in path.sh #1099 by @kan-bayashi
- [Enhancement] Improve batch decoding #980 by @takaaki-hori
- [Enhancement] Implement add_arguments method of E2E for rnn. #941 by @kamo-naoyuki
Recipe
- [Recipe] Update swbd #1137 by @sw005320
- [Recipe] Updated symlink in Librispeech #1130 by @kan-bayashi
- [Recipe] Add missing lines to iwslt19 LM training data #1126 by @hirofumi0810
- [Recipe] Add iwslt19 ASR recipe #1120 by @hirofumi0810
- [Recipe] How2 speech translation recipe #1102 by @hirofumi0810
- [Recipe] Must-C ASR recipe #1098 by @hirofumi0810
- [Recipe] Must-C speech translation corpus #1085 by @hirofumi0810
- [Recipe] Replace character-level recipe with the BPE one in iwslt18 #1079 by @hirofumi0810
- [Recipe] Fix swbd recipe v2 #1072 by @sw005320
- [Recipe] Updated REVERB multi-channel E2E recipe #1057 by @Xiaofei-Wang
New Features
- [New Features] Add --train-dtype option for float16/float32/float64 precision training in pytorch ASR and LM #1119 by @ShigekiKarita
- [New Features] transfer learning #1103 by @b-flo
- [New Features] New beam-search framework: ScorerInterface, CPU/GPU float16/32/64 decoding, and new language models (SeqRNNLM and TransformerLM) #1092 by @ShigekiKarita
- [New Features] Support pretrained WaveNet vocoder #1081 by @kan-bayashi
- [New Features] RNN-Transducer #1065 by @b-flo
Acknowledgements
Special thanks to @Fhrozen, @ShigekiKarita, @Xiaofei-Wang, @akreal, @b-flo, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @sw005320, @takaaki-hori, @zh794390558.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.1
Bugfix
- [Bugfix] Fix conda installation error #1076 by @kan-bayashi
- [Bugfix] Minor fix batchsize log when batchsize = 0 #1068 by @kan-bayashi
- [Bugfix] Fix spm decode #1062 by @ShigekiKarita
- [Bugfix] Minor fix to use fastspeech in synth_wav.sh #1061 by @kan-bayashi
- [Bugfix] Fix help message to enable line break #1059 by @kan-bayashi
- [Bugfix] Fix tensorboard interval in validation #1054 by @ShigekiKarita
- [Bugfix] Update E2E-ASR test #1041 by @kan-bayashi
- [Bugfix] Fix Loss Calculation #1039 by @Fhrozen
Refactoring
- [Refactoring] Remove unused conf #1070 by @kan-bayashi
- [Refactoring] [Reopen] Support default arguments #1067 by @kan-bayashi
- [Refactoring] Refactor E2E-TTS test #1042 by @kan-bayashi
CI
- [CI] Add TTS integration test #1069 by @kan-bayashi
- [CI] Make test smaller to speed up #1044 by @kan-bayashi
- [CI] Separate tasks in each job of circleci #1043 by @kan-bayashi
Recipe
- [Recipe] Add data augmentation to ami recipe #1066 by @Jzmo
- [Recipe] Update accum_grad for a single gpu in CSJ #1050 by @kan-bayashi
- [Recipe] add commonvoice recipe #1000 by @YosukeHiguchi
- [Recipe] REVERB multi-channel E2E recipe #985 by @Xiaofei-Wang
New Features
- [New Features] Support multi gpu in pytorch lm #1063 by @ShigekiKarita
Enhancement
- [Enhancement] Use librosa's fast Griffin-Lim #1058 by @kan-bayashi
- [Enhancement] Add option to select the integration type of speaker embedding #1047 by @kan-bayashi
- [Enhancement] update tedlium3 recipe with transformer #1037 by @ShigekiKarita
- [Enhancement] update tedlium2 config #1036 by @ShigekiKarita
- [Enhancement] Support of other recipe in recog_wav.sh #1026 by @hiratake55
Acknowledgements
Special thanks to @Fhrozen, @Jzmo, @ShigekiKarita, @Xiaofei-Wang, @YosukeHiguchi, @hiratake55, @kan-bayashi.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.5.0
CI
- [CI] Integration test with mini AN4 #1035 by @ShigekiKarita
- [CI] codecov support #850 by @ShigekiKarita
Bugfix
- [Bugfix] [Bug] Fix error calculator for report false #1032 by @Fhrozen
- [Bugfix] fix unk scoring #1002 by @sw005320
- [Bugfix] make tensorboard logging done every 100 iters #996 by @sw005320
Refactoring
- [Refactoring] TTS: avoid using asr module in TTS #1031 by @r9y9
- [Refactoring] Exit 1 when source command return 1 #1030 by @kan-bayashi
- [Refactoring] Refactor FileReaderWrapper and FileWriterWrapper #947 by @kamo-naoyuki
Enhancement
- [Enhancement] Use pypi sentencepiece #1029 by @ShigekiKarita
- [Enhancement] Add log of the inference speed of TTS models #1027 by @kan-bayashi
- [Enhancement] Add GPU decodable test for TTS modules #1025 by @kan-bayashi
- [Enhancement] Support multi-speaker FastSpeech #1006 by @kan-bayashi
- [Enhancement] Custom Training extensions for ASR chainer #1004 by @Fhrozen
- [Enhancement] Support multi-speaker Transformer #1001 by @kan-bayashi
- [Enhancement] RFC: Add keepalldataonmem option #999 by @r9y9
- [Enhancement] Support saving of attention weights and probability in decoding #995 by @kan-bayashi
- [Enhancement] Implement Fast Speech #848 by @kan-bayashi
- [Enhancement] Transformer Chainer #774 by @Fhrozen
- [Enhancement] Neural Machine Translation #563 by @hirofumi0810
Recipe
- [Recipe] fix bugs to make a swbd recipe run #1024 by @sw005320
- [Recipe] Add multi-speaker Transformer config in LibriTTS #1022 by @kan-bayashi
- [Recipe] Rename RESULTS to RESULTS.md #1021 by @kan-bayashi
- [Recipe] Clean LibriTTS RESULTS.md #1020 by @kan-bayashi
- [Recipe] Clean LJSPeech RESULTS.md #1019 by @kan-bayashi
- [Recipe] Update JSUT TTS RESULTS.md #1018 by @kan-bayashi
- [Recipe] Add Transformer config in JSUT #1009 by @kan-bayashi
- [Recipe] Update libri trans #949 by @hirofumi0810
- [Recipe] iwslt18 NMT recipe #937 by @hirofumi0810
- [Recipe] libri_trans NMT recipe #931 by @hirofumi0810
- [Recipe] Add fastspeech.v2 result #925 by @kan-bayashi
Documentation
- [Documentation] [Docstrings] Removing empty init files to avoid docs #1016 by @Fhrozen
- [Documentation] add egs info #1015 by @sw005320
- [Documentation] Update docstrings in espnet.nets.chainer_backend #974 by @Masao-Someki
- [Documentation] Reformat docstrings in espnet/asr #914 by @Masao-Someki
- [Documentation] Update TTS module’s docstrings and refactor some modules #898 by @kan-bayashi
Acknowledgements
Special thanks to @Fhrozen, @Masao-Someki, @ShigekiKarita, @hirofumi0810, @kamo-naoyuki, @kan-bayashi, @r9y9, @sw005320.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.4.3
Enhancement
- [Enhancement] Use queue-freegpu.pl in all cmd.sh #1013
Documentation
Recipe
Bugfix
- [Bugfix] fix Cupy Import Error #969 #1010
- [Bugfix] Fix a bug in synthesis_wav.sh #989
- [Bugfix] Fix lmnaverage in lang_model #988
Refactoring
- [Refactoring] Remove "free-gpu" from *_train and create queue-freegpu.pl #938
CI
- [ci] reduce travis jobs #1011
Acknowledgements
Special thanks to @Fhrozen @kamo-naoyuki @Magic-Bubble @ShigekiKarita @takenori-y @Xiaofei-Wang.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.4.2
Bugfix
- [Bugfix] Fix pytorch LM GPU training without cupy #981
- [Bugfix] make tensorboard logging done every 100 iters #966
- [Bugfix] FiX ER calculator #955
- [Bugfix] Fix a typo bug in computing guided attention loss #956
- [Bugfix] run.sh should exit if sourcing path.sh return error #954
Recipe
- [Recipe] Update Librispeech recipe #970
- [Recipe] New RNN and Transformer result of AMI recipe(ihm) #978
- [Recipe] BPE support for SwitchBoard & Transformer config #909
- [Recipe] Update li10 #965
- [Recipe] Update libri trans #949
Enhancement
- [Enhancement] transform: expose pad_mode for logmelspectrogram #957
Acknowledgements
Special thanks to @Fhrozen, @geekboood, @hirofumi0810, @Jzmo, @naxingyu, @r9y9, @ShigekiKarita.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.4.1
Bugfix
- [Bugfix] Fix a bug in calculateallattentions #862
- [Bugfix] Fix bugs in frontend #875
- [Bugfix] Fix grad noise v2 #912
- [Bugfix] Fix plot fail #913
- [Bugfix] Fix tgz typo #892
- [Bugfix] Fix: Output dimension of Conv2dSubsampling #822 #921
- [Bugfix] Fix: espnet/transform/transformation.py #866
- [Bugfix] Fixed certain typos #893
- [Bugfix] Modified if conditions #908
- [Bugfix] fix bugs in grad noise #886
- [Bugfix] CER/WER & CER_CTC in Transformer pytorch #936
- [Bugfix] Update iwslt18 recipe #808
Documentation
- [Documentation] Add model link #899
- [Documentation] Document espnet tools and modules #884
- [Documentation] Fix typo #930
- [Documentation] Reformat docstrings in espnet/asr #914
- [Documentation] Update CONTRIBUTING.md #880
- [Documentation] add recipe related documentations to CONTRIBUTING.md #872
- [Documentation] skip ci when gh-pages is deployed #901
- [Documentation] use only conda to build doc #895
Enhancement
- [Enhancement] Script for docker builds from the local repo #877
- [Enhancement] Demo script for TTS #871
- [Enhancement] Fix plot attention for chainer transformer #940
- [Enhancement] Implement Fast Speech #848
- [Enhancement] Move the dependency links to github from Makefile to setup.py #858
- [Enhancement] Support new version in Docker containers #836
- [Enhancement] gradient noise injection from std normal dis #881
- [Enhancement] [Discussion] Create show_result.sh #874
Recipe
- [Recipe] Add Jsut asr recipe #793
- [Recipe] AURORA4 RESULTS.md file #835
- [Recipe] Add Librispeech French corpus #882
- [Recipe] Add transformer config in m_ailabs/tts1 recipe #924
- [Recipe] Change librispeechfrench to libritrans #903
- [Recipe] Fix: utils/show_result.sh #915
- [Recipe] Minor update for speech translation recipe #907
- [Recipe] Transformer for CHiME4 Single Channel #837
- [Recipe] Update LJSpeech RESULTS.md #861
- [Recipe] Update LJSpeech RESULTS.md #887
- [Recipe] Update Librispeech recipe #885
- [Recipe] Update fisher callhome spanish for speech translation #868
- [Recipe] libri_trans NMT recipe #931
Refactoring
- [Refactoring] Refactor TTS Transformer #865
- [Refacotring] test: avoid using grep and sed in subprocess and use python stdlib instead #854
- [Refactoring] Update TTS module’s docstrings and refactor some modules #898
Acknowledgements
Special thanks to @27jiangziyan, @Fhrozen, @Masao-Someki, @ShigekiKarita, @SuperGops7, @creatorscan, @hirofumi0810, @kamo-naoyuki, @lumaku, @naxingyu, @r9y9, @simpleoier, @takenori-y.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.4.0
New features and improvements
- E2E Mulchi channels system #596
- Changed to use pip-install for pytorch_wpe #843
- Transformer
- ASR chainer #655
- ASR pytorch #690
- TTS pytorch #752
- Specaugment #734 #745 #754
- Streaming attention encoder-decdoer E2E-ASR #757
- Offline recognition demo #809
- New batch making strategies #759
- Guided Attention Loss #816
Important changes
- drop python2 support
- use
utils/fix_data_dir.shas default #660 - CPU-only installation #677 #687 #704
- fix to use python2 as default in travis #685
- add CUDA_VERSION in Makefile #687
- use Pytorch 1.0.1 as default #721
- use
yamlformat configuration file #722 - modularize TTS components #746 #815
- use Chainer/Cupy 6.0.0 as default #753
- reinforce CI #763
- Google drive downloader #798
- New scripts to pack model and get system info #790 #802
- change the scoring in multi-speaker case from shell to python #805
- update patience in TTS recipes #817
n_averageoption in TTS #823- update TTS recipes to use config files #780
- make
ngpu=1as default for all of the recipes #800 - deprecate
egs/librispeech/tts1recipe #806 - maintain the pytorch warp-ctc under espnet #838
New recipes
- AURORA4 #722 #770 #824
- JNAS #725
- LibriTTS #795
- Tedlium release3 #739
- added the model link and missing files #831
- TIMIT #698
- Russian Open STT #768
Recipe updates
- Aishell
- support Transformer #827
- fix the indent of RESULTS.md in the aishell recipe #828
- CSJ
- support Transformer #737 #742 #782
- HKUST
- support Transformer #840
- IWSLT18
- add missing files for iwslt18 recipe #767
- Librispeech
- support Transformer #781
- LJSpeech
- added more samples #825 #842
- support Transformer #752
- Tedlium release2
- support word LM in TEDLIUM recipe #683
- fix duplicated line in tedlium recipe #714
- fix a bug in the TEDLIUM recipe #771
- support Transformer #803
- Voxforge
- bugfix in voxforge #684
- unify rnn and transformer recipes for the voxforge task #769
- support Transformer #758
- update config files in the voxforge recipe #783
- WSJ
- support Specaugment #745
- support Transformer #655 #690
Documentation
- add citation bibtex entry for ESPnet #676
- add NACCL paper repliation link for CMU Wilderness Multilingual Speech Dataset #717 #731
- update library information #789
- Add table of contents #812
- add GPU decoding document Documentation #813
- minibatch explanation #821
Bugfix
- fix recognizebatch for 2d, locationreccurent, multi-head attentions for #665 and add test #681
- fix CER/WER calculation during training #678
- add version check for matplotlib installation #679
- make sure
hlensis tensor in recognize_batch #680 - fix choice between pytorch and pytorch-cpu #702
- fix
merge_jsonbehavior (#699) when no labels for #708 - fix
check_install.py#728 - use
ensure_ascii=Falseto make json human-readable #730 - Fix argument name for SummaryWriter #747
- use scikit-learn 0.20 #749
- fix pytorch for chainer v6.0.0 #772
- fix model compatibility #799
- fix minor typos in the recipes #801
- bug fix:
egs/chime4/asr1_multich/conf/train.yaml#826 - bug fix:
espnet/utils/training/batchfy.py#833 - fix to use sentencepiece v.0.1.82 #839
Acknowledegements
Special thanks to @27jiangziyan, @akreal, @bobchennan, @creatorscan, @danoneata, @Fhrozen, @gtache, @hirofumi0810, @jan-schuchardt, @jnishi, @kamo-naoyuki, @Masao-Someki, @oadams, @simpleoier, @sknadig, @ShigekiKarita, @takenori-y
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 6 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet Version 0.3.1 (stable)
New improvements
- Add instant speech recognition #581
- Add CTC greedy decoding CER monitor #587
- Add Streaming encoder #638
- Add Uni-directional encoder #624 #629
- Add model compatibility test #615 #649
- Update fishercallhomespanish recipe #625
- Improve swbd scoring #614 #620
- Improve memory usage in json merge script #579
- Improve background job failure check in decoding state #627 #643 #648
- Separate installation of basic tools and extra tools #628
Bugfix
- Fix CTC type selection #617 #618
- Fix MultiProcessIterator #613
- Fix chainer sortgrad bug
- Fix installer #594 #595 #604 #609 #622
- Fix WSJ-mix recipe #610 #630 #641
- Fix remove_longshortdata.sh #646
Thank you for a lot of contributions @kamo-naoyuki, @gtache, @simpleoier, @takenori-y, @Fhrozen, @JaejinCho, @pzelasko, @zh794390558, @kan-bayashi, @sw005320.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 7 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet v.0.3.0 beta
New features and improvements
- Support Pytorch 1.0 #553
- Support the use of Tensorboard #506
- Support early stopping #508
- Support
stop_stageoption #539 - Support sortgrad #550
- Add GRU architecture #496
- Add GPU batch decoding #318
- Support HDF5 format instead of kaldi ark #412 #493
- Add speech separation recipe #531
- Add TTS recipes (German, Spanish, Italy, Japanese...) #562 #569 #519
- Add ASR recipes #574 #519
- Improve ASR recipes #491 #521 #546 #435 #467 #469
- Improve speech translation recipes #468
- Improve Python2/3 compatibility #567
- Improve cmd.sh usage #538 #547
- Add test scripts for shell scripts #484 #498
- Change to use conda with Python3.7 as default #567
- Python code modularization #440 #484
We really appreciate a lot of contributions, @gtache, @kamo-naoyuki, @hirofumi0810, @ShigekiKarita, @takenori-y, @simpleoier, @Fhrozen, @sas91, @mn5k, @JaejinCho. @Xiaofei-Wang, @jnishi, @Magic-Bubble.
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi almost 7 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet v.0.2.0 (Major update)
New feature and improvement
- add data prefetch #340
- add new recipes
- IWSLT speech translation recipe #325
- REVERB challenge recipe #359
- add test codes
- for checking warp ctc behaviors in the multitask mode #369
- for a multiple GPU #362
- for a single GPU #376
- for read/write models #362 #376
- add check script for python library installation #373 #389
- improve some ASR baseline recipes by using a shallow and wide BLSTM encoder and subwords
- librispeech #354 #386
- CSJ #326
- HKUST #366
Important changes
- fix to use PyTorch 0.4.1 (stop to support PyTorch 0.3.x) #332
- rename some functions
e2e_asr_attctc.py->e2e_asr.pye2e_asr_attctc_th.py->e2e_asr_th.py
- change the format of model.conf from pickle to JSON #342
- remove deprecated options #336
- unify the data converter with TTS one #343
- unify model variable arguments between TTS and ASR #337
- fix pytorch backend snapshot functions including the save of optimizers #362
- avoid to use
feat-to-len. Usewrite_utt2num_frames=true, and read utt2num instead of executingfeat-to-len#339 - refacor
asr_pytorch.pyandasr_chainer.py.- refactor the recog part in asrchainer.py and asrpytorch especially after it gets nbest. #370
- make
nets/e2e_common.py, and move some common functions there
Bug fix
- warpctc gradient scaling (Thanks @jnishi)
- warpctc multi-gpu bug (Thanks @jnishi)
- undefined gpuid bug in cpu RNN training #379
- no hypothesis bug #378
- Python3 compatibility #375 #341 (Thanks @akreal)
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 7 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet v.0.1.5 (minor update)
- update the Librispeech ASR recipe and use subword modeling as default.
- attached Librispeech ASR model (librispeech_asr1.tgz):
- RNNLM:
exp/train_rnnlm_2layer_bs256_unigram2000/rnnlm.model.best - ASR models:
exp/train_960_vggblstm_e4_subsample1_2_2_1_1_unit1024_proj1024_d1_unit1024_location1024_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_unigram2000/results/{model.acc.best,model.conf} - performance:
- RNNLM:
| | WER (%) | |-----------|:----:| | Librispeech devclean | 5.0 | | Librispeech testclean | 5.0 |
-
- when we use the above models, please insert the ASR model directory (expdir) and RNNLM model directory (lmexpdir) in run.sh as follows:
```
expdir=exp/train960vggblstme4subsample12211unit1024proj1024d1unit1024location1024aconvc10aconvf100mtlalpha0.5adadeltabs30mli800mlo150unigram2000
lmexpdir=exp/trainrnnlm2layerbs256_unigram2000
${decode_cmd} JOB=1:${nj} ${expdir}/${decode_dir}/log/decode.JOB.log \
asr_recog.py \
--ngpu ${ngpu} \
--backend ${backend} \
--recog-json ${feat_recog_dir}/split${nj}utt/data_${bpemode}${nbpe}.JOB.json \
--result-label ${expdir}/${decode_dir}/data.JOB.json \
--model ${expdir}/results/model.${recog_model} \
--model-conf ${expdir}/results/model.conf \
--beam-size ${beam_size} \
--penalty ${penalty} \
--maxlenratio ${maxlenratio} \
--minlenratio ${minlenratio} \
--ctc-weight ${ctc_weight} \
--rnnlm ${lmexpdir}/rnnlm.model.best \
--lm-weight ${lm_weight} \
```
Scientific Software - Peer-reviewed
- Python
Published by sw005320 over 7 years ago
Software Design and User Interface of ESPnet-SE++ - ESPnet v.0.1.4
- Added TTS recipe based on Tacotron2
egs/ljspeech/tts1 - Extended the above TTS recipe to multispeaker TTS
egs/librispeech/tts1/ - Supported PyTorch 0.4.0
- Added word level decoding
- (Finally) fixed CNN (VGG) layer issues in PyTorch
- Fixed warp CTC scaling issues in PyTorch
- Added subword modeling based on sentence piece toolkit
- Many bug fix
- Updated CSJ performance
Scientific Software - Peer-reviewed
- Python
Published by sw005320 over 7 years ago
Software Design and User Interface of ESPnet-SE++ - stable version for jsalt18 summer school
- bug fix
- improve the jsalt18e2e recipe
- improve the JSON format
- simplify Makefile
Scientific Software - Peer-reviewed
- Python
Published by sw005320 over 7 years ago
Software Design and User Interface of ESPnet-SE++ - Change JSON format and use feature compression
- change the JSON format to deal with multiple inputs and outputs
- use feature compression to reduce the data I/O
Scientific Software - Peer-reviewed
- Python
Published by sw005320 over 7 years ago
Software Design and User Interface of ESPnet-SE++ - Added attention visualization and jsalt18e2e recipe, and refined Librispeech recipe
Support attention visualization.
- Added
PlotAttentionReportwhich save attention weight as figure for each epoch. - Refactored test script
test_e2e_modelto check various attention functions
Added JSALT18 end-to-end ASR recipe
Refined the Librispeech recipe - Removed long utterances during training - Added RNNLM
Scientific Software - Peer-reviewed
- Python
Published by kan-bayashi over 7 years ago
Software Design and User Interface of ESPnet-SE++ - First (test) release
First release. - CTC, attention-based encoder-decoder, and hybrid CTC/attention based end-to-end ASR - Fast/accurate training with CTC/attention multitask training - CTC/attention joint decoding to boost monotonic alignment decoding - Encoder: VGG-like CNN + BLSTM or pyramid BLSTM - Attention: Dot product, location-aware attention, variants of multihead (pytorch only) - Incorporate RNNLM/LSTMLM trained only with text data - Flexible network architecture thanks to chainer and pytorch - Kaldi style complete recipe - Support numbers of ASR benchmarks (WSJ, Switchboard, CHiME-4, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, etc.) - State-of-the-art performance in Japanese/Chinese benchmarks (comparable/superior to hybrid DNN/HMM and CTC+FST) - Moderate performance in standard English benchmarks - Support multiple GPU training
Scientific Software - Peer-reviewed
- Python
Published by sw005320 over 7 years ago