https://github.com/artificialzeng/asrframe
An Automatic Speech Recognition Frame ,一个中文语音识别的完整框架, 提供了多个模型
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (4.4%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
An Automatic Speech Recognition Frame ,一个中文语音识别的完整框架, 提供了多个模型
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of sailist/ASRFrame
Created about 6 years ago
· Last pushed over 6 years ago
https://github.com/ArtificialZeng/ASRFrame/blob/master/
# ASRFrame
- 10
- ...
#
201987
201987`./util/dicts/errdict.json`
2019821
2020123
#
https://github.com/sailist/ASRFrame
UI
#
-
- 10
-
-
#
- 80%100%100%
-
#
Pythonrealease
##
Python
- Distance (>=0.1.3)
- jieba (>=0.39)
- Keras (>=2.2.4)
- librosa (>=0.6.3)
- numpy (>=1.16.2)
- pypinyin (>=0.35.3)
- python-speech-features (>=0.6)
- scipy (>=1.2.1)
- tensorflow (>=1.13.1)
- thulac (>=0.2.0)
- pydub (>=0.23.1)
##
```bash
pip install -r requirement.txt
```
##
###
#### THCHS30
40CSLT
[data_thchs30.tgz](https://openslr.org/18/)
#### Free ST Chinese Mandarin Corpus
10100855
[ST-CMDS-20170001_1-OS.tar.gz](https://openslr.org/38/)
#### AISHELL
17840095
[data_aishell.tgz](https://openslr.org/33/)
#### Primewords Chinese Corpus Set 1
1002969895JSON
[primewords_md_2018_set1.tar.gz](https://openslr.org/47/)
#### Aidatatang_200zh
200()Android16kHz16iOS16kHz1698
[aidatatang_200zh.tgz](https://openslr.org/62/)
###
`config.py`
###
- wavwavaishell
-
-
50
```bash
python run_clean.py
```
`run_create_dict.py`
> PS1: pypinyin
> PS2:
###
```bash
python run_summary.py
```
```text
start to summary the Thchs30 dataset
checked 13375 wav files:/data/voicerec/dataset/dataset/thchs30-openslr/data_thchs30/data/D6_938.wavv
max audio len = 261000, max timestamp = (281, 603) ,min audio len = 71424, sample = 16000
checked 13375 label files:/data/voicerec/dataset/dataset/thchs30-openslr/data_thchs30/data/D6_938.wav.trnn
max label len = 48, min label len = 19, pinpin coverage:1208
result from 13376 sample, used 3.7486759999999997 sec
Load pinyin dict. Max index = 1436.
start to summary the AiShell dataset
checked 141599 wav files:/data/voicerec/ALShell-1/data_aishell/wav/train/S0003/BAC009S0003W0427.wav
max audio len = 235199, max timestamp = (281, 544) ,min audio len = 19680, sample = 16000
checked 141599 label files:/data/voicerec/ALShell-1/data_aishell/wav/train/S0003/BAC009S0003W0427.txt
max label len = 44, min label len = 1, pinpin coverage:1196
result from 141600 sample, used 98.877352 sec
Load pinyin dict. Max index = 1436.
start to summary the Primewords dataset
checked 50369 wav files:/data/voicerec/Primewords Chinese Corpus Set 1/primewords_md_2018_set1/audio_files/5/57/5732d955-b4f4-41a4-b60f-32b42da573af.wav
max audio len = 320640, max timestamp = (281, 741) ,min audio len = 21120, sample = 16000
checked 50369 label files:/data/voicerec/Primewords Chinese Corpus Set 1/primewords_md_2018_set1/audio_files/5/57/5732d955-b4f4-41a4-b60f-32b42da573af.txt
max label len = 35, min label len = 1, pinpin coverage:1231
result from 50370 sample, used 43.464597 sec
Load pinyin dict. Max index = 1436.
start to summary the ST_CMDS dataset
checked 102572 wav files:/data/voicerec/Free ST Chinese Mandarin Corpus/ST-CMDS-20170001_1-OS/20170001P00085A0053.wav
max audio len = 160416, max timestamp = (281, 371) ,min audio len = 19200, sample = 16000
checked 102572 label files:/data/voicerec/Free ST Chinese Mandarin Corpus/ST-CMDS-20170001_1-OS/20170001P00085A0053.txt
max label len = 22, min label len = 1, pinpin coverage:1194
result from 102573 sample, used 73.52233999999999 sec
Load pinyin dict. Max index = 1436.
start to summary the Z200 dataset
checked 231663 wav files:/data/voicerec/z200/G1428/session01/T0055G1428S0034.wav
max audio len = 348935, max timestamp = (281, 807) ,min audio len = 13811, sample = 16000
checked 231663 label files:/data/voicerec/z200/G1428/session01/T0055G1428S0034.txt
max label len = 43, min label len = 1, pinpin coverage:1182
result from 231664 sample, used 164.35475000000002 sec
```
###
`run_train.py`
```bash
python run_train.py
```
[](acoustic/README.md)
DCBNN1D
```python
import config
from acoustic.ABCDNN import DCBNN1D
from util.reader import Thchs30
thchs = Thchs30(config.thu_datapath)
DCBNN1D.train([thchs],)
# config.model_dir
DCBNN1D.train([thchs],config.join_model_path("./DCBNN1D_step_326000.h5"))
```
###
`real_predict()``run_real_predict.py`
```bash
python run_real_predict.py
```
##
###
#### wiki
104(1,043,224; 1.6G519M2019.2.7)
[1.json(wiki2019zh)](https://github.com/brightmart/nlp_chinese_corpus)
###
2019716
wiki:
>
```bash
python run_build_corpus.py
```
```bash
cd path/to/wiki_corpus/
mkdir splits
for i in $(find -name '*.txt');do echo $i;split -100000 $i ./splits/$i;done
```
`path/to/wiki_corpus/splits`
3000w
###
`run_train.py`
###
##
`./jointly/``real_predict()`
## UI
UI`DCSOM`
```bash
python run_ui.py
```
> UI
##
release
- DCBNN1D,`DCBNN1D_cur_best.h5`
- SOMMalpha,`SOMMalpha_step_18000.h5`
##
acousticlanguagecompiletrain
- Voiceloader(xs,ys,feature_len,label_len),placeholder
> placeholderlossctcloss
#
## acoustic
`README.md`
## core
- attention
- base_model
- ctc_functionlossdecodekeraslayerLambda
- glu
- layer norm
- muti_gpugpu
- positional embeddingTransformer
## featurebatch
- MelFeature5[ASRT](https://github.com/nl8590687/ASRT_SpeechRecognition)
## language
`README.md`
## jointly
`README.md`
## util
- audiotool
- callbackskeras
- dataset
- evaluate
- mapmap-index-index-indexlistbatch
- number_convert...
- readerkerasSequence
- ...
## visualizationUI
-
#

## [](acoustic/README.md)
## [](language/README.md)
#
## github
- https://github.com/pwxcoo/chinese-xinhua
- https://github.com/mozillazg/pinyin-data
- https://github.com/mozillazg/phrase-pinyin-data
- https://github.com/SophonPlus/ChineseNlpCorpus
- https://github.com/crownpku/Awesome-Chinese-NLP
- https://github.com/brightmart/nlp_chinese_corpus
- https://github.com/mozillazg/python-pinyin
- https://github.com/shibing624/pycorrector
- pyimehttps://github.com/fxsjy/pyime
- https://github.com/crownpku/Somiao-Pinyin
- https://github.com/letiantian/Pinyin2Hanzi
- https://github.com/libai3/masr
- https://github.com/xxbb1234021/speech_recognition
- https://github.com/nl8590687/ASRT_SpeechRecognition
- https://github.com/Deeperjia/tensorflow-wavenet
##
- Language Modeling with Gated Convolutional Networkshttps://arxiv.org/abs/1612.08083
- Attention Is All You Needhttps://arxiv.org/abs/1706.03762
- Highway Networkshttps://arxiv.org/abs/1505.00387
- Fast and Accurate Entity Recognition with Iterated Dilated Convolutionshttps://arxiv.org/abs/1702.02098
- Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.6306&rep=rep1&type=pdf
- Listen, Attend and Spellhttps://arxiv.org/abs/1508.01211
- WaveNet: A Generative Model for Raw Audiohttps://arxiv.org/abs/1609.03499
#
##
5thchs30555'de''de5'
##
201971750
## SOMM
SOMM50000batchcore dump
linuxsplit
## CTCloss
...
# TODO list
-
-
-
- TextLoader
- UI
-
- loss
#
201952220196192019713TODO list
Python
Owner
- Name: Dr. Artificial曾小健
- Login: ArtificialZeng
- Kind: user
- Location: Beijing
- Website: https://blog.csdn.net/sinat_37574187?type=blog
- Repositories: 171
- Profile: https://github.com/ArtificialZeng
LLM practitioner/engineer, AI/ML/DL Quant