https://github.com/artificialzeng/asrframe

An Automatic Speech Recognition Frame ,一个中文语音识别的完整框架, 提供了多个模型

https://github.com/artificialzeng/asrframe

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

An Automatic Speech Recognition Frame ,一个中文语音识别的完整框架, 提供了多个模型

Basic Info
  • Host: GitHub
  • Owner: ArtificialZeng
  • License: apache-2.0
  • Default Branch: master
  • Homepage:
  • Size: 8.96 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of sailist/ASRFrame
Created about 6 years ago · Last pushed over 6 years ago

https://github.com/ArtificialZeng/ASRFrame/blob/master/

# ASRFrame
- 10
- ...

# 

201987

201987`./util/dicts/errdict.json`

2019821

2020123

# 
https://github.com/sailist/ASRFrame

UI



# 
- 
- 10
- 
- 

# 
- 80%100%100%
- 

# 
Pythonrealease

## 
Python
- Distance (>=0.1.3)
- jieba (>=0.39)
- Keras (>=2.2.4)
- librosa (>=0.6.3)
- numpy (>=1.16.2)
- pypinyin (>=0.35.3)
- python-speech-features (>=0.6)
- scipy (>=1.2.1)
- tensorflow (>=1.13.1)
- thulac (>=0.2.0)
- pydub (>=0.23.1)

## 
```bash
pip install -r requirement.txt
```

## 


### 


#### THCHS30
40CSLT

[data_thchs30.tgz](https://openslr.org/18/)

#### Free ST Chinese Mandarin Corpus
10100855

[ST-CMDS-20170001_1-OS.tar.gz](https://openslr.org/38/)
#### AISHELL
17840095

[data_aishell.tgz](https://openslr.org/33/)

#### Primewords Chinese Corpus Set 1
1002969895JSON

[primewords_md_2018_set1.tar.gz](https://openslr.org/47/)

#### Aidatatang_200zh
200()Android16kHz16iOS16kHz1698

[aidatatang_200zh.tgz](https://openslr.org/62/)


### 
`config.py`

### 

- wavwavaishell
- 
- 



50



```bash
python run_clean.py
```

`run_create_dict.py`

> PS1: pypinyin

> PS2:


### 
```bash
python run_summary.py
```

```text
start to summary the Thchs30 dataset
checked 13375 wav files:/data/voicerec/dataset/dataset/thchs30-openslr/data_thchs30/data/D6_938.wavv
max audio len = 261000, max timestamp = (281, 603) ,min audio len = 71424, sample = 16000
checked 13375 label files:/data/voicerec/dataset/dataset/thchs30-openslr/data_thchs30/data/D6_938.wav.trnn
max label len = 48, min label len = 19, pinpin coverage:1208
result from 13376 sample, used 3.7486759999999997 sec
Load pinyin dict. Max index = 1436.

start to summary the AiShell dataset
checked 141599 wav files:/data/voicerec/ALShell-1/data_aishell/wav/train/S0003/BAC009S0003W0427.wav
max audio len = 235199, max timestamp = (281, 544) ,min audio len = 19680, sample = 16000
checked 141599 label files:/data/voicerec/ALShell-1/data_aishell/wav/train/S0003/BAC009S0003W0427.txt
max label len = 44, min label len = 1, pinpin coverage:1196
result from 141600 sample, used 98.877352 sec
Load pinyin dict. Max index = 1436.

start to summary the Primewords dataset
checked 50369 wav files:/data/voicerec/Primewords Chinese Corpus Set 1/primewords_md_2018_set1/audio_files/5/57/5732d955-b4f4-41a4-b60f-32b42da573af.wav
max audio len = 320640, max timestamp = (281, 741) ,min audio len = 21120, sample = 16000
checked 50369 label files:/data/voicerec/Primewords Chinese Corpus Set 1/primewords_md_2018_set1/audio_files/5/57/5732d955-b4f4-41a4-b60f-32b42da573af.txt
max label len = 35, min label len = 1, pinpin coverage:1231
result from 50370 sample, used 43.464597 sec
Load pinyin dict. Max index = 1436.

start to summary the ST_CMDS dataset
checked 102572 wav files:/data/voicerec/Free ST Chinese Mandarin Corpus/ST-CMDS-20170001_1-OS/20170001P00085A0053.wav
max audio len = 160416, max timestamp = (281, 371) ,min audio len = 19200, sample = 16000
checked 102572 label files:/data/voicerec/Free ST Chinese Mandarin Corpus/ST-CMDS-20170001_1-OS/20170001P00085A0053.txt
max label len = 22, min label len = 1, pinpin coverage:1194
result from 102573 sample, used 73.52233999999999 sec
Load pinyin dict. Max index = 1436.

start to summary the Z200 dataset
checked 231663 wav files:/data/voicerec/z200/G1428/session01/T0055G1428S0034.wav
max audio len = 348935, max timestamp = (281, 807) ,min audio len = 13811, sample = 16000
checked 231663 label files:/data/voicerec/z200/G1428/session01/T0055G1428S0034.txt
max label len = 43, min label len = 1, pinpin coverage:1182
result from 231664 sample, used 164.35475000000002 sec
```

### 
`run_train.py`
```bash
python run_train.py
```
[](acoustic/README.md)

DCBNN1D
```python
import config
from acoustic.ABCDNN import DCBNN1D
from util.reader import Thchs30

thchs = Thchs30(config.thu_datapath)

DCBNN1D.train([thchs],)
# config.model_dir
DCBNN1D.train([thchs],config.join_model_path("./DCBNN1D_step_326000.h5"))
```

### 
`real_predict()``run_real_predict.py`
```bash
python run_real_predict.py
```


## 
### 
#### wiki
104(1,043,224; 1.6G519M2019.2.7)

[1.json(wiki2019zh)](https://github.com/brightmart/nlp_chinese_corpus)


### 
2019716

wiki:
>
```bash
python run_build_corpus.py
```


```bash
cd path/to/wiki_corpus/
mkdir splits
for i in $(find -name '*.txt');do echo $i;split -100000 $i ./splits/$i;done
```
`path/to/wiki_corpus/splits`

3000w

### 
`run_train.py`

### 


## 
`./jointly/``real_predict()`

## UI
UI`DCSOM`
```bash
python run_ui.py
```
> UI


## 
release

- DCBNN1D,`DCBNN1D_cur_best.h5`
- SOMMalpha,`SOMMalpha_step_18000.h5`


## 
acousticlanguagecompiletrain


- Voiceloader(xs,ys,feature_len,label_len),placeholder
> placeholderlossctcloss



# 

## acoustic
`README.md`



## core
- attention
- base_model
- ctc_functionlossdecodekeraslayerLambda
- glu
- layer norm
- muti_gpugpu
- positional embeddingTransformer

## featurebatch
- MelFeature5[ASRT](https://github.com/nl8590687/ASRT_SpeechRecognition)

## language
`README.md`

## jointly
`README.md`

## util
- audiotool
- callbackskeras
- dataset
- evaluate
- mapmap-index-index-indexlistbatch
- number_convert...
- readerkerasSequence
- ...

## visualizationUI
- 



# 


![image/ui.png](image/ui.png)

## [](acoustic/README.md)
## [](language/README.md)


# 

## github
- https://github.com/pwxcoo/chinese-xinhua
- https://github.com/mozillazg/pinyin-data
- https://github.com/mozillazg/phrase-pinyin-data
- https://github.com/SophonPlus/ChineseNlpCorpus
- https://github.com/crownpku/Awesome-Chinese-NLP
- https://github.com/brightmart/nlp_chinese_corpus

- https://github.com/mozillazg/python-pinyin

- https://github.com/shibing624/pycorrector
- pyimehttps://github.com/fxsjy/pyime
- https://github.com/crownpku/Somiao-Pinyin
- https://github.com/letiantian/Pinyin2Hanzi

- https://github.com/libai3/masr
- https://github.com/xxbb1234021/speech_recognition
- https://github.com/nl8590687/ASRT_SpeechRecognition
- https://github.com/Deeperjia/tensorflow-wavenet

## 
- Language Modeling with Gated Convolutional Networkshttps://arxiv.org/abs/1612.08083
- Attention Is All You Needhttps://arxiv.org/abs/1706.03762
- Highway Networkshttps://arxiv.org/abs/1505.00387
- Fast and Accurate Entity Recognition with Iterated Dilated Convolutionshttps://arxiv.org/abs/1702.02098
- Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.6306&rep=rep1&type=pdf
- Listen, Attend and Spellhttps://arxiv.org/abs/1508.01211
- WaveNet: A Generative Model for Raw Audiohttps://arxiv.org/abs/1609.03499

# 
## 
5thchs30555'de''de5'

## 
201971750


## SOMM
SOMM50000batchcore dump



linuxsplit


## CTCloss
...

# TODO list
- 
- 
- 
- TextLoader
- UI
- 
- loss


# 
201952220196192019713TODO list



Python

Owner

  • Name: Dr. Artificial曾小健
  • Login: ArtificialZeng
  • Kind: user
  • Location: Beijing

LLM practitioner/engineer, AI/ML/DL Quant

GitHub Events

Total
Last Year