https://github.com/bestsongc/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regula
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.1%) to scientific vocabulary
Repository
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regula
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
https://github.com/Bestsongc/micronet/blob/master/
# micronet
*""*
##
[](https://pypi.org/project/micronet) 
*micronet, a model compression and deploy lib.*
###
- High-Bit(>2b): QAT, PTQ, QAFT; Low-Bit(2b)/Ternary and Binary: QAT
-
- (A)BN(BN > convb)
- High-BitBN(BN > convwb)
###
- TensorRT(fp32/fp16/int8(ptq-calibration)op-adapt(upsample)dynamic_shape)
##

```
micronet
__init__.py
base_module
__init__.py
op.py
compression
README.md
__init__.py
pruning
README.md
__init__.py
gc_prune.py
main.py
models_save
models_save.txt
normal_regular_prune.py
quantization
README.md
__init__.py
wbwtab
__init__.py
bn_fuse
bn_fuse.py
bn_fused_model_test.py
models_save
models_save.txt
main.py
models_save
models_save.txt
quantize.py
wqaq
__init__.py
dorefa
__init__.py
main.py
models_save
models_save.txt
quant_model_test
models_save
models_save.txt
quant_model_para.py
quant_model_test.py
quantize.py
iao
__init__.py
bn_fuse
bn_fuse.py
bn_fused_model_test.py
models_save
models_save.txt
main.py
models_save
models_save.txt
quantize.py
data
data.txt
deploy
README.md
__init__.py
tensorrt
README.md
__init__.py
calibrator.py
eval_trt.py
models
__init__.py
models_trt.py
models_save
calibration_seg.cache
test_trt.py
util_trt.py
models
__init__.py
nin.py
nin_gc.py
resnet.py
readme_imgs
code_structure.jpg
micronet.xmind
```
##
- **2019.12.4**,
- **12.8**, DoReFa(A)(* 0.1)
- **12.11**,
- 12.12,
- 12.14, :1BN(W/)W/; 2BN(conv)(bias)
- **12.17**, ()
- 12.20, (cpugpu())
- **12.27**,
- 12.29, High-Bit8-bit10-bit16-bit
- **2020.2.17**, 1W/; 2W
- **2.18**, (A)BN:BNgammaBN
- **2.24**, /Convcompression/quantization/wbwtab/models/util_wbwtab.pyQuantConv2dnin_gc.py
- **3.1**, 1googleHigh-Bit; 2High-BitBN
- **3.23.3**, Conv(FCdorefa)models/util_wxax.pyQuantConv2d(QuantLinear)nin_gc.py
- **3.4**, wbwtab/bn_fuse(A)BNBN(//())
- 3.11, compression/wqaq/iaoBNmomentum(0.1 > 0.01),batch,,,acc1%
- **3.13**,
- 4.6, W_clip()(models/util_wxax.py)
- **12.14**, 1improve code structure; 2add deploy-tensorrt(main module, but not running yet)
- 12.18, 1improve code structure/module reference/module_name; 2add transfer-use demo
- **12.21**, improve pruning-quantization pipeline and code
- **2021.1.4**, add other quant_op
- 1.5, add quant_weight's per-channel and per-layer selection
- **1.7**, fix iao's loss-nan bug. The bug is due to per-channel min/max error
- 1.8, 1improve quant_para save. Now, only save scale and zero_point; 2add optional weight_observer(MinMaxObserver or MovingAverageMinMaxObserver)
- **1.11**, fix bug in binary_a(1/0) and binary_w preprocessing
- **1.12**, add "pip install"
- **1.22**, add auto_insert_quant_op(this still needs to be improved)
- **1.27**, improve auto_insert_quant_op(now you can easily use quantization, as [quant_test_auto](#quant_test_auto.py))
- 1.28, 1fix prune-quantization pipeline and code; 2improve code structure
- **2.1**, improve wbwtab_bn_fuse
- **2.4**, 1add wqaq_bn_fuse; 2add quant_model_inference_simulation; 3improve code format
- 4.30, 1update code_structure img; 2fix iao's quant_weight_range, quant_contrans and quant_bn_fuse_conv pretrained_model bn_para load bug
- **5.4**, add **qaft**, it's beneficial to improve the quantization accuracy
- **5.6**, add **ptq**, its quantization accuracy is also good
- 5.11, add bn_fuse_calib flag
- **5.14**, 1change **ste** to **clip_ste**, it's beneficial to improve the quant_train2remove quant_relu and add quant_leaky_relu
- 5.15, fix bug in quant_model_para post-processing
- **6.7**, add quant_add(need use base_module's op) and quant_resnet demo
- **6.9**, iao_quant supports multi gpus
- 6.16, fix quant_round() and quant_binary()
- 10.6, format
##
- python >= 3.5
- torch >= 1.1.0
- torchvison >= 0.3.0
- numpy
- onnx == 1.6.0
- tensorrt == 7.0.0.11
##
[PyPI](https://pypi.org/project/micronet/)
```bash
pip install micronet -i https://pypi.org/simple
```
[GitHub](https://github.com/666DZY666/micronet)
```bash
git clone https://github.com/666DZY666/micronet.git
cd micronet
python setup.py install
```
**
```bash
python -c "import micronet; print(micronet.__version__)"
```
##
*Install from github*
###
####
*--refine,,*
##### wbwtab
--W --A, WA
```bash
cd micronet/compression/quantization/wbwtab
```
- WbAb
```bash
python main.py --W 2 --A 2
```
- WbA32
```bash
python main.py --W 2 --A 32
```
- WtAb
```bash
python main.py --W 3 --A 2
```
- WtA32
```bash
python main.py --W 3 --A 32
```
##### wqaq
--w_bits --a_bits, WA
###### dorefa
```bash
cd micronet/compression/quantization/wqaq/dorefa
```
- W16A16
```bash
python main.py --w_bits 16 --a_bits 16
```
- W8A8
```bash
python main.py --w_bits 8 --a_bits 8
```
- W4A4
```bash
python main.py --w_bits 4 --a_bits 4
```
- bits
###### iao
```bash
cd micronet/compression/quantization/wqaq/iao
```
*dorefa*
**
**QAT/PTQ > QAFT**
**! QAT/PTQQAFT !**
--q_type, (0-, 1-)
--q_level, (0-, 1-)
--weight_observer, weight_observer(0-MinMaxObserver, 1-MovingAverageMinMaxObserver)
--bn_fuse, bn
--bn_fuse_calib, bn
--pretrained_model,
--qaft, qaft
--ptq, ptq_observer
--ptq_control, ptq_control
--ptq_batch, ptqbatch
--percentile, ptq
**QAT**
- : (), bn, weight_observer-MinMaxObserver, , qat
```bash
python main.py --q_type 0 --q_level 0 --weight_observer 0
```
- (), bn, weight_observer-MovingAverageMinMaxObserver
```bash
python main.py --q_type 0 --q_level 0 --weight_observer 1
```
- (), bn
```bash
python main.py --q_type 0 --q_level 1
```
- (), bn
```bash
python main.py --q_type 1 --q_level 0
```
- (), bn
```bash
python main.py --q_type 1 --q_level 1
```
- (), bn
```bash
python main.py --q_type 0 --q_level 0 --bn_fuse
```
- (), bn
```bash
python main.py --q_type 0 --q_level 1 --bn_fuse
```
- (), bn
```bash
python main.py --q_type 1 --q_level 0 --bn_fuse
```
- (), bn
```bash
python main.py --q_type 1 --q_level 1 --bn_fuse
```
- (), bn
```bash
python main.py --q_type 0 --q_level 0 --bn_fuse --bn_fuse_calib
```
**PTQ**
*,*
- (), bn
```bash
python main.py --refine ../../../pruning/models_save/nin_gc.pth --q_level 0 --bn_fuse --pretrained_model --ptq_control --ptq --batch_size 32 --ptq_batch 200 --percentile 0.999999
```
-
**QAFT**
**! QAT/PTQQAFT !**
**QAT > QAFT**
- (), bn
```bash
python main.py --resume models_save/nin_gc_bn_fused.pth --q_type 0 --q_level 0 --bn_fuse --qaft --lr 0.00001
```
-
**PTQ > QAFT**
- (), bn
```bash
python main.py --resume models_save/nin_gc_bn_fused.pth --q_level 0 --bn_fuse --qaft --lr 0.00001 --ptq
```
-
####
* > > *
```bash
cd micronet/compression/pruning
```
#####
-sr
--s (datasetmodel)
--model_type (0-nin, 1-nin_gc)
- nin()
```bash
python main.py -sr --s 0.0001 --model_type 0
```
- nin_gc()
```bash
python main.py -sr --s 0.001 --model_type 1
```
#####
--percent
--normal_regular (N,filterN)
--model model
--save model,
- (nin)
```bash
python normal_regular_prune.py --percent 0.5 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth
```
- (nin)
```bash
python normal_regular_prune.py --percent 0.5 --normal_regular 8 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth
```
```bash
python normal_regular_prune.py --percent 0.5 --normal_regular 16 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth
```
- (nin_gc)
```bash
python gc_prune.py --percent 0.4 --model models_save/nin_gc_sparse.pth
```
#####
--prune_refine model
- nin
```bash
python main.py --model_type 0 --prune_refine models_save/nin_prune.pth
```
- nin_gc
*******cfg***
**
```bash
python main.py --model_type 1 --gc_prune_refine 154 162 144 304 320 320 608 584
```
#### >
**
##### >
###### w8a8(dorefa)
```bash
cd micronet/compression/quantization/wqaq/dorefa
```
- nin()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth
```
- nin_gc()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_quant ../../../pruning/models_save/nin_gc_retrain.pth
```
###### w8a8(iao)
```bash
cd micronet/compression/quantization/wqaq/iao
```
**QAT/PTQ > QAFT**
**! QAT/PTQQAFT !**
**QAT**
*bn*
- nin()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth --lr 0.001
```
- nin_gc()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_quant ../../../pruning/models_save/nin_gc_retrain.pth --lr 0.001
```
*bn*
- nin()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth --bn_fuse --pretrained_model --lr 0.001
```
- nin_gc()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_quant ../../../pruning/models_save/nin_gc_retrain.pth --bn_fuse --pretrained_model --lr 0.001
```
**PTQ**
- nin()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth --bn_fuse --pretrained_model --ptq_control --ptq --batch_size 32 --ptq_batch 200 --percentile 0.999999
```
-
**QAFT**
**! QAT/PTQQAFT !**
**QAT > QAFT**
*bn*
- nin()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin.pth --qaft --lr 0.00001
```
- nin_gc()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc.pth --qaft --lr 0.00001
```
*bn*
- nin()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin_bn_fused.pth --bn_fuse --qaft --lr 0.00001
```
- nin_gc()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc_bn_fused.pth --bn_fuse --qaft --lr 0.00001
```
**PTQ > QAFT**
*bn*
- nin()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin.pth --qaft --lr 0.00001 --ptq
```
- nin_gc()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc.pth --qaft --lr 0.00001 --ptq
```
*bn*
- nin()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin_bn_fused.pth --bn_fuse --qaft --lr 0.00001 --ptq
```
- nin_gc()
```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc_bn_fused.pth --bn_fuse --qaft --lr 0.00001 --ptq
```
######
##### >
```bash
cd micronet/compression/quantization/wbwtab
```
###### wbab
- nin()
```bash
python main.py --W 2 --A 2 --model_type 0 --prune_quant ../../pruning/models_save/nin_finetune.pth
```
- nin_gc()
```bash
python main.py --W 2 --A 2 --model_type 1 --prune_quant ../../pruning/models_save/nin_gc_retrain.pth
```
######
#### BN
##### wbwtab
```bash
cd micronet/compression/quantization/wbwtab/bn_fuse
```
###### bn_fuse(quant_model_trainquant_bn_fused_model_inference)
*--model_type, 1 - nin_gc(); 0 - nin()*
*--prune_quant, _*
*--W, weight*
*,*
- nin_gc, quant_model, wb
```bash
python bn_fuse.py --model_type 1 --W 2
```
- nin_gc, prune_quant_model, wb
```bash
python bn_fuse.py --model_type 1 --prune_quant --W 2
```
- nin_gc, quant_model, wt
```bash
python bn_fuse.py --model_type 1 --W 3
```
- nin, quant_model, wb
```bash
python bn_fuse.py --model_type 0 --W 2
```
###### bn_fused_model_test(quant_model_trainquant_bn_fused_model_inference)
```bash
python bn_fused_model_test.py
```
##### dorefa
```bash
cd micronet/compression/quantization/wqaq/dorefa/quant_model_test
```
###### quant_model_para(quant_model_trainquant_model_inference)
*--model_type, 1 - nin_gc(); 0 - nin()*
*--prune_quant, _*
*--w_bits, weight; --a_bits, activation*
*,*
- nin_gc, quant_model, w8a8
```bash
python quant_model_para.py --model_type 1 --w_bits 8 --a_bits 8
```
- nin_gc, prune_quant_model, w8a8
```bash
python quant_model_para.py --model_type 1 --prune_quant --w_bits 8 --a_bits 8
```
- nin, quant_model, w8a8
```bash
python quant_model_para.py --model_type 0 --w_bits 8 --a_bits 8
```
###### quant_model_test(quant_model_trainquant_model_inference)
```bash
python quant_model_test.py
```
##### iao
***, --bn_fuse True***
```bash
cd micronet/compression/quantization/wqaq/iao/bn_fuse
```
###### bn_fuse(quant_bn_fused_model_trainquant_bn_fused_model_inference)
*--model_type, 1 - nin_gc(); 0 - nin()*
*--prune_quant, _*
*--w_bits, weight; --a_bits, activation*
*--q_type, 0 - ; 1 - *
*--q_level, 0 - ; 1 - *
*,*
- nin_gc, quant_model, w8a8
```bash
python bn_fuse.py --model_type 1 --w_bits 8 --a_bits 8
```
- nin_gc, prune_quant_model, w8a8
```bash
python bn_fuse.py --model_type 1 --prune_quant --w_bits 8 --a_bits 8
```
- nin, quant_model, w8a8
```bash
python bn_fuse.py --model_type 0 --w_bits 8 --a_bits 8
```
- nin_gc, quant_model, w8a8, ,
```bash
python bn_fuse.py --model_type 0 --w_bits 8 --a_bits 8 --q_type 1 --q_level 1
```
###### bn_fused_model_test(quant_bn_fused_model_trainquant_bn_fused_model_inference)
```bash
python bn_fused_model_test.py
```
####
*cpugpu()*
--cpu cpu--gpu_id gpu
- cpu
```bash
python main.py --cpu
```
- gpu
```bash
python main.py --gpu_id 0
```
```bash
python main.py --gpu_id 1
```
- gpu
```bash
python main.py --gpu_id 0,1
```
```bash
python main.py --gpu_id 0,1,2
```
**
###
#### TensorRT
*****demo*
#####
- [tensorrt-](https://zhuanlan.zhihu.com/p/336256668)
- [tensorrt-op/dynamic_shape](https://zhuanlan.zhihu.com/p/335829625)
##
###
#### LeNet example
##### quant_test_manual.py
*A model can be quantized(High-Bit(>2b)Low-Bit(2b)/Ternary and Binary) by simply replacing ***op*** with ***quant_op***.*
```python
import torch.nn as nn
import torch.nn.functional as F
# some base_op, such as ``Add````Concat``
from micronet.base_module.op import *
# ``quantize`` is quant_module, ``QuantConv2d``, ``QuantLinear``, ``QuantMaxPool2d``, ``QuantReLU`` are quant_op
from micronet.compression.quantization.wbwtab.quantize import (
QuantConv2d as quant_conv_wbwtab,
)
from micronet.compression.quantization.wbwtab.quantize import (
ActivationQuantizer as quant_relu_wbwtab,
)
from micronet.compression.quantization.wqaq.dorefa.quantize import (
QuantConv2d as quant_conv_dorefa,
)
from micronet.compression.quantization.wqaq.dorefa.quantize import (
QuantLinear as quant_linear_dorefa,
)
from micronet.compression.quantization.wqaq.iao.quantize import (
QuantConv2d as quant_conv_iao,
)
from micronet.compression.quantization.wqaq.iao.quantize import (
QuantLinear as quant_linear_iao,
)
from micronet.compression.quantization.wqaq.iao.quantize import (
QuantMaxPool2d as quant_max_pool_iao,
)
from micronet.compression.quantization.wqaq.iao.quantize import (
QuantReLU as quant_relu_iao,
)
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
self.max_pool = nn.MaxPool2d(kernel_size=2)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.relu(self.max_pool(self.conv1(x)))
x = self.relu(self.max_pool(self.conv2(x)))
x = x.view(-1, 320)
x = self.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
class QuantLeNetWbWtAb(nn.Module):
def __init__(self):
super(QuantLeNetWbWtAb, self).__init__()
self.conv1 = quant_conv_wbwtab(1, 10, kernel_size=5)
self.conv2 = quant_conv_wbwtab(10, 20, kernel_size=5)
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
self.max_pool = nn.MaxPool2d(kernel_size=2)
self.relu = quant_relu_wbwtab()
def forward(self, x):
x = self.relu(self.max_pool(self.conv1(x)))
x = self.relu(self.max_pool(self.conv2(x)))
x = x.view(-1, 320)
x = self.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
class QuantLeNetDoReFa(nn.Module):
def __init__(self):
super(QuantLeNetDoReFa, self).__init__()
self.conv1 = quant_conv_dorefa(1, 10, kernel_size=5)
self.conv2 = quant_conv_dorefa(10, 20, kernel_size=5)
self.fc1 = quant_linear_dorefa(320, 50)
self.fc2 = quant_linear_dorefa(50, 10)
self.max_pool = nn.MaxPool2d(kernel_size=2)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.relu(self.max_pool(self.conv1(x)))
x = self.relu(self.max_pool(self.conv2(x)))
x = x.view(-1, 320)
x = self.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
class QuantLeNetIAO(nn.Module):
def __init__(self):
super(QuantLeNetIAO, self).__init__()
self.conv1 = quant_conv_iao(1, 10, kernel_size=5)
self.conv2 = quant_conv_iao(10, 20, kernel_size=5)
self.fc1 = quant_linear_iao(320, 50)
self.fc2 = quant_linear_iao(50, 10)
self.max_pool = quant_max_pool_iao(kernel_size=2)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.relu(self.max_pool(self.conv1(x)))
x = self.relu(self.max_pool(self.conv2(x)))
x = x.view(-1, 320)
x = self.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
lenet = LeNet()
quant_lenet_wbwtab = QuantLeNetWbWtAb()
quant_lenet_dorefa = QuantLeNetDoReFa()
quant_lenet_iao = QuantLeNetIAO()
print("***ori_model***\n", lenet)
print("\n***quant_model_wbwtab***\n", quant_lenet_wbwtab)
print("\n***quant_model_dorefa***\n", quant_lenet_dorefa)
print("\n***quant_model_iao***\n", quant_lenet_iao)
print("\nquant_model is ready")
print("micronet is ready")
```
##### quant_test_auto.py
*A model can be quantized(High-Bit(>2b)Low-Bit(2b)/Ternary and Binary) by simply using ***micronet.compression.quantization.quantize.prepare(model)***.*
```python
import torch.nn as nn
import torch.nn.functional as F
# some base_op, such as ``Add````Concat``
from micronet.base_module.op import *
import micronet.compression.quantization.wqaq.dorefa.quantize as quant_dorefa
import micronet.compression.quantization.wqaq.iao.quantize as quant_iao
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
self.max_pool = nn.MaxPool2d(kernel_size=2)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
x = self.relu(self.max_pool(self.conv1(x)))
x = self.relu(self.max_pool(self.conv2(x)))
x = x.view(-1, 320)
x = self.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
"""
--w_bits --a_bits, WA
--q_type, (0-, 1-)
--q_level, (0-, 1-)
--weight_observer, weight_observer(0-MinMaxObserver, 1-MovingAverageMinMaxObserver)
--bn_fuse, bn
--bn_fuse_calib, bn
--pretrained_model,
--qaft, qaft
--ptq, ptq
--percentile, ptq
"""
lenet = LeNet()
quant_lenet_dorefa = quant_dorefa.prepare(lenet, inplace=False, a_bits=8, w_bits=8)
quant_lenet_iao = quant_iao.prepare(
lenet,
inplace=False,
a_bits=8,
w_bits=8,
q_type=0,
q_level=0,
weight_observer=0,
bn_fuse=False,
bn_fuse_calib=False,
pretrained_model=False,
qaft=False,
ptq=False,
percentile=0.9999,
)
# if ptq == False, do qat/qaft, need train
# if ptq == True, do ptq, don't need train
# you can refer to micronet/compression/quantization/wqaq/iao/main.py
print("***ori_model***\n", lenet)
print("\n***quant_model_dorefa***\n", quant_lenet_dorefa)
print("\n***quant_model_iao***\n", quant_lenet_iao)
print("\nquant_model is ready")
print("micronet is ready")
```
#### test
##### quant_test_manual
```bash
python -c "import micronet; micronet.quant_test_manual()"
```
##### quant_test_auto
```bash
python -c "import micronet; micronet.quant_test_auto()"
```
*when outputting "quant_model is ready", micronet is ready.*
###
***[BN](#bn)***
##
*cifar10*
||W(Bits)|A(Bits)|Acc|GFLOPs|Para(M)|Size(MB)|||
|:-:|:-----:|:-----:|:--:|:---:|:-----:|:------:|:---:|:-:|
|(nin)|FP32|FP32|91.01%|0.15|0.67|2.68|***|***|
|(nin_gc)|FP32|FP32|91.04%|0.15|0.58|2.32|13.43%|-0.03%|
||FP32|FP32|90.26%|0.09|0.32|1.28|52.24%|0.75%|
||1|FP32|90.93%|***|0.58|0.204|92.39%|0.08%|
||1.5|FP32|91%|***|0.58|0.272|89.85%|0.01%|
||1|1|86.23%|***|0.58|0.204|92.39%|4.78%|
||1.5|1|86.48%|***|0.58|0.272|89.85%|4.53%|
|(DoReFa)|8|8|91.03%|***|0.58|0.596|77.76%|-0.02%|
|(IAO,,symmetric/per-channel/bn_fuse)|8|8|90.99%|***|0.58|0.596|77.76%|0.02%|
|++|1.5|1|86.13%|***|0.32|0.19|92.91%|4.88%|
*--train_batch_size 256, *
##
###
####
##### QAT
######
- [BinarizedNeuralNetworks: TrainingNeuralNetworkswithWeightsand ActivationsConstrainedto +1 or1](https://arxiv.org/abs/1602.02830)
- [XNOR-Net:ImageNetClassicationUsingBinary ConvolutionalNeuralNetworks](https://arxiv.org/abs/1603.05279)
- [AN EMPIRICAL STUDY OF BINARY NEURAL NETWORKS OPTIMISATION](https://openreview.net/forum?id=rJfUCoR5KX)
- [A Review of Binarized Neural Networks](https://www.semanticscholar.org/paper/A-Review-of-Binarized-Neural-Networks-Simons-Lee/0332fdf00d7ff988c5b66c47afd49431eafa6cd1)
######
- [Ternary weight networks](https://arxiv.org/abs/1605.04711)
###### High-Bit
- [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160)
- [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/abs/1712.05877)
- [Quantizing deep convolutional networks for efficient inference: A whitepaper](https://arxiv.org/abs/1806.08342)
##### PTQ
###### High-Bit
- [tensorrt-ptq-8-bit](https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf)
####
- [Learning Efficient Convolutional Networks through Network Slimming](https://arxiv.org/abs/1708.06519)
- [RETHINKING THE VALUE OF NETWORK PRUNING](https://arxiv.org/abs/1810.05270)
####
- [Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing](https://arxiv.org/abs/1603.08270)
###
#### TensorRT
- [github](https://github.com/NVIDIA/TensorRT)
- [ptq](https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf)
- [tensorrt-](https://zhuanlan.zhihu.com/p/336256668)
- [tensorrt-op/dynamic_shape](https://zhuanlan.zhihu.com/p/335829625)
- [summary](https://github.com/mileistone/study_resources/blob/master/engineering/tensorrt/tensorrt.md)
##
- tensorrtdemo
- (///NAS)
- (mnn/tnn/tengine)
- >
Owner
- Name: Bestsongc
- Login: Bestsongc
- Kind: user
- Repositories: 1
- Profile: https://github.com/Bestsongc