https://github.com/bestsongc/micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regula

https://github.com/bestsongc/micronet

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regula

Basic Info
  • Host: GitHub
  • Owner: Bestsongc
  • License: mit
  • Default Branch: master
  • Homepage:
  • Size: 6.84 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of 666DZY666/micronet
Created over 2 years ago · Last pushed over 4 years ago

https://github.com/Bestsongc/micronet/blob/master/

# micronet

*""*

## 

[![PyPI](https://img.shields.io/pypi/v/micronet)](https://pypi.org/project/micronet) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/micronet)

*micronet, a model compression and deploy lib.*

### 

- High-Bit(>2b): QAT, PTQ, QAFT; Low-Bit(2b)/Ternary and Binary: QAT
- 
- (A)BN(BN > convb)
- High-BitBN(BN > convwb)

### 

- TensorRT(fp32/fp16/int8(ptq-calibration)op-adapt(upsample)dynamic_shape)


## 

![code_structure](https://github.com/666DZY666/micronet/blob/master/micronet/readme_imgs/code_structure.jpg)

```
micronet
 __init__.py
 base_module
  __init__.py
  op.py
 compression
  README.md
  __init__.py
  pruning
   README.md
   __init__.py
   gc_prune.py
   main.py
   models_save
    models_save.txt
   normal_regular_prune.py
  quantization
      README.md
      __init__.py
      wbwtab
       __init__.py
       bn_fuse
        bn_fuse.py
        bn_fused_model_test.py
        models_save
            models_save.txt
       main.py
       models_save
        models_save.txt
       quantize.py
      wqaq
          __init__.py
          dorefa
           __init__.py
           main.py
           models_save
            models_save.txt
           quant_model_test
            models_save
             models_save.txt
            quant_model_para.py
            quant_model_test.py
           quantize.py
          iao
              __init__.py
              bn_fuse
               bn_fuse.py
               bn_fused_model_test.py
               models_save
                   models_save.txt
              main.py
              models_save
               models_save.txt
              quantize.py
 data
  data.txt
 deploy
  README.md
  __init__.py
  tensorrt
      README.md
      __init__.py
      calibrator.py
      eval_trt.py
      models
       __init__.py
       models_trt.py
      models_save
       calibration_seg.cache
      test_trt.py
      util_trt.py
 models
  __init__.py
  nin.py
  nin_gc.py
  resnet.py
 readme_imgs
     code_structure.jpg
     micronet.xmind
```


## 
- **2019.12.4**, 
- **12.8**, DoReFa(A)(* 0.1)
- **12.11**, 
- 12.12, 
- 12.14, :1BN(W/)W/; 2BN(conv)(bias)
- **12.17**, ()
- 12.20, (cpugpu())
- **12.27**, 
- 12.29, High-Bit8-bit10-bit16-bit
- **2020.2.17**, 1W/; 2W
- **2.18**, (A)BN:BNgammaBN
- **2.24**, /Convcompression/quantization/wbwtab/models/util_wbwtab.pyQuantConv2dnin_gc.py
- **3.1**, 1googleHigh-Bit; 2High-BitBN
- **3.23.3**, Conv(FCdorefa)models/util_wxax.pyQuantConv2d(QuantLinear)nin_gc.py
- **3.4**, wbwtab/bn_fuse(A)BNBN(//())
- 3.11, compression/wqaq/iaoBNmomentum(0.1 > 0.01),batch,,,acc1%
- **3.13**, 
- 4.6, W_clip()(models/util_wxax.py)
- **12.14**, 1improve code structure; 2add deploy-tensorrt(main module, but not running yet)
- 12.18, 1improve code structure/module reference/module_name; 2add transfer-use demo
- **12.21**, improve pruning-quantization pipeline and code
- **2021.1.4**, add other quant_op
- 1.5, add quant_weight's per-channel and per-layer selection
- **1.7**, fix iao's loss-nan bug. The bug is due to per-channel min/max error
- 1.8, 1improve quant_para save. Now, only save scale and zero_point; 2add optional weight_observer(MinMaxObserver or MovingAverageMinMaxObserver)
- **1.11**, fix bug in binary_a(1/0) and binary_w preprocessing
- **1.12**, add "pip install"
- **1.22**, add auto_insert_quant_op(this still needs to be improved)
- **1.27**, improve auto_insert_quant_op(now you can easily use quantization, as [quant_test_auto](#quant_test_auto.py))
- 1.28, 1fix prune-quantization pipeline and code; 2improve code structure
- **2.1**, improve wbwtab_bn_fuse
- **2.4**, 1add wqaq_bn_fuse; 2add quant_model_inference_simulation; 3improve code format
- 4.30, 1update code_structure img; 2fix iao's quant_weight_range, quant_contrans and quant_bn_fuse_conv pretrained_model bn_para load bug
- **5.4**, add **qaft**, it's beneficial to improve the quantization accuracy
- **5.6**, add **ptq**, its quantization accuracy is also good
- 5.11, add bn_fuse_calib flag
- **5.14**, 1change **ste** to **clip_ste**, it's beneficial to improve the quant_train2remove quant_relu and add quant_leaky_relu
- 5.15, fix bug in quant_model_para post-processing
- **6.7**, add quant_add(need use base_module's op) and quant_resnet demo
- **6.9**, iao_quant supports multi gpus
- 6.16, fix quant_round() and quant_binary()
- 10.6, format


## 

- python >= 3.5
- torch >= 1.1.0
- torchvison >= 0.3.0
- numpy
- onnx == 1.6.0
- tensorrt == 7.0.0.11


## 

[PyPI](https://pypi.org/project/micronet/)

```bash
pip install micronet -i https://pypi.org/simple
```

[GitHub](https://github.com/666DZY666/micronet)

```bash
git clone https://github.com/666DZY666/micronet.git
cd micronet
python setup.py install
```

**

```bash
python -c "import micronet; print(micronet.__version__)"
```

## 

*Install from github*

### 

#### 

*--refine,,*

##### wbwtab

--W --A, WA

```bash
cd micronet/compression/quantization/wbwtab
```

- WbAb

```bash
python main.py --W 2 --A 2
```

- WbA32

```bash
python main.py --W 2 --A 32
```

- WtAb

```bash
python main.py --W 3 --A 2
```

- WtA32

```bash
python main.py --W 3 --A 32
```

##### wqaq

--w_bits --a_bits, WA

###### dorefa

```bash
cd micronet/compression/quantization/wqaq/dorefa
```

- W16A16

```bash
python main.py --w_bits 16 --a_bits 16
```

- W8A8

```bash
python main.py --w_bits 8 --a_bits 8
```

- W4A4

```bash
python main.py --w_bits 4 --a_bits 4
```

- bits

###### iao

```bash
cd micronet/compression/quantization/wqaq/iao
```

*dorefa*

**

**QAT/PTQ  >  QAFT**

**! QAT/PTQQAFT !**

--q_type, (0-, 1-)

--q_level, (0-, 1-)

--weight_observer, weight_observer(0-MinMaxObserver, 1-MovingAverageMinMaxObserver)

--bn_fuse, bn

--bn_fuse_calib, bn

--pretrained_model, 

--qaft, qaft

--ptq, ptq_observer

--ptq_control, ptq_control

--ptq_batch, ptqbatch

--percentile, ptq

**QAT**

- : (), bn, weight_observer-MinMaxObserver, , qat

```bash
python main.py --q_type 0 --q_level 0 --weight_observer 0
```

- (), bn, weight_observer-MovingAverageMinMaxObserver

```bash
python main.py --q_type 0 --q_level 0 --weight_observer 1
```

- (), bn

```bash
python main.py --q_type 0 --q_level 1
```

- (), bn

```bash
python main.py --q_type 1 --q_level 0
```

- (), bn

```bash
python main.py --q_type 1 --q_level 1
```

- (), bn

```bash
python main.py --q_type 0 --q_level 0 --bn_fuse
```

- (), bn

```bash
python main.py --q_type 0 --q_level 1 --bn_fuse
```

- (), bn

```bash
python main.py --q_type 1 --q_level 0 --bn_fuse
```

- (), bn

```bash
python main.py --q_type 1 --q_level 1 --bn_fuse
```

- (), bn

```bash
python main.py --q_type 0 --q_level 0 --bn_fuse --bn_fuse_calib
```

**PTQ**

*,*

- (), bn

```bash
python main.py --refine ../../../pruning/models_save/nin_gc.pth --q_level 0 --bn_fuse --pretrained_model --ptq_control --ptq --batch_size 32 --ptq_batch 200 --percentile 0.999999
```

- 

**QAFT**

**! QAT/PTQQAFT !**

**QAT  >  QAFT**

- (), bn

```bash
python main.py --resume models_save/nin_gc_bn_fused.pth --q_type 0 --q_level 0 --bn_fuse --qaft --lr 0.00001
```

- 

**PTQ  >  QAFT**

- (), bn

```bash
python main.py --resume models_save/nin_gc_bn_fused.pth --q_level 0 --bn_fuse --qaft --lr 0.00001 --ptq
```

- 

#### 

*  >    >  *

```bash
cd micronet/compression/pruning
```

##### 

-sr 

--s (datasetmodel)

--model_type (0-nin, 1-nin_gc)

- nin()

```bash
python main.py -sr --s 0.0001 --model_type 0
```

- nin_gc()

```bash
python main.py -sr --s 0.001 --model_type 1
```

##### 

--percent 

--normal_regular (N,filterN)

--model model

--save model, 

- (nin)

```bash
python normal_regular_prune.py --percent 0.5 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth
```

- (nin)

```bash
python normal_regular_prune.py --percent 0.5 --normal_regular 8 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth
```



```bash
python normal_regular_prune.py --percent 0.5 --normal_regular 16 --model models_save/nin_sparse.pth --save models_save/nin_prune.pth
```

- (nin_gc)

```bash
python gc_prune.py --percent 0.4 --model models_save/nin_gc_sparse.pth
```

##### 

--prune_refine model

- nin

```bash
python main.py --model_type 0 --prune_refine models_save/nin_prune.pth
```

- nin_gc

*******cfg***

**

```bash
python main.py --model_type 1 --gc_prune_refine 154 162 144 304 320 320 608 584
```

####  > 

**

#####  > 

###### w8a8(dorefa)

```bash
cd micronet/compression/quantization/wqaq/dorefa
```

- nin()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth
```

- nin_gc()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_quant ../../../pruning/models_save/nin_gc_retrain.pth
```

###### w8a8(iao)

```bash
cd micronet/compression/quantization/wqaq/iao
```

**QAT/PTQ  >  QAFT**

**! QAT/PTQQAFT !**

**QAT**

*bn*

- nin()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth --lr 0.001
```

- nin_gc()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_quant ../../../pruning/models_save/nin_gc_retrain.pth --lr 0.001
```

*bn*

- nin()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth --bn_fuse --pretrained_model --lr 0.001
```

- nin_gc()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_quant ../../../pruning/models_save/nin_gc_retrain.pth --bn_fuse --pretrained_model --lr 0.001
```

**PTQ**

- nin()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_quant ../../../pruning/models_save/nin_finetune.pth --bn_fuse --pretrained_model --ptq_control --ptq --batch_size 32 --ptq_batch 200 --percentile 0.999999
```

- 

**QAFT**

**! QAT/PTQQAFT !**

**QAT  >  QAFT**

*bn*

- nin()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin.pth --qaft --lr 0.00001
```

- nin_gc()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc.pth --qaft --lr 0.00001
```

*bn*

- nin()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin_bn_fused.pth --bn_fuse --qaft --lr 0.00001
```

- nin_gc()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc_bn_fused.pth --bn_fuse --qaft --lr 0.00001
```

**PTQ  >  QAFT**

*bn*

- nin()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin.pth --qaft --lr 0.00001 --ptq
```

- nin_gc()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc.pth --qaft --lr 0.00001 --ptq
```

*bn*

- nin()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 0 --prune_qaft models_save/nin_bn_fused.pth --bn_fuse --qaft --lr 0.00001 --ptq
```

- nin_gc()

```bash
python main.py --w_bits 8 --a_bits 8 --model_type 1 --prune_qaft models_save/nin_gc_bn_fused.pth --bn_fuse --qaft --lr 0.00001 --ptq
```

###### 

#####  > 

```bash
cd micronet/compression/quantization/wbwtab
```

###### wbab

- nin()

```bash
python main.py --W 2 --A 2 --model_type 0 --prune_quant ../../pruning/models_save/nin_finetune.pth
```

- nin_gc()

```bash
python main.py --W 2 --A 2 --model_type 1 --prune_quant ../../pruning/models_save/nin_gc_retrain.pth
```

###### 


#### BN

##### wbwtab

```bash
cd micronet/compression/quantization/wbwtab/bn_fuse
```

###### bn_fuse(quant_model_trainquant_bn_fused_model_inference)

*--model_type, 1 - nin_gc(); 0 - nin()*

*--prune_quant, _*

*--W, weight*

*,*

- nin_gc, quant_model, wb

```bash
python bn_fuse.py --model_type 1 --W 2
```

- nin_gc, prune_quant_model, wb

```bash
python bn_fuse.py --model_type 1 --prune_quant --W 2
```

- nin_gc, quant_model, wt

```bash
python bn_fuse.py --model_type 1 --W 3
```

- nin, quant_model, wb

```bash
python bn_fuse.py --model_type 0 --W 2
```

###### bn_fused_model_test(quant_model_trainquant_bn_fused_model_inference)

```bash
python bn_fused_model_test.py
```

##### dorefa

```bash
cd micronet/compression/quantization/wqaq/dorefa/quant_model_test
```

###### quant_model_para(quant_model_trainquant_model_inference)

*--model_type, 1 - nin_gc(); 0 - nin()*

*--prune_quant, _*

*--w_bits, weight; --a_bits, activation*

*,*

- nin_gc, quant_model, w8a8

```bash
python quant_model_para.py --model_type 1 --w_bits 8 --a_bits 8
```

- nin_gc, prune_quant_model, w8a8

```bash
python quant_model_para.py --model_type 1 --prune_quant --w_bits 8 --a_bits 8
```

- nin, quant_model, w8a8

```bash
python quant_model_para.py --model_type 0 --w_bits 8 --a_bits 8
```

###### quant_model_test(quant_model_trainquant_model_inference)

```bash
python quant_model_test.py
```

##### iao

***, --bn_fuse  True***
```bash
cd micronet/compression/quantization/wqaq/iao/bn_fuse
```

###### bn_fuse(quant_bn_fused_model_trainquant_bn_fused_model_inference)

*--model_type, 1 - nin_gc(); 0 - nin()*

*--prune_quant, _*

*--w_bits, weight; --a_bits, activation*

*--q_type, 0 - ; 1 - *

*--q_level, 0 - ; 1 - *

*,*

- nin_gc, quant_model, w8a8

```bash
python bn_fuse.py --model_type 1 --w_bits 8 --a_bits 8
```

- nin_gc, prune_quant_model, w8a8

```bash
python bn_fuse.py --model_type 1 --prune_quant --w_bits 8 --a_bits 8
```

- nin, quant_model, w8a8

```bash
python bn_fuse.py --model_type 0 --w_bits 8 --a_bits 8
```

- nin_gc, quant_model, w8a8, , 

```bash
python bn_fuse.py --model_type 0 --w_bits 8 --a_bits 8 --q_type 1 --q_level 1
```

###### bn_fused_model_test(quant_bn_fused_model_trainquant_bn_fused_model_inference)

```bash
python bn_fused_model_test.py
```

#### 

*cpugpu()*

--cpu cpu--gpu_id gpu

- cpu

```bash
python main.py --cpu
```

- gpu

```bash
python main.py --gpu_id 0
```



```bash
python main.py --gpu_id 1
```

- gpu

```bash
python main.py --gpu_id 0,1
```



```bash
python main.py --gpu_id 0,1,2
```

**

### 

#### TensorRT

*****demo*

##### 
- [tensorrt-](https://zhuanlan.zhihu.com/p/336256668)
- [tensorrt-op/dynamic_shape](https://zhuanlan.zhihu.com/p/335829625)


## 

### 

#### LeNet example

##### quant_test_manual.py

*A model can be quantized(High-Bit(>2b)Low-Bit(2b)/Ternary and Binary) by simply replacing ***op*** with ***quant_op***.*

```python
import torch.nn as nn
import torch.nn.functional as F

# some base_op, such as ``Add````Concat``
from micronet.base_module.op import *

# ``quantize`` is quant_module, ``QuantConv2d``, ``QuantLinear``, ``QuantMaxPool2d``, ``QuantReLU`` are quant_op
from micronet.compression.quantization.wbwtab.quantize import (
    QuantConv2d as quant_conv_wbwtab,
)
from micronet.compression.quantization.wbwtab.quantize import (
    ActivationQuantizer as quant_relu_wbwtab,
)
from micronet.compression.quantization.wqaq.dorefa.quantize import (
    QuantConv2d as quant_conv_dorefa,
)
from micronet.compression.quantization.wqaq.dorefa.quantize import (
    QuantLinear as quant_linear_dorefa,
)
from micronet.compression.quantization.wqaq.iao.quantize import (
    QuantConv2d as quant_conv_iao,
)
from micronet.compression.quantization.wqaq.iao.quantize import (
    QuantLinear as quant_linear_iao,
)
from micronet.compression.quantization.wqaq.iao.quantize import (
    QuantMaxPool2d as quant_max_pool_iao,
)
from micronet.compression.quantization.wqaq.iao.quantize import (
    QuantReLU as quant_relu_iao,
)


class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.max_pool = nn.MaxPool2d(kernel_size=2)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.relu(self.max_pool(self.conv1(x)))
        x = self.relu(self.max_pool(self.conv2(x)))
        x = x.view(-1, 320)
        x = self.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


class QuantLeNetWbWtAb(nn.Module):
    def __init__(self):
        super(QuantLeNetWbWtAb, self).__init__()
        self.conv1 = quant_conv_wbwtab(1, 10, kernel_size=5)
        self.conv2 = quant_conv_wbwtab(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.max_pool = nn.MaxPool2d(kernel_size=2)
        self.relu = quant_relu_wbwtab()

    def forward(self, x):
        x = self.relu(self.max_pool(self.conv1(x)))
        x = self.relu(self.max_pool(self.conv2(x)))
        x = x.view(-1, 320)
        x = self.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


class QuantLeNetDoReFa(nn.Module):
    def __init__(self):
        super(QuantLeNetDoReFa, self).__init__()
        self.conv1 = quant_conv_dorefa(1, 10, kernel_size=5)
        self.conv2 = quant_conv_dorefa(10, 20, kernel_size=5)
        self.fc1 = quant_linear_dorefa(320, 50)
        self.fc2 = quant_linear_dorefa(50, 10)
        self.max_pool = nn.MaxPool2d(kernel_size=2)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.relu(self.max_pool(self.conv1(x)))
        x = self.relu(self.max_pool(self.conv2(x)))
        x = x.view(-1, 320)
        x = self.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


class QuantLeNetIAO(nn.Module):
    def __init__(self):
        super(QuantLeNetIAO, self).__init__()
        self.conv1 = quant_conv_iao(1, 10, kernel_size=5)
        self.conv2 = quant_conv_iao(10, 20, kernel_size=5)
        self.fc1 = quant_linear_iao(320, 50)
        self.fc2 = quant_linear_iao(50, 10)
        self.max_pool = quant_max_pool_iao(kernel_size=2)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.relu(self.max_pool(self.conv1(x)))
        x = self.relu(self.max_pool(self.conv2(x)))
        x = x.view(-1, 320)
        x = self.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


lenet = LeNet()
quant_lenet_wbwtab = QuantLeNetWbWtAb()
quant_lenet_dorefa = QuantLeNetDoReFa()
quant_lenet_iao = QuantLeNetIAO()

print("***ori_model***\n", lenet)
print("\n***quant_model_wbwtab***\n", quant_lenet_wbwtab)
print("\n***quant_model_dorefa***\n", quant_lenet_dorefa)
print("\n***quant_model_iao***\n", quant_lenet_iao)

print("\nquant_model is ready")
print("micronet is ready")
```

##### quant_test_auto.py

*A model can be quantized(High-Bit(>2b)Low-Bit(2b)/Ternary and Binary) by simply using ***micronet.compression.quantization.quantize.prepare(model)***.*

```python
import torch.nn as nn
import torch.nn.functional as F

# some base_op, such as ``Add````Concat``
from micronet.base_module.op import *

import micronet.compression.quantization.wqaq.dorefa.quantize as quant_dorefa
import micronet.compression.quantization.wqaq.iao.quantize as quant_iao


class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.max_pool = nn.MaxPool2d(kernel_size=2)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.relu(self.max_pool(self.conv1(x)))
        x = self.relu(self.max_pool(self.conv2(x)))
        x = x.view(-1, 320)
        x = self.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


"""
--w_bits --a_bits, WA
--q_type, (0-, 1-)
--q_level, (0-, 1-)
--weight_observer, weight_observer(0-MinMaxObserver, 1-MovingAverageMinMaxObserver)
--bn_fuse, bn
--bn_fuse_calib, bn
--pretrained_model, 
--qaft, qaft
--ptq, ptq
--percentile, ptq
"""
lenet = LeNet()
quant_lenet_dorefa = quant_dorefa.prepare(lenet, inplace=False, a_bits=8, w_bits=8)
quant_lenet_iao = quant_iao.prepare(
    lenet,
    inplace=False,
    a_bits=8,
    w_bits=8,
    q_type=0,
    q_level=0,
    weight_observer=0,
    bn_fuse=False,
    bn_fuse_calib=False,
    pretrained_model=False,
    qaft=False,
    ptq=False,
    percentile=0.9999,
)

# if ptq == False, do qat/qaft, need train
# if ptq == True, do ptq, don't need train
# you can refer to micronet/compression/quantization/wqaq/iao/main.py

print("***ori_model***\n", lenet)
print("\n***quant_model_dorefa***\n", quant_lenet_dorefa)
print("\n***quant_model_iao***\n", quant_lenet_iao)

print("\nquant_model is ready")
print("micronet is ready")
```

#### test

##### quant_test_manual

```bash
python -c "import micronet; micronet.quant_test_manual()"
```

##### quant_test_auto

```bash
python -c "import micronet; micronet.quant_test_auto()"
```

*when outputting "quant_model is ready", micronet is ready.*

### 

***[BN](#bn)***

## 

*cifar10*

||W(Bits)|A(Bits)|Acc|GFLOPs|Para(M)|Size(MB)|||
|:-:|:-----:|:-----:|:--:|:---:|:-----:|:------:|:---:|:-:|
|(nin)|FP32|FP32|91.01%|0.15|0.67|2.68|***|***|
|(nin_gc)|FP32|FP32|91.04%|0.15|0.58|2.32|13.43%|-0.03%|
||FP32|FP32|90.26%|0.09|0.32|1.28|52.24%|0.75%|
||1|FP32|90.93%|***|0.58|0.204|92.39%|0.08%|
||1.5|FP32|91%|***|0.58|0.272|89.85%|0.01%|
||1|1|86.23%|***|0.58|0.204|92.39%|4.78%|
||1.5|1|86.48%|***|0.58|0.272|89.85%|4.53%|
|(DoReFa)|8|8|91.03%|***|0.58|0.596|77.76%|-0.02%|
|(IAO,,symmetric/per-channel/bn_fuse)|8|8|90.99%|***|0.58|0.596|77.76%|0.02%|
|++|1.5|1|86.13%|***|0.32|0.19|92.91%|4.88%|

*--train_batch_size 256, *

## 

### 

#### 

##### QAT

###### 

- [BinarizedNeuralNetworks: TrainingNeuralNetworkswithWeightsand ActivationsConstrainedto +1 or1](https://arxiv.org/abs/1602.02830)

- [XNOR-Net:ImageNetClassicationUsingBinary ConvolutionalNeuralNetworks](https://arxiv.org/abs/1603.05279)

- [AN EMPIRICAL STUDY OF BINARY NEURAL NETWORKS OPTIMISATION](https://openreview.net/forum?id=rJfUCoR5KX)

- [A Review of Binarized Neural Networks](https://www.semanticscholar.org/paper/A-Review-of-Binarized-Neural-Networks-Simons-Lee/0332fdf00d7ff988c5b66c47afd49431eafa6cd1)

###### 

- [Ternary weight networks](https://arxiv.org/abs/1605.04711)

###### High-Bit

- [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160)
- [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](https://arxiv.org/abs/1712.05877)
- [Quantizing deep convolutional networks for efficient inference: A whitepaper](https://arxiv.org/abs/1806.08342)

##### PTQ

###### High-Bit

- [tensorrt-ptq-8-bit](https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf)

#### 

- [Learning Efficient Convolutional Networks through Network Slimming](https://arxiv.org/abs/1708.06519)
- [RETHINKING THE VALUE OF NETWORK PRUNING](https://arxiv.org/abs/1810.05270)

#### 

- [Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing](https://arxiv.org/abs/1603.08270)

### 

#### TensorRT

- [github](https://github.com/NVIDIA/TensorRT)
- [ptq](https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf)
- [tensorrt-](https://zhuanlan.zhihu.com/p/336256668)
- [tensorrt-op/dynamic_shape](https://zhuanlan.zhihu.com/p/335829625)
- [summary](https://github.com/mileistone/study_resources/blob/master/engineering/tensorrt/tensorrt.md)


## 

- tensorrtdemo
- (///NAS)
- (mnn/tnn/tengine)
-  > 

Owner

  • Name: Bestsongc
  • Login: Bestsongc
  • Kind: user

GitHub Events

Total
Last Year