https://github.com/aimilefth/fxpytorch

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: aimilefth
License: mit
Language: Python
Default Branch: main
Size: 159 KB

Statistics

Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

FxPyTorch: Fixed-Point Quantization for PyTorch Layers

FxPyTorch is a Python library that extends PyTorch's nn.Module system to support symmetric, linear fixed-point quantization. It provides tools for simulating fixed-point arithmetic where the scaling factor is constrained to be a power of 2, which is often efficient for hardware implementations. The library helps analyze quantization effects and prepare models for deployment on hardware with fixed-point capabilities.

Features

Fixed-Point Layer Implementations:
- FxPLinear: Fixed-point Linear layer.
- FxPLayerNorm: Fixed-point Layer Normalization.
- FxPMultiheadAttention: Fixed-point Multi-Head Attention.
- FxPTransformerEncoderLayer: Fixed-point Transformer Encoder Layer.
- FxPSoftmax: Fixed-point Softmax.
- FxPDropout: Fixed-point Dropout (quantizes input/output, dropout itself is standard).
Flexible Quantization Configuration (Symmetric, Power-of-2 Scaling):
- Implements symmetric linear quantization around zero.
- Uses power-of-2 scaling factors (determined by fractional_bits) for efficient hardware mapping (e.g., bit shifts instead of multiplications).
- Define total_bits and fractional_bits for weights, biases, and activations.
- Choose rounding methods (e.g., ROUND_SATURATE, TRUNC_SATURATE).
- Pydantic-based configuration models (QType, LinearQConfig, etc.) for validation and clarity.
Helper Utilities:
- set_high_precision_quant(): Configure layers for maximum precision (e.g., 24 fractional bits) given their dynamic range, adhering to the symmetric, power-of-2 scheme.
- set_no_overflow_quant(): Configure layers to use a specified total number of bits for parameters, automatically calculating fractional bits to prevent overflow based on weight/bias dynamic range, adhering to the symmetric, power-of-2 scheme.
Transparent Base Layers:
- Includes "transparent" versions of standard PyTorch layers (LinearTransparent, LayerNormTransparent, etc.) that act as drop-in replacements for nn.Module equivalents but include hooks for activation logging. These serve as the base for the FxP layers.
Activation Logging:
- ActivationLogger utility to inspect intermediate tensor values and their quantized counterparts throughout the model.

Installation

Prerequisites

Python (>=3.8 recommended)
PyTorch (>=2.2.0 recommended, see pyproject.toml for specific version)
Pydantic (>=2.0, see pyproject.toml)

From Git (Recommended for development or as a submodule)

You can include FxPyTorch in your project as a Git submodule: bash git submodule add https://github.com/yourusername/FxPyTorch.git

Quick Start

```python import torch from FxPyTorch.fxp.fxplinear import FxPLinear, LinearQConfig from FxPyTorch.fxp.symmetricsquant import QType, QMethod

Define a quantization configuration for a linear layer

Example: 8-bit weights, 8-bit bias, 16-bit input/activation with 8 fractional bits

linearqconfig = LinearQConfig( input=QType(totalbits=16, fractionalbits=8, qmethod=QMethod.ROUNDSATURATE), weight=QType(totalbits=8, qmethod=QMethod.ROUNDSATURATE), # Fractional bits determined by setnooverflowquant bias=QType(totalbits=8, qmethod=QMethod.ROUNDSATURATE), # Fractional bits determined by setnooverflowquant activation=QType(totalbits=16, fractionalbits=8, qmethod=QMethod.ROUNDSATURATE) )

Create a fixed-point linear layer

fxplinearlayer = FxPLinear(infeatures=10, outfeatures=5, bias=True, qconfig=linearq_config)

Initialize weights (e.g., load from a pre-trained floating-point model)

fxplinearlayer.loadstatedict(...)

If totalbits for weights/bias are set but fractionalbits are not,

you can automatically determine fractional_bits to avoid overflow:

fxplinearlayer.setnooverflow_quant()

print("Quantization Config after setnooverflowquant:") print(fxplinearlayer.qconfig.modeldumpjson(indent=2))

Create dummy input

dummy_input = torch.randn(1, 10)

Forward pass (simulates fixed-point arithmetic)

apply_ste=True uses Straight-Through Estimator for gradients during training

output = fxplinearlayer(dummyinput, applyste=True) print("\nOutput:", output)

To get truly quantized weights (e.g., for export):

fxplinearlayer.quantizeweightsbias() print("\nQuantized Weight:", fxplinearlayer.weight.data) ```

See the tests/ directory for more detailed usage examples of different layers and quantization scenarios.

Core Concepts

QType: Defines the bit-width (total_bits, fractional_bits) and QMethod for a specific tensor (input, weight, bias, activation).
*QConfig (e.g., LinearQConfig): A Pydantic model that groups QType configurations for all relevant tensors within a specific layer type.
FxP* layers: PyTorch modules that implement fixed-point behavior. They typically inherit from a corresponding *Transparent layer.
- If q_config is None, they behave like standard floating-point layers.
- If q_config is provided, they simulate quantization during the forward pass.
set_no_overflow_quant(): A method on FxP* layers. If total_bits is specified in the QType for weights/biases, this method calculates the optimal fractional_bits to maximize precision while ensuring the current weight/bias values do not overflow.
set_high_precision_quant(): A method that configures weights/biases to use a high number of fractional bits (e.g., 24) and calculates the total_bits needed to represent their current dynamic range.
quantize_weights_bias(): A method to permanently alter the layer's weight and bias tensors to their quantized values. Useful before exporting weights.
ActivationLogger: A utility to log intermediate tensor values during the forward pass for debugging and analysis.

Modules

fxp/: Contains the fixed-point layer implementations and core quantization logic.
- symmetrics_quant.py: Core symmetric quantization functions and QType/QConfig base.
- utils.py: Helper utilities like ValueRange.
- fxp_*.py: Specific fixed-point layer implementations.
transparent/: Contains "transparent" base layers that mirror standard PyTorch layers but include hooks for activation logging.
- activation_logger.py: The ActivationLogger class.
- trans_*.py: Specific transparent layer implementations.
tests/: Unit tests and usage examples.

TODO / Future Work

[ ] Explore quantization schemes with non-power-of-2 scaling factors
[ ] Add support for asymmetric quantization.
[ ] More comprehensive testing scenarios.
[ ] Detailed documentation for each module and function.
[ ] Performance benchmarking.
[ ] Examples of exporting quantized weights for specific hardware targets.

License

This project is licensed under the MIT License.  ```

Owner

Login: aimilefth
Kind: user

Repositories: 1
Profile: https://github.com/aimilefth

GitHub Events

Total

Watch event: 1
Push event: 5
Public event: 1
Fork event: 1
Create event: 2

Last Year

Watch event: 1
Push event: 5
Public event: 1
Fork event: 1
Create event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 6 minutes
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: 6 minutes
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0