innervator
Innervator: Hardware Acceleration for Neural Networks
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Keywords
Repository
Innervator: Hardware Acceleration for Neural Networks
Basic Info
- Host: GitHub
- Owner: Thraetaona
- License: gpl-3.0
- Language: VHDL
- Default Branch: main
- Homepage: https://doi.org/10.36227/techrxiv.172263165.56660174/v1
- Size: 2.86 MB
Statistics
- Stars: 14
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
Innervator
Hardware Acceleration for Neural Networks
*Technical Paper (IEEE TechArxiv)*
|
Presentation Slides
|
Repository DOI
Abstract
Artificial intelligence ("AI") is deployed in various applications, from noise cancellation to image recognition, but AI-based products often come with high hardware and electricity costs; this makes them inaccessible for consumer devices and small-scale edge electronics. Inspired by biological brains, deep neural networks ("DNNs") are modeled using mathematical formulae, yet general-purpose processors treat otherwise-parallelizable AI algorithms as step-by-step sequential logic. In contrast, programmable logic devices ("PLDs") can be customized to the specific parameters of a trained DNN, thereby ensuring data-tailored computation and algorithmic parallelism at the register-transfer level. Furthermore, a subgroup of PLDs, field-programmable gate arrays ("FPGAs"), are dynamically reconfigurable. So, to improve AI runtime performance, I designed and open-sourced my hardware compiler: Innervator. Written entirely in VHDL-2008, Innervator takes any DNN's metadata and parameters (e.g., number of layers, neurons per layer, and their weights/biases), generating its synthesizable FPGA hardware description with the appropriate pipelining and batch processing. Innervator is entirely portable and vendor-independent. As a proof of concept, I used Innervator to implement a sample 8x8-pixel handwritten digit-recognizing neural network in a low-cost AMD Xilinx Artix-7(TM) FPGA @ 100 MHz. With 3 pipeline stages and 2 batches at about 67% LUT utilization, the Network achieved ~7.12 GOP/s, predicting the output in 630 ns and under 0.25 W of power. In comparison, an Intel(R) Core(TM) i7-12700H CPU @ 4.70 GHz would take 40,000-60,000 ns at 45 to 115 W. Ultimately, Innervator's hardware-accelerated approach bridges the inherent mismatch between current AI algorithms and the general-purpose digital hardware they run on.
Technical Paper
Notice
Although the Abstract specifically talks about an image-recognizing neural network, I endeavoured to generalize Innervator: in practice, it is capable of implementing any number of neurons and layers, and in any possible application (e.g., speech recognition), not just imagery. In the ./data folder, you will find weight and bias parameters that will be used during Innervator's synthesis. Because of the incredibly broken implementation of VHDL's std.textio library across most synthesis tools, I was limited to only reading std_logic_vectors from files; due to that, weights and biases had to be pre-formatted in a fixed-point representation. (More information is available in file_parser.vhd.
The VHDL code itself has been very throughly documented; because I was a novice to VHDL, AI, and FPGA design myself, I documented each step as if it was a beginner's tutorial. Also, you may find these overview slides of the Project useful.
Interestingly, even though I was completely new to the world of hardware design, I still found the toolchain (and even VHDL itself) in a very unstable and buggy state; in fact, throughout this project, I found and documented dozens of different bugs, some of which were new and reported to IEEE and Xilinx:
* VHDL Language Inconsistency in Ports
* VHDL Language Enhancement
* Bug in Vivado's file_open()
* Bug in Vivado's read()
* GitHub VHDL Syntax Highlighter
* Synopsys Synplify p2019's Parser Breaks on VHDL-2019 syntax
Nomenclature
To innervate means "to supply something with nerves."
Innervator is, aptly, an implementer of artificial neural networks within Programmable Logic Devices.
Furthermore, these hardware-based neural networks could be named "Innervated Neural Networks," which also appears as INN in INNervator.
Foreword
- Prior to starting this project, I had no experience or training with artificial intelligence ("AI"), electrical engineering, or hardware design;
- Hardware design is a complex field—an "unlearn" of computer science; and
- Combining the two ideas, AI & hardware, transformed this project into a unique proof of concept.
Synopsis
- Inspired by biological brains, AI neural networks are modeled in mathematical formulae that are inherently concurrent;
- AI applications are widespread but suffer from general-purpose computer processors that execute algorithms in step-by-step sequences; and
- Programmable Logic Devices ("PLDs") allow for digital circuitry to be predesigned for data-tailored and massively parallelized operations
Build Instructions
[TODO: Create a TCL script and makefile to automate this.]
To ensure maximal compatibility, I tested Innervator across both Xilinx Vivado 2024's synthesizer (not simulator) and Mentor Graphics ModelSim 2016's simulator; the code itself was written using a subset of VHDL-2008, without any other language involved. Additionally, absolutely no vendor-specific libraries were used in Innervator's design; only the official std and IEEE VHDL packages were utilized.
Because I developed Innervator on a small, entry-level FPGA board (i.e., Digilent Arty A7-35T), I faced many challenges in regard to logic resource usage and timing failures; however, this also ensured that Innervator would become very portable and resource-efficient.
In the ./src/config.vhd file, you will be able to fine-tune Innervator to your liking; almost everything is customizable and generic, down to the polarization/synchronization of reset, fixed-point types' widths, and neurons' batch processing size or pipeline stages.
Hardware Demo (Arty A7-35T)
I used the four LEDs to "transmit" the network's prediction (i.e., resulting digit in this case); but the same UART interface could later be used to also transmit it back to the computer.
https://github.com/Thraetaona/Innervator/assets/42461518/52132598-aac0-4532-85c9-bce6e69aa214
(Note: The "delay" you see between the command prompt and FPGA is primarily due to the UART speed; the actual neural network itself takes ~1000 nanoseconds to process its input.)
Simulation
(Note: This was an old simulation run; in the current version, the same digit was predicted with a %70+ accuracy.)
The sample network that was used in said simulation:
Statistics (Artix-7 35T FPGA)
Excluding the periphals (e.g., UART, button debouncer, etc.) and given a network with an input and 2 neural layers (64 inputs, 20 hidden neurons, and 10 output neurons), 4 bits for integral and 4 bits for fractional widths of fixed-point numerals, batch processing of 1 and 2 (i.e., one/two DSP for each neuron), and 3 pipeline stages; Innervator consumed the following resources:
|Resource|Utilization (1)|Utilization (2)|Total Availability| |:-|:-|:-|:-| | Logic LUT | 10,233 | 13,949 | 20,800| | Sliced Reg. | 13,954 | 22,145 | 41,600 | | F7 Mux. | 620 | 1,440 | 16,300 | | Slice | 3,775 | 6,115 | 8,150 | | DSP | 30 | 60 | 90 | | Speed (ns) | 1,030 | 639 | N/A |
Timing reports were also great; the Worst Negative Slack (WNS) was 1.252 ns, without aggressive synthesis optimizations, given a 100 MHz clock. Lastly, on the same FPGA and with two pipeline stages, the number of giga-operations per second was 7.12 GOP/s (calculations in the technical paper), and the total on-chip power draw was 0.189 W.
Prediction Acuracy Falloff (vs. CPU/floating-point)
| Digit | FPGA | CPU |:-|:-|:-| | 0 | .30468800 | .10168505 | | 1 | .57812500 | .15610851 | | 2 | .50781300 | .14220775 | | 3 | .21875000 | .19579356 | | 4 | .00390625 | .00119471 | | 5 | .20703100 | .01840737 | | 6 | .21484400 | .00273704 | | 7 | .13281300 | .09511474 | | 8 | .24218800 | .15363488 | | 9 | .69921900 | .71728650 | | Speed (ns) | 630 | 40k--60k |
Owner
- Name: Fereydoun Memarzanjany
- Login: Thraetaona
- Kind: user
- Website: https://Thraetaona.github.io/
- Repositories: 4
- Profile: https://github.com/Thraetaona
Freelancer Programmer interested in low-level Languages, Embedded Systems, Machine Learning & AI.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you wish to cite this hardware design, please do so as noted below:"
authors:
- family-names: "Memarzanjany"
given-names: "Fereydoun"
orcid: "https://orcid.org/0000-0003-1393-6804"
title: "Innervator"
version: 1.0.0
doi: 10.5281/zenodo.12712831
date-released: 2024-06-05
url: "https://github.com/Thraetaona/Innervator"
preferred-citation:
type: report
authors:
- family-names: "Memarzanjany"
given-names: "Fereydoun"
orcid: "https://orcid.org/0000-0003-1393-6804"
doi: "10.36227/techrxiv.172263165.56660174/v1"
journal: "IEEE TechArxiv"
month: 8
start: 1 # First page number
end: 6 # Last page number
title: "Innervator: Hardware Acceleration for Neural Networks"
issue: 1
volume: 1
year: 2024
GitHub Events
Total
- Watch event: 3
- Fork event: 2
Last Year
- Watch event: 3
- Fork event: 2