innervator

Innervator: Hardware Acceleration for Neural Networks

https://github.com/thraetaona/innervator

Keywords

artifical-neural-network artificial-intelligence arty-a7 dnn fpga freestanding hardware-acceleration hardware-compiler ieee-1076 machine-learning neural-networks neuron open-source paper standalone vhdl vhdl-2008 vlsi xilinx-fpga

Last synced: 10 months ago · JSON representation ·

Repository

Innervator: Hardware Acceleration for Neural Networks

Basic Info

Host: GitHub
Owner: Thraetaona
License: gpl-3.0
Language: VHDL
Default Branch: main
Homepage: https://doi.org/10.36227/techrxiv.172263165.56660174/v1
Size: 2.86 MB

Statistics

Stars: 14
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 1

Topics

artifical-neural-network artificial-intelligence arty-a7 dnn fpga freestanding hardware-acceleration hardware-compiler ieee-1076 machine-learning neural-networks neuron open-source paper standalone vhdl vhdl-2008 vlsi xilinx-fpga

Created over 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

README.md

`Innervator`

Hardware Acceleration for Neural Networks

Technical Paper (IEEE TechArxiv) | Presentation Slides | Repository DOI

Abstract

Artificial intelligence ("AI") is deployed in various applications, from noise cancellation to image recognition, but AI-based products often come with high hardware and electricity costs; this makes them inaccessible for consumer devices and small-scale edge electronics. Inspired by biological brains, deep neural networks ("DNNs") are modeled using mathematical formulae, yet general-purpose processors treat otherwise-parallelizable AI algorithms as step-by-step sequential logic. In contrast, programmable logic devices ("PLDs") can be customized to the specific parameters of a trained DNN, thereby ensuring data-tailored computation and algorithmic parallelism at the register-transfer level. Furthermore, a subgroup of PLDs, field-programmable gate arrays ("FPGAs"), are dynamically reconfigurable. So, to improve AI runtime performance, I designed and open-sourced my hardware compiler: Innervator. Written entirely in VHDL-2008, Innervator takes any DNN's metadata and parameters (e.g., number of layers, neurons per layer, and their weights/biases), generating its synthesizable FPGA hardware description with the appropriate pipelining and batch processing. Innervator is entirely portable and vendor-independent. As a proof of concept, I used Innervator to implement a sample 8x8-pixel handwritten digit-recognizing neural network in a low-cost AMD Xilinx Artix-7(TM) FPGA @ 100 MHz. With 3 pipeline stages and 2 batches at about 67% LUT utilization, the Network achieved ~7.12 GOP/s, predicting the output in 630 ns and under 0.25 W of power. In comparison, an Intel(R) Core(TM) i7-12700H CPU @ 4.70 GHz would take 40,000-60,000 ns at 45 to 115 W. Ultimately, Innervator's hardware-accelerated approach bridges the inherent mismatch between current AI algorithms and the general-purpose digital hardware they run on.

Technical Paper

For academic researchers, I also wrote a citable technical paper that describes Innervator. (IEEE TechArxiv)

Notice

Although the Abstract specifically talks about an image-recognizing neural network, I endeavoured to generalize Innervator: in practice, it is capable of implementing any number of neurons and layers, and in any possible application (e.g., speech recognition), not just imagery. In the ./data folder, you will find weight and bias parameters that will be used during Innervator's synthesis. Because of the incredibly broken implementation of VHDL's std.textio library across most synthesis tools, I was limited to only reading std_logic_vectors from files; due to that, weights and biases had to be pre-formatted in a fixed-point representation. (More information is available in file_parser.vhd.

The VHDL code itself has been very throughly documented; because I was a novice to VHDL, AI, and FPGA design myself, I documented each step as if it was a beginner's tutorial. Also, you may find these overview slides of the Project useful.

Interestingly, even though I was completely new to the world of hardware design, I still found the toolchain (and even VHDL itself) in a very unstable and buggy state; in fact, throughout this project, I found and documented dozens of different bugs, some of which were new and reported to IEEE and Xilinx: * VHDL Language Inconsistency in Ports * VHDL Language Enhancement * Bug in Vivado's file_open() * Bug in Vivado's read() * GitHub VHDL Syntax Highlighter * Synopsys Synplify p2019's Parser Breaks on VHDL-2019 syntax

Nomenclature

To innervate means "to supply something with nerves."

Innervator is, aptly, an implementer of artificial neural networks within Programmable Logic Devices.

Furthermore, these hardware-based neural networks could be named "Innervated Neural Networks," which also appears as INN in INNervator.

Foreword

Prior to starting this project, I had no experience or training with artificial intelligence ("AI"), electrical engineering, or hardware design;
Hardware design is a complex field—an "unlearn" of computer science; and
Combining the two ideas, AI & hardware, transformed this project into a unique proof of concept.

Synopsis

Inspired by biological brains, AI neural networks are modeled in mathematical formulae that are inherently concurrent;
AI applications are widespread but suffer from general-purpose computer processors that execute algorithms in step-by-step sequences; and
Programmable Logic Devices ("PLDs") allow for digital circuitry to be predesigned for data-tailored and massively parallelized operations

Build Instructions

[TODO: Create a TCL script and makefile to automate this.]

To ensure maximal compatibility, I tested Innervator across both Xilinx Vivado 2024's synthesizer (not simulator) and Mentor Graphics ModelSim 2016's simulator; the code itself was written using a subset of VHDL-2008, without any other language involved. Additionally, absolutely no vendor-specific libraries were used in Innervator's design; only the official std and IEEE VHDL packages were utilized.

Because I developed Innervator on a small, entry-level FPGA board (i.e., Digilent Arty A7-35T), I faced many challenges in regard to logic resource usage and timing failures; however, this also ensured that Innervator would become very portable and resource-efficient.

In the ./src/config.vhd file, you will be able to fine-tune Innervator to your liking; almost everything is customizable and generic, down to the polarization/synchronization of reset, fixed-point types' widths, and neurons' batch processing size or pipeline stages.

Hardware Demo (Arty A7-35T)

I used the four LEDs to "transmit" the network's prediction (i.e., resulting digit in this case); but the same UART interface could later be used to also transmit it back to the computer.

https://github.com/Thraetaona/Innervator/assets/42461518/52132598-aac0-4532-85c9-bce6e69aa214

(Note: The "delay" you see between the command prompt and FPGA is primarily due to the UART speed; the actual neural network itself takes ~1000 nanoseconds to process its input.)

Simulation

(Note: This was an old simulation run; in the current version, the same digit was predicted with a %70+ accuracy.)

The sample network that was used in said simulation:

Statistics (Artix-7 35T FPGA)

Excluding the periphals (e.g., UART, button debouncer, etc.) and given a network with an input and 2 neural layers (64 inputs, 20 hidden neurons, and 10 output neurons), 4 bits for integral and 4 bits for fractional widths of fixed-point numerals, batch processing of 1 and 2 (i.e., one/two DSP for each neuron), and 3 pipeline stages; Innervator consumed the following resources:

|Resource|Utilization (1)|Utilization (2)|Total Availability| |:-|:-|:-|:-| | Logic LUT | 10,233 | 13,949 | 20,800| | Sliced Reg. | 13,954 | 22,145 | 41,600 | | F7 Mux. | 620 | 1,440 | 16,300 | | Slice | 3,775 | 6,115 | 8,150 | | DSP | 30 | 60 | 90 | | Speed (ns) | 1,030 | 639 | N/A |

Timing reports were also great; the Worst Negative Slack (WNS) was 1.252 ns, without aggressive synthesis optimizations, given a 100 MHz clock. Lastly, on the same FPGA and with two pipeline stages, the number of giga-operations per second was 7.12 GOP/s (calculations in the technical paper), and the total on-chip power draw was 0.189 W.

Prediction Acuracy Falloff (vs. CPU/floating-point)

| Digit | FPGA | CPU |:-|:-|:-| | 0 | .30468800 | .10168505 | | 1 | .57812500 | .15610851 | | 2 | .50781300 | .14220775 | | 3 | .21875000 | .19579356 | | 4 | .00390625 | .00119471 | | 5 | .20703100 | .01840737 | | 6 | .21484400 | .00273704 | | 7 | .13281300 | .09511474 | | 8 | .24218800 | .15363488 | | 9 | .69921900 | .71728650 | | Speed (ns) | 630 | 40k--60k |

Owner

Name: Fereydoun Memarzanjany
Login: Thraetaona
Kind: user

Website: https://Thraetaona.github.io/
Repositories: 4
Profile: https://github.com/Thraetaona

Freelancer Programmer interested in low-level Languages, Embedded Systems, Machine Learning & AI.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you wish to cite this hardware design, please do so as noted below:"
authors:
- family-names: "Memarzanjany"
  given-names: "Fereydoun"
  orcid: "https://orcid.org/0000-0003-1393-6804"
title: "Innervator"
version: 1.0.0
doi: 10.5281/zenodo.12712831
date-released: 2024-06-05
url: "https://github.com/Thraetaona/Innervator"
preferred-citation:
  type: report
  authors:
  - family-names: "Memarzanjany"
    given-names: "Fereydoun"
    orcid: "https://orcid.org/0000-0003-1393-6804"
  doi: "10.36227/techrxiv.172263165.56660174/v1"
  journal: "IEEE TechArxiv"
  month: 8
  start: 1 # First page number
  end: 6 # Last page number
  title: "Innervator: Hardware Acceleration for Neural Networks"
  issue: 1
  volume: 1
  year: 2024

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

innervator

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

`Innervator`

Technical Paper (IEEE TechArxiv) | Presentation Slides | Repository DOI

Abstract

Technical Paper

Notice

Nomenclature

Foreword

Synopsis

Build Instructions

Hardware Demo (Arty A7-35T)

(Note: The "delay" you see between the command prompt and FPGA is primarily due to the UART speed; the actual neural network itself takes ~1000 nanoseconds to process its input.)

Simulation

(Note: This was an old simulation run; in the current version, the same digit was predicted with a %70+ accuracy.)

The sample network that was used in said simulation:

Statistics (Artix-7 35T FPGA)

Prediction Acuracy Falloff (vs. CPU/floating-point)

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

innervator

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Innervator

*Technical Paper (IEEE TechArxiv)* | Presentation Slides | Repository DOI

Abstract

Technical Paper

Notice

Nomenclature

Foreword

Synopsis

Build Instructions

Hardware Demo (Arty A7-35T)

(Note: The "delay" you see between the command prompt and FPGA is primarily due to the UART speed; the actual neural network itself takes ~1000 nanoseconds to process its input.)

Simulation

(Note: This was an old simulation run; in the current version, the same digit was predicted with a %70+ accuracy.)

The sample network that was used in said simulation:

Statistics (Artix-7 35T FPGA)

Prediction Acuracy Falloff (vs. CPU/floating-point)

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

`Innervator`

Technical Paper (IEEE TechArxiv) | Presentation Slides | Repository DOI