Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

EdrawMind

FAQ-CNN_Embedding for quantized convolutional neural networks

MindMap Gallery FAQ-CNN_Embedding for quantized convolutional neural networks

FAQ-CNN_Embedding for quantized convolutional neural networks

FAQ-CNN_For the embedding of quantized convolutional neural networks, experimental results show that FAQ-CNN can support relevant researchers to quickly build quantized CNN accelerators, and has good guiding significance and research value in fields such as deep learning and heterogeneous computing. .

Edited at 2023-04-26 21:37:01

PlotWizard

Recent works View more works>>

FAQ-CNN_Embedding for quantized convolutional neural networks

PlotWizard

Recent works View more works>>

Recommended to you
Outline

No relevant template

FAQ-CNN: Embedded FPGA acceleration framework for quantized convolutional neural networks (Computer Research and Development)

the term

programmable logic array FPGA

(field programable gate array)

FAQ-CNN framework design

1) FAQ-CNN architecture

① Quantitative component

module op

Multiply accumulate (MAC) operations involved in quantization algorithms

Module quantization

Numerical mapping of quantization algorithms

② Data engine

In order to solve the complexity of organizing low-width data into high-width data. A data engine that supports parallel reading and writing is designed to realize parallel reading and writing of multiple data in a single clock cycle, and alleviate the contradiction between the rate mismatch between data transmission and data calculation.

Contains two modules: encoder and decoder

③ On-chip cache

Store input feature maps, output feature maps and model weights

④ Command unit

Responsible for solving problems according to pre-defined instruction rules

Analysis model configuration parameters

Instructions: input channel, output channel, input feature map height, input feature map width, convolution kernel size, convolution step size, convolution padding and calculation type.

①Convolution layer

② Pooling layer ③ Activation layer ④ Fully connected layer

⑤ Calculation engine

Two computing kernels handle respectively: convolutional layer calculation-intensive & fully connected layer communication-intensive

2) Data parallel computing: after tensor data is loaded into on-chip memory

loop unrolling

FAQ-CNN set of loop structure

convolution layer

4 outer loops process the convolution kernel and feature map of a single channel

tm & tn represent the slicing factor of the output and input feature maps in the channel dimension

Fully connected layer

Input and output are both 1D tensors

tm and tn represent the slicing factors of the output and input tensor module lengths respectively.

Arithmetic rules

The op module in the figure above defines operation rules and provides two op operations with different granularities.

Operator fusion

FAQ-CNN integrates activation and pooling operations directly into the post-processing stage of the convolutional layer or fully connected layer

3) Communication bandwidth optimization: Make full use of data transmission bandwidth resources

Hierarchical encoding & bit width independent encoding

① Two encoding methods:

a) FAQ-CNN uses bit-width independent coding to process weight data; b) uses hierarchical coding to process input feature map and output feature map data

② Parallel decoding: greatly improves data exchange performance

③ Burst transmission: It can be seen from the figure that the bandwidth peak will increase with the increase of bit width and burst transmission length. FAQ-CNN takes advantage of burst transmission by increasing the burst transmission length to further improve the efficiency of wide word transmission.

④ Transmission frequency: The operating frequency of the off-chip memory is set to 2 times or even higher than the FPGA on-chip memory I/O port frequency to support fast reading and writing

FAQ-CNN framework implementation

Quantitative method adaptation

On-chip resource model construction

Design space exploration

Experimental evaluation

Experimental setup

Software Environment

Vitis2020

Hardware environment

Xilinx ZCU102 SoC FPGA

Development board

The operating frequency of the FPGA is set to 200MHz

Evaluation indicators

1) Data transmission efficiency

2) On-chip resource utilization

3) Number of operations per second

① Encoding and decoding efficiency gain

② MAC operation resource consumption

③ Comparative analysis of the overall overhead and performance of the convolution layer

④ Resource allocation optimization and performance comparison

FAQ-CNN and related quantization accelerator performance comparison

Conditions: Caffeine and AccELB accelerator clock frequencies are both 200MHz

Conclusion: Under the low bit width 8b data configuration, FAQ-CNN makes full use of DSP resources and LUT logic resources to achieve a computing performance of 1229GOPS. Compared with Caffeine using 16b, the peak performance is increased to 3.6 times.

FAQ-Performance comparison between CNN and Caffeine in processing convolutional layers

Experimental results show that FAQ-CNN can support relevant researchers to quickly build quantitative CNN accelerators, and has good guiding significance and research value in fields such as deep learning and heterogeneous computing.