MindMap Gallery FAQ-CNN_Embedding for quantized convolutional neural networks
FAQ-CNN_For the embedding of quantized convolutional neural networks, experimental results show that FAQ-CNN can support relevant researchers to quickly build quantized CNN accelerators, and has good guiding significance and research value in fields such as deep learning and heterogeneous computing. .
Edited at 2023-04-26 21:37:01One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
FAQ-CNN: Embedded FPGA acceleration framework for quantized convolutional neural networks (Computer Research and Development)
the term
programmable logic array FPGA
(field programable gate array)
FAQ-CNN framework design
1) FAQ-CNN architecture
① Quantitative component
module op
Multiply accumulate (MAC) operations involved in quantization algorithms
Module quantization
Numerical mapping of quantization algorithms
② Data engine
In order to solve the complexity of organizing low-width data into high-width data. A data engine that supports parallel reading and writing is designed to realize parallel reading and writing of multiple data in a single clock cycle, and alleviate the contradiction between the rate mismatch between data transmission and data calculation.
Contains two modules: encoder and decoder
③ On-chip cache
Store input feature maps, output feature maps and model weights
④ Command unit
Responsible for solving problems according to pre-defined instruction rules
Analysis model configuration parameters
Instructions: input channel, output channel, input feature map height, input feature map width, convolution kernel size, convolution step size, convolution padding and calculation type.
①Convolution layer
② Pooling layer ③ Activation layer ④ Fully connected layer
⑤ Calculation engine
Two computing kernels handle respectively: convolutional layer calculation-intensive & fully connected layer communication-intensive
2) Data parallel computing: after tensor data is loaded into on-chip memory
loop unrolling
FAQ-CNN set of loop structure
convolution layer
4 outer loops process the convolution kernel and feature map of a single channel
tm & tn represent the slicing factor of the output and input feature maps in the channel dimension
Fully connected layer
Input and output are both 1D tensors
tm and tn represent the slicing factors of the output and input tensor module lengths respectively.
Arithmetic rules
The op module in the figure above defines operation rules and provides two op operations with different granularities.
Operator fusion
FAQ-CNN integrates activation and pooling operations directly into the post-processing stage of the convolutional layer or fully connected layer
3) Communication bandwidth optimization: Make full use of data transmission bandwidth resources
Hierarchical encoding & bit width independent encoding
① Two encoding methods:
a) FAQ-CNN uses bit-width independent coding to process weight data; b) uses hierarchical coding to process input feature map and output feature map data
② Parallel decoding: greatly improves data exchange performance
③ Burst transmission: It can be seen from the figure that the bandwidth peak will increase with the increase of bit width and burst transmission length. FAQ-CNN takes advantage of burst transmission by increasing the burst transmission length to further improve the efficiency of wide word transmission.
④ Transmission frequency: The operating frequency of the off-chip memory is set to 2 times or even higher than the FPGA on-chip memory I/O port frequency to support fast reading and writing
FAQ-CNN framework implementation
Quantitative method adaptation
On-chip resource model construction
Design space exploration
Experimental evaluation
Experimental setup
Software Environment
Vitis2020
C
Hardware environment
Xilinx ZCU102 SoC FPGA
Development board
The operating frequency of the FPGA is set to 200MHz
Evaluation indicators
1) Data transmission efficiency
2) On-chip resource utilization
3) Number of operations per second
① Encoding and decoding efficiency gain
② MAC operation resource consumption
③ Comparative analysis of the overall overhead and performance of the convolution layer
④ Resource allocation optimization and performance comparison
FAQ-CNN and related quantization accelerator performance comparison
Conditions: Caffeine and AccELB accelerator clock frequencies are both 200MHz
Conclusion: Under the low bit width 8b data configuration, FAQ-CNN makes full use of DSP resources and LUT logic resources to achieve a computing performance of 1229GOPS. Compared with Caffeine using 16b, the peak performance is increased to 3.6 times.
FAQ-Performance comparison between CNN and Caffeine in processing convolutional layers
Experimental results show that FAQ-CNN can support relevant researchers to quickly build quantitative CNN accelerators, and has good guiding significance and research value in fields such as deep learning and heterogeneous computing.