基於FPGA的可重構神經網路加速器應用於Tiny YOLO-V3;An FPGA-Based Reconfigurable Convolutional Neural Network Accelerator for Tiny YOLO-V3

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/88372

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/88372

题名:	基於FPGA的可重構神經網路加速器應用於Tiny YOLO-V3;An FPGA-Based Reconfigurable Convolutional Neural Network Accelerator for Tiny YOLO-V3
作者:	童迺婕;Tung, Nai-Chieh
贡献者:	電機工程學系
关键词:	物件偵測;可重構;硬體架構;FPGA
日期:	2022-04-19
上传时间:	2022-07-14 00:48:53 (UTC+8)
出版者:	國立中央大學
摘要:	近年來，深度學習的發展日漸蓬勃，隨著硬體設備的更新與進步，神經網路能夠處理的事項也越來越多，更滲透進我們的生活中，小至平時手機的解鎖功能，大至現在的智能客服、聊天機器人，這些都逐漸將傳統演算法的方法替代掉。然而，神經網路有著包含參數量與計算量龐大的問題，且需要在GPU或是有CUDA加速的嵌入式開發版上執行。所以近幾年關於能夠加速其神經網路的硬體架構與方法越來越多。本論文提出可重構的硬體架構設計，可以針對不同輸入圖像大小、不同卷積核大小以及不同的步長來對應進行計算，分別設計深度卷積(Depth-wise Convolution)、逐點卷積(Point-wise Convolution)、一般卷積、正規化(Batch-Normalization)、激活函數(Activation Function)以及最大池化層(Max-pooling)，進而加速我們的神經網路；我們選擇透過SoC的方式，PL(Programming Logic)端使用AXI總線協議來與PS(Processing System)端溝通，由PS端處理資料傳輸跟排序，PL端處理所有運算。此架構可以根據輸入的指令來選擇目前對應的情況進行計算，且只要其他神經網路模型使用的函數有包含此架構可支援的計算，都可以利用這個可重構架構進行加速；因為內部記憶體的資源量較少，為了減少資料傳輸及溝通的次數，我們選擇在硬體部分直接完成zero-padding，讓內部記憶體可以儲存較多的輸入資料。最後我們在Xilinx ZCU104開發版上實現Tiny YOLO-V3，由CMOS鏡頭將影像輸入至FPGA，物件偵測的結果包含輸入影像、偵測框、分類結果以及機率經由HDMI顯示於螢幕上，在此開發版達到25.6GOPs且只需耗能4.959W，達到5.16GOPs/W的效能。 ;In recent year, the development of deep learning has been grown. According to the equipment updated and advanced, neural network could be handle has more and more things. Even, neural network has been through our lives. From the unlocking function of cellphones to the smart customer service and chat robot, these have replaced the traditional algorithmic method. But neural network has huge amount of parameter and calculations, and its need to execute on GPU or an embedded development board with CUDA acceleration. Therefore, the method and hardware architecture that can accelerate the neural network has become main issue. This paper proposed a reconfigurable hardware architecture which can be calculated for different input image size, different kernel size and different stride size. This architecture includes depthwise convolution, pointwise convolution, convolution, batch-normalization, activation function and max-pooling to accelerate neural network. The proposed system uses SoC design to communicate through the AXI bus protocol between Programming Logic and Processing System. The PS part deal with the data transfer and data sorting and the PL part handle all the calculations. Reconfigurable hardware architecture can select corresponding situation for calculation according to the instruction, and other neural network can use this architecture for acceleration if the function of neural network is supported by this architecture. Because the on-chip memory is less, to reduce the number of data transmission and communication, we choose to directly complete zero-padding in the hardware part. So, the on-chip memory can store more input data. Finally, we implement Tiny YOLO-V3 on the Xilinx ZCU104 development board. The input data transfer to FPGA by CMOS camera. Then the result of object detection which include input image, bounding box, classification and probability will show on the monitor through the HDMI. Our design can achieve 25.6 GOPs and the power consumption only 4.959W and the performance is 5.16 GOPs/W on ZCU104.
显示于类别:	[電機工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	56	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....