基於FPGA的可重構神經網路加速器應用於Tiny YOLO-V3

DC 欄位	值	語言
DC.contributor	電機工程學系	zh_TW
DC.creator	童迺婕	zh_TW
DC.creator	Nai-Chieh Tung	en_US
dc.date.accessioned	2022-4-19T07:39:07Z
dc.date.available	2022-4-19T07:39:07Z
dc.date.issued	2022
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107521050
dc.contributor.department	電機工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	近年來，深度學習的發展日漸蓬勃，隨著硬體設備的更新與進步，神經網路能夠處理的事項也越來越多，更滲透進我們的生活中，小至平時手機的解鎖功能，大至現在的智能客服、聊天機器人，這些都逐漸將傳統演算法的方法替代掉。然而，神經網路有著包含參數量與計算量龐大的問題，且需要在GPU或是有CUDA加速的嵌入式開發版上執行。所以近幾年關於能夠加速其神經網路的硬體架構與方法越來越多。本論文提出可重構的硬體架構設計，可以針對不同輸入圖像大小、不同卷積核大小以及不同的步長來對應進行計算，分別設計深度卷積(Depth-wise Convolution)、逐點卷積(Point-wise Convolution)、一般卷積、正規化(Batch-Normalization)、激活函數(Activation Function)以及最大池化層(Max-pooling)，進而加速我們的神經網路；我們選擇透過SoC的方式，PL(Programming Logic)端使用AXI總線協議來與PS(Processing System)端溝通，由PS端處理資料傳輸跟排序，PL端處理所有運算。此架構可以根據輸入的指令來選擇目前對應的情況進行計算，且只要其他神經網路模型使用的函數有包含此架構可支援的計算，都可以利用這個可重構架構進行加速；因為內部記憶體的資源量較少，為了減少資料傳輸及溝通的次數，我們選擇在硬體部分直接完成zero-padding，讓內部記憶體可以儲存較多的輸入資料。最後我們在Xilinx ZCU104開發版上實現Tiny YOLO-V3，由CMOS鏡頭將影像輸入至FPGA，物件偵測的結果包含輸入影像、偵測框、分類結果以及機率經由HDMI顯示於螢幕上，在此開發版達到25.6GOPs且只需耗能4.959W，達到5.16GOPs/W的效能。	zh_TW
dc.description.abstract	In recent year, the development of deep learning has been grown. According to the equipment updated and advanced, neural network could be handle has more and more things. Even, neural network has been through our lives. From the unlocking function of cellphones to the smart customer service and chat robot, these have replaced the traditional algorithmic method. But neural network has huge amount of parameter and calculations, and its need to execute on GPU or an embedded development board with CUDA acceleration. Therefore, the method and hardware architecture that can accelerate the neural network has become main issue. This paper proposed a reconfigurable hardware architecture which can be calculated for different input image size, different kernel size and different stride size. This architecture includes depthwise convolution, pointwise convolution, convolution, batch-normalization, activation function and max-pooling to accelerate neural network. The proposed system uses SoC design to communicate through the AXI bus protocol between Programming Logic and Processing System. The PS part deal with the data transfer and data sorting and the PL part handle all the calculations. Reconfigurable hardware architecture can select corresponding situation for calculation according to the instruction, and other neural network can use this architecture for acceleration if the function of neural network is supported by this architecture. Because the on-chip memory is less, to reduce the number of data transmission and communication, we choose to directly complete zero-padding in the hardware part. So, the on-chip memory can store more input data. Finally, we implement Tiny YOLO-V3 on the Xilinx ZCU104 development board. The input data transfer to FPGA by CMOS camera. Then the result of object detection which include input image, bounding box, classification and probability will show on the monitor through the HDMI. Our design can achieve 25.6 GOPs and the power consumption only 4.959W and the performance is 5.16 GOPs/W on ZCU104.	en_US
DC.subject	物件偵測	zh_TW
DC.subject	可重構	zh_TW
DC.subject	硬體架構	zh_TW
DC.subject	FPGA	zh_TW
DC.title	基於FPGA的可重構神經網路加速器應用於Tiny YOLO-V3	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	An FPGA-Based Reconfigurable Convolutional Neural Network Accelerator for Tiny YOLO-V3	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 107521050 完整後設資料紀錄