博碩士論文 107521050 完整後設資料紀錄

DC 欄位 語言
DC.contributor電機工程學系zh_TW
DC.creator童迺婕zh_TW
DC.creatorNai-Chieh Tungen_US
dc.date.accessioned2022-4-19T07:39:07Z
dc.date.available2022-4-19T07:39:07Z
dc.date.issued2022
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107521050
dc.contributor.department電機工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract近年來,深度學習的發展日漸蓬勃,隨著硬體設備的更新與進步,神經網路能夠處理的事項也越來越多,更滲透進我們的生活中,小至平時手機的解鎖功能,大至現在的智能客服、聊天機器人,這些都逐漸將傳統演算法的方法替代掉。然而,神經網路有著包含參數量與計算量龐大的問題,且需要在GPU或是有CUDA加速的嵌入式開發版上執行。所以近幾年關於能夠加速其神經網路的硬體架構與方法越來越多。 本論文提出可重構的硬體架構設計,可以針對不同輸入圖像大小、不同卷積核大小以及不同的步長來對應進行計算,分別設計深度卷積(Depth-wise Convolution)、逐點卷積(Point-wise Convolution)、一般卷積、正規化(Batch-Normalization)、激活函數(Activation Function)以及最大池化層(Max-pooling),進而加速我們的神經網路;我們選擇透過SoC的方式,PL(Programming Logic)端使用AXI總線協議來與PS(Processing System)端溝通,由PS端處理資料傳輸跟排序,PL端處理所有運算。此架構可以根據輸入的指令來選擇目前對應的情況進行計算,且只要其他神經網路模型使用的函數有包含此架構可支援的計算,都可以利用這個可重構架構進行加速;因為內部記憶體的資源量較少,為了減少資料傳輸及溝通的次數,我們選擇在硬體部分直接完成zero-padding,讓內部記憶體可以儲存較多的輸入資料。最後我們在Xilinx ZCU104開發版上實現Tiny YOLO-V3,由CMOS鏡頭將影像輸入至FPGA,物件偵測的結果包含輸入影像、偵測框、分類結果以及機率經由HDMI顯示於螢幕上,在此開發版達到25.6GOPs且只需耗能4.959W,達到5.16GOPs/W的效能。zh_TW
dc.description.abstractIn recent year, the development of deep learning has been grown. According to the equipment updated and advanced, neural network could be handle has more and more things. Even, neural network has been through our lives. From the unlocking function of cellphones to the smart customer service and chat robot, these have replaced the traditional algorithmic method. But neural network has huge amount of parameter and calculations, and its need to execute on GPU or an embedded development board with CUDA acceleration. Therefore, the method and hardware architecture that can accelerate the neural network has become main issue. This paper proposed a reconfigurable hardware architecture which can be calculated for different input image size, different kernel size and different stride size. This architecture includes depthwise convolution, pointwise convolution, convolution, batch-normalization, activation function and max-pooling to accelerate neural network. The proposed system uses SoC design to communicate through the AXI bus protocol between Programming Logic and Processing System. The PS part deal with the data transfer and data sorting and the PL part handle all the calculations. Reconfigurable hardware architecture can select corresponding situation for calculation according to the instruction, and other neural network can use this architecture for acceleration if the function of neural network is supported by this architecture. Because the on-chip memory is less, to reduce the number of data transmission and communication, we choose to directly complete zero-padding in the hardware part. So, the on-chip memory can store more input data. Finally, we implement Tiny YOLO-V3 on the Xilinx ZCU104 development board. The input data transfer to FPGA by CMOS camera. Then the result of object detection which include input image, bounding box, classification and probability will show on the monitor through the HDMI. Our design can achieve 25.6 GOPs and the power consumption only 4.959W and the performance is 5.16 GOPs/W on ZCU104.en_US
DC.subject物件偵測zh_TW
DC.subject可重構zh_TW
DC.subject硬體架構zh_TW
DC.subjectFPGAzh_TW
DC.title基於FPGA的可重構神經網路加速器應用於Tiny YOLO-V3zh_TW
dc.language.isozh-TWzh-TW
DC.titleAn FPGA-Based Reconfigurable Convolutional Neural Network Accelerator for Tiny YOLO-V3en_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明