基於HWCK資料排程之分離式卷積加速器設計與實現

DC 欄位	值	語言
DC.contributor	電機工程學系	zh_TW
DC.creator	許晉瑋	zh_TW
DC.creator	Chin-Wei Hsu	en_US
dc.date.accessioned	2020-7-20T07:39:07Z
dc.date.available	2020-7-20T07:39:07Z
dc.date.issued	2020
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=106521038
dc.contributor.department	電機工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	近年來隨著GPU進步與大數據時代的來臨，深度學習給各領域帶來革命性的進展，從基本的影像前處理、影像切割技術、人臉辨識、語音辨識等，逐漸的取代了以往的傳統演算法，這說明了神經網路的興起已經帶動人工智慧的各項改革。但受限於GPU的功耗以及成本，其產品都極其昂貴，也因神經網路演算法龐大的計算量，須配合加速的硬體來進行實時運算，這促使近幾年來有不少研究是針對卷積網路的加速數位電路硬體設計。本論文提出基於HWCK資料排程之分離式卷積硬體架構設計，設計深度卷積(Depthwise convolution)、逐點卷積(Pointwise convolution)、正規化硬體架構(Batch Normalization)，來加速深度可分離卷積模型，可透過SoC設計，利用AXI4總線協議讓PS端(Processing System)與PL端(Programmable Logic)相互溝通，可以使CPU利用我們所開發之神經網路模組在FPGA上對神經網路進行加速。此HWCK資料排程方法，可根據所分配的記憶體頻寬資源以及內存資源進行重新配置，當頻寬與內存均足夠時，可以非常輕易的將此設計進行擴展。為了減少神經網路的權重參數，資料皆以定點數16-bit來進行運算與儲存，並以兵乓記憶體的架構來進行內存存取，且透過AXI4總線協議與CPU進行資料傳輸。整個硬體架構可實現在Xilinx ZCU106開發版上實現，藉由SoC設計，使用已預先編譯的驅動程式溝通作業系統與外部的資源，並同時控制所設計的神經網路加速模組，利用高階的程式語言來快速的重新配置神經網路加速的排程，提高硬體的重新配置能力，能在多種不同的嵌入式平台上實現此硬體架構設計，將此硬體架構運行FaceNet可以達到222FPS以及60.8GOPS，在Xilinx ZCU106開發版上只需要耗能8.82W，能達到6.89GOP/s/W的效能。	zh_TW
dc.description.abstract	In recent years, deep learning technology becomes more popular because of the improvement of GPU and the advent of big data. The deep learning has brought revolutionary promotion in various fields. Most traditional algorithms are replaced by deep learning technologies such as basic pre-image processing, image segmentation, face recognition, speech recognition, etc. That shows the rise of the neural network has led to the reform of artificial intelligence. However, the neural network is limited by the power consumption and cost of the GPU, its products are extremely expensive. Due to the large amount of computation of the neural network, the neural network has to be used with the hardware accelerator for real-time computing. The problem of the computation of the neural network has promoted a lot of research for convolution network accelerator digital circuit hardware design. This paper proposed a design and implementation of a separable convolution accelerator based on HWCK data scheduling. It can be used to accelerate the deep separable convolution model by the design of the deepwise convolution, pointwise convolution, and the batch normalization. The proposed system can be through the SoC design to let the PS (Processing System) and PL (Programmable Logic) communicated with each other by using the AXI4 bus protocol, so our proposed design can be used when the CPU needs to accelerate the neural network. This HWCK data scheduling method can be reconfigured by the allocated memory and the bandwidth resource on the DDR4 and can be easily extended our design when the bandwidth and memory are sufficient. To reduce the weight parameter of the neural network, the data are calculated and stored with a 16bits fixed-point. The memory access is carried out with the architecture of ping-pong memory, it can transmit the data through the AXI4 bus protocol. The while hardware design architecture can be implemented on the Xilinx ZCU106 development board. The SoC design which using a precompiled driver to communicate operating systems and external resources, and control the design of the neural network acceleration module on FPGA. The higher program language to quickly reconfigure the network schedule, it can improve the hardware reconfigurable ability. This hardware architecture can reach 222FPS and 60.8GOPS by running FaceNet. The energy consumption on the Xilinx ZCU106 board is 8.82W, it has 6.89GOP/s/W performance.	en_US
DC.subject	硬體加速器	zh_TW
DC.subject	深度學習	zh_TW
DC.subject	現場可程式化邏輯閘陣列	zh_TW
DC.subject	系統單晶片	zh_TW
DC.subject	Hardware accelerator	en_US
DC.subject	Deep learning	en_US
DC.subject	Field Programmable Gate Array	en_US
DC.subject	System on a Chip	en_US
DC.title	基於HWCK資料排程之分離式卷積加速器設計與實現	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Design and Implementation of a Separable Convolution Accelerator Based on HWCK Data Scheduling	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 106521038 完整後設資料紀錄