基於HWCK資料排程之分離式卷積加速器設計與實現;Design and Implementation of a Separable Convolution Accelerator Based on HWCK Data Scheduling

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/84142

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/84142

题名:	基於HWCK資料排程之分離式卷積加速器設計與實現;Design and Implementation of a Separable Convolution Accelerator Based on HWCK Data Scheduling
作者:	許晉瑋;Hsu, Chin-Wei
贡献者:	電機工程學系
关键词:	硬體加速器;深度學習;現場可程式化邏輯閘陣列;系統單晶片;Hardware accelerator;Deep learning;Field Programmable Gate Array;System on a Chip
日期:	2020-07-20
上传时间:	2020-09-02 18:23:05 (UTC+8)
出版者:	國立中央大學
摘要:	近年來隨著GPU進步與大數據時代的來臨，深度學習給各領域帶來革命性的進展，從基本的影像前處理、影像切割技術、人臉辨識、語音辨識等，逐漸的取代了以往的傳統演算法，這說明了神經網路的興起已經帶動人工智慧的各項改革。但受限於GPU的功耗以及成本，其產品都極其昂貴，也因神經網路演算法龐大的計算量，須配合加速的硬體來進行實時運算，這促使近幾年來有不少研究是針對卷積網路的加速數位電路硬體設計。本論文提出基於HWCK資料排程之分離式卷積硬體架構設計，設計深度卷積(Depthwise convolution)、逐點卷積(Pointwise convolution)、正規化硬體架構(Batch Normalization)，來加速深度可分離卷積模型，可透過SoC設計，利用AXI4總線協議讓PS端(Processing System)與PL端(Programmable Logic)相互溝通，可以使CPU利用我們所開發之神經網路模組在FPGA上對神經網路進行加速。此HWCK資料排程方法，可根據所分配的記憶體頻寬資源以及內存資源進行重新配置，當頻寬與內存均足夠時，可以非常輕易的將此設計進行擴展。為了減少神經網路的權重參數，資料皆以定點數16-bit來進行運算與儲存，並以兵乓記憶體的架構來進行內存存取，且透過AXI4總線協議與CPU進行資料傳輸。整個硬體架構可實現在Xilinx ZCU106開發版上實現，藉由SoC設計，使用已預先編譯的驅動程式溝通作業系統與外部的資源，並同時控制所設計的神經網路加速模組，利用高階的程式語言來快速的重新配置神經網路加速的排程，提高硬體的重新配置能力，能在多種不同的嵌入式平台上實現此硬體架構設計，將此硬體架構運行FaceNet可以達到222FPS以及60.8GOPS，在Xilinx ZCU106開發版上只需要耗能8.82W，能達到6.89GOP/s/W的效能。 ;In recent years, deep learning technology becomes more popular because of the improvement of GPU and the advent of big data. The deep learning has brought revolutionary promotion in various fields. Most traditional algorithms are replaced by deep learning technologies such as basic pre-image processing, image segmentation, face recognition, speech recognition, etc. That shows the rise of the neural network has led to the reform of artificial intelligence. However, the neural network is limited by the power consumption and cost of the GPU, its products are extremely expensive. Due to the large amount of computation of the neural network, the neural network has to be used with the hardware accelerator for real-time computing. The problem of the computation of the neural network has promoted a lot of research for convolution network accelerator digital circuit hardware design. This paper proposed a design and implementation of a separable convolution accelerator based on HWCK data scheduling. It can be used to accelerate the deep separable convolution model by the design of the deepwise convolution, pointwise convolution, and the batch normalization. The proposed system can be through the SoC design to let the PS (Processing System) and PL (Programmable Logic) communicated with each other by using the AXI4 bus protocol, so our proposed design can be used when the CPU needs to accelerate the neural network. This HWCK data scheduling method can be reconfigured by the allocated memory and the bandwidth resource on the DDR4 and can be easily extended our design when the bandwidth and memory are sufficient. To reduce the weight parameter of the neural network, the data are calculated and stored with a 16bits fixed-point. The memory access is carried out with the architecture of ping-pong memory, it can transmit the data through the AXI4 bus protocol. The while hardware design architecture can be implemented on the Xilinx ZCU106 development board. The SoC design which using a precompiled driver to communicate operating systems and external resources, and control the design of the neural network acceleration module on FPGA. The higher program language to quickly reconfigure the network schedule, it can improve the hardware reconfigurable ability. This hardware architecture can reach 222FPS and 60.8GOPS by running FaceNet. The energy consumption on the Xilinx ZCU106 board is 8.82W, it has 6.89GOP/s/W performance.
显示于类别:	[電機工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	140	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....