應用於深度類神經網路加速系統之層融合能耗減低技術透過最小化動態隨機記憶體存取

DC 欄位	值	語言
DC.contributor	電機工程學系	zh_TW
DC.creator	戴勝澤	zh_TW
DC.creator	Sheng-Tse Tai	en_US
dc.date.accessioned	2021-1-28T07:39:07Z
dc.date.available	2021-1-28T07:39:07Z
dc.date.issued	2021
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=106521034
dc.contributor.department	電機工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	近年來，深度神經網絡（DNN）已被廣泛使用於人工智能應用上。 DNN 加速系統通常會使用動態隨機存取記憶體 (DRAM) 來儲存資料，而運算會由一個加速器負責。然而，存取DRAM 所消耗的能量通常占了DNN 加速系統的大部分能量，在本文中，我們提出一個適應性融合層方法（ALFA）藉由最小化DRAM 存取的數量來降低整個加速系統的能量消耗。ALFA 在給定的融合層中的每一層適應性地最大化重複利用輸入特徵圖(input feature map)、權重(weight)或輸出特徵圖(output feature map)來找到能夠有最小DRAM 存取數量的組合。分析結果顯示如果加速器中的記憶體(on-chip buffer size)為128 KB 且用於融合AlexNet 的第1 層到第4 層時，ALFA 可以比[1]中報告的方法減少27％的DRAM 存取數量。此外，我們還提出了系統化的方法來決定一個DNN 模型中有多少層需要用ALFA 來融合。分析結果顯示，如果加速器中的記憶體(on-chip buffer size)為128 KB 且應用於模型VGG16上，所提出的方法相較於採用[2]中報告的方法可減少34％DRAM存取數量。我們有設計一個可以支援ALFA 運算的加速器，加速器使用台積電40nm CMOS standard cell library 所合成的。加速器可在頻率為200 MHz 時使用256 個乘法器和256 個加法器達到峰值性能(peak performance)102.4 GOPS。另外，合成結果顯示出加速器的功耗和面積成本在頻率為200 MHz 時分別為195 mW 和5.214 mm2。	zh_TW
dc.description.abstract	Deep neural network (DNN) has been widely used for the artificial intelligence applications. A DNN acceleration system typically consists of a dynamic random access memory (DRAM) for data buffering and an accelerator for the computation. However, the energy of DRAM typically consumes a significant portion of the energy of the DNN acceleration system. In this thesis, we propose an adaptive layer-fusing approach (ALFA) to reduce the energy consumption of DRAM by minimizing the amount of accesses. The ALFA adaptively maximizes the reuse of input feature map, weight, and output feature map in every layer of the given fused layers. Analysis results show that the ALFA can achieve 27% reduction of DRAM access than the approach reported in [1] if 128 K-byte on-chip buffer is used for fusing convolution layers 1 to 4 of AlexNet. We also propose a systematic method to determine the number of layers fused by the ALFA for a DNN model. Analysis results show that the proposed method with the ALFA can achieve 34% reduction in DRAM access than the approach reported in [2] if 128 K-byte on-chip buffer is used for VGG16. An accelerator with the ALFA is designed and synthesized by using TSMC 40nm CMOS standard cell library. The accelerator can achieve 102.4 GOPS peak performance with 256 multipliers and 256 adders at 200 MHz. Also, synthesis results show that the power consumption and area cost of the accelerator are 195 mW and 5.214 mm2 at 200 MHz, respectively.	en_US
DC.subject	類神經網路加速系統	zh_TW
DC.subject	最小化動態隨機記憶體存取	zh_TW
DC.subject	acceleration system	en_US
DC.subject	Minimize DRAM Access	en_US
DC.title	應用於深度類神經網路加速系統之層融合能耗減低技術透過最小化動態隨機記憶體存取	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Layer-Fusing Energy Reduction Techniques for Deep Neural Network Acceleration Systems by Minimizing DRAM Access	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 106521034 完整後設資料紀錄