應用於深度類神經網路加速系統之層融合能耗減低技術透過最小化動態隨機記憶體存取;Layer-Fusing Energy Reduction Techniques for Deep Neural Network Acceleration Systems by Minimizing DRAM Access

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/85119

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/85119

題名:	應用於深度類神經網路加速系統之層融合能耗減低技術透過最小化動態隨機記憶體存取;Layer-Fusing Energy Reduction Techniques for Deep Neural Network Acceleration Systems by Minimizing DRAM Access
作者:	戴勝澤;Tai, Sheng-Tse
貢獻者:	電機工程學系
關鍵詞:	類神經網路加速系統;最小化動態隨機記憶體存取;acceleration system;Minimize DRAM Access
日期:	2021-01-28
上傳時間:	2021-03-18 17:42:46 (UTC+8)
出版者:	國立中央大學
摘要:	近年來，深度神經網絡（DNN）已被廣泛使用於人工智能應用上。 DNN 加速系統通常會使用動態隨機存取記憶體 (DRAM) 來儲存資料，而運算會由一個加速器負責。然而，存取DRAM 所消耗的能量通常占了DNN 加速系統的大部分能量，在本文中，我們提出一個適應性融合層方法（ALFA）藉由最小化DRAM 存取的數量來降低整個加速系統的能量消耗。ALFA 在給定的融合層中的每一層適應性地最大化重複利用輸入特徵圖(input feature map)、權重(weight)或輸出特徵圖(output feature map)來找到能夠有最小DRAM 存取數量的組合。分析結果顯示如果加速器中的記憶體(on-chip buffer size)為128 KB 且用於融合AlexNet 的第1 層到第4 層時，ALFA 可以比[1]中報告的方法減少27％的DRAM 存取數量。此外，我們還提出了系統化的方法來決定一個DNN 模型中有多少層需要用ALFA 來融合。分析結果顯示，如果加速器中的記憶體(on-chip buffer size)為128 KB 且應用於模型VGG16上，所提出的方法相較於採用[2]中報告的方法可減少34％DRAM存取數量。我們有設計一個可以支援ALFA 運算的加速器，加速器使用台積電40nm CMOS standard cell library 所合成的。加速器可在頻率為200 MHz 時使用256 個乘法器和256 個加法器達到峰值性能(peak performance)102.4 GOPS。另外，合成結果顯示出加速器的功耗和面積成本在頻率為200 MHz 時分別為195 mW 和5.214 mm2。;Deep neural network (DNN) has been widely used for the artificial intelligence applications. A DNN acceleration system typically consists of a dynamic random access memory (DRAM) for data buffering and an accelerator for the computation. However, the energy of DRAM typically consumes a significant portion of the energy of the DNN acceleration system. In this thesis, we propose an adaptive layer-fusing approach (ALFA) to reduce the energy consumption of DRAM by minimizing the amount of accesses. The ALFA adaptively maximizes the reuse of input feature map, weight, and output feature map in every layer of the given fused layers. Analysis results show that the ALFA can achieve 27% reduction of DRAM access than the approach reported in [1] if 128 K-byte on-chip buffer is used for fusing convolution layers 1 to 4 of AlexNet. We also propose a systematic method to determine the number of layers fused by the ALFA for a DNN model. Analysis results show that the proposed method with the ALFA can achieve 34% reduction in DRAM access than the approach reported in [2] if 128 K-byte on-chip buffer is used for VGG16. An accelerator with the ALFA is designed and synthesized by using TSMC 40nm CMOS standard cell library. The accelerator can achieve 102.4 GOPS peak performance with 256 multipliers and 256 adders at 200 MHz. Also, synthesis results show that the power consumption and area cost of the accelerator are 195 mW and 5.214 mm2 at 200 MHz, respectively.
顯示於類別:	[電機工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	103	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....