應用於小晶片之深度神經網路加速器架構探索

DC 欄位	值	語言
DC.contributor	電機工程學系	zh_TW
DC.creator	邱皓珩	zh_TW
DC.creator	Hao-Heng Qiu	en_US
dc.date.accessioned	2022-9-26T07:39:07Z
dc.date.available	2022-9-26T07:39:07Z
dc.date.issued	2022
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107521129
dc.contributor.department	電機工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	深度神經網絡(DNN)被廣泛地應用於人工智慧(AI)的領域，例如物件辨識及圖像分類等等。目前的深度神經網路模型通常需要大量的資料計算。為了在不同應用中滿足對效能的需求，加速器通常被用來實現深度神經網路的推論（inference）。在本文中，我們提出了一個基於小晶片(chiplet)設計方法的深度神經網路加速器架構。此架構由一個基底晶片(base die)和具有可擴展性的多個計算晶片(compute die)組成。基底晶片由靜態隨機存取記憶體（SRAM）和控制單元組成，用來處理外部動態隨機存取記憶體（DRAM）及計算晶片之間的資料傳輸。計算晶片由靜態隨機存取記憶體及處理單元（processing element, PE）組成。我們亦根據此架構提出設計空間探索(design space exploration, DSE)的方法，用來探索在資料頻寬和端到端延遲（end-to-end latency）的限制條件下，基底晶片及計算晶片可能的設計選擇。探索的結果顯示，缺陷密度（defect density, 1/mm2）及接合良率（bonding yield）是影響計算晶片的顆粒度（granularity）的主要因素。考慮在動態隨機存取記憶體頻寬為25.6 GB/s及基底晶片的引腳（I/O）數目為4096 及端到端延遲為12毫秒的情況下實現ResNet-50的推論，由一個基底晶片及兩個計算晶片組成的系統可以達到最低的製造成本。當缺陷密度提高時，將計算晶片切割成更多的數量可以得到成本降低的回報；當接合良率下降時，將計算晶片切割成較少的數量可以有效地降低成本。為了驗證所提出的基於小晶片設計方法的深度神經網路加速器架構，我們在Xilinx ZCU-102開發板上實現了一個用於MobileNet推論的加速器。此加速器由一個基底晶片及一個計算晶片所組成。實驗結果顯示，在100MHz的操作頻率下，此加速器可以達到25ms的端到端延遲。	zh_TW
dc.description.abstract	Deep neural network (DNN) is widely used in artiﬁcial intelligence (AI) applications, e.g., object detection and image classiﬁcation. A modern DNN model usually needs a large amount of computation. To meet the performance requirement of applications, an accelerator is usually designed for DNN inference. In this thesis, we consider a DNN accelerator realized by using the chiplet-based method. A chiplet-based DNN accelerator architecture is proposed, which consists of a base die and multiple compute dies for scalability. The base die is composed of SRAM buﬀers and controllers for handling the data transportation between the external dynamic random access memory (DRAM) and the compute dies. The compute die consists of memory units (SRAM) and compute units (processing element, PE). A design space exploration is proposed to explore possible design selections of the base die and compute dies under the constraints of data bandwidth and the end-to-end latency. The exploration results show that the defect density and the bonding yield are the dominant factors for the granularity of the compute dies. For realizing the ResNet-50 model under the constraints of 25.6 GB/s DRAM bandwidth, 4096 IOs of the base die, and 12 ms latency, two compute dies can provide the minimal fabrication cost. Partitioning with more compute dies pays oﬀ when the defect density increases; for decreasing bonding yield, partitioning with fewer compute dies lowers the cost. To verify the proposed chiplet-based DNN accelerator architecture, we implemented the chiplet-based DNN accelerator for MobileNet inference using Xilinx ZCU-102 evaluation board. The chiplet-based DNN accelerator is architectured with one base die and one compute die. The implementation results show that the 25 ms end-to-end latency can be achieved using 100MHz operation frequency.	en_US
DC.subject	深度神經網絡	zh_TW
DC.subject	加速器	zh_TW
DC.subject	小晶片	zh_TW
DC.subject	設計空間探索	zh_TW
DC.subject	Deep Neural Network	en_US
DC.subject	accelerator	en_US
DC.subject	chiplet	en_US
DC.subject	design space exploration	en_US
DC.title	應用於小晶片之深度神經網路加速器架構探索	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Chiplet-based Deep Neural Network Accelerator Architecture and Exploration	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 107521129 完整後設資料紀錄