中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/90142
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 38468603      Online Users : 238
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/90142


    Title: 應用於小晶片之深度神經網路加速器架構探索;Chiplet-based Deep Neural Network Accelerator Architecture and Exploration
    Authors: 邱皓珩;Qiu, Hao-Heng
    Contributors: 電機工程學系
    Keywords: 深度神經網絡;加速器;小晶片;設計空間探索;Deep Neural Network;accelerator;chiplet;design space exploration
    Date: 2022-09-26
    Issue Date: 2022-10-04 12:12:12 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 深度神經網絡(DNN)被廣泛地應用於人工智慧(AI)的領域,例如物件辨識及圖像分類等等。目前的深度神經網路模型通常需要大量的資料計算。為了在不同應用中滿足對效能的需求,加速器通常被用來實現深度神經網路的推論(inference)。在本文中,我們提出了一個基於小晶片(chiplet)設計方法的深度神經網路加速器架構。此架構由一個基底晶片(base die)和具有可擴展性的多個計算晶片(compute die)組成。基底晶片由靜態隨機存取記憶體(SRAM)和控制單元組成,用來處理外部動態隨機存取記憶體(DRAM)及計算晶片之間的資料傳輸。計算晶片由靜態隨機存取記憶體及處理單元(processing element, PE)組成。我們亦根據此架構提出設計空間探索(design space exploration, DSE)的方法,用來探索在資料頻寬和端到端延遲(end-to-end latency)的限制條件下,基底晶片及計算晶片可能的設計選擇。探索的結果顯示,缺陷密度(defect density, 1/mm2)及接合良率(bonding yield)是影響計算晶片的顆粒度(granularity)的主要因素。考慮在動態隨機存取記憶體頻寬為25.6 GB/s及基底晶片的引腳(I/O)數目為4096 及端到端延遲為12毫秒的情況下實現ResNet-50的推論,由一個基底晶片及兩個計算晶片組成的系統可以達到最低的製造成本。當缺陷密度提高時,將計算晶片切割成更多的數量可以得到成本降低的回報;當接合良率下降時,將計算晶片切割成較少的數量可以有效地降低成本。為了驗證所提出的基於小晶片設計方法的深度神經網路加速器架構,我們在Xilinx ZCU-102開發板上實現了一個用於MobileNet推論的加速器。此加速器由一個基底晶片及一個計算晶片所組成。實驗結果顯示,在100MHz的操作頻率下,此加速器可以達到25ms的端到端延遲。;Deep neural network (DNN) is widely used in artificial intelligence (AI) applications, e.g., object detection and image classification. A modern DNN model usually needs a large amount of computation. To meet the performance requirement of applications, an accelerator is usually designed for DNN inference. In this thesis, we consider a DNN accelerator realized by using the chiplet-based method. A chiplet-based DNN accelerator architecture is proposed, which consists of a base die and multiple compute dies for scalability. The base die is composed of SRAM buffers and controllers for handling the data transportation between the external dynamic random access memory (DRAM) and the compute dies. The compute die consists of memory units (SRAM) and compute units (processing element, PE). A design space exploration is proposed to explore possible design selections of the base die and compute dies under the constraints of data bandwidth and the end-to-end latency. The exploration results show that the defect density and the bonding yield are the dominant factors for the granularity of the compute dies. For realizing the ResNet-50 model under the constraints of 25.6 GB/s DRAM bandwidth, 4096 IOs of the base die, and 12 ms latency, two compute dies can provide the minimal fabrication cost. Partitioning with more compute dies pays off when the defect density increases; for decreasing bonding yield, partitioning with fewer compute dies lowers the cost. To verify the proposed chiplet-based DNN accelerator architecture, we implemented the chiplet-based DNN accelerator for MobileNet inference using Xilinx ZCU-102 evaluation board. The chiplet-based DNN accelerator is architectured with one base die and one compute die. The implementation results show that the 25 ms end-to-end latency can be achieved using 100MHz operation frequency.
    Appears in Collections:[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML172View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明