中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/84223
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 38478904      Online Users : 236
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/84223


    Title: 可重構深度神經網路加速器設計;Design of a Reconfigurable Deep Neural Network Accelerator
    Authors: 田繹;Tien, Yi
    Contributors: 電機工程學系
    Keywords: 硬體加速器;深度神經網路;可重構;Hardware Accelerator;Deep Neural Network;Reconfigurable;FPGA
    Date: 2020-08-20
    Issue Date: 2020-09-02 18:31:04 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 深度卷積神經網路(DCNNs)被廣泛地用於人工智慧的應用,例如:物件辨識及影像分類等等。現今的深度卷積神經網路具有大量計算與大量數據的特性,為了在不同應用中符合對性能的要求,加速器被用來執行深度卷積神經網路的運算。在本論文中,我們根據以動態隨機存取記憶體(DRAM)儲存資料及使用加速器來執行計算的深度卷積神經網路推論系統,提出架構探索的方法。此方法以減少資料傳輸時間與計算時間的差異而定義出之加速器架構。加速器包含了數叢(clusters)之處理單元(PEs)、一個可重構之記憶體單元及一個控制器。交換器(switch)用以連接一叢處理單元陣列與可重構記憶體單元。可重構記憶體是由三個靜態隨機存取記憶體組合而成,每一靜態隨機存取記憶體可以調整其大小,以符合不同卷積層的記憶體需求。處理單元陣列與可重構記憶體之組態是由基於子層之參數選定流程(sublayer-based parameters decision flow)所決定。與現存之研究相比,本論文提出之加速器在卷積層及深度卷積神經網路各提升4.2%及17.4%的硬體利用率。我們根據提出之可重構加速器架構,在Xilinx ZCU-102開發板上實現了一個推論MobileNet V1的可重構加速器,此一加速器包含了1092KB的靜態隨機存取記憶體與四叢處理單元陣列,每一叢處理單元陣列包含了8個處理單元。實驗結果達到在150MHz的操作頻率下,此一加速器達到每秒1440億次計算及每秒推論40.1張圖片的效能。;Deep convolutional neural networks (DCNNs) are widely used for the artificial intelligence applications, e.g., object recognition and image classification. A modern DCNN model usually needs a huge amount of computations and data. To meet the performance requirement of applications, an accelerator is usually designed to execute the computation of DCNN.
    In this thesis, we consider a DCNN inference system using a DRAM to store data and an accelerator to execute the computation. An architecture exploration method based on the minimization of difference between DRAM data access time and computation time is proposed to define the architecture of accelerator. The accelerator consists of multiple clusters of processing elements (PEs), a reconfigurable memory unit, and a controller. A cluster of PEs is connected to the reconfigurable memory unit through a switch box. The reconfigurable memory unit consists of three static random access memories which sizes can be dynamically changed to fit the requirement of different convolutional layers. The configurations of PE array and reconfigurable memory are determined by sublayer-based parameters decision flow which can gain 4.2% and 17.4% increment of hardware resource utilization for convolutional layers and DCNN model in comparison with existing works. We implement the MobileNet V1 model in Xilinx ZCU-102 evaluation board using the proposed reconfigurable accelerator architecture with 1092KB SRAM and four PE clusters in which each cluster has 8 PEs. Ex-
    perimental results show that 144 GOPS and 40.1 FPS can be achieved under 100MHz clock rate.
    Appears in Collections:[電機工程研究所] 博碩士論文

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML161View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明