English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 42118601      線上人數 : 999
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/84223


    題名: 可重構深度神經網路加速器設計;Design of a Reconfigurable Deep Neural Network Accelerator
    作者: 田繹;Tien, Yi
    貢獻者: 電機工程學系
    關鍵詞: 硬體加速器;深度神經網路;可重構;Hardware Accelerator;Deep Neural Network;Reconfigurable;FPGA
    日期: 2020-08-20
    上傳時間: 2020-09-02 18:31:04 (UTC+8)
    出版者: 國立中央大學
    摘要: 深度卷積神經網路(DCNNs)被廣泛地用於人工智慧的應用,例如:物件辨識及影像分類等等。現今的深度卷積神經網路具有大量計算與大量數據的特性,為了在不同應用中符合對性能的要求,加速器被用來執行深度卷積神經網路的運算。在本論文中,我們根據以動態隨機存取記憶體(DRAM)儲存資料及使用加速器來執行計算的深度卷積神經網路推論系統,提出架構探索的方法。此方法以減少資料傳輸時間與計算時間的差異而定義出之加速器架構。加速器包含了數叢(clusters)之處理單元(PEs)、一個可重構之記憶體單元及一個控制器。交換器(switch)用以連接一叢處理單元陣列與可重構記憶體單元。可重構記憶體是由三個靜態隨機存取記憶體組合而成,每一靜態隨機存取記憶體可以調整其大小,以符合不同卷積層的記憶體需求。處理單元陣列與可重構記憶體之組態是由基於子層之參數選定流程(sublayer-based parameters decision flow)所決定。與現存之研究相比,本論文提出之加速器在卷積層及深度卷積神經網路各提升4.2%及17.4%的硬體利用率。我們根據提出之可重構加速器架構,在Xilinx ZCU-102開發板上實現了一個推論MobileNet V1的可重構加速器,此一加速器包含了1092KB的靜態隨機存取記憶體與四叢處理單元陣列,每一叢處理單元陣列包含了8個處理單元。實驗結果達到在150MHz的操作頻率下,此一加速器達到每秒1440億次計算及每秒推論40.1張圖片的效能。;Deep convolutional neural networks (DCNNs) are widely used for the artificial intelligence applications, e.g., object recognition and image classification. A modern DCNN model usually needs a huge amount of computations and data. To meet the performance requirement of applications, an accelerator is usually designed to execute the computation of DCNN.
    In this thesis, we consider a DCNN inference system using a DRAM to store data and an accelerator to execute the computation. An architecture exploration method based on the minimization of difference between DRAM data access time and computation time is proposed to define the architecture of accelerator. The accelerator consists of multiple clusters of processing elements (PEs), a reconfigurable memory unit, and a controller. A cluster of PEs is connected to the reconfigurable memory unit through a switch box. The reconfigurable memory unit consists of three static random access memories which sizes can be dynamically changed to fit the requirement of different convolutional layers. The configurations of PE array and reconfigurable memory are determined by sublayer-based parameters decision flow which can gain 4.2% and 17.4% increment of hardware resource utilization for convolutional layers and DCNN model in comparison with existing works. We implement the MobileNet V1 model in Xilinx ZCU-102 evaluation board using the proposed reconfigurable accelerator architecture with 1092KB SRAM and four PE clusters in which each cluster has 8 PEs. Ex-
    perimental results show that 144 GOPS and 40.1 FPS can be achieved under 100MHz clock rate.
    顯示於類別:[電機工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML128檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明