中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/90857
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 78852/78852 (100%)
Visitors : 38468672      Online Users : 286
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/90857


    Title: 深度神經網絡的邊緣優化增量學習;Edge-optimized Incremental Learning for Deep Neural Networks
    Authors: 歐海珊;Hussain, Muhammad Awais
    Contributors: 電機工程學系
    Keywords: 數位硬體設計;深度神經網路;增量學習;特殊應用積體電路實作(設計);Digital Hardware Design;DNN;Incremental Learning;ASIC Implementation
    Date: 2023-01-16
    Issue Date: 2023-05-09 18:10:49 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 增量學習技術旨在提高深度神經網絡 (DNN) 模型在預訓練 DNN 模型中添加新類別的能力。然而,DNN 在增量學習過程中會遭受災難性的遺忘。現有的增量學習技術需要以前的訓練樣本或較為複雜的模型架構來減少災難性遺忘。這導致高設計複雜性和記憶體儲存空間要求,使得增量學習演算法無法在記憶體儲存空間和計算資源有限的邊緣設備上實現。因此,在本文中,提出了一種片上增量學習(OCIL)加速器,它是一種軟硬體架構的協同設計,在邊緣裝置的DNN上實現高效能和高速的增量學習。 OCIL 採用新穎而簡單的增量學習演算法 Learning with Sharing (LwS) 來持續學習 DNN 模型中的新類別,同時將災難性遺忘降至最低。 LwS 可以保留現有數據類別的知識並添加新類別,而無需存儲來自先前類別的訓練樣本。 LwS 在 Cifar-100、Caltech-101 和 UCSD Birds 資料集上的準確度優於現今最先進的技術。 LwS 在執行增量學習時需要訓練全連接 (FC) 層且需要大量數據的移動,因此,對於 OCIL 的節能設計,通過一種新的最佳化記憶體存取方法來最小化訓練 FC 層的數據移動。最佳化的記憶體存取方法利用FC層反向傳播期間的數據重用來減少數據移動。最佳化後的記憶體存取方法實現了多個 DNN 模型的不同 FC 層的記憶體存取量減少達 1.45x-15.5x。由於最佳化後的記憶體存取方法,OCIL 在反向傳播階段實現了與 FC 層在處理前向傳播類似的高吞吐量。此外,用於錯誤/增量計算的最佳化記憶體存取方法統一了前向和後向傳遞的數據流,從而無需單獨的運算處理單元 (PE) 和複雜的數據控制器進行後向傳遞。 OCIL 採用 40 奈米技術工藝,時鐘頻率為 225 MHz,0.9V 時功耗為 168.8 mW。對於 32 位定點數,加速器可以實現 14.9 GOPs/mm2 的面積效率和 682.1 GOPs/W 的能效。;Incremental learning techniques aim to increase the capability of Deep Neural Network (DNN) model to add new classes in a pre-trained DNN model. However, DNNs suffer from catastrophic forgetting during the incremental learning process. Existing incremental learning techniques require either previous samples of data or complex model architectures to reduce catastrophic forgetting. This leads to high design complexity and memory requirements which make incremental learning algorithms infeasible to implement on edge devices that have limited memory and computation resources. So, in this thesis, an On-Chip Incremental Learning (OCIL) accelerator is presented, which is a co-design of software and hardware architecture for energy-efficient and high-speed incremental learning in DNNs on the edge. OCIL features a novel and simple incremental learning algorithm Learning with Sharing (LwS) to learn about new classes in the DNN model continuously with minimum catastrophic forgetting. LwS can preserve the knowledge of existing data classes and add new classes without storing data from the previous classes. LwS outperforms the state-of-the-art techniques in accuracy comparison for Cifar-100, Caltech-101, and UCSD Birds datasets. LwS requires a large number of data movements for the training of Fully Connected (FC) layers to perform incremental learning, so, for the energy-efficient design of OCIL, data movement is minimized by a novel optimized memory access method for the training of FC layers. The optimized memory access method exploits the data reuse during the backpropagation of FC layers to reduce data movements. The optimized memory access method achieves 1.45x-15.5x memory access reduction for different FC layers of multiple DNN models. OCIL achieves high throughput during the backpropagation stage similar to the forward propagation for processing of FC layers due to the optimized memory access method. Moreover, the optimized memory access method for error/delta calculation unifies the dataflow for the forward and backward passes that eliminates the need for separate Processing Elements (PEs) and complex data controllers for backward pass. OCIL has been implemented in 40-nm technology process and works at the clock rate of 225 MHz with 168.8 mW of power consumption at 0.9V. The accelerator can achieve an area efficiency of 14.9 GOPs/mm2 and an energy efficiency of 682.1 GOPs/W for 32-bits fixed-point numbers.
    Appears in Collections:[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML94View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明