English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 81570/81570 (100%)
造訪人次 : 47007547      線上人數 : 157
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/96319


    題名: 適用於多步幅卷積神經網路之比特級稀疏性感知壓縮情境及加速器架構設計;Bit-Level Sparsity-Aware Compression and Accelerator Architecture Design for Multi-Stride Convolutional Neural Networks
    作者: 黃筠茵;Huang, Yun-Yin
    貢獻者: 電機工程學系
    關鍵詞: 多步幅;比特級;稀疏性;加速器
    日期: 2024-11-11
    上傳時間: 2025-04-09 17:48:15 (UTC+8)
    出版者: 國立中央大學
    摘要: 深度神經網路(DNNs)對自動駕駛、計算機視覺和自然語言處理等應用至關重要。卷積神經網路(CNNs)特別適用於計算機視覺,但其推理過程需要對大量輸入特徵圖(feature maps)進行數百萬次操作,這對速度和能效提出了更高的要求。為了解決這些挑戰,過去的研究探索各種技術來優化DNNs和CNNs的性能,包括模型壓縮(model compression)、量化(quantization)和硬體加速器(hardware accelerators)等方法。這些技術不僅能降低模型的計算複雜度和功耗,還能提升其在實時應用中的效能,使得DNNs和CNNs在資源受限的環境中也能高效運行。
    過去的研究指出CNN特徵圖與權重的稀疏性(sparsity)會產生許多無效的乘法運算,因此去除零值運算進而提高運算效率是非常有效且不影響神經網路推論準確率的做法。然而,對稀疏數據進行壓縮會導致資料排列不再有規律性,特徵圖與權重匹配(feature map and weight matching)變成一個困難的問題。在過去的研究中,內積(inner product)的資料流是在執行卷積計算時最為經典的做法,但是內積的作法在考慮卷積中的稀疏性時,為了取出不規律排列的特徵圖與權重,其電路設計會有過多面積成本。笛卡爾積(Cartesian product)是在去除零值運算時相當有效率的資料流,但在不同的卷積步幅(stride)下進行乘加運算(MAC,multiply-and-accumulate)時,會有更多特徵圖與權重的匹配問題需要考慮。
    為了能利用稀疏性提高運算效率的同時最小化資料流帶來的負面效應,本篇論文提出了一種雙半字節稀疏感知步幅分解方案,該方案能有效地在非單位步幅情境中消除零值計算,同時考慮位級稀疏性(bit-level sparsity)以進一步提升推理效率。在本篇論文的方法中,首先將張量(tensors)分解為多個單位步幅張量,然後通過雙半字節表示法壓縮特徵圖,以減少數據量和計算負荷。實驗結果顯示,與傳統架構和最新的加速器相比,本篇論文在VGG16和MobileNetV1上的平均加速分別達到了約8倍和約1.4倍。
    ;Deep neural networks (DNNs) are crucial for applications such as autonomous driving, computer vision, and natural language processing. Convolutional neural networks (CNNs) are particularly well-suited for computer vision tasks, but their inference process requires millions of operations on a large number of input feature maps. This imposes higher demands on speed and energy efficiency. To address these challenges, past research has explored various techniques to optimize the performance of DNNs and CNNs, including model compression, quantization, and hardware accelerators. These techniques not only reduce the computational complexity and power consumption of the models but also enhance their performance in real-time applications, allowing DNNs and CNNs to operate efficiently even in resource-constrained environments.
    Previous studies have shown that the sparsity of CNN feature maps and weights results in many unnecessary multiplication operations. Therefore, eliminating zero-value operations to improve computational efficiency is an effective approach that does not compromise the accuracy of neural network inference. However, compressing sparse data can lead to irregular data arrangement, making feature map and weight matching a challenging problem. In traditional implementations, the inner product dataflow is the most classic method for performing convolution calculations. However, when considering sparsity in convolution operations, the design of the circuit to extract irregularly arranged feature maps and weights can incur excessive area costs. The Cartesian product dataflow is highly efficient in eliminating zero-value operations, but when performing multiply-and-accumulate (MAC) operations with different convolution strides, more feature map and weight matching issues need to be considered.
    In order to leverage sparsity to enhance computational efficiency while minimizing the negative impact of data flow, this work proposes a d Two-Nibble Sparsity-Aware Stride-Decomposition scheme. This approach effectively eliminates zero-value computations in non-unit stride scenarios while considering bit-level sparsity to further enhance inference efficiency. In this method, tensors are first decomposed into multiple unit stride tensors, and then feature maps are compressed using a two-nibble representation to reduce data volume and computational load. Experimental results show that compared to traditional architectures and state-of-the-art accelerators, the proposed method achieves average speedups of about 8 times and about 1.4 times on VGG16 and MobileNetV1, respectively.
    顯示於類別:[電機工程研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML21檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明