以作者查詢圖書館館藏 、以作者查詢臺灣博碩士 、以作者查詢全國書目 、勘誤回報 、線上人數:148 、訪客IP:3.22.181.148
姓名 柯蓉鈁(Jung-Fang Ke) 查詢紙本館藏 畢業系所 電機工程學系 論文名稱 一個全新考量稀疏性的資料映射策略以減緩 AI 加速器中的老化效應
(A Novel Sparsity Aware Data Mapping Strategy for Mitigating Aging Effects in Systolic-array-based AI Accelerator)相關論文 檔案 [Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] 至系統瀏覽論文 (2029-7-12以後開放) 摘要(中) 在人工智慧(Artificial Intelligence, AI)領域中,卷積神經網絡(Convolutional Neural
Networks, CNN)因其強大的特徵提取能力而受到青睞,尤其在圖像識別、物件偵測和圖
像分割等任務中,都展現了高效的表現。為了充分利用卷積神經網絡,人工智慧加速器
被開發出來以加速 CNN 的運算效率,其中最為廣泛應用的架構之一是脈動陣列(Systolic
Array)。脈動陣列架構包含多個處理元件(Processing Element, PE),組成類似陣列的結
構以執行乘加運算(Multiply-Accumulate, MAC),並且因其規律且高度平行的計算能力,
脈動陣列有效提升了整體加速器的運算效能。
然而,脈動陣列內部 PE 的可靠性會受到老化效應的影響,例如負偏壓溫度不穩定
性(Negative Bias Temperature Instability,NBTI)、正偏壓溫度不穩定性(Postivie Bias
Temperature Instability,PBTI)和熱載子注入(Hot Carrier Injection, HCI)。這些老化效
應會導致 PE 計算出錯,從而降低加速器的計算準確度。此外,由於卷積神經網絡模型
的稀疏性(Sparsity),脈動陣列在運算時會有部分 PE 未被使用,這會導致 PE 之間的老
化不均衡,使得部分 PE 受老化影響比其他 PE 更嚴重,進而讓這些 PE 過早失去運算功
能,最終引發整體加速器計算錯誤和準確度下降。
為了克服這些挑戰,我們提出了一種考量模型稀疏性的資料映射策略,目的在於減
輕脈動陣列中老化引起的壓力。通過利用權重和輸入的稀疏性,我們的方法使老化效應
均勻分佈在 PE 之間,從而延長整體加速器的使用壽命。實驗結果顯示,採用我們提出
的資料映射策略,可以使人工智慧加速器的使用壽命延長 1.5 至 2 倍,同時面積開銷可
忽略不計。摘要(英) In the field of Artificial Intelligence (AI), Convolutional Neural Networks (CNNs) are
highly favored for their powerful feature extraction capabilities, particularly in tasks such as
image recognition, object detection, and image segmentation. To fully leverage CNNs, AI
accelerators have been developed to enhance CNN computation efficiency, with the systolic
array being one of the most widely applied architectures. The systolic array architecture
comprises multiple Processing Elements (PEs) arranged in an array-like structure to perform
Multiply-Accumulate (MAC) operations. Due to its regular and highly parallel computational
capabilities, the systolic array effectively boosts the overall performance of accelerators.
However, the reliability of the PEs within a systolic array is affected by aging effects, such
as Negative Bias Temperature Instability (NBTI), Positive Bias Temperature Instability (PBTI),
and Hot Carrier Injection (HCI). These aging effects can cause PE computation errors, thereby
reducing the accuracy of the accelerator. Additionally, the sparsity of CNN models results in
some PEs being underutilized during computation, leading to uneven aging among PEs. This
imbalance causes certain PEs to degrade more rapidly than others, which can lead to premature
failure of some PEs, resulting in overall computation errors and reduced accuracy.
To address these challenges, we propose a sparsity-aware data mapping strategy to
mitigate aging-induced stress in systolic arrays. By leveraging the sparsity of weights and inputs,
our method ensures that aging effects are evenly distributed across PEs, thereby extending the
overall lifespan of the accelerator. Experimental results demonstrate that our proposed data
mapping strategy can extend the lifespan of AI accelerators by 1.5 to 2 times, with negligible
area overhead.關鍵字(中) ★ 人工智慧加速器
★ 老化效應關鍵字(英) ★ AI Accelerator
★ Aging Effects論文目次 摘要................................................................................................................ii
Abstract..........................................................................................................iii
致謝...............................................................................................................iv
Table of Contents............................................................................................ v
Table of Figures............................................................................................vii
Table of Tables.............................................................................................viii
Chapter 1 Introduction................................................................................ 1
1.1 Systolic-array-based CNN Accelerator............................................ 2
1.2 Reliability Issues of CNN Accelerator............................................. 3
1.3 Contributions.................................................................................... 5
Chapter 2 Preliminaries.............................................................................. 7
2.1 Systolic-array-based CNN Accelerator............................................ 7
2.1.1 Systolic Array........................................................................ 7
2.1.2 Dataflow ................................................................................ 8
2.2 Sparsity in CNN ............................................................................. 11
2.3 Aging Effects.................................................................................. 12
2.4 Previous Works............................................................................... 13
Chapter 3 Sparsity Aware Data Mapping Strategy................................... 15
3.1 Strategy Overview.......................................................................... 15
3.2 Problem Formulation...................................................................... 18
3.3 Example of Strategy ....................................................................... 19
vi
3.4 Hardware Design............................................................................ 21
3.4.1 Full-Hardware ..................................................................... 21
3.4.2 Heterogeneous Computing.................................................. 23
Chapter 4 Experimental Results............................................................... 25
4.1 Aging Effects on PE ....................................................................... 25
4.2 Strategy Simulation ........................................................................ 26
4.3 Accuracy Results............................................................................ 28
4.4 Hardware Overhead........................................................................ 32
4.4.1 Area and Power Overhead................................................... 32
4.4.2 Timing Overhead................................................................. 33
Chapter 5 Conclusions.............................................................................. 35
Reference...................................................................................................... 36參考文獻 [1] C. Luo, X. Li, et al., "How Does the Data Set Affect CNN-based Image Classification
Performance?," 2018 5th International Conference on Systems and Informatics (ICSAI),
Nanjing, China, 2018, pp. 361-366, doi: 10.1109/ICSAI.2018.8599448.
[2] H.T. Kung, et al., “Systolic array (for VLSI),” in Sparse Matrix Symposium, pp. 256-282,
1978.
[3] N. P. Jouppi, et al., “In-datacenter performance analysis of a tensor processing unit,” in
Proc. 44th Annu. Int. Symp. Comput. Architecture, pp. 1–12, Jun 2017.
[4] WEI, Xuechao, et al., “Automated systolic array architecture synthesis for high throughput
CNN inference on FPGAs,” in Proceedings of the 54th Annual Design Automation
Conference, pp. 1-6, 2017.
[5] GENC, Hasan, et al., “Gemmini: An agile systolic array generator enabling systematic
evaluations of deep-learning architectures, ” arXiv preprint, arXiv:1911.09925, 3: 25,
2019.
[6] G. Zhou, et al., “Research on NVIDIA deep learning accelerator,” in Proc. ASID, Nov.
2018, pp. 192–195.
[7] Fowers, Jeremy, et al., “A configurable cloud-scale DNN processor for real-time AI. ”
2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture
(ISCA), IEEE, 2018.
[8] Y. Li, et al., “High-performance Convolutional Neural Network Accelerator Based on
Systolic Arrays and Quantization,” in 2019 IEEE 4th International Conference on Signal
and Image Processing (ICSIP), pp. 335-339, 2019, Wuxi, China
[9] H.T. Kung, et al., “Packing sparse convolutional neural networks for efficient systolic
array implementations: Column combining under joint optimization,” in Proceedings of
37
the Twenty-Fourth International Conference on Architectural Support for Programming
Languages and Operating Systems. p. 821-834. 2019.
[10] H. Afzali-Kusha, et al., “X-nvdla: Runtime accuracy configurable nvdla based on applying
voltage overscaling to computing and memory units,” IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 70, no. 5, pp. 1989–2002, 2023
[11] MUÑOZ, et al., “Gated-CNN: Combating NBTI and HCI aging effects in on-chip
activation memories of Convolutional Neural Network accelerators”. in Journal of
Systems Architecture, 128: 102553. 2022.
[12] John Keane, et al., “Transistor Aging”. in IEEE Spectrum. 2011.
[13] J. W. McPherson, “Time-dependent dielectric breakdown physics— Models revisited,”
Microelectron. Rel., vol. 52, nos. 9–10, pp. 1753–1760, 2012.
[14] B. C. Paul, et al., “Temporal Performance Degradation under NBTI: Estimation and
Design for Improved Reliability of Nanoscale Circuits,” in Proc. of ACM/IEEE DATE, pp.
1-6, 2006, Munich.
[15] S. Khan, et al., "BTI impact on logical gates in nano-scale CMOS technology," 2012 IEEE
15th International Symposium on Design and Diagnostics of Electronic Circuits &
Systems (DDECS), Tallinn, Estonia, 2012, pp. 348-353, doi:
10.1109/DDECS.2012.6219086.
[16] I. Moghaddasi, et al., "Dependable DNN Accelerator for Safety-Critical Systems: A
Review on the Aging Perspective," in IEEE Access, vol. 11, pp. 89803-89834, 2023, doi:
10.1109/ACCESS.2023.3300376.
[17] DU, Zidong, et al., “ShiDianNao: Shifting vision processing closer to the sensor,” in
Proceedings of the 42nd Annual International Symposium on Computer Architecture, p.
92-104, 2015.
38
[18] PARASHAR, et al., “SCNN: An accelerator for compressed-sparse convolutional neural
networks”. in ACM SIGARCH computer architecture news, 45.2: 27-40, 2017.
[19] Song Han, et al., ” Learning Both Weights and Connections for Efficient Neural Networks,”
in Proceedings of the International Conference on Neural Information Processing Systems
(NIPS), 1135–1143, 2015.
[20] W. Liu, et al., “Analysis of circuit aging on accuracy degradation of deep neural network
accelerator,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS),
pp. 1-5, May 2019.
[21] B. Tudor, et al., “MOSRA: An efficient and versatile MOS aging modeling and reliability
analysis solution for 45nm and below,” in IEEE International Conference on Solid-State
and Integrated Circuit Technology, pp. 1645-1647, 2010.
[22] M. Liu, et al., "Efficient Zero-Activation-Skipping for On-Chip Low-Energy CNN
Acceleration," 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits
and Systems (AICAS), Washington DC, DC, USA, 2021, pp. 1-4, doi:
10.1109/AICAS51828.2021.9458578.
[23] Meng, et al., "Dynamap: Dynamic algorithm mapping framework for low latency cnn
inference." The 2021 ACM/SIGDA International Symposium on Field-Programmable
Gate Arrays. 2021.指導教授 陳聿廣(Yu-Guang Chen) 審核日期 2024-7-12 推文 facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu 網路書籤 Google bookmarks del.icio.us hemidemi myshare