一個全新考量稀疏性的資料映射策略以減緩 AI 加速器中的老化效應

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：148

、訪客IP：3.22.181.148

姓名

柯蓉鈁(Jung-Fang Ke) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

一個全新考量稀疏性的資料映射策略以減緩 AI 加速器中的老化效應
(A Novel Sparsity Aware Data Mapping Strategy for Mitigating Aging Effects in Systolic-array-based AI Accelerator)

相關論文

★ 晶圓圖之網格及稀疏缺陷樣態辨識	★ 晶圓圖提取特徵參數錯誤樣態分析
★ 使用聚類過濾策略和 CNN 計算識別晶圓圖瑕疵樣態	★ 新建晶圓圖相似性門檻以強化相似程度辨別能力
★ 一種可動態重新配置的4:2近似壓縮器用於補償老化	★ 一個可靠的靜態隨機存取記憶體內運算結構: 設計指南與耐老化策略研究
★ 一個高效的老化偵測器部屬策略: 基於生成對抗網路的設計方法	★ 考慮電壓衰退和繞線影響以優化電路時序之電源供應網絡精煉策略
★ 適用於提高自旋轉移力矩式磁阻隨機存取記憶體矩陣可靠度之老化偵測與緩解架構設計	★ 8T 靜態隨機存取記憶體之內積運算引擎的老化威脅緩解策略: 從架構及運算角度來提出解決的方法
★ 用於響應穩定性的老化感知平行掃描鏈PUF設計	★ 8T靜態隨機存取記憶體運算的老化檢測和容忍機制：適用於邏輯和 MAC 運算的應用
★ 使用擺置後的設計特徵及極限梯度提升演算法預測繞線後的繞線需求	★ 基於強化學習的晶片佈局規劃的卷積神經網路與圖神經網路融合架構
★ 用於佈線後階段電壓降優化的強化學習框架	★ 多核心系統的老化與瞬態錯誤感知任務部署策略：壽命延長且節能的框架

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-7-12以後開放)

摘要(中)

在人工智慧（Artificial Intelligence, AI）領域中，卷積神經網絡（Convolutional Neural
Networks, CNN）因其強大的特徵提取能力而受到青睞，尤其在圖像識別、物件偵測和圖
像分割等任務中，都展現了高效的表現。為了充分利用卷積神經網絡，人工智慧加速器
被開發出來以加速 CNN 的運算效率，其中最為廣泛應用的架構之一是脈動陣列（Systolic
Array）。脈動陣列架構包含多個處理元件（Processing Element, PE），組成類似陣列的結
構以執行乘加運算（Multiply-Accumulate, MAC），並且因其規律且高度平行的計算能力，
脈動陣列有效提升了整體加速器的運算效能。
然而，脈動陣列內部 PE 的可靠性會受到老化效應的影響，例如負偏壓溫度不穩定
性（Negative Bias Temperature Instability，NBTI）、正偏壓溫度不穩定性（Postivie Bias
Temperature Instability，PBTI）和熱載子注入（Hot Carrier Injection, HCI）。這些老化效
應會導致 PE 計算出錯，從而降低加速器的計算準確度。此外，由於卷積神經網絡模型
的稀疏性（Sparsity），脈動陣列在運算時會有部分 PE 未被使用，這會導致 PE 之間的老
化不均衡，使得部分 PE 受老化影響比其他 PE 更嚴重，進而讓這些 PE 過早失去運算功
能，最終引發整體加速器計算錯誤和準確度下降。
為了克服這些挑戰，我們提出了一種考量模型稀疏性的資料映射策略，目的在於減
輕脈動陣列中老化引起的壓力。通過利用權重和輸入的稀疏性，我們的方法使老化效應
均勻分佈在 PE 之間，從而延長整體加速器的使用壽命。實驗結果顯示，採用我們提出
的資料映射策略，可以使人工智慧加速器的使用壽命延長 1.5 至 2 倍，同時面積開銷可
忽略不計。

摘要(英)

In the field of Artificial Intelligence (AI), Convolutional Neural Networks (CNNs) are
highly favored for their powerful feature extraction capabilities, particularly in tasks such as
image recognition, object detection, and image segmentation. To fully leverage CNNs, AI
accelerators have been developed to enhance CNN computation efficiency, with the systolic
array being one of the most widely applied architectures. The systolic array architecture
comprises multiple Processing Elements (PEs) arranged in an array-like structure to perform
Multiply-Accumulate (MAC) operations. Due to its regular and highly parallel computational
capabilities, the systolic array effectively boosts the overall performance of accelerators.
However, the reliability of the PEs within a systolic array is affected by aging effects, such
as Negative Bias Temperature Instability (NBTI), Positive Bias Temperature Instability (PBTI),
and Hot Carrier Injection (HCI). These aging effects can cause PE computation errors, thereby
reducing the accuracy of the accelerator. Additionally, the sparsity of CNN models results in
some PEs being underutilized during computation, leading to uneven aging among PEs. This
imbalance causes certain PEs to degrade more rapidly than others, which can lead to premature
failure of some PEs, resulting in overall computation errors and reduced accuracy.
To address these challenges, we propose a sparsity-aware data mapping strategy to
mitigate aging-induced stress in systolic arrays. By leveraging the sparsity of weights and inputs,
our method ensures that aging effects are evenly distributed across PEs, thereby extending the
overall lifespan of the accelerator. Experimental results demonstrate that our proposed data
mapping strategy can extend the lifespan of AI accelerators by 1.5 to 2 times, with negligible
area overhead.

關鍵字(中)

★ 人工智慧加速器
★ 老化效應

關鍵字(英)

★ AI Accelerator
★ Aging Effects

論文目次

摘要................................................................................................................ii
Abstract..........................................................................................................iii
致謝...............................................................................................................iv
Table of Contents............................................................................................ v
Table of Figures............................................................................................vii
Table of Tables.............................................................................................viii
Chapter 1 Introduction................................................................................ 1
1.1 Systolic-array-based CNN Accelerator............................................ 2
1.2 Reliability Issues of CNN Accelerator............................................. 3
1.3 Contributions.................................................................................... 5
Chapter 2 Preliminaries.............................................................................. 7
2.1 Systolic-array-based CNN Accelerator............................................ 7
2.1.1 Systolic Array........................................................................ 7
2.1.2 Dataflow ................................................................................ 8
2.2 Sparsity in CNN ............................................................................. 11
2.3 Aging Effects.................................................................................. 12
2.4 Previous Works............................................................................... 13
Chapter 3 Sparsity Aware Data Mapping Strategy................................... 15
3.1 Strategy Overview.......................................................................... 15
3.2 Problem Formulation...................................................................... 18
3.3 Example of Strategy ....................................................................... 19
vi
3.4 Hardware Design............................................................................ 21
3.4.1 Full-Hardware ..................................................................... 21
3.4.2 Heterogeneous Computing.................................................. 23
Chapter 4 Experimental Results............................................................... 25
4.1 Aging Effects on PE ....................................................................... 25
4.2 Strategy Simulation ........................................................................ 26
4.3 Accuracy Results............................................................................ 28
4.4 Hardware Overhead........................................................................ 32
4.4.1 Area and Power Overhead................................................... 32
4.4.2 Timing Overhead................................................................. 33
Chapter 5 Conclusions.............................................................................. 35
Reference...................................................................................................... 36

參考文獻

[1] C. Luo, X. Li, et al., "How Does the Data Set Affect CNN-based Image Classification
Performance?," 2018 5th International Conference on Systems and Informatics (ICSAI),
Nanjing, China, 2018, pp. 361-366, doi: 10.1109/ICSAI.2018.8599448.
[2] H.T. Kung, et al., “Systolic array (for VLSI),” in Sparse Matrix Symposium, pp. 256-282,
1978.
[3] N. P. Jouppi, et al., “In-datacenter performance analysis of a tensor processing unit,” in
Proc. 44th Annu. Int. Symp. Comput. Architecture, pp. 1–12, Jun 2017.
[4] WEI, Xuechao, et al., “Automated systolic array architecture synthesis for high throughput
CNN inference on FPGAs,” in Proceedings of the 54th Annual Design Automation
Conference, pp. 1-6, 2017.
[5] GENC, Hasan, et al., “Gemmini: An agile systolic array generator enabling systematic
evaluations of deep-learning architectures, ” arXiv preprint, arXiv:1911.09925, 3: 25,
2019.
[6] G. Zhou, et al., “Research on NVIDIA deep learning accelerator,” in Proc. ASID, Nov.
2018, pp. 192–195.
[7] Fowers, Jeremy, et al., “A configurable cloud-scale DNN processor for real-time AI. ”
2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture
(ISCA), IEEE, 2018.
[8] Y. Li, et al., “High-performance Convolutional Neural Network Accelerator Based on
Systolic Arrays and Quantization,” in 2019 IEEE 4th International Conference on Signal
and Image Processing (ICSIP), pp. 335-339, 2019, Wuxi, China
[9] H.T. Kung, et al., “Packing sparse convolutional neural networks for efficient systolic
array implementations: Column combining under joint optimization,” in Proceedings of
37
the Twenty-Fourth International Conference on Architectural Support for Programming
Languages and Operating Systems. p. 821-834. 2019.
[10] H. Afzali-Kusha, et al., “X-nvdla: Runtime accuracy configurable nvdla based on applying
voltage overscaling to computing and memory units,” IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 70, no. 5, pp. 1989–2002, 2023
[11] MUÑOZ, et al., “Gated-CNN: Combating NBTI and HCI aging effects in on-chip
activation memories of Convolutional Neural Network accelerators”. in Journal of
Systems Architecture, 128: 102553. 2022.
[12] John Keane, et al., “Transistor Aging”. in IEEE Spectrum. 2011.
[13] J. W. McPherson, “Time-dependent dielectric breakdown physics— Models revisited,”
Microelectron. Rel., vol. 52, nos. 9–10, pp. 1753–1760, 2012.
[14] B. C. Paul, et al., “Temporal Performance Degradation under NBTI: Estimation and
Design for Improved Reliability of Nanoscale Circuits,” in Proc. of ACM/IEEE DATE, pp.
1-6, 2006, Munich.
[15] S. Khan, et al., "BTI impact on logical gates in nano-scale CMOS technology," 2012 IEEE
15th International Symposium on Design and Diagnostics of Electronic Circuits &
Systems (DDECS), Tallinn, Estonia, 2012, pp. 348-353, doi:
10.1109/DDECS.2012.6219086.
[16] I. Moghaddasi, et al., "Dependable DNN Accelerator for Safety-Critical Systems: A
Review on the Aging Perspective," in IEEE Access, vol. 11, pp. 89803-89834, 2023, doi:
10.1109/ACCESS.2023.3300376.
[17] DU, Zidong, et al., “ShiDianNao: Shifting vision processing closer to the sensor,” in
Proceedings of the 42nd Annual International Symposium on Computer Architecture, p.
92-104, 2015.
38
[18] PARASHAR, et al., “SCNN: An accelerator for compressed-sparse convolutional neural
networks”. in ACM SIGARCH computer architecture news, 45.2: 27-40, 2017.
[19] Song Han, et al., ” Learning Both Weights and Connections for Efficient Neural Networks,”
in Proceedings of the International Conference on Neural Information Processing Systems
(NIPS), 1135–1143, 2015.
[20] W. Liu, et al., “Analysis of circuit aging on accuracy degradation of deep neural network
accelerator,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS),
pp. 1-5, May 2019.
[21] B. Tudor, et al., “MOSRA: An efficient and versatile MOS aging modeling and reliability
analysis solution for 45nm and below,” in IEEE International Conference on Solid-State
and Integrated Circuit Technology, pp. 1645-1647, 2010.
[22] M. Liu, et al., "Efficient Zero-Activation-Skipping for On-Chip Low-Energy CNN
Acceleration," 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits
and Systems (AICAS), Washington DC, DC, USA, 2021, pp. 1-4, doi:
10.1109/AICAS51828.2021.9458578.
[23] Meng, et al., "Dynamap: Dynamic algorithm mapping framework for low latency cnn
inference." The 2021 ACM/SIGDA International Symposium on Field-Programmable
Gate Arrays. 2021.

指導教授

陳聿廣(Yu-Guang Chen)

審核日期

2024-7-12

推文