應用於脈動陣列深度身經網路加速器之自 我測試與修復技術

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：20

、訪客IP：18.220.188.4

姓名

周士淳(Shih-Chun Chou) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

應用於脈動陣列深度身經網路加速器之自我測試與修復技術
(Built-In Self-Test and Repair Techniques for Systolic Array-Based AI Accelerators)

相關論文

★ 應用於三元內容定址記憶體之低功率設計與測試技術	★ 用於隨機存取記憶體的接線驗證演算法
★ 用於降低系統晶片內測試資料之基礎矽智產	★ 內容定址記憶體之鄰近區域樣型敏感瑕疵測試演算法
★ 內嵌式記憶體中位址及資料匯流排之串音瑕疵測試	★ 用於系統晶片中單埠與多埠記憶體之自我修復技術
★ 用於修復嵌入式記憶體之基礎矽智產	★ 自我修復記憶體之備份分析評估與驗證平台
★ 使用雙倍疊乘累加命中線之低功率三元內容定址記憶體設計	★ 可自我測試且具成本效益之記憶體式快速傅利葉轉換處理器設計
★ 低功率與可自我修復之三元內容定址記憶體設計	★ 多核心系統晶片之診斷方法
★ 應用於網路晶片上隨機存取記憶體測試及修復之基礎矽智產	★ 應用於貪睡靜態記憶體之有效診斷與修復技術
★ 應用於內嵌式記憶體之高效率診斷性資料壓縮與可測性方案	★ 應用於隨機存取記憶體之有效良率及可靠度提升技術

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-8-21以後開放)

摘要(中)

深度神經網路(DNN)已經被廣泛運用在人工智慧的應用當中。基於脈動陣列
的加速器經常用來加速DNN 的運算，加速器中包含許多相同的處理單元(PEs)。使用傳統的掃描鏈與自動測試圖樣產生(ATPG)技術是非常花費時間且不符合成本效益的。在這篇論文中，我們提出對角流水線掃描(DiPS)測試技術應用於脈動陣列加速器。DiPS 測試技術針對單一個PE 產生測試圖樣並且使用對角流水線的方式運用在整個陣列的處理單元上。DiPS 測試技術有著低測試複雜度和良好延展性的特色。DiPS 針對乘累加電路可覆蓋100%的延遲錯誤。針對n×m 的陣列測試的時間複雜度為(P×(S+1)+S+m+n-1)×2，其中P 為測試圖樣的數目，S 為單一PE 掃描鏈的長度。與先前的論文比較，被陣列大小影響的時間複雜度可以從(m+n)×P 下降到(m+n-1)×2。此外，我們基於DiPS 方法提出了錯誤定位方法。藉由比
對同一對角線上相鄰PE 的掃描鏈輸出來定位出錯誤PE 的位置。這個方法還能夠將陣列的掃描鏈輸出數目從m×n/2 減少到(m+n-2)/2。最後，我們設計了自我測試和自我修復(BISR)電路應用於一備用列的脈動陣列。針對32×32 的陣列假設PE 的錯誤率為0.1%到1%之間，BISR 技術可以提升8.79%到62.09 的良率。

摘要(英)

Deep neural networks (DNN) are widely used in the artificial intelligence applications. Systolic array-based accelerators usually are used to accelerate the computation of DNNs. In a systolic array-based accelerator, many identical processing elements (PEs) are included. Testing it using typical scan and automatic test pattern generation (ATPG) technique is time consuming and not cost-effective. In this thesis, we propose a diagonally pipelined scan (DiPS) test scheme
for the systolic array-based accelerator. The DiPS test scheme generates test patterns at single PE and applies test patterns for all the PEs in a diagonally pipelined way. The features of the DiPS test scheme are low test complexity and scalability. The DiPS method can cover 100% delay fault of the multiply-and-accumulate circuit. The test application time complexity of the DiPS is (P × (S + 1) + S + m + n − 1) × 2 for an n × m PE array, where P and S denote the number of required test patterns and the scan chain length of a single PE, respectively. In comparison with the existing work, the test application time complexity with respect to the array size is reduced from (m + n) × P to (m + n − 1) × 2. Furthermore, we propose a fault location method based on the DiPS method. A faulty PE can be located by comparing the scan outputs of two adjacent PEs in the same diagonal. This reduces the number of test outputs of the PE array from n×m/2 to (m+n−2)/2. Finally, we design a built-in self-test and built-in self-repair (BISR) circuit for the systolic array-based PE array with a spare column. Assume that the error rate of a PE is between 0.1% and 1% for a 32x32 array size. The BISR scheme can gain 8.79% to 62.09% yield improvement.

關鍵字(中)

★ 脈動陣列
★ 測試
★ 內建自我修復
★ 自我測試
★ 自我修復

關鍵字(英)

★ Systolic array
★ Testing
★ BISR
★ Self-test
★ Self-repair

論文目次

1 Introduction 1
1.1 Deep Neural Network 1
1.2 Systolic Array-Based Accelerator Architecture 2
1.3 Impact of Fault on Systolic Array-Based Accelerator 5
1.4 Testing of Systolic Array-Based Accelerators 5
1.5 Motivation 7
1.6 Contribution 8
1.7 Thesis Organization 8
2 Proposed Diagonally Pipelined Scan Test Scheme 9
2.1 PE Level Test Procedure 9
2.2 Test Architecture 12
2.3 Array Level Test Procedure 13
2.4 Experiment Result 19
2.4.1 Fault Coverage 19
2.4.2 Dft Circuit Area Overhead for Different Array Sizes 20
2.4.3 Test Time 25
2.4.4 PE Partition Analysis 26
2.4.5 Comparison 27
3 Proposed Reducing Scan Outputs Method and Built-in Self-repair Scheme 38
3.1 Reducing Scan Outputs Method 38
3.1.1 Fault Location Concept 38
3.1.2 Fault Location Method 38
3.1.3 Reducing Scan Outputs architecture 40
3.1.4 Aliasing Analysis 42
3.1.5 Scan Chain Fault Location Method 45
3.2 Built-in Self-repair Design 48
3.2.1 Built-in Self-repair Flow 48
3.2.2 Built-in Self-repair Architecture 50
3.2.3 Built-in Self-repair Design with PE Partition 51
3.3 Experiment Result 52
3.3.1 Built-in Self-repair Hardware Simulation 52
3.3.2 Area Overhead 53
3.3.3 Yield Analysis 54
4 Conclusion and Future Work 63

參考文獻

[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceeding of The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Jun. 2016, pp. 770–778.
[2] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
[3] K. Bong, S. Choi, C. Kim, S. Kang, Y. Kim, and H.-J. Yoo, “14.6 a 0.62 mW ultralow-power convolutional-neural-network face-recognition processor and a CIS integrated
with always-on haar-like face detector,” in Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 248–249.
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
[5] A. Conneau, H. Schwenk, L. Barrault, and Y. Lecun, “Very deep convolutional networks for natural language processing,” arXiv preprint arXiv:1606.01781, vol. 2, 2016.
[6] D. Silver et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.
[7] A. G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” Computing Research Repository (CoRR), 2017.
[8] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[9] M. Naphade et al., “The NVIDIA AI city challenge,” in 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), 2017, pp. 1–6.
[10] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017.
[11] N. P. Jouppi, C. Young, N. Patil et al., “In-datacenter performance analysis of a tensor processing unit,” in 2017 ACM/IEEE 44th Annual International Symposium on Computer
Architecture (ISCA), 2017, pp. 1–12.
[12] X. Wei, C. H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang, and J. Cong, “Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs,” in
2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), 2017, pp. 1–6.
[13] Z.-G. Liu, P. N. Whatmough, and M. Mattina, “Systolic tensor array: An efficient structured-sparse GEMM accelerator for mobile cnn inference,” IEEE Computer Architecture Letters, vol. 19, no. 1, pp. 34–37, 2020.
[14] K.-W. Chang and T.-S. Chang, “Vwa: Hardware efficient vectorwise accelerator for convolutional neural network,” IEEE Transactions on Circuits and Systems I: Regular Papers,
vol. 67, no. 1, pp. 145–154, 2020.
[15] J. J. Zhang, T. Gu, K. Basu, and S. Garg, “Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator,” in 2018 IEEE 36th VLSI Test Symposium (VTS), 2018, pp. 1–6.
[16] A. Chaudhuri, C. Liu, X. Fan, and K. Chakrabarty, “C-testing of AI accelerators,” in 2020 IEEE 29th Asian Test Symposium (ATS), 2020, pp. 1–6.
[17] U. S. Solangi, M. Ibtesam, M. A. Ansari, J. Kim, and S. Park, “Test architecture for systolic array of edge-based AI accelerator,” IEEE Access, vol. 9, pp. 96 700–96 710, 2021.
[18] H. Lee, J. Kim, J. Park, and S. Kang, “Strait: Self-test and self-recovery for AI accelerator,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 9, pp. 3092–3104, 2023.
[19] S. Lee, J. Park, S. Park, H. Kim, and S. Kang, “A new zero-overhead test method for low-power ai accelerators,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 71, no. 5, pp. 2649–2653, 2024.
[20] J. J. Zhang, K. Basu, and S. Garg, “Fault-tolerant systolic array based accelerators for deep neural network execution,” IEEE Design Test, vol. 36, no. 5, pp. 44–53, 2019.
[21] A. Chaudhuri, C. Liu, X. Fan, and K. Chakrabarty, “C-testing and efficient fault localization for AI accelerators,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 7, pp. 2348–2361, 2022.
[22] M. Ibtesam, U. S. Solangi, J. Kim, M. A. Ansari, and S. Park, “Highly efficient test architecture for low-power AI accelerators,” IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 41, no. 8, pp. 2728–2738, 2022.
[23] J. Savir and S. Patil, “Scan-based transition test,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 8, pp. 1232–1241, 1993.
[24] Y. Huang, R. Guo, W.-T. Cheng, and J. C.-M. Li, “Survey of scan chain diagnosis,” IEEE Design Test of Computers, vol. 25, no. 3, pp. 240–248, 2008.

指導教授

李進福(Jin-Fu Li)

審核日期

2024-8-21

推文