用於評估深度神經網路與加速器之錯誤容忍度模擬器

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：47

、訪客IP：3.16.83.157

姓名

蔡永聿(Yung-Yu Tsai) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

用於評估深度神經網路與加速器之錯誤容忍度模擬器
(A Simulator for Evaluating the Fault-Tolerance Capability of Deep Neural Networks and Accelerators)

相關論文

★ 應用於三元內容定址記憶體之低功率設計與測試技術	★ 用於隨機存取記憶體的接線驗證演算法
★ 用於降低系統晶片內測試資料之基礎矽智產	★ 內容定址記憶體之鄰近區域樣型敏感瑕疵測試演算法
★ 內嵌式記憶體中位址及資料匯流排之串音瑕疵測試	★ 用於系統晶片中單埠與多埠記憶體之自我修復技術
★ 用於修復嵌入式記憶體之基礎矽智產	★ 自我修復記憶體之備份分析評估與驗證平台
★ 使用雙倍疊乘累加命中線之低功率三元內容定址記憶體設計	★ 可自我測試且具成本效益之記憶體式快速傅利葉轉換處理器設計
★ 低功率與可自我修復之三元內容定址記憶體設計	★ 多核心系統晶片之診斷方法
★ 應用於網路晶片上隨機存取記憶體測試及修復之基礎矽智產	★ 應用於貪睡靜態記憶體之有效診斷與修復技術
★ 應用於內嵌式記憶體之高效率診斷性資料壓縮與可測性方案	★ 應用於隨機存取記憶體之有效良率及可靠度提升技術

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-8-23以後開放)

摘要(中)

深度神經網路(DNN)已被廣泛使用於人工智慧的應用上，其中有些應用是對於安全性敏感的應用，可靠性是設計安全性敏感電子產品的重要指標。儘管DNN被認為具有固有的錯誤容忍能力，然而帶有運算簡化技術的DNN加速器與硬體錯誤，可能會大幅減低DNN的錯誤容忍能力。在此篇論文中，我們提出用於評估深度神經網路與加速器之錯誤容忍度模擬器，能以軟體與硬體的角度探討DNN錯誤容忍度的能力。模擬器架構基於TensorFlow與Keras，使用張量(tensor)運算整合量化(quantization)及植入錯誤(fault injection)的函式庫，可以適用於各式的DNN層。因此，模擬器可以協助使用者在加速器設計規劃階段分析錯誤容忍能力，並最佳化DNN模型與加速器。我們評估了各式DNN模型之錯誤容忍能力，DNN模型層數為4至50層，其中量化設定為8位元或16位元的定點數，並且將準確率損失維持在1%以下。在加速器的錯誤容忍度評估中，被測試之緩衝記憶體大小區間是19.2KB到904KB，以及運算單元陣列大小8×8到32×32。分析結果顯示加速器不同元件間的錯誤容忍度能力差異巨大，容忍度高的元件可以多承受數個數量級的錯誤，然而脆弱的元件則是會因為少數關鍵錯誤而大幅影響準確率。根據模擬結果我們歸納出幾個關鍵錯誤的成因，使錯誤修復機制可以針對脆弱點設計。我們在Xilinx ZCU-102 FPGA上實作了8×8脈動陣列(systolic array)的加速器推論LeNet-5模型，模擬器與FPGA之間的平均誤差在6.3%以下。

摘要(英)

Deep neural networks (DNNs) have been widely used for artificial intelligence applications, some of which are safety-critical applications. Reliability is a key metric for designing an electronic system for safety-critical applications. Although DNNs have inherent fault-tolerance capability, the computation-reduction techniques used for designing DNN accelerators and hardware faults might drastically reduce their fault-tolerance capability. In this thesis, we propose a simulator for evaluating the fault-tolerance capability of DNN models and accelerators, it can evaluate the fault-tolerance capability of DNNs at software and hardware levels. The proposed simulator is developed on the frameworks of TensorFlow and Keras. We implement tensor operation-based libraries of quantization and fault injection which are scalable for different types of DNN layers. Designers can use the simulator to analyze the fault-tolerance capability in design phase such that the reliability of DNN models and accelerators can be optimized. We analyze the fault-tolerance capability of a wide range of DNNs with number of layers from 4∼50. The data is quantized to 8-bit or 16-bit fixed-point with accuracy drop under 1%. Accelerators with on-chip memory from 19.3KB to 904KB and the PE array size from 8x8 to 32x32 are simulated. Analysis results show that the difference of fault-tolerance capability between parts of DNN accelerator is huge. Stronger parts can tolerate several orders of magnitude more faults, while a few critical faults on weaker parts can drastically degrade the inference accuracy. We observe a few causes of critical faults that fault mitigation resources can focus on. We also implement an accelerator with 8×8 systolic array on Xilinx ZCU-102 FPGA running LeNet-5 model, the average error between the FPGA and simulator results is within 6.3%.

關鍵字(中)

★ 深度神經網路
★ 錯誤容忍度
★ 硬體加速器

關鍵字(英)

★ Deep Neural Network
★ Fault-Tolerance Capability
★ Hardware Accelerator

論文目次

1 Introduction 1
1.1 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Fault-Tolerance Capability of Neural Networks . . . . . . . . . . . . . . . . . . . 4
1.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Simulator for DNN Fault-Tolerance Capability Evaluation 10
2.1 Inherent Fault-Tolerance Capability of Deep Learning . . . . . . . . . . . . . . . . 10
2.2 Simulator Role in Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Simulator Structure and Simulation Flow . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Quantization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Quantization Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 Quantization in Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.3 Fuse Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.4 Analysis Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Model-Level Fault Injection Mechanism . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Defined Metrics of Fault-Tolerance Capability . . . . . . . . . . . . . . . . . . . . 25
3 Simulating Approaches for DNN Accelerators with Faults 27
3.1 Hardware Fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 DNN Inference Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 DNN Computation Flow in Hardware . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.1 Data Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.2 Partial Sum Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Tile-based Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Loop Tiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 Fault Duplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Data Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.1 Spatial and Temporal Description of Dataflow . . . . . . . . . . . . . . . . 36
3.5.2 Transformation Between On-Chip Memory and Tile . . . . . . . . . . . . 38
3.5.3 Transformation Between PE Array and Tile . . . . . . . . . . . . . . . . . 41
3.5.4 Data Contamination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6 Fault Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.1 Computation Unit Description . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6.2 Memory Fault Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6.3 PE Array Fault Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6.4 Run Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Simulation Result and Analysis 55
4.1 Fault-Tolerance Capability Analysis of Models . . . . . . . . . . . . . . . . . . . 56
4.1.1 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.2 Model Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.3 Fault Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 Fault-Tolerance Capability Analysis of Accelerators . . . . . . . . . . . . . . . . . 62
4.2.1 On-Chip Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.2 PE Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4 Fault Mitigation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 FPGA Platform Validation 82
5.1 Validation of DNN Model and Accelerator . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Xilinx ZCU-102 FPGA Implementation . . . . . . . . . . . . . . . . . . . . . . . 83
5.3 Comparison Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6 Conclusion and Future Work 88
6.1 conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

參考文獻

[1] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[2] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.
[3] K. Bong, S. Choi, C. Kim, S. Kang, Y. Kim, and H.-J. Yoo, “14.6 a 0.62 mW ultra-lowpower convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector,” in Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), 2017, pp. 248–249.
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceeding of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 770–778.
[6] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceeding of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
[7] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” Computing Research Repository (CoRR), 2017. [Online]. Available: http://arxiv.org/abs/1704.04861
[8] S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing deep neural network with pruning, trained quantization and Huffman coding,” Computing Research Repository (CoRR), 2015. [Online]. Available: http://arxiv.org/abs/1510.00149
[9] S. Han, X. Liu, H.Mao, J. Pu, A. Pedram,M. A. Horowitz, andW. J. Dally, “EIE: Efficient inference engine on compressed deep neural network,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 243–254, 2016.
[10] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon- X: An accelerator for sparse neural networks,” in Proceeding of 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016, pp. 1–12.
[11] D. Kim, J. Ahn, and S. Yoo, “Zena: Zero-aware neural network accelerator,” IEEE Design & Test, vol. 35, no. 1, pp. 39–46, 2017.
[12] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, “cuDNN: Efficient primitives for deep learning,” Computing Research Repository (CoRR), 2014. [Online]. Available: http://arxiv.org/abs/1410.0759
[13] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean,M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., “TensorFlow: A system for large-scale machine learning,” in Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016, pp. 265–283.
[14] F. Chollet et al., “Keras,” 2015.
[15] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2015, pp. 161–170.
[16] M. Motamedi, P. Gysel, V. Akella, and S. Ghiasi, “Design space exploration of FPGA-based deep convolutional neural networks,” in Proceedings of Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2016, pp. 575–580.
[17] J. Qiu, J.Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, et al., “Going deeper with embedded FPGA platform for convolutional neural network,” in Proceedings of ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016, pp. 26–35.
[18] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “DianNao: A smallfootprint high-throughput accelerator for ubiquitous machine-learning,” in Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2014, pp. 269–284.
[19] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “ShiDianNao: Shifting vision processing closer to the sensor,” ACM SIGARCH Computer Architecture News, vol. 43, no. 3, pp. 92–104, Jun. 2015.
[20] Y. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, Jan. 2017.
[21] B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “14.5 Envision: A 0.26-to- 10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm FDSOI,” in Proceedings of IEEE International Solid-State Circuits Conference (ISSCC), Feb. 2017, pp. 246–247.
[22] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al., “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of ACM/IEEE International Symposium on Computer Architecture (ISCA), 2017, pp. 1–12.
[23] D. M. Loroch, F.-J. Pfreundt, N. Wehn, and J. Keuper, “TensorQuant - A simulation toolbox for deep neural network quantization,” in Proceedings of the Machine Learning on HPC Environments, 2017, pp. 1–8.
[24] B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2704–2713.
[25] B. Moons, K. Goetschalckx, N. Van Berckelaer, and M. Verhelst, “Minimum energy quantized neural networks,” in Proceedings of Asilomar Conference on Signals, Systems, and Computers, Oct. 2017, pp. 1921–1925.
[26] D. Lin, S. Talathi, and S. Annapureddy, “Fixed point quantization of deep convolutional networks,” in Proceedings of International conference on machine learning, 2016, pp. 2849–2858.
[27] L. Yang and B. Murmann, “SRAM voltage scaling for energy-efficient convolutional neural networks,” in Proceedings of IEEE International Symposium on Quality Electronic Design (ISQED), 2017, pp. 7–12.
[28] L. Yang and B.Murmann, “Approximate SRAMfor energy-efficient, privacy-preserving convolutional neural networks,” in Proceedings of IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2017, pp. 689–694.
[29] S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan, “AxNN: Energy-efficient neuromorphic systems using approximate computing,” in Proceedings of IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), 2014, pp. 27–32.
[30] Q. Zhang, T. Wang, Y. Tian, F. Yuan, and Q. Xu, “ApproxANN: An approximate computing framework for artificial neural network,” in Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, pp. 701–706.
[31] V. Mrazek, Z. Vasicek, L. Sekanina, M. A. Hanif, and M. Shafique, “ALWANN: automatic layer-wise approximation of deep neural network accelerators without retraining,” in Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2019, pp. 1–8.
[32] G. Bolt, “Investigating fault tolerance in artificial neural networks,” Department of Computer Science, Technical Report YCS 154, Heslington, York, England, Tech. Rep., 1991.
[33] C.-T. Chin, K. Mehrotra, C. K. Mohan, and S. Rankat, “Training techniques to obtain fault-tolerant neural networks,” in Proceedings of IEEE International Symposium on Fault-Tolerant Computing, 1994, pp. 360–369.
[34] B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hern´andez-Lobato, G.-Y. Wei, and D. Brooks, “Minerva: Enabling low-power, highly-accurate deep neural network accelerators,” in Proceedings of ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), 2016, pp. 267–278.
[35] B. Reagen, U. Gupta, L. Pentecost, P.Whatmough, S. K. Lee, N.Mulholland, D. Brooks, and G.-Y. Wei, “Ares: A framework for quantifying the resilience of deep neural networks,” in Proceedings of ACM/ESDA/IEEE Design Automation Conference (DAC), 2018, pp. 1–6.
[36] B. Salami, O. S. Unsal, and A. C. Kestelman, “On the resilience of RTL NN accelerators: Fault characterization and mitigation,” in Proceedings of International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2018, pp. 322–329.
[37] L.-H. Hoang, M. A. Hanif, and M. Shafique, “FT-ClipAct: Resilience analysis of deep neural networks and improving their fault tolerance using clipped activation,” in Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE), 2020, pp. 1241–1246.
[38] G. Li, S. K. S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, and S. W. Keckler, “Understanding error propagation in deep learning neural network (DNN) accelerators and applications,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, p. 8.
[39] J. J. Zhang, T. Gu, K. Basu, and S. Garg, “Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator,” in Proceedings of IEEE 36th VLSI Test Symposium (VTS), 2018, pp. 1–6.
[40] M. Abdullah Hanif and M. Shafique, “SalvageDNN: salvaging deep neural network accelerators with permanent faults through saliency-driven fault-aware mapping,” Philosophical Transactions of the Royal Society A, vol. 378, no. 2164, p. 20190164, 2020.
[41] S. Kim, P. Howe, T. Moreau, A. Alaghi, L. Ceze, and V. S. Sathe, “Energy-efficient neural network acceleration in the presence of bit-level memory errors,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 12, pp. 4285–4298, 2018.
[42] C. De Sio, S. Azimi, and L. Sterpone, “An emulation platform for evaluating the reliability of deep neural networks,” in Proceedings of IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), 2020, pp. 1–4.
[43] C. Torres-Huitzil and B. Girau, “Fault and error tolerance in neural networks: A review,” IEEE Access, vol. 5, pp. 17 322–17 341, 2017.
[44] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of International conference on machine learning, 2015, pp. 448–456.
[45] J.-C. Vialatte and F. Leduc-Primeau, “A study of deep learning robustness against computation failures,” Computing Research Repository (CoRR), 2017. [Online]. Available: http://arxiv.org/abs/1704.05396
[46] A. Bosio, P. Bernardi, A. Ruospo, and E. Sanchez, “A reliability analysis of a deep neural network,” in Proceedings of IEEE Latin American Test Symposium (LATS), 2019, pp. 1–6.
[47] H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V. Sarkar, and T. Krishna, “Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach,” in Proceedings of IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 754–768.
[48] F. Chollet, “Deep learning models,” https://github.com/fchollet/deep-learningmodels/releases, 2018.

指導教授

李進福(Jin-Fu Li)

審核日期

2021-8-27

推文