應用於深度神經網絡加速器中靜態隨機存取記憶體之內建自我修復技術

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：38

、訪客IP：3.137.173.172

姓名

鄧凱云(Kai-Yun Deng) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

應用於深度神經網絡加速器中靜態隨機存取記憶體之內建自我修復技術
(Built-In Self-Repair Scheme for SRAMs in Deep Neural Network Accelerators)

相關論文

★ 應用於三元內容定址記憶體之低功率設計與測試技術	★ 用於隨機存取記憶體的接線驗證演算法
★ 用於降低系統晶片內測試資料之基礎矽智產	★ 內容定址記憶體之鄰近區域樣型敏感瑕疵測試演算法
★ 內嵌式記憶體中位址及資料匯流排之串音瑕疵測試	★ 用於系統晶片中單埠與多埠記憶體之自我修復技術
★ 用於修復嵌入式記憶體之基礎矽智產	★ 自我修復記憶體之備份分析評估與驗證平台
★ 使用雙倍疊乘累加命中線之低功率三元內容定址記憶體設計	★ 可自我測試且具成本效益之記憶體式快速傅利葉轉換處理器設計
★ 低功率與可自我修復之三元內容定址記憶體設計	★ 多核心系統晶片之診斷方法
★ 應用於網路晶片上隨機存取記憶體測試及修復之基礎矽智產	★ 應用於貪睡靜態記憶體之有效診斷與修復技術
★ 應用於內嵌式記憶體之高效率診斷性資料壓縮與可測性方案	★ 應用於隨機存取記憶體之有效良率及可靠度提升技術

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-8-20以後開放)

摘要(中)

深度神經網絡(Deep Neural Networks, DNN)已經被廣泛地使用於人工智能應用。而典型的深度神經網絡系統加速器通常具有靜態隨機訪問記憶體(SRAM)，用來暫時儲存數據及資料。在此論文當中，我們提出了一種有效的內建自我修復方案(Built-in self-repair)，用來提高深度神經網絡系統加速器中靜態隨機訪問記憶體(SRAM)的良率。論文的第一部分，我們提出一種交換機制的技術，在有限制的降低推論精確度(inference accuracy)之下提高記憶體良率。而此交換機制技術可以與現有的內建冗餘分析(Built-in redundancy analysis)演算法相互結合。我們實現兩種與交換機制結合之內建冗餘分析方案，包含局部修復最大化(local-repair-most, LRM)以及全面的內建冗餘分析(exhaustive BIRA)兩種。實驗結果表明，經修改的局部修復最大化演算法與全面的內建冗餘分析演算法各別在記憶體大小256 千位元組且帕松分布均值為0.2~1.0 (1.0~3.0)的條件下進行模擬，可以提高修復率大約3.4% (30.7%)與3.5% (27.3%)，並且犧牲至多0.10% (0.73%)
及0.12% (0.95%) 於MobileNet 以及Res-Net-50 模型之推論精確度。論文的第二部分，我們為上述所提出的內建冗餘分析方案提供一個自動評估與驗證平台。在平台當中，內建冗餘分析編譯器可以產生我們所提出的內建冗餘分析的暫存器傳輸級(RTL)之設計。其中評估工具根據指定的深度神經網絡模型以及加速器的靜態隨機訪問記憶體，提供修復率以及推論精確度的預測。另外驗證平台的部分，可以產生Verilog 測試平台(testbench)用以驗證內建冗餘分析的暫存器傳輸級設計。

摘要(英)

Deep neural networks (DNNs) have been widely used for artificial intelligence applications. An accelerator in a DNN system typically has static random access memories (SRAMs) for data buffering. In this thesis, we propose an efficient built-in self-repair scheme for enhancing the yield of SRAMs in the accelerator of DNN systems. In the first part of this thesis, a swapping mechanism is proposed to increase the yield under the constraint of
inference accuracy reduction. The swapping mechanism can be integrated into existing built-in redundancy analysis (BIRA) algorithms. A local-repair-most (LRM) and an exhaustive BIRA algorithms are modified to include the swapping mechanism. Simulation results show that the modified LRM scheme and exhaustive BIRA schemes can gain about 3.4% (30.7%) and 3.5% (27.3%) increment of repair rate by sacrificing most 0.10% (0.73%) and 0.12% (0.95%) inference accuracy reduction for a MobileNet and ResNet-50 under the condition of injection fault with Poisson distribution mean value 0.2~1.0 (and in mean value 1.0~3.0) of 256 Kbyte memory size with 2D redundancy configuration, respectively. In the second part of this thesis, we present an automation, evaluation and verification platform for the proposed BIRA schemes. In the platform, a BIRA compiler is designed to generate the RTL of the proposed BIRA schemes. An evaluation tool is proposed to estimate the repair rate and inference accuracy for the SRAMs in a given accelerator executing a given DNN model. Finally, the platform can generate Verilog testbench for the verification of RTL design of
BIRAs.

關鍵字(中)

★ 內建自我修復技術
★ 修復率
★ 內建備份元件分析技術

關鍵字(英)

★ built in self repair
★ repair rate
★ built-in redundancy-analysis

論文目次

1 Introduction 1
1.1 Deep Neural Network System . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Neural Network Acceleration System . . . . . . . . . . . . . . . . . . 4
1.2 Memory Built-In Self-Repair Techniques . . . . . . . . . . . . . . . . . . . . 4
1.2.1 BISR Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Built-In Redundancy Analysis . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Error Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Thesis Motivation and Contribution . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Proposed BISR Scheme for Memories in DNN Systems 9
2.1 Concept of Swap Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Swap Mechanism Variability . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 RA Schemes with Swap Mechanism . . . . . . . . . . . . . . . . . . . 12
2.2 Proposed Heuristic BIRA Algorithm with Swap Mechanism . . . . . . . . . . 12
2.2.1 Local Bitmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Built-In Redundancy Analysis Algorithm . . . . . . . . . . . . . . . . 14
2.2.3 Design of the BIRA Circuit . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Proposed Exhaustive BIRA Algorithm with Swap Mechanism . . . . . . . . 25
2.3.1 CRESTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.2 Built-In Redundancy Analysis Algorithm . . . . . . . . . . . . . . . . 26
2.3.3 Design of the BIRA Circuit . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Evaluation and Verification Platform 34
3.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Evaluation Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Fault Map Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 Redundancy Analysis Categories in DNN Accelerator . . . . . . . . . 37
3.2.3 Evaluation Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Verification Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Verify Simulation and Design Result . . . . . . . . . . . . . . . . . . 40
3.3.2 Automatic RTL Generation of Proposed BIRA . . . . . . . . . . . . . 42
4 Experimental Result and Analysis 45
4.1 Repair Rate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Repair Rate of HRA-SW Algorithm . . . . . . . . . . . . . . . . . . . 45
4.1.2 Repair Rate of ERA-SW Algorithm . . . . . . . . . . . . . . . . . . . 49
4.1.3 Repair Rate of Comparison Results . . . . . . . . . . . . . . . . . . . 53
4.2 Inference Accuracy Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 Number of Swap Words and Inference Accuracy . . . . . . . . . . . . 58
4.2.2 Different Size of Memory and Inference Accuracy . . . . . . . . . . . 60
4.2.3 Two RA Schemes and Inference Accuracy . . . . . . . . . . . . . . . 62
4.3 Area Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 Area Overhead of HRA-SW scheme . . . . . . . . . . . . . . . . . . . 64
4.3.2 Area Overhead of ERA-SW scheme . . . . . . . . . . . . . . . . . . . 65
5 Conclusion and Future Work 68

參考文獻

[1] D. Gerhard, Neuroscience. 5th Edition, 5th ed., ser. The Yale Journal of Biology and Medicine. Yale University, US: YJBM, 3 Jan. 2013, vol. 86.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015.
[3] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.
[4] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Advances in neural information processing systems, 2014, pp. 568–576.
[5] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal processing magazine, vol. 29, no. 6, pp. 82–97, 2012.
[6] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, “Natural language processing (almost) from scratch,” Journal of machine learning research, vol. 12, no. Aug, pp. 2493–2537, 2011.
[7] H. Zeng, M. D. Edwards, G. Liu, and D. K. Gifford, “Convolutional neural network architectures for predicting DNA–protein binding,” Bioinformatics, vol. 32, no. 12, pp. i121–i127, 2016.
[8] M. Jermyn, J. Desroches, J. Mercier, M.-A. Tremblay, K. St-Arnaud, M.-C. Guiot, K. Petrecca, and F. Leblond, “Neural networks improve brain cancer detection with
Raman spectroscopy in the presence of operating room light artifacts,” Journal of biomedical optics, vol. 21, no. 9, p. 094002, 2016.
[9] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint
arXiv:1312.5602, 2013.
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[11] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
[14] W. Du, Z. Wang, and D. Chen, “Optimizing of convolutional neural network accelerator,” in Green Electronics, C. Ravariu and D. Mihaiescu, Eds. Rijeka:
IntechOpen, 2018, ch. 8. [Online]. Available: https://doi.org/10.5772/intechopen.75796
[15] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015, pp. 161–170.
[16] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, “CUDNN: Efficient primitives for deep learning,” arXiv preprint
arXiv:1410.0759, 2014.
[17] Y.-H. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 367–379, 2016.
[18] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2016.
[19] Y. Zorian, “Embedded memory test and repair: Infrastructure ip for soc yield,” in Proceedings IEEE International Test Conference. IEEE, 2002, pp. 340–349.
[20] Y. Zorian and S. Shoukourian, “Embedded-memory test and repair: infrastructure ip for soc yield,” IEEE Design & Test of Computers, no. 3, pp. 58–66, 2003.
[21] J.-F. Li, J.-C. Yeh, R.-F. Huang, and C.-W. Wu, “A built-in self-repair design for RAMs with 2-D redundancy,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 6, pp. 742–745, 2005.
[22] T.-W. Tseng, J.-F. Li, and C.-C. Hsu, “ReBISR: A reconfigurable built-in self-repair scheme for random access memories in SOCs,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, no. 6, pp. 921–932, 2009.
[23] T.-W. Tseng, Y.-J. Huang, and J.-F. Li, “DABISR: A defect-aware built-in self-repair scheme for single/multi-port RAMs in SoCs,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 10, pp. 1628–1639, 2010.
[24] T.-W. Tseng, J.-F. Li, and C.-S. Hou, “A built-in method to repair SoC RAMs in parallel,” IEEE design & Test of Computers, vol. 27, no. 6, pp. 46–57, 2010.
[25] C.-S. Hou, J.-F. Li, and T.-W. Tseng, “Memory built-in self-repair planning framework for RAMs in SoCs,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 30, no. 11, pp. 1731–1743, 2011.
[26] S.-K. Lu, C.-J. Tsai, and M. Hashizume, “Enhanced built-in self-repair techniques for improving fabrication yield and reliability of embedded memories,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 8, pp. 2726–2734, 2016.
[27] A. Tanabe, T. Takeshima, H. Koike, Y. Aimoto, M. Takada, T. Ishijima, N. Kasai, H. Hada, K. Shibahara, T. Kunio, et al., “A 30-ns 64-Mb DRAM with built-in self-
test and self-repair function,” IEEE Journal of Solid-State Circuits, vol. 27, no. 11, pp. 1525–1533, 1992.
[28] V. Schober, S. Paul, and O. Picot, “Memory built-in self-repair using redundant words,” in Proceedings IEEE International Test Conference 2001, 2001, pp. 995–1001.
[29] D. Anand, B. Cowan, O. Farnsworth, P. Jakobsen, S. Oakland, M. R. Ouellette, and D. L. Wheater, “An on-chip self-repair calculation and fusing methodology,” IEEE
Design Test of Computers, vol. 20, no. 5, pp. 67–75, 2003.
[30] C.-T. Huang, C.-F. Wu, J.-F. Li, and C.-W. Wu, “Built-in redundancy analysis for memory yield improvement,” IEEE Transactions on Reliability, vol. 52, no. 4, pp. 386–399, 2003.
[31] I. Kang, W. Jeong, and S. Kang, “High-efficiency memory BISR with two serial RA stages using spare memories,” Electronics Letters, vol. 44, no. 8, pp. 515–517, 2008.
[32] X. Wang, D. Vasudevan, and H.-H. S. Lee, “Global built-in self-repair for 3D memories with redundancy sharing and parallel testing,” in Proceedings IEEE International 3D Systems Integration Conference (3DIC), 2011, pp. 1–8.
[33] M. Nicolaidis, N. Achouri, and S. Boutobza, “Optimal reconfiguration functions for column or data-bit built-in self-repair,” in Proceedings IEEE Design Automation and
Test in Europe (DATE’03), 2003, pp. 590–595.
[34] R. Zappa, C. Selva, D. Rimondi, C. Torelli, M. Crestan, G. Mastrodomenico, and L. Albani, “Micro programmable built-in self repair for SRAMs,” in Proceedings IEEE International Workshop on Memory Technology, Design and Testing, 2004, pp. 72–77.
[35] C.-L. Su, R.-F. Huang, and C.-W. Wu, “A processor-based built-in self-repair design for embedded memories,” in Proceedings IEEE Test Symposium, 2003, pp. 366–371.
[36] X. Du, S. M. Reddy, W.-T. Cheng, J. Rayhawk, and N. Mukherjee, “At-speed built-in self-repair analyzer for embedded word-oriented memories,” in Proceedings IEEE 17th International Conference on VLSI Design, 2004, pp. 895–900.
[37] P. Ohler, S. Hellebrand, and H.-J. Wunderlich, “An integrated built-in test and repair approach for memories with 2D redundancy,” in Proceedings IEEE 12th European Test Symposium (ETS’07), 2007, pp. 91–96.
[38] S.-Y. Kuo and W. K. Fuchs, “Efficient spare allocation for reconfigurable arrays,” IEEE Design & Test of Computers, vol. 4, no. 1, pp. 24–31, 1987.
[39] T.-W. Tseng, J.-F. Li, and D.-M. Chang, “A built-in redundancy-analysis scheme for RAMs with 2D redundancy using 1D local bitmap,” in Proceedings of the Design Au-
tomation & Test in Europe Conference, vol. 1, 2006, pp. 6–pp.
[40] K. Cho, Y.-W. Lee, S. Seo, and S. Kang, “An efficient BIRA utilizing characteristics of spare pivot faults,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 3, pp. 551–561, 2018.
[41] T. Kawagoe, J. Ohtani, M. Niiro, T. Ooishi, M. Hamada, and H. Hidaka, “A built-in self-repair analyzer (CRESTA) for embedded DRAMs,” in Proceedings IEEE International Test Conference 2000, 2000, pp. 567–574.
[42] T.-J. Chen, J.-F. Li, and T.-W. Tseng, “Cost-efficient built-in redundancy analysis with optimal repair rate for rams,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 6, pp. 930–940, 2012.
[43] S. Nakahara, K. Higeta, M. Kohno, T. Kawamura, and K. Kakitani, “Built-in self-test for GHz embedded SRAMs using flexible pattern generator and new repair algorithm,” in Proceedings IEEE International Test Conference, 1999, pp. 301–310.
[44] D. K. Bhavsar, “An algorithm for row-column self-repair of RAMs and its implementation in the Alpha 21264,” in Proceedings IEEE International Test Conference, 1999, pp. 311–318.
[45] T.-Y. Hsieh, K.-H. Li, and Y.-H. Peng, “On efficient error-tolerability evaluation and maximization for image processing applications,” in Proceedings IEEE Technical Papers of 2014 International Symposium on VLSI Design, Automation and Test, 2014, pp. 1–4.
[46] T.-Y. Hsieh, C.-C. Ku, and C.-H. Yeh, “A yield and reliability enhancement framework for image processing applications,” in 2012 IEEE Asia Pacific Conference on Circuits and Systems, 2012, pp. 683–686.
[47] T.-Y. Hsieh, M. A. Breuer, M. Annavaram, S. K. Gupta, and K.-J. Lee, “Tolerance of performance degrading faults for effective yield improvement,” in Proceedings IEEE 2009 International Test Conference, 2009, pp. 1–10.
[48] Q. Fan, S. S. Sapatnekar, and D. J. Lilja, “Cost-quality trade-offs of approximate memory repair mechanisms for image data,” in Proceedings IEEE 2017 18th International Symposium on Quality Electronic Design (ISQED), 2017, pp. 438–444.
[49] T. F. Hsieh, J. F. Li, J. S. Lai, C. Y. Lo, D. M. Kwai, and Y. F. Chou, “Refresh Power Reduction of DRAMs in DNN Systems Using Hybrid Voting and ECC Method,” in Proceedings IEEE International Test Conference in Asia (ITC-Asia), 2020.
[50] H. R. Mahdiani, S. M. Fakhraie, and C. Lucas, “Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 8, pp. 1215–1228, 2012.

指導教授

李進福(Jin-Fu Li)

審核日期

2020-8-24

推文