應用於邊緣裝置的機器學習系統晶片 軟硬體共同開發

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：106

、訪客IP：18.118.140.79

姓名

胡桂誠(Guei-Cheng Hu) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

應用於邊緣裝置的機器學習系統晶片軟硬體共同開發
(Co-Development of Software and Hardware for Machine Learning System-on-a-Chip Applied to Edge Devices)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-6-5以後開放)

摘要(中)

本研究旨在開發一個結合機率神經網路(Probabilistic Neural Network, PNN)與RISC-V的機器學習系統晶片(MLSoC)，以發揮硬體加速的優勢並具備微處理器的泛用性，實現高性能和高度客製化的機器學習應用。透過RISC-V自定義指令和中斷時序設計來優化軟硬體間的數據傳輸和處理流程，增進系統的整體運行效率。本研究採用MIAT系統設計方法論，實現高度的模組化設計，提高系統架構的靈活性。此外，為解決嵌入式系統中記憶體和運算資源達到最佳化設計，本研究提出一個可變精度神經網路開發框架，開發者可以依據需求調整精度。
實驗結果表明，所開發的MLSoC能夠在66毫秒內完成一張64x48大小的影像分割，每個像素的處理時間約為21微秒，消耗能量為0.00504mWh，顯示出系統在保持低功耗的同時，亦能提供高效的運算性能。此外，系統在處理不同精度設定下展現出良好的靈活性和準確性。
本研究提出了一個高效能、低功耗且易於擴展的機器學習軟硬體解決方案，MLSoC的設計在工業應用中尤其具有潛力，適合被廣泛應用於需要即時影像處理和物件識別的場景。本研究的成果也提供了一個實用的參考模型，有助於未來在FPGA上實現更多高效的機器學習解決方案，推動更廣泛的醫療和工業應用。

摘要(英)

This study aims to develop a machine learning system-on-a-chip (MLSoC) that integrates a Probabilistic Neural Network (PNN) with RISC-V, leveraging the advantages of hardware acceleration while maintaining the versatility of a microprocessor to achieve high performance and highly customizable machine learning applications. The system optimizes data transfer and processing workflows between software and hardware through custom instructions and interrupt handling, enhancing overall system efficiency. The study employs the MIAT system design methodology to achieve a highly modular design, improving the flexibility of the system architecture. Additionally, to address the challenges of memory and computational resource limitations in embedded systems, this study proposes a variable precision neural network development framework, allowing developers to adjust precision according to their needs.
Experimental results show that the developed MLSoC can complete the segmentation of a 64x48 image in 66 milliseconds, with each pixel processed in approximately 21 microseconds, demonstrating that the system can provide efficient computational performance while maintaining low power consumption. Furthermore, the system exhibits good flexibility and accuracy under different precision settings.
This research provides an efficient, low-power, and scalable hardware solution for machine learning. The MLSoC design has significant potential in industrial applications, especially suitable for scenarios requiring real-time image processing and object recognition. The outcomes of this research also offer a practical reference model for other researchers, facilitating the development of more efficient machine learning solutions on FPGA, thereby advancing broader application development.

關鍵字(中)

★ 硬體加速器
★ 系統晶片
★ 機率神經網路
★ 影像分割

關鍵字(英)

★ RISC-V
★ PNN
★ SOC

論文目次

摘要............................................................ I
Abstract ...................................................... II
目錄 .......................................................... III
圖目錄 ........................................................ VI
表目錄 ........................................................ X
第一章、緒論 ................................................. 1
1.1 研究背景 .................................................. 1
1.2 研究目的 .................................................. 3
1.3 論文架構 .................................................. 4
第二章、技術回顧 ............................................. 5
2.1 RISC-V神經網路硬體加速器 .................................. 5
2.1.1 RISC-V起源與發展 ....................................... 5
2.1.2 RISC-V基本特點 ......................................... 5
2.1.3 RISC-V 指令集架構 ...................................... 6
2.1.4 基於RISC-V的神經網路硬體加速器軟硬體整合架構設計 ...........8
2.2 神經網路硬體加速 ......................................... 10
2.2.1 整數量化法 ............................................. 10
2.2.2 多精度神經網路 ......................................... 12
2.3 機率神經網路 ............................................. 13
2.3.1 機率神經網路 ........................................... 13
2.3.2 機率神經網路硬體加速器 .................................. 17
2.4 MIAT系統設計方法論 ....................................... 19
2.4.1 IDEF0階層式模組化設計 .................................. 20
2.4.2 GRAFCET離散事件建模 .................................... 22
2.4.3 硬體高階合成 ........................................... 25
第三章、機率神經網路硬體加速器設計 ............................ 28
3.1 機率神經網路硬體設計 ...................................... 28
3.1.1 IDEF0 ................................................. 28
3.1.2 機率密度函數計算模組(A1) ................................ 29
3.1.3 決策模組(A2) ........................................... 32
3.1.4 Verilog硬體設計 ........................................ 35
3.2 定點數量化 ............................................... 36
3.3 可變精度設計 ............................................. 37
3.3.1 軟體可變精度設計 ....................................... 37
3.3.2 多精度硬體設計 ......................................... 39
3.4 管線化設計 ............................................... 39
3.4.1 管線化原理 ............................................. 39
3.4.2 PNN管線化設計 .......................................... 42
3.4.3 管線化的PNN Verilog設計 ................................ 45
第四章、 RISC-V機器學習系統晶片設計 ............................ 47
4.1 RISC-V處理器和開發平台硬體設計 ............................ 47
4.1.1 系統設計 ............................................... 47
4.1.2 系統中斷配置 ........................................... 49
4.2 RISC-V的PNN硬體加速器軟體設計 ............................. 52
4.2.1 RISC-V軟體IDEF0 ....................................... 52
4.2.2 系統狀態機(A1) ......................................... 53
4.2.3 Data Set狀態機(A11) ................................... 55
4.2.4 Test Feature狀態機(A12) ............................... 56
4.2.5 中斷狀態機(A14) ....................................... 57
4.3 RISC-V擴充指令設計 ...................................... 58
第五章、 RISC-V硬體加速器實驗 ................................ 61
5.1 實驗環境 ................................................ 61
5.1.1 實驗平台 .............................................. 61
5.1.2 測試資料集 ............................................ 64
5.1.3 訓練用特徵 ............................................ 65
5.2 機率神經網路硬體加速器實驗 ............................... 69
5.2.1 數位電路合成 .......................................... 69
5.2.2 時序驗證 ............................................. 71
5.2.3 比較不同Sigma測試結果 ................................. 71
5.2.4 比較不同位元精度測試結果 ............................... 72
5.3 RISC-V系統晶片實驗 ...................................... 73
5.3.1 系統狀態機測試 ........................................ 73
5.3.2 中斷觸發測試 .......................................... 74
5.3.3 並列式寫入測試 ........................................ 76
5.3.4 執行時間測試 .......................................... 79
5.3.5 實作結果 .............................................. 80
5.3.6 硬體加速器綜合評比 ..................................... 81
第六章、結論與未來展望 ...................................... 85
6.1 結論 .................................................... 85
6.2 未來展望 ................................................ 86
第七章、參考文獻 ............................................ 87

參考文獻

[1] K. Dang and S. Sharma, "Review and Comparison of Face Detection," 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence, pp. 629-633, 2017.
[2] M. Al-Qizwini, I. Barjasteh, H. Al-Qassab and H. Radha, "Deep Learning Algorithm for Autonomous Driving using GooLeNet," IEEE Intelligent Vehicles Symposium, pp. 89-96, 2017.
[3] Z. Chen and X. Zhilu, "End-to-End Learning for Lane Keeping of Self-Driving Cars," IEEE Intelligent Vehicles Symposium, pp. 1856-1860, 2017.
[4] M. Munir, S. A. Siddiqui, A. Dengel and S. Ahmed, "DeepAnT: A Deep Learning Approach for Unsupervised Anomaly Detection in Time Series," IEEE Access, vol. 7, pp. 1991-2005, 2019.
[5] I. Ullah and H. Q. Mahmoud, "Design and Development of a Deep Learning-Based Model for Anomaly Detection in IoT Networks," IEEE Access, vol. 9, pp. 103906-103926, 2021.
[6] Z.-Q. Zhao, P. Zheng, S.-T. Xu and X. Wu, "Object Detection With Deep Learning: A Review," IEEE Transactions on Neural Networks and Learning Systems, vol. 30, pp. 3212 - 3232, 2019.
[7] G. Yang, W. Feng, J. Jin, Q. Lei, X. Li, G. Gui and W. Wang, "Face Mask Recognition System with YOLOV5 Based on Image Recognition," IEEE Conference on Computer and Communications, pp. 1398-1404, 2020.
[8] W. Alexander, T. Hanazawa, G. Hinton, K. Shikano and K. J. Lang, "Phoneme Recognition Using Time-Delay Neural Networks," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328-339, 1989.
[9] D. Strigl, K. Kofler and S. Podlipnig, "Performance and Scalability of GPU-based Convolutional Neural Networks," 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 317-324, 2010.
[10] D. Steinkraus, I. Buck and Y. P. Simard, "Using GPUs for Machine Learning Algorithms," 8th International Conference on Document Analysis and Recognition, 2005.
[11] P.-K. Tsung, S.-F. Tsai, A. Pai, S.-J. Lai and C. Lu, "High Performance Deep Neural Network on Low Cost Mobile GPU," 2016 IEEE International Conference on Consumer Electronics, pp. 69-70, 2016.
[12] H. Liu, Z. Wei, H. Zhang, B. Li and C. Zhao, "Tiny Machine Learning (Tiny-ML) for Efficient Channel," IEEE Transactions on Vehicular Technology, vol. 71, no. 6, pp. 6795-6800, 2022.
[13] M. Hussein, M. Zorkany and N. S. A. Kader, "Real Time Operating Systems for the Internet of Things," World Symposium on Computer Applications & Research, 2016.
[14] K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang and H. Yang, "Angel-Eye: A Complete Design Flow for Mapping," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 1, pp. 35-47, 2018.
[15] D. T. Nguyen, T. N. Nguyen, H. Kim and H.-J. Lee, "A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 8, pp. 1861-1873, 2019.
[16] M. Peemen, A. A. Setio, B. Mesman and H. Corporaal, "Memory-Centric Accelerator Design for Convolutional Neural Networks," IEEE 31st International Conference on Computer Design (ICCD), 2013.
[17] A. Waterman, Y. Lee, R. Avizienis, H. Cook, D. Patterson and K. Asanovic, "The RISC-V instruction set," IEEE Hot Chips 25 Symposium (HCS), 2013.
[18] L. Zhang, X. Zhou and C. Guo, "A CNN Accelerator with Embedded RISC-V Controllers," China Semiconductor Technology International Conference (CSTIC), 2021.
[19] L. Ren, M. Yang, Y. Liu and J. Han, "Quality Defect Recognition Method Based on Variable Precision Rough Set and Deep Belief Network," 2022 4th International Conference on Artificial Intelligence and Advanced Manufacturing, 2022.
[20] Y. He, Y. Wang, Y. Wang, H. Li and X. Li, "An Agile Precision-Tunable CNN Accelerator based on ReRAM," IEEE/ACM International Conference on Computer-Aided Design, pp. 1-7, 2019.
[21] C.-H. Chen, M.-Y. Lin and X.-C. Guo, "High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61, pp. 48-66, 2017.
[22] SiFive, "The RISC-V Instruction Set Manual," 7 May 2017. [Online]. Available: https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf. [Accessed 27 Feb 2024].
[23] S.-Y. Lee, Y.-W. Hung, Y.-T. Chang, C.-C. Lin and G.-S. Shieh, "RISC-V CNN Coprocessor for Real-Time Epilepsy Detection in Wearable Application," IEEE Transactions on Biomedical Circuits and Systems, vol. 15, no. 4, pp. 679-691, 2021.
[24] P. Luszczek, J. Kurzak, I. Yamazaki and J. Dongarra, "Towards numerical benchmark for half-precision floating point arithmetic," in 2017 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 2017.
[25] X.-T. Tran, D.-A. Nguyen, D.-H. Bui and X.-T. Tran, "A Variable Precision Approach for Deep Neural Networks," in 2019 International Conference on Advanced Technologies for Communications (ATC), Hanoi, Vietnam, 2019.
[26] R. Singh, T. Conroy and P. Schaumont, "Variable Precision Multiplication for Software-Based Neural Networks," in 2020 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 2020.
[27] D. F. Specht, "Probabilistic Neural Networks," Neural networks, vol. 3, no. 1, pp. 109-118, 1990, vol. 3, no. 1, pp. 109-118, 1990.
[28] K. Vipin, Y. Akhmetov, S. Myrzakhme and A. P. James, "FAPNN : An FPGA based Approximate Probabilistic Neural Network Library," 2018 International Conference on Computing and Network Communications (CoCoNet), pp. 64-68, 30 Sep 2018.
[29] N. Aibe, M. Yasunaga, I. Yoshihara and J. H. Kim, "A Probabilistic Neural Network Hardware System Using a Learning-Parameter Parallel Architecture," Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN′02 (Cat. No.02CH37290), pp. 2270-2275, 2002.
[30] G. Minchin and A. Zaknich, "A Design for FPGA Implementation of the Probabilistic Neural Network," ICONIP′99. ANZIIS′99 & ANNES′99 & ACNN′99. 6th International Conference on Neural Information Processing. Proceedings, pp. 556-559, 1999.
[31] N. Bu, T. Hamamoto, T. Tsuji and O. Fukuda, "FPGA Implementation of a Probabilistic Neural Network for a Bioelectric Human Interface," The 2004 47th Midwest Symposium on Circuits and Systems, 2004. MWSCAS ′04, pp. iii-29-32, 2004.
[32] R. J. Mayer, "IDEF0 Function Modeling," A Reconstruction of the Original Air Force Wright Aeronautical Laboratory Technical Report, AFWAL-TR-81-4023 (The IDEF0 Yellow Book), 1992.
[33] R. David, "Grafcet: A Powerful Tool for Specification of Logic Controllers," IEEE Transactions on Control Systems Technology, vol. 3, no. 3, pp. 253-265, 1995.

指導教授

陳慶瀚(Ching-Han Chen)

審核日期

2024-6-6

推文