基於SoC的Tiny-YOLOv2 CNN加速器應用於盲人輔助系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：91

、訪客IP：18.224.44.207

姓名

陳俊友(Chun-Yu Chen) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

基於SoC的Tiny-YOLOv2 CNN加速器應用於盲人輔助系統
(An SoC based CNN Accelerator of Tiny-YOLOv2 for Blind Assistive System)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-8-31以後開放)

摘要(中)

隨著深度學習日漸蓬勃，其中卷積神經網路（CNN）進步尤為顯著，使得物件偵測神經網路在電腦視覺領域中取得了巨大進步，這些模型可以在各種複雜場景中準確檢測物件的位置和類別，並已實際應用於人們的日常生活中。在CNN模型的推理階段中，卷積層的運算佔很大的比例，在卷積層中因為輸入圖片會和許多權重核進行乘加運算(MAC)，且隨著網路層逐漸加深，參數量和計算量也會不斷提高。因此，許多研究提出了不同的計算方法和硬體架構來有效處理資料以加速神經網路的計算時間。
本文提出了一種盲人輔助系統，該系統由一個ARM CPU 和一個用於物件偵測任務的神經網路加速器模組組成的SoC架構，基於 Tiny You-Only-Look Once version 2（Tiny YOLOv2）和文本到語音(text-to-speech)來輔助盲人，目的是讓盲人更容易在陌生環境中獨自行走。
我們在模型量化上使用訓練後量化的方式對模型進行分析，使的模型可以在不損失太多準確性的情況下，大量的減少參數量。在架構方面，為了在內部記憶體資源有限制的情況下有效利用內部記憶體，因此我們會在硬體完成所有需要zero-padding的部分，在卷積運算中我們提出了一種卷積運算單元（Convolution Unit），藉由Row Stationary(RS)和將3x3卷積切割成數個1x1 卷積的處理方式，使每個PE可以最大限度地重複使用權重，並且可以支援 3x3 和 1x1 卷積運算。該系統在 Zynq UltraSacle+MPSoC EGO-ZU19EG FPGA 上實現，實驗結果表明，當頻率為166MHz時，該系統可達到 169.98 GOPS功耗為6.599W，達到25.76GOPS/W的效能。

摘要(英)

With the rapid growth of deep learning, Convolutional Neural Networks (CNNs) have made particularly significant advancements, leading to major progress in object detection networks within the field of computer vision. These models can accurately detect the location and category of objects in various complex scenes and have been practically applied in people′s daily lives. During the inference stage of CNN models, convolutional layers account for a large proportion of the computations. In these layers, input feature maps will be multiplied-accumulated (MAC) operations with many weight kernels. As the network layers deepen, the number of parameters and computations also increases. Consequently, numerous studies have proposed different computation methods and hardware architectures to efficiently process data and accelerate the computation time of neural networks.
In this paper, we propose an assistive system for the blind, which consists of a System-on-Chip (SoC) architecture with an ARM CPU and a neural network accelerator module for object detection tasks for Tiny You-Only-Look Once version 2 (Tiny YOLOv2) and text-to-speech to assist the blind, enabling them to walk independently in unfamiliar environments.
We use post-training quantization to analyze the model, significantly reducing the number of parameters without compromising accuracy. In terms of architecture, to effectively utilize the internal memory resources under constraints, we will complete all the parts that require zero-padding in the hardware. For convolution operations, we introduce a Convolution Unit that maximizes the reuse of weights for each PE and supports both 3x3 and 1x1 convolutional operations by using Row Stationary (RS) and decomposes3x3 convolutions into multiple 1x1 convolutions. The system is implemented on a Zynq UltraSacle+MPSoC EGO-ZU19EG FPGA, and the experimental results show that the system achieves 169.98 GOPS with a power consumption of 6.599W and the energy efficiency can achieve 25.76 GOPS/W when the frequency is 166MHz.

關鍵字(中)

★ 卷積神經網絡
★ 盲人輔助系統
★ Tiny YOLOv2
★ 系統單晶片設計

關鍵字(英)

★ convolutional neural network
★ blind assistive system
★ Tiny YOLOv2
★ SoC design

論文目次

摘要 ............................................... I
Abstract .......................................... II
1. 序論 ............................................ 1
1.1. 研究背景與動機 ............................... 1
1.2. 論文架構 ..................................... 4
2. 文獻探討 ..................................... ... 5
2.1. 物件偵測網路 ................................. 5
2.2. 物件偵測硬體加速器 ........................... 11
3. 系統架構設計 ..................................... 15
3.1. 模型量化 ..................................... 15
3.2. 整體硬體架構 ................................. 17
3.3. 卷積單元架構(CONV Unit) ...................... 20
3.4. 量化單元架構(Quantization Unit) .............. 22
3.5. 池化層單元架構(Max pooling Unit) ............. 23
3.6. 文字轉語音(Text-to-Speech) ................... 24
4. 硬體實現結果 ..................................... 25
4.1. 模型量化結果 ................................. 25
4.2. 硬體合成結果 ................................. 27
4.3. 與相關加速之比較 ............................. 28
4.4. 硬體在FPGA開發板執行結果 ...................... 31
5. 結論 ............................................. 35
參考文獻 ............................................ 36

參考文獻

[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015
[2] Liu, Wei, et al. "Ssd: Single shot multibox detector." Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016.
[3] A. Dadashzadeh, A. T. Targhi, M. Tahmasbi, M. Mirmehdi, “HGR-Net: A Fusion Network for Hand Gesture Segmentation and Recognition,” arXiv:1806.05653, 2018.
[4] Deng, Jiankang, et al. Retinaface: Single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641, 2019.
[5] Y. Sun, D. Liang, X. Wang, X. Tang, "Deepid3: Face recognition with very deep neural networks", arXiv preprint arXiv:1502.00873, 2015.
[6] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998.
[7] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition.” In International Conference on Learning Representations, May 2015.
[8] He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778, 2016.
[9] Girshick, Ross, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
[10] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection."Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 779-788.
[11] Liu, Wei, et al. Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, 2016. p. 21-37.
[12] Guo, Kaiyuan, et al. "Angel-eye: A complete design flow for mapping CNN onto embedded FPGA." IEEE transactions on computer-aided design of integrated circuits and systems 37.1 (2017): 35-47.
[13] Du, Li, et al. "A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things." IEEE Transactions on Circuits and Systems I: Regular Papers 65.1 (2017): 198-208.
[14] Wang, Dong, Jianjing An, and Ke Xu. "PipeCNN: An OpenCL-based FPGA accelerator for large-scale convolution neuron networks." arXiv preprint arXiv:1611.02450 (2016).
[15] Y. Ma, N. Suda, Y. Cao, J. Seo and S. Vrudhula, "Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA," 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, 2016, pp. 1-8.
[16] S. Han, H. Mao, and W. Dally. “Deep compression: CompressingDNNs with pruning, trained quantization and huffman coding,” arXivpreprint arxiv:1510.00149v3, 2015.
[17] A Zhou, A. Yao and Y. Guo, “Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights,” arXiv preprint arXiv:1702.03044, 2017.
[18] Chakradhar, Srimat, et al. "A dynamically configurable coprocessor for convolutional neural networks." Proceedings of the 37th annual international symposium on Computer architecture. 2010.
[19] Zhang, Qi, et al. "FPGA implementation of quantized convolutional neural networks." 2019 IEEE 19th International Conference on Communication Technology (ICCT). IEEE, 2019.
[20] Chen, Yu-Hsin, et al. "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks." IEEE journal of solid-state circuits 52.1 (2016): 127-138.
[21] Navneet Dalal and Bill Triggs. “Histograms of oriented gradients for human detection.” 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05). Vol. 1. IEEE, 2005.
[22] Girshick, Ross. Fast R-CNN. Proceedings of the IEEE international conference on computer vision. 2015.
[23] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” In Advances in Neural Information Processing Systems, pages 91–99, 2015.
[24] Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[25] REDMON, Joseph; FARHADI, Ali. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
[26] Lin, Tsung-Yi, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. p. 2117-2125.
[27] S. Zhang, J. Cao, Q. Zhang, Q. Zhang, Y. Zhang and Y. Wang, "An FPGA-Based Reconfigurable CNN Accelerator for YOLO," 2020 IEEE 3rd International Conference on Electronics Technology (ICET), Chengdu, China, 2020, pp. 74-78.
[28] Zhang, Jinming, et al. "A low-latency FPGA implementation for real-time object detection." 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2021.
[29] Adiono, Trio, et al. "Fast and Scalable Multicore YOLOv3-Tiny Accelerator Using Input Stationary Systolic Architecture." IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2023).
[30] Nguyen, Duy Thanh, et al. "A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27.8 (2019): 1861-1873.
[31] Liu, Bing, et al. An fpga-based cnn accelerator integrating depthwise separable convolution. Electronics, 2019, 8.3: 281.
[32] Everingham, Mark, et al. "The pascal visual object classes (voc) challenge." International journal of computer vision 88 (2010): 303-338.
[33] H. Huang, Z. Liu, T. Chen, X. Hu, Q. Zhang, and X. Xiong, "Design space exploration for YOLO neural network accelerator." Electronics, vol. 9, no. 11, p. 1921, Nov. 2020.
[34] Xu, K., Wang, X., Liu, X. et al., "A dedicated hardware accelerator for real-time acceleration of YOLOv2," J Real-Time Image Proc 18,pp. 481–492, 2021.
[35] Hongmin, Huang, et al. "An efficient parallel architecture for convolutional neural networks accelerator on FPGAs." Proceedings of the 6th International Conference on High Performance Compilation, Computing and Communications. 2022.
[36] K. Shi, M. Wang, X. Tan, Q. Li, and T. Lei,“Efficient Dynamic Reconfigurable CNN Accelerator for Edge Intelligence Computing on FPGA,” Information 2023, Vol. 14, Page 194, vol.14, no. 3, p. 194, Mar. 2023.

指導教授

蔡宗漢(Tsung-Han Tsai)

審核日期

2024-8-13

推文