基於FPGA的可重構神經網路加速器應用於Tiny YOLO-V3

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：14

、訪客IP：3.128.95.113

姓名

童迺婕(Nai-Chieh Tung) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

基於FPGA的可重構神經網路加速器應用於Tiny YOLO-V3
(An FPGA-Based Reconfigurable Convolutional Neural Network Accelerator for Tiny YOLO-V3)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來，深度學習的發展日漸蓬勃，隨著硬體設備的更新與進步，神經網路能夠處理的事項也越來越多，更滲透進我們的生活中，小至平時手機的解鎖功能，大至現在的智能客服、聊天機器人，這些都逐漸將傳統演算法的方法替代掉。然而，神經網路有著包含參數量與計算量龐大的問題，且需要在GPU或是有CUDA加速的嵌入式開發版上執行。所以近幾年關於能夠加速其神經網路的硬體架構與方法越來越多。
本論文提出可重構的硬體架構設計，可以針對不同輸入圖像大小、不同卷積核大小以及不同的步長來對應進行計算，分別設計深度卷積(Depth-wise Convolution)、逐點卷積(Point-wise Convolution)、一般卷積、正規化(Batch-Normalization)、激活函數(Activation Function)以及最大池化層(Max-pooling)，進而加速我們的神經網路；我們選擇透過SoC的方式，PL(Programming Logic)端使用AXI總線協議來與PS(Processing System)端溝通，由PS端處理資料傳輸跟排序，PL端處理所有運算。此架構可以根據輸入的指令來選擇目前對應的情況進行計算，且只要其他神經網路模型使用的函數有包含此架構可支援的計算，都可以利用這個可重構架構進行加速；因為內部記憶體的資源量較少，為了減少資料傳輸及溝通的次數，我們選擇在硬體部分直接完成zero-padding，讓內部記憶體可以儲存較多的輸入資料。最後我們在Xilinx ZCU104開發版上實現Tiny YOLO-V3，由CMOS鏡頭將影像輸入至FPGA，物件偵測的結果包含輸入影像、偵測框、分類結果以及機率經由HDMI顯示於螢幕上，在此開發版達到25.6GOPs且只需耗能4.959W，達到5.16GOPs/W的效能。

摘要(英)

In recent year, the development of deep learning has been grown. According to the equipment updated and advanced, neural network could be handle has more and more things. Even, neural network has been through our lives. From the unlocking function of cellphones to the smart customer service and chat robot, these have replaced the traditional algorithmic method. But neural network has huge amount of parameter and calculations, and its need to execute on GPU or an embedded development board with CUDA acceleration. Therefore, the method and hardware architecture that can accelerate the neural network has become main issue.
This paper proposed a reconfigurable hardware architecture which can be calculated for different input image size, different kernel size and different stride size. This architecture includes depthwise convolution, pointwise convolution, convolution, batch-normalization, activation function and max-pooling to accelerate neural network. The proposed system uses SoC design to communicate through the AXI bus protocol between Programming Logic and Processing System. The PS part deal with the data transfer and data sorting and the PL part handle all the calculations. Reconfigurable hardware architecture can select corresponding situation for calculation according to the instruction, and other neural network can use this architecture for acceleration if the function of neural network is supported by this architecture. Because the on-chip memory is less, to reduce the number of data transmission and communication, we choose to directly complete zero-padding in the hardware part. So, the on-chip memory can store more input data. Finally, we implement Tiny YOLO-V3 on the Xilinx ZCU104 development board. The input data transfer to FPGA by CMOS camera. Then the result of object detection which include input image, bounding box, classification and probability will show on the monitor through the HDMI. Our design can achieve 25.6 GOPs and the power consumption only 4.959W and the performance is 5.16 GOPs/W on ZCU104.

關鍵字(中)

★ 物件偵測
★ 可重構
★ 硬體架構
★ FPGA

關鍵字(英)

論文目次

摘要 ....................I
ABSTRACT ....................II
1. 序論 ....................1
1.1. 研究背景與動機 ....................1
1.2. 論文架構 ....................4
2. 文獻探討 ....................5
2.1. 硬體加速器 ....................5
2.2. 物件偵測網路 ....................12
3. 硬體架構設計 ....................17
3.1. 整體系統硬體架構圖 ....................16
3.2. 通用型硬體加速控制模組 ....................20
3.3. AXI傳輸控制訊號 ....................22
3.4. 資料前處理 ....................23
3.5. 運算元件模組(PROCESSING ELEMENT) ....................25
3.6. 資料存取模組(DATA FETCH MODULE) ....................28
3.7. 層融合(LAYER FUSION) ....................30
4. 硬體實現結果 ....................31
4.1. 通用型硬體架構合成結果 ....................31
5. 結論 ....................36
參考文獻 ....................37

參考文獻

[1] Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, et al. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100, 2020.
[2] Deng, Jiankang, et al. Retinaface: Single-stage dense face localisation in the wild. arXiv preprint arXiv:1905.00641, 2019.
[3] Y. Sun, D. Liang, X. Wang, X. Tang, "Deepid3: Face recognition with very deep neural networks", arXiv preprint arXiv:1502.00873, 2015.
[4] Li, Gongfa, et al. Hand gesture recognition based on convolution neural network. Cluster Computing, 2019, 22.2: 2719-2729.
[5] A. Dadashzadeh, A. T. Targhi, M. Tahmasbi, M. Mirmehdi, “HGR-Net: A Fusion Network for Hand Gesture Segmentation and Recognition,” arXiv:1806.05653, 2018.
[6] G. M. Basavaraj and A. Kusagur, "Vision based surveillance system for detection of human fall," 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, 2017, pp. 1516-1520, doi: 10.1109/RTEICT.2017.8256851.
[7] Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012, 25: 1097-1105.
[8] Redmon, Joseph, et al. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 779-788.
[9] Long, Jonathan; Shelhamer, Evan; Darrell, Trevor. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 3431-3440.
[10] Redmon, Joseph; Farhadi, Ali. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
[11] “Why is so much memory needed for deep neural networks?” [Online]. Available: https://www.graphcore.ai/posts/why-is-so-much-memory-needed-for-deep-neural-networks. [Accessed: 13-Jan-2020].
[12] “TensorFlow.” [Online]. Available: https://www.tensorflow.org/. [Accessed: 13-Jan-2020].
[13] “PyTorch.” [Online]. Available: https://pytorch.org/. [Accessed: 13-Jan-2020].
[14] Howard, Andrew G., et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[15] Bai, Lin; Zhao, Yiming; Huang, Xinming. A CNN accelerator on FPGA using depthwise separable convolution. IEEE Transactions on Circuits and Systems II: Express Briefs, 2018, 65.10: 1415-1419.
[16] Nguyen, Duy Thanh, et al. A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019, 27.8: 1861-1873.
[17] Guo, Kaiyuan, et al. Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE transactions on computer-aided design of integrated circuits and systems, 2017, 37.1: 35-47.
[18] Zhang, Jinming, et al. A Low-Latency FPGA Implementation for Real-Time Object Detection. 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2021. p. 1-5.
[19] Zhang, Min, et al. Optimized compression for implementing convolutional neural networks on FPGA. Electronics, 2019, 8.3: 295.
[20] Liu, Bing, et al. An fpga-based cnn accelerator integrating depthwise separable convolution. Electronics, 2019, 8.3: 281.
[21] Moini, Shayan, et al. A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Transactions on Circuits and Systems II: Express Briefs, 2017, 64.10: 1217-1221.
[22] Girshick, Ross, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
[23] Girshick, Ross. Fast R-CNN. Proceedings of the IEEE international conference on computer vision. 2015.
[24] Ren, Shaoqing, et al. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 2015, 28: 91-99.
[25] Redmon, Joseph, et al. You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 779-788.
[26] Liu, Wei, et al. Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, Cham, 2016. p. 21-37.
[27] Simonyan, Karen; Zisserman, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[28] Lin, Tsung-Yi, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. p. 2117-2125.
[29] Redmon, Joseph; Farhadi, Ali. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
[30] He, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770-778.
[31] Yu, Yunxuan, et al. OPU: An FPGA-based overlay processor for convolutional neural networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019, 28.1: 35-47.

指導教授

蔡宗漢

審核日期

2022-4-19

推文