二值卷積神經網路硬體加速器設計與實作

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：23

、訪客IP：18.118.128.210

姓名

王政晏(Jheng-Yan Wang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

二值卷積神經網路硬體加速器設計與實作
(Design and Implementation of Hardware Accelerator for Binarized Convolutional Neural Network)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-8-3以後開放)

摘要(中)

在解決影像識別的問題上，深度學習取得了很好的成果，其中卷積神經網路是最具代表性的模型，也成為現今解決影像辨識問題的主流方法，然而其龐大的運算量與記憶體佔用，使得神經網路模型難以運用在硬體資源受限的邊緣裝置上。為了解決上述問題，有許多新的模型被提出來，後來甚至發展出二值卷積神經網路，大幅降低了深度神經網路的硬體資源需求。隨著近幾年智慧物聯網的興起，為了提供邊緣裝置即時且有效解決影像識別問題的方法，增強邊緣裝置的能力，本論文以ReActNet二值卷積神經網路模型作為參考，並透過階層式模組化的設計方法，設計了一個具有彈性架構的二值卷積神經網路硬體加速器，且在硬體設計上使用管線化技術，以提升神經網路的推論速度，同時使用8位元的定點運算取代浮點數運算，減少硬體資源的使用量。根據實驗結果，硬體加速器的單一影像辨識速度是使用圖形處理器的伺服器的27倍，雖然辨識率差了2.71%，但整體的耗電量較低，是伺服器的0.06倍。本論文提出之二值卷積神經網路硬體加速器不僅具有彈性架構，且在速度上具有即時性，能夠很好的提供硬體資源受限的裝置解決影像識別問題的能力。

摘要(英)

In solving the image recognition problem, deep learning has achieved good results, among which the convolutional neural network is the most representative model and has become the mainstream approach to solve the image recognition problem nowadays. However, the huge computation and memory consumption make it difficult to use the neural network model on the edge devices with limited hardware resources. To solve these problems, many new models have been proposed, and later even binary convolutional neural networks have been developed to significantly reduce the hardware resource requirements of deep neural networks. With the emergence of Artificial Intelligence of Things in recent years, to provide a real-time and effective solution to the image recognition problem and enhance the capability of the edge devices, this paper uses ReActNet as a reference for the binary convolutional neural network model and designs a binary convolutional neural network hardware accelerator with a flexible architecture through a hierarchical modular design approach. In addition, the pipelining technique is used in the hardware design to improve the inference speed of the neural network, and 8-bit fixed-point computing is used instead of floating-point computing to reduce the hardware resource usage. According to the experimental results, the single image recognition speed of the hardware accelerator is 27 times faster than that of the server using graphics processor. Although the recognition rate is 2.71% worse than that of the server, the overall power consumption is lower than that of the server by 0.06 times. The binary convolutional neural network hardware accelerator proposed in this paper not only has a flexible architecture, but also has real-time speed, which can provide a good ability to solve image recognition problems for devices with limited hardware resources.

關鍵字(中)

★ 神經網路硬體加速器
★ 二值卷積神經網路

關鍵字(英)

★ Hardware Accelerator for Neural Network
★ Binarized Convolutional Neural Network

論文目次

摘要 I
Abstract II
謝誌 III
目錄 V
圖目錄 VII
表目錄 IX
第一章、緒論 1
1.1 研究背景 1
1.2 研究目的 3
1.3 論文架構 3
第二章、文獻回顧 4
2.1 二值卷積神經網路 4
2.1.1 Binarized Neural Network 4
2.1.2 XNOR-Net 6
2.2 ReActNet二值卷積神經網路 8
第三章、二值卷積神經網路硬體加速器設計 12
3.1 系統設計方法論 12
3.1.1 IDEF0階層式模組化設計 13
3.1.2 GRAFCET離散事件建模 15
3.2 二值卷積神經網路硬體加速器架構 17
3.3 二值卷積神經網路硬體加速器GRAFCET 19
3.3.1 標準卷積層GRAFCET 20
3.3.2 二值卷積模塊層GRAFCET 21
3.4 硬體加速器管線化設計 26
第四章、實驗結果 32
4.1 實驗軟硬體開發環境 32
4.2 影像辨識實驗 33
4.2.1 影像辨識資料集 33
4.2.2 二值卷積神經網路架構 35
4.2.3 影像辨識結果 36
4.3 硬體合成與驗證 37
4.3.1 管線化控制器模組 38
4.3.2 G1模組 40
4.3.3 G2模組 42
4.3.4 硬體合成資源 43
4.4 實驗結果分析 45
第五章、結論與未來展望 47
5.1 結論 47
5.2 未來展望 48
參考文獻 49

參考文獻

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, May 2017.
[2] G. Hinton et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012.
[3] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” Adv. Neural Inf. Process. Syst., vol. 4, no. January, pp. 3104–3112, Sep. 2014.
[4] R. Raina, A. Madhavan, and A. Y. Ng, “Large-scale deep unsupervised learning using graphics processors,” in Proceedings of the 26th Annual International Conference on Machine Learning - ICML ’09, 2009, pp. 1–8.
[5] O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Sep. 2014.
[6] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14, Sep. 2014.
[7] C. Szegedy et al., “Going Deeper with Convolutions,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07-12-June, pp. 1–9, Sep. 2014.
[8] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, Dec. 2015.
[9] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size,” pp. 1–13, Feb. 2016.
[10] A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv, Apr. 2017.
[11] M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” 36th Int. Conf. Mach. Learn. ICML 2019, vol. 2019-June, pp. 10691–10700, May 2019.
[12] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9908 LNCS, pp. 525–542, Mar. 2016.
[13] Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K.-T. Cheng, “Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11219 LNCS, pp. 747–763, Aug. 2018.
[14] J. Bethge, C. Bartz, H. Yang, Y. Chen, and C. Meinel, “MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy?,” arXiv, Jan. 2020.
[15] S. Mittal, “A survey of FPGA-based accelerators for convolutional neural networks,” Neural Comput. Appl., vol. 32, no. 4, pp. 1109–1139, Feb. 2020.
[16] S. Han et al., “ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA,” Proc. 2017 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 75–84, Dec. 2016.
[17] A. Page, A. Jafari, C. Shea, and T. Mohsenin, “SPARCNet: A hardware accelerator for efficient deployment of sparse convolutional networks,” ACM J. Emerg. Technol. Comput. Syst., vol. 13, no. 3, pp. 1–32, May 2017.
[18] J. Qiu et al., “Going Deeper with Embedded FPGA Platform for Convolutional Neural Network,” in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2016, pp. 26–35.
[19] J. Park and W. Sung, “FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2016-May, pp. 1011–1015, Feb. 2016.
[20] S. A. Mirsalari, N. Nazari, S. A. Ansarmohammadi, M. E. Salehi, and S. Ghiasi, “E2BNet: MAC-free yet accurate 2-level binarized neural network accelerator for embedded systems,” J. Real-Time Image Process., Jul. 2021.
[21] R. Zhao et al., “Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs,” in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2017, pp. 15–24.
[22] S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei, “FP-BNN: Binarized neural network on FPGA,” Neurocomputing, vol. 275, pp. 1072–1086, Jan. 2018.
[23] Y. Umuroglu et al., “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference,” FPGA 2017 - Proc. 2017 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, no. February, pp. 65–74, Dec. 2016.
[24] Z. Liu, Z. Shen, M. Savvides, and K.-T. Cheng, “ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12359 LNCS, pp. 143–159, Mar. 2020.
[25] M. Courbariaux, Y. Bengio, and J.-P. David, “BinaryConnect: Training Deep Neural Networks with binary weights during propagations,” Adv. Neural Inf. Process. Syst., vol. 2015-Janua, pp. 3123–3131, Nov. 2015.
[26] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1,” Feb. 2016.
[27] C.-H. Chen, M.-Y. Lin, and X.-C. Guo, “High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things,” Comput. Electr. Eng., vol. 61, pp. 48–66, Jul. 2017.
[28] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms,” pp. 1–6, Aug. 2017.
[29] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[30] P. Guo, H. Ma, R. Chen, P. Li, S. Xie, and D. Wang, “FBNA: A Fully Binarized Neural Network Accelerator,” in 2018 28th International Conference on Field Programmable Logic and Applications (FPL), Aug. 2018, pp. 51–513.
[31] W. Mao et al., “Energy-Efficient Machine Learning Accelerator for Binary Neural Networks,” in Proceedings of the 2020 on Great Lakes Symposium on VLSI, Sep. 2020, pp. 77–82.

指導教授

陳慶瀚(Ching-Han Chen)

審核日期

2021-8-3

推文