ERCNet:以精簡的ECA分支增強ReActNet

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：105

、訪客IP：18.191.140.36

姓名

陳彥廷(Yen-Ting Chen) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

ERCNet:以精簡的ECA分支增強ReActNet
(ERCNet: Enhancing ReActNet with a Compact ECA Branch)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-6-12以後開放)

摘要(中)

自2016年以來，Courbariaux率先開創了二值神經網路，大幅降低了卷積神經網絡的參數量和計算成本。後續的研究持續不斷的縮小與浮點數網路能力差距。其中，ReActNet在眾多二值模型中嶄露頭角。
本論文重新設計了ReActNet的基礎模塊。首先，我們移除了基礎模塊中所有1x1的二值卷積層，以減少權重大小和運算量。在下採樣模塊中的其中一支1x1二值卷積，用高效通道注意力(ECA)取代，以豐富表示能力。另外，在分支合併之後新增一個BatchNorm，以使數據分佈更優化。最後，將殘差捷徑連接位置移至RPReLU之後，以保留殘差捷徑資訊的完整性，稱之為ERCNet。
從實驗表明，ERCNet在CIFAR100上的Top-1準確率比原始ReActNet高出2.39%，而記憶體佔用量和計算量則分別降低了約10%和8%。在物件偵測實驗中，將ERCNet移入YOLOv8骨幹。在KITTI數據集上，我們的ERCNet比浮點數YOLOv8更為表現出色，達到94.8%的mAP50，分別超越YOLOv8-L和-N 1.9%和11.2%。
最後，根據實驗的結果，我們證明了在某些特定數據集中，二值化神經網路表現能力優於浮點數神經網路，並保持具有更低的記憶體和計算成本。因此，在未來應用於輕量級設備上的特定數據集更加合適。

摘要(英)

Since 2016, Courbariaux pioneered Binary Neural Network to dramatically decrease the storage and computation cost of CNN for lightweight application, researchers have made continued efforts to drill the cost as well as minimize the representation capacity loss and accuracy gap to its real-valued counterpart. Among them, ReActNet achieving 62.16% Top-1 accuracy on CFAR100 sets a new horizon on this competition landscape. In this thesis, we strive for further polishing its performance yet at even a lower overall cost.
We redesign the General Building block of the ReActNet (GBR) in an effort to elevating the accuracy on CIFAR100 image classification dataset, PSCAL VOC 07+12 object detection dataset, and KITTI vision benchmark suits, yet at a lower memory footprint and lower computation cost. The GBR comprises a single Down-sampling Block (DB) and a plurality of Common Blocks (CB). Firstly, we eliminate all the 1x1 Binary Convolutional (BConv) layers of the CBs to reduce the weight parameters as well as the network size. Second, the 1x1 Bconv duplicate of the DB is replaced by the Efficient Channel Attention (ECA) to enrich the representation capacity. Third, a Batch Normalization (BN) unit is added right after the Concatenator of the DB to render the data distribution more suitable for the performance optimization. Finally, the shortcut connection is resided after the RPReLU activation unit so as to balance the information preservation from the shortcut path and information transformation from the residual path. Our experiment shows that the enhanced network (ERCNet) delivers 2.39% higher Top-1 accuracy on CIFAR100 than the original ReActNet yet at around 10% lower memory and 8% lower computation flops. It generates 81.8% mAP50 under YOLOv8 framework on Pascal VOC 07+12 data set, surpassing the ReActNet by 0.8%. Furthermore, it is extremely encouraging that on the KITTI dataset, our ERCNET wins a landslide victory over all the models of the official YOLOv8 backbone, presenting 94.8% mAP50 which transcends YOLOv8-L &-N by 1.9% and 11.2%, respectively. On the other hand, we also find that our ERCNET performs slightly inferiorly to the default YOLOv8 backbone when regressing both on Pascal VOC 07+12.
Our experiments indicate that ERCNet demonstrates better performance than CNN in some particular data sets such as KITTI, yet at a lower memory and computation cost. As such, ERCNet makes it further suitable for having BNN on specific dataset applications in lightweight devices.

關鍵字(中)

★ 二值化卷積神經網路
★ 有效率通道注意力機制
★ 影像辨識
★ 物件偵測

關鍵字(英)

★ Binary neural network
★ Efficient channel attention
★ Classification
★ Object detection

論文目次

摘要 I
Abstract II
謝誌 III
目錄 IV
圖目錄 VI
表目錄 VIII
第一章、緒論 1
1.1 研究背景 1
1.2 研究目的 3
1.3 論文架構 4
第二章、文獻回顧 5
2.1 二值化卷積神經網路 5
2.1.1 Naive Binary Neural Network 5
2.1.2 XNOR-Net二值化卷積神經網路模型 8
2.1.3 Bi-RealNet二值化卷積神經網路模型 10
2.1.4 ReActNet二值化卷積神經網路模型 11
2.1.5 BNext二值化卷積神經網路模型 15
2.1.6 SE Channel attentionl 17
2.1.7 Efficient Channel Attention for Deep Convolutional Neural Networks 19
2.1.8 YOLOv8 21
第三章、 ERC-Net神經網路 23
3.1 ERCNet神經網路架構概述 23
3.2 網路簡化與參數優化 27
3.3 ECA的效益與性能提升 31
3.4 在關鍵位置增加批量歸一化 33
3.5 殘差捷徑連接向後移動至激活函數 35
3.6 二值化神經網路的性能分析和ERCNet架構 38
第四章、 ERCNet二值神經網路影像任務實驗 46
4.1 影像分類實驗 46
4.1.1 影像辨識資料及介紹 47
4.1.2 影像辨識結果 49
4.2 物件偵測實驗 53
4.2.1 物件偵測資料及介紹 54
4.2.2 KITTI物件偵測實驗結果 57
4.2.3 PASCAL VOC物件偵測實驗結果 59
第五章、結論與未來展望 61
5.1 結論 61
5.2 未來展望 62
參考文獻 63
附錄 66

參考文獻

[1] Y. LeCun et al., "Backpropagation applied to handwritten zip code recognition," Neural computation, vol. 1, no. 4, pp. 541-551, 1989.
[2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[5] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[6] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[7] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[8] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[9] M. Tan and Q. Le, "Efficientnet: Rethinking model scaling for convolutional neural networks," in International conference on machine learning, 2019: PMLR, pp. 6105-6114.
[10] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132-7141.
[11] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "Cbam: Convolutional block attention module," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.
[12] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, "ECA-Net: Efficient channel attention for deep convolutional neural networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11534-11542.
[13] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.
[14] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448.
[15] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," Advances in neural information processing systems, vol. 28, 2015.
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[17] W. Liu et al., "Ssd: Single shot multibox detector," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 2016: Springer, pp. 21-37.
[18] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," in European conference on computer vision, 2016: Springer, pp. 525-542.
[19] Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K.-T. Cheng, "Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 722-737.
[20] Z. Liu, Z. Shen, M. Savvides, and K.-T. Cheng, "Reactnet: Towards precise binary neural network with generalized activation functions," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, 2020: Springer, pp. 143-159.
[21] Y. Zhang, Z. Zhang, and L. Lew, "Pokebnn: A binary pursuit of lightweight accuracy," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12475-12485.
[22] N. Guo, J. Bethge, C. Meinel, and H. Yang, "Join the high accuracy club on imagenet with a binary neural network ticket," arXiv preprint arXiv:2211.12933, 2022.
[23] M. Courbariaux, Y. Bengio, and J.-P. David, "Binaryconnect: Training deep neural networks with binary weights during propagations," Advances in neural information processing systems, vol. 28, 2015.
[24] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized neural networks," Advances in neural information processing systems, vol. 29, 2016.
[25] Y. Bengio, N. Léonard, and A. Courville, "Estimating or propagating gradients through stochastic neurons for conditional computation," arXiv preprint arXiv:1308.3432, 2013.
[26] G. Jocher et al., "ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation," Zenodo, 2022.
[27] A. C. G. Jocher, J. Qiu. Ultralytics YOLO, Version 8.0.0 [Computer software] [Online] Available: https://github.com/ultralytics/ultralytics
[28] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[29] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8759-8768.
[30] X. Li et al., "Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection," Advances in Neural Information Processing Systems, vol. 33, pp. 21002-21012, 2020.
[31] B. Xia et al., "Basic binary convolution unit for binarized image restoration network," arXiv preprint arXiv:2210.00405, 2022.
[32] R. Sun, W. Zou, and Y. Zhan, "“Ghost” and Attention in Binary Neural Network," IEEE Access, vol. 10, pp. 60550-60557, 2022.
[33] B. Martinez, J. Yang, A. Bulat, and G. Tzimiropoulos, "Training binary neural networks with real-to-binary convolutions," arXiv preprint arXiv:2003.11535, 2020.
[34] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International journal of computer vision, vol. 88, pp. 303-338, 2010.
[35] K. Chellapilla, S. Puri, and P. Simard, "High performance convolutional neural networks for document processing," in Tenth international workshop on frontiers in handwriting recognition, 2006: Suvisoft.
[36] Z. Xu and R. C. Cheung, "Binary convolutional neural network acceleration framework for rapid system prototyping," Journal of Systems Architecture, vol. 109, p. 101762, 2020.
[37] Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo, "Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 45-54.
[38] Y. Li, Z. Liu, K. Xu, H. Yu, and F. Ren, "A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks," ACM Journal on Emerging Technologies in Computing Systems (JETC), vol. 14, no. 2, pp. 1-16, 2018.
[39] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images," 2009.
[40] A. Geiger, P. Lenz, and R. Urtasun, "Are we ready for autonomous driving? the kitti vision benchmark suite," in 2012 IEEE conference on computer vision and pattern recognition, 2012: IEEE, pp. 3354-3361.

指導教授

陳慶瀚(Ching-Han Chen)

審核日期

2024-6-12

推文