TinyissimoYOLOv5-P4-DA：基於深度剪枝、輔助網路和量化的物件偵測模型

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：3.15.14.33

姓名

邱之宇(Jr-Yu Chiou) 查詢紙本館藏

畢業系所

人工智慧國際碩士學位學程

論文名稱

TinyissimoYOLOv5-P4-DA：基於深度剪枝、輔助網路和量化的物件偵測模型
(TinyissimoYOLOv5-P4-DA: A Depth Pruning, Auxiliary Network, and Quantization-Based Object Detection Model)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-7-22以後開放)

摘要(中)

物件偵測技術在計算機視覺領域應用廣泛，但其高計算需求通常依賴強大的硬體支持，對資源有限的微控制器是一大挑戰。本研究基於 YOLOv5 及 TinyissimoYOLO ，提出了一種改進的 TYv5-P4 模型，通過深度剪枝、輔助網路以及量化，成功將模型大小縮小至 334KiB，並命名為 TYv5-P4-DA 。在低解析度輸入的情況下，該模型仍能保持相對較高的準確度。
TYv5-P4-DA 的創新之處在於其 Backbone 僅保留三個 C3 層，並只使用單一輸出。這一方法不僅能在較低解析度輸入下提升準確度，還能有效減少模型大小。此外，與 TinyissimoYOLO 相比， TinyissimoYOLO 的 mAP 會隨輸入尺寸增加而下降，而 TYv5-P4-DA 的 mAP 則會隨輸入尺寸增加而提升。該模型採用高解析度圖像進行訓練，低解析度圖像進行推論，有效提高了物件偵測的準確性。
這一成果為低功耗、低成本的 TinyML 應用提供了新的可能性，並具有廣泛的實際應用價值。未來工作將集中於進一步優化模型性能，提升準確度和推理速度，以滿足更多實際應用場景的需求。

摘要(英)

Object detection technology is extensively applied in the field of computer vision, yet its high computational requirements typically depend on robust hardware support, posing a significant challenge for resource-constrained microcontrollers. This research introduces an improved TYv5-P4 model based on YOLOv5 and TinyissimoYOLO. Through Depth Pruning, Auxiliary Networks and Quantization, the model size is successfully reduced to 334KiB, and it is named TYv5-P4-DA. This model maintains relatively high accuracy even with low-resolution inputs.
The innovation of TYv5-P4-DA lies in its Backbone, which retains only three C3 layers and uses only P4 as the output. This method not only enhances accuracy with lower-resolution inputs but also effectively reduces the model size. Furthermore, unlike TinyissimoYOLO, which experiences a decline in mAP as input size increases, TYv5-P4-DA′s mAP improves with larger input sizes. The model is trained with high-resolution images and performs inference with low-resolution images, significantly enhancing object detection accuracy.
This accomplishment provides new possibilities for low-power, low-cost TinyML applications and possesses broad practical value. Future work will focus on further optimizing model performance, enhancing accuracy, and speeding up inference to meet the demands of more practical application scenarios.

關鍵字(中)

★ 深度剪枝
★ 輔助網路
★ 量化
★ 物件偵測

關鍵字(英)

★ Depth Pruning
★ Auxiliary Network
★ Quantization
★ Object Detection
★ TinyML
★ TFLITE

論文目次

摘要 i
Abstract ii
Acknowledgments iii
Table of Contents iv
List of Figures vi
List of Tables viii
Chapter 1 Introduction 1
1.1 Research Background 1
1.2 Thesis Structure 3
Chapter 2 Related Work 4
2.1 TinyML 4
2.1.1 TensorFlow Lite (TFLITE) 5
2.1.2 Quantization 6
2.2 TinyissimoYOLO 9
2.2.1 Benchmark 9
2.2.2 Advantages and Disadvantages of TinyissimoYOLO 9
2.3 Depth Pruning with Auxiliary Networks 11
2.3.1 Unstructured Pruning and Structured Pruning 12
2.3.2 Depth Pruning with Auxiliary Networks 13
2.4 YOLOv5 14
2.4.1 Convolution Mudule - ConvBNSiLU - CBS 19
2.4.2 CSPNet 19
2.4.3 SPPF (Spatial Pyramid Pooling Fast) 20
2.4.4 Mosaic Augmentation 22
2.4.5 CIoU Loss 22
2.4.6 Focal Loss 28
2.4.7 NMS 30
2.4.8 mAP 32
Chapter 3 TinyissimoYOLOv5-P4-DA 39
3.1 TinyissimoYOLOv5-P4-DA (TYv5-P4-DA) 39
3.2 Pratrain TYv5-P4 41
3.3 Transfer Learning 50
3.4 Depth Pruning and Auxiliary Network Learning 51
3.5 Quantization by Converting to TF Lite 55
Chapter 4 Experiments 56
4.1 Experimental Environment 56
4.2 Benchmark 56
4.3 Depth Pruning and Auxiliary Networks 57
4.4 Training with large images, inferring with small images 59
4.5 Creating pretrained weights using COCO 60
4.6 Comparison of inference results with different image sizes 64
Chapter 5 Conclusion 66
Chapter 6 References 67

參考文獻

[1] Y. Abadade, A. Temouden, H. Bamoumen, N. Benamar, Y. Chtouki, and A. S. Hafid, "A comprehensive survey on tinyml," IEEE Access, 2023.
[2] B. Murdoch, "Privacy and artificial intelligence: challenges for protecting health information in a new era," BMC Medical Ethics, vol. 22, pp. 1-5, 2021.
[3] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, "Object detection in 20 years: A survey," Proceedings of the IEEE, vol. 111, no. 3, pp. 257-276, 2023.
[4] Ultralytics. (March 13). Available: https://github.com/ultralytics/ultralytics
[5] J. Lin, L. Zhu, W.-M. Chen, W.-C. Wang, C. Gan, and S. Han, "On-device training under 256kb memory," Advances in Neural Information Processing Systems, vol. 35, pp. 22941-22954, 2022.
[6] R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, and T. Wang, "Tensorflow lite micro: Embedded machine learning for tinyml systems," Proceedings of Machine Learning and Systems, vol. 3, pp. 800-811, 2021.
[7] E. Impulse. (March 13). Available: https://edgeimpulse.com/
[8] E. Impulse. (March 2024). FOMO: Object detection for constrained devices. Available: https://docs.edgeimpulse.com/docs/edge-impulse-studio/learning-blocks/object-detection/fomo-object-detection-for-constrained-devices
[9] J. Moosmann, M. Giordano, C. Vogt, and M. Magno, "Tinyissimoyolo: A quantized, low-memory footprint, tinyml object detection network for low power microcontrollers," in 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 1-5, 2023.
[10] J. Moosmann, H. Mueller, N. Zimmerman, G. Rutishauser, L. Benini, and M. Magno, "Flexible and fully quantized ultra-lightweight tinyissimoyolo for ultra-low-power edge systems," arXiv preprint arXiv:2307.05999, 2023.
[11] C. White, M. Safari, R. Sukthanker, B. Ru, T. Elsken, A. Zela, D. Dey, and F. Hutter, "Neural architecture search: Insights from 1000 papers," arXiv preprint arXiv:2301.08727, 2023.
[12] Ultralytics. (March 13). YOLOv5. Available: https://github.com/ultralytics/yolov5
[13] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[14] D. Blalock, J. J. Gonzalez Ortiz, J. Frankle, and J. Guttag, "What is the state of neural network pruning?," Proceedings of machine learning and systems, vol. 2, pp. 129-146, 2020.
[15] Huggingface. (March 13). Quantization. Available: https://huggingface.co/docs/optimum/concept_guides/quantization
[16] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and D. Terzopoulos, "Image segmentation using deep learning: A survey," IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3523-3542, 2021.
[17] TensorFlow. (March 13). Visual Wake Words with TensorFlow Lite Micro. Available: https://blog.tensorflow.org/2019/10/visual-wake-words-with-tensorflow-lite_30.html
[18] (March 13). MCUs Expected to Make Modest Comeback After 2020 Drop. Available: https://www.icinsights.com/news/bulletins/mcus-expected-to-make-modest-comeback-after-2020-drop--/
[19] TensorFlow. (March 13). TensorFlow Lite Model conversion overview. Available: https://www.tensorflow.org/lite/models/convert
[20] (March 13). TensorFlow lite interpreter "tf.lite.Interpreter". Available: https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter
[21] (March 13). TensorFlow Lite inference. Available: https://www.tensorflow.org/lite/guide/inference
[22] (March 14). Quantization aware training of TensorFlow. Available: https://www.tensorflow.org/model_optimization/guide/quantization/training
[23] TensorFlow. (March 14). Post-training quantization of TensorFlow. Available: https://www.tensorflow.org/lite/performance/post_training_quantization
[24] (March 14). The PASCAL Visual Object Classes Homepage. Available: http://host.robots.ox.ac.uk/pascal/VOC/
[25] P. Henderson and V. Ferrari, "End-to-end training of object class detectors for mean average precision," in Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part V 13, pp. 198-213, 2017.
[26] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, "Xlnet: Generalized autoregressive pretraining for language understanding," Advances in neural information processing systems, vol. 32, 2019.
[27] D. H. Dario Amodei. (March 14). AI and compute. Available: https://openai.com/research/ai-and-compute
[28] J. Gou, B. Yu, S. J. Maybank, and D. Tao, "Knowledge distillation: A survey," International Journal of Computer Vision, vol. 129, no. 6, pp. 1789-1819, 2021.
[29] Y. He and L. Xiao, "Structured pruning for deep convolutional neural networks: A survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[30] Z. Liao, V. Quétu, V.-T. Nguyen, and E. Tartaglione, "Can Unstructured Pruning Reduce the Depth in Deep Neural Networks?," in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1402-1406, 2023.
[31] L. Chen, Y. Chen, J. Xi, and X. Le, "Knowledge from the original network: restore a better pruned network with knowledge distillation," Complex & Intelligent Systems, pp. 1-10, 2021.
[32] (July 20). Awesome-Pruning. Available: https://github.com/he-y/Awesome-Pruning
[33] J. D. De Leon and R. Atienza, "Depth pruning with auxiliary networks for tinyml," in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3963-3967, 2022.
[34] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He, "A comprehensive survey on transfer learning," Proceedings of the IEEE, vol. 109, no. 1, pp. 43-76, 2020.
[35] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
[36] (March 14). Ultralytics | Revolutionizing the World of Vision AI. Available: https://www.ultralytics.com/
[37] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125, 2017.
[38] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, "Path aggregation network for instance segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759-8768, 2018.
[39] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[40] M. Qiu, L. Huang, and B.-H. Tang, "ASFF-YOLOv5: Multielement detection method for road traffic in UAV images based on multiscale feature fusion," Remote Sensing, vol. 14, no. 14, p. 3498, 2022.
[41] C. S. Wiki. (March 14). File:MaxpoolSample2.png - Computer Science Wiki. Available: https://computersciencewiki.org/index.php/File:MaxpoolSample2.png
[42] Ultralytics. (March 14). Github of Mosaic augmentation. Available: https://github.com/ultralytics/ultralytics/blob/5c1277113b19e45292c01e5a47aa2bdb6ebc98d0/ultralytics/data/augment.py#L133
[43] Ultralytics. (March 14). mosaic augmentation #1423．ultralytics/yolov5. Available: https://github.com/ultralytics/yolov5/issues/1423#issuecomment-1093947259
[44] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, pp. 2980-2988, 2017.
[45] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft coco: Common objects in context," in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740-755, 2014.
[46] C.-H. Chen, M.-Y. Lin, and X.-C. Guo, "High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61, pp. 48-66, 2017.
[47] P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, and G. Venkatesh, "Mixed precision training," arXiv preprint arXiv:1710.03740, 2017.
[48] ONNX. (06/11). Convert a PyTorch model to Tensorflow using ONNX. Available: https://github.com/onnx/tutorials/blob/main/tutorials/PytorchTensorflowMnist.ipynb
[49] TensorFlow. (06/11). Convert TF Object Detection API model to TFLite.ipynb. Available: https://colab.research.google.com/github/tensorflow/models/blob/master/research/object_detection/colab_tutorials/convert_odt_model_to_TFLite.ipynb#scrollTo=-ecGLG_Ovjcr
[50] C. Michaelis, B. Mitzkus, R. Geirhos, E. Rusak, O. Bringmann, A. S. Ecker, M. Bethge, and W. Brendel, "Benchmarking robustness in object detection: Autonomous driving when winter is coming," arXiv preprint arXiv:1907.07484, 2019.
[51] J. Moosmann, P. Bonazzi, Y. Li, S. Bian, P. Mayer, L. Benini, and M. Magno, "Ultra-efficient on-device object detection on ai-integrated smart glasses with tinyissimoyolo," arXiv preprint arXiv:2311.01057, 2023.

指導教授

陳慶瀚(Ching-Han Chen)

審核日期

2024-7-23

推文