基於物件偵測之車輛違停辨識

、線上人數：23

、訪客IP：3.149.235.7

姓名	楊仲軒(Chung-Hsuan Yang) 查詢紙本館藏	畢業系所	資訊工程學系
論文名稱	基於物件偵測之車輛違停辨識 (Vehicle Violation Parking Recognition Based on Object Detection)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] 至系統瀏覽論文 (2027-6-30以後開放)
摘要(中)	改善交通亂象，打造安全的行車環境一直是各國努力的目標，在過去沒有任何科技方法的幫助下，交通違規檢舉是經由警察當場舉發，又或是民眾提供當下的錄影畫面，提供給警方進行檢舉，不管是當場舉發或是事後檢舉，都需要大量的人力資源來做審核的動作。因為近年來人工智慧的技術蓬勃發展，科技執法也因此而誕生了，科技執法希望藉由人工智慧的幫助直接對交通違規的車輛進行舉發，藉此可以大幅降低人力支出，也可以已做到24小時的監控，不讓民眾存在著僥倖的心態，來達到改善交通亂象的目的。現在的科技執法，最直觀的做法就是先利用物件偵測模型偵測出車輛，再根據不同的場域、不同的違規項目撰寫違規演算法，這是最快的科技執法實現方式，缺點是需要根據不同的場域決定要監控的違規項目，並且針對不同違規項目撰寫不同的違規演算法，這樣的方案會大大的降低我們部屬相關科技執法設備的效率。為了解決不同場域與不同違規項目的辨識方法，我們希望透過直接使用不同違規項目的影像，就可以利用深度學習模型去學習各種不同的交通違規項目。因為目前SOTA(State Of The Art) 的物件偵測模型，在一般的物件偵測方面已經有很好的效果，像是一階段的物件偵測模型YOLOv4，已經可以達到偵測效果好且速度快的優勢。因此本文也採用類YOLO架構直接做到車輛違停辨識，透過自行收集的車輛交通違規資料集進行違規標註，希望可以直接使用模型進行違停辨識，這個方法省去需要針對不同違規撰寫違規演算法的過程。在網路架構中，本文與目前主流方法相同，從骨幹網路拉出三個尺度的特徵圖後，透過RFSM、STEM、FFAM等模組，加強模型的特徵擷取能力與特徵融合能力，藉以提升模型辨識的能力。另外也會使用Grad-CAM進行模型注意力可視化，我們希望的是模型可以學習到人類所制訂出來的交通規則，像是學會人類是如何判斷這個違規項目是否成立，而不是只透過影像的共通性來達到分數的極大化，如果可以達成這個預想的結果，未來再增加更多的違規辨識時也更有可行性。
摘要(英)	Improving traffic chaos and creating a safe driving environment has always been t he goal of all countries. In the past, without the help of any technological methods, traffic violations were reported by police officers on the spot or by providing the video to the poli ce for prosecution, whether on the spot or afterwards, a lot of human resources were neede d. Because the thrive of artificial intelligence in recent years, technology law enforcement has also been born. It is hoped that by using the help of AI to report traffic violations dire ctly, it can significantly reduce manpower expenses, and can also achieve 24-hour monitori ng. The most common technology enforcement is to use the object detection model to d etect the vehicle first, and then write the violation algorithm based on different fields and different violations. This is the fastest way to implement technology enforcement, but the disadvantage is that we need to decide the violation rules to be monitored according to dif ferent scenes, and write the violation algorithm for the violation rules. In order to solve th e problem of identifying the different violations in different scenes, we hope to use deep lea rning models to learn different traffic violations by using images of different violations. One stage object detection model YOLOv4 can already achieve the advantage of go od detection effect and fast speed. Therefore, this paper also adopts a YOLO-like architect ure to directly identify vehicle parking violations. We enhance the feature extraction capa bility and feature fusion capability of the model through RFSM, STEM, and FFAM modul es, so as to improve the capability of model identification. We will also use Grad-CAM to visualize the attention of the model. What we hope is that the model can learn the traffic rules made by humans, such as learning how human s judge whether the violation is valid or not.
關鍵字(中)	★ 物件偵測 ★ 科技執法 ★ YOLO ★ Grad-CAM	關鍵字(英)	★ Object Detection ★ Technology Enforcement ★ YOLO ★ Grad-CAM
論文目次	摘要 I ABSTRACT II 目錄 III 圖目錄 V 表目錄 VI 第一章緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3 研究方法 4 第二章文獻探討與回顧 5 2.1 異常偵測 5 2.2 物件偵測 6 2.2.1物件偵測 Object Detection 6 2.2.2 YOLOv4 8 2.3 SELECTIVE KERNEL NETWORK 9 2.4 SMALLBIGNET 10 2.5 AFF (ATTENTIONAL FEATURE FUSION) 11 2.6 CLASS ACTIVATION MAP 12 第三章研究方法 14 3.1 模型架構 14 3.1.1整體架構 14 3.1.2 RFSM (Receptive-Field Selective Module) 16 3.1.3 STEM (Spatial-Temporal Enhance Module) 17 3.1.4 FFAM (Feature Fusion Attention Module) 18 3.2 實驗流程 20 3.2.1 多標籤訓練 20 3.2.2 輸入 20 3.2.3 多影格特徵融合 21 3.2.4 資料增強 22 3.3 資料集 23 3.3.1 資料收集與標註 23 3.3.2 資料集簡介 24 第四章實驗結果 25 4.1 評量指標 25 4.2 消融實驗 27 4.3 模型注意力視覺化 28 4.4 模組注意力視覺化 31 4.4.1 RFSM 31 4.4.2 STEM 31 4.4.3 FFAM 32 第五章結論與未來研究 33 5.1 結論 33 5.2 未來研究 34 參考文獻 35 附錄 37
參考文獻	[1] A. Aboah, “A vision-based system for traffic anomaly detection using deep learni ng and decision trees.” In Proc. IEEE/CVF Conference on Computer Vision and P attern Recognition, 2021, 4207-4212. [2] Y. Dai, F. Gieseke, S. Oehmcke, Y. Wu, and K. Barnard, “Attentional feature fusi on,” in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Jan. 2021, pp. 3560 –3569 [3] X. Li, Y. Wang, Z. Zhou, and Y. Qiao, “Smallbignet: Integrating core and context ual views for video classification,” in Proc. IEEE Conf. Comput. Vis. Pattern Reco g. (CVPR), 2020, pp. 1092–1101. [4] X. Li, W. Wang, X. Hu, and J. Yang, “Selective kernel networks,” in Proc. IEEE C onf. Comput. Vision Pattern Recognit., 2019, pp. 510–519. [5] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Gr ad-cam: Visual explanations from deep networks via gradient-based localization,” See https://arxiv. org/abs/1610.02391 v3, vol. 7, no. 8, 2016 [6] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” 2018, arXi v:1804.02767. [Online]. Available: https://arxiv.org/ abs/1804.02767 [7] A. Bochkovskiy, C.-Y. Wang, and H.-Y. Mark Liao, “YOLOv4: Optimal speed an d accuracy of object detection,” 2020, arXiv:2004.10934. [Online]. Available: http:/ /arxiv.org/abs/2004.10934 [8] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. ECCV, 2016, pp. 21– 37. [9] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for ac curate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 580–587. [10] R. Girshick, “Fast R-CNN,” in Proc. ICCV, 2015, pp. 1440–1448. [11] S. Ren et al., “Faster R-CNN: Towards real-time object detection with region pro 35 posal networks,” in Proc. NIPS, 2015, pp. 91–99. [12] T.-Y. Lin et al., “Feature pyramid networks for object detection,” in Proc. CVPR, 2017, pp. 936–944. [13] J. Hu, L. Shen, and G. Sun. Squeeze-and-Excitation networks. arXiv preprint arXi v:1709.01507, 2017. [14] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance seg mentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8759– 8768. [15] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep feat ures for discriminative localization,” in Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on. IEEE, 2016, pp. 2921–2929. [16] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional block attentio n module,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Munich, German, Sep. 2018, pp. 8–14. [17] H. Zhang et al., “ResNeSt: Split-attention networks,” 2020, arXiv: 2004.08955 [18] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: I nverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vis. Patter n Recognit., Jun. 2018, pp. 4510–4520. [19] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neu ral networks,” in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 6105–6114. [20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognitio n,” arXiv:1512.03385, 2015.
指導教授	范國清莊啟宏(Kuo-Chin Fan Chi-Hung Chuan)	審核日期	2022-7-18
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 109522111 詳細資訊