應用於行人以及車輛檢測之雙流 IR-RGB 特徵融合物件偵測網路;Dual-Stream IR-RGB Feature Fusion Object Detection Network for Pedestrian and Vehicle Detection

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/99438

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/99438

題名:	應用於行人以及車輛檢測之雙流 IR-RGB 特徵融合物件偵測網路;Dual-Stream IR-RGB Feature Fusion Object Detection Network for Pedestrian and Vehicle Detection
作者:	魏伯恩;Wei, Bo-En
貢獻者:	電機工程學系
關鍵詞:	物件偵測;特徵融合;雙流網路;object detection;feature fusion;two-stream network
日期:	2026-01-14
上傳時間:	2026-03-06 18:59:50 (UTC+8)
出版者:	國立中央大學
摘要:	近年物件偵測與影像處理技術快速進展，特別是 YOLO 家族的蓬勃發展，讓偵測網路在電腦視覺領域取得顯著突破；這些模型已能在多樣且複雜的場景中準確定位並分類物體，並廣泛落地於日常應用。然而，低光源場景的偵測仍具挑戰。為了提升準確率，常見做法是擴大模型規模，但這同時提高參數量與計算量，不利於將模型部署在路口監視器等邊緣裝置上。基於此，愈來愈多研究轉向以特徵融合為核心的方法，以在低光條件下有效提升偵測效能。本論文提出 YOLO-E2FS，一個端到端高效的行人與車輛偵測模型，旨在有效融合紅外線（IR）與可見光（RGB）雙流影像資訊，以提升偵測效能。YOLO-E2FS 採用雙流架構，由兩個結構相同的輕量化 YOLOv10n 主幹網路組成，分別獨立提取 RGB 與 IR 特徵。這些特徵透過改良的特徵融合策略與雙輸出頭（dual-head）偵測機制，在跨模態間進行高效融合。得益於 YOLOv10 端到端架構的特性以及其非最大抑制（NMS）無需推論流程，YOLO-E2FS 即使在低光源等具挑戰性的環境下，仍能維持穩定且精確的偵測表現。此外，憑藉輕量化設計，YOLO-E2FS 相較於現有的雙流模型，大幅減少參數量。實驗結果顯示，YOLO-E2FS 同時具備更高的準確率與運行速度，且維持輕量化特性。在推論速度上，YOLO-E2FS 在 NVIDIA GeForce RTX4080-SUPER GPU 上的單張影像推論時間僅 4.6 毫秒，在 AMD Ryzen 9 9900X CPU 上則為 37.8 毫秒，符合即時應用的需求。;Recent advances in object detection and image processing, particularly the rapid progress of the YOLO family, have driven significant breakthroughs in computer vision. These models can accurately localize and classify objects across diverse and complex scenes and have been widely deployed in everyday applications. However, detection in low-light environments remains challenging. A common approach to improving accuracy is to scale up model size, but this increases parameters and computation, making deployment on edge devices such as intersection cameras more difficult. Consequently, many studies have shifted toward feature fusion–centric methods to effectively boost detection performance under low-light conditions.This paper proposes YOLO-E2FS, an end-to-end, efficient pedestrian and vehicle detection model designed to effectively fuse dual-stream infrared (IR) and visible-light (RGB) image information to enhance detection performance. YOLO-E2FS adopts a dual-stream architecture composed of two structurally identical, lightweight YOLOv10n backbones that independently extract RGB and IR features. These features are efficiently combined across modalities through an improved feature fusion strategy and a dual-head detection mechanism.Benefiting from the end-to-end design of YOLOv10 and its NMS-free inference pipeline, YOLO-E2FS maintains stable and accurate detection even in challenging low-light scenarios. Moreover, thanks to its lightweight design, YOLO-E2FS substantially reduces the number of parameters compared with existing dual-stream models. Experimental results show that YOLO-E2FS achieves higher accuracy and faster speed while remaining lightweight. In terms of inference speed, YOLO-E2FS processes a single image in 4.6 ms on an NVIDIA GeForce RTX4080-SUPER GPU and 37.8 ms on an AMD Ryzen 9 9900X CPU, meeting real-time application requirements.
顯示於類別:	[電機工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	17	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....