基於多尺度網格的語義表面點生成方法之3D物件偵測;Multi-Scale Grid-based Semantic Surface Point Generation for 3D Object Detection

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/98389

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98389

题名:	基於多尺度網格的語義表面點生成方法之3D物件偵測;Multi-Scale Grid-based Semantic Surface Point Generation for 3D Object Detection
作者:	陳信富;Chen, Xin-Fu
贡献者:	資訊工程學系
关键词:	物件偵測;多尺度網格注意力;點生成方法;Object Detection;Multi-scale Grid Attention;Point Generation
日期:	2025-07-28
上传时间:	2025-10-17 12:43:20 (UTC+8)
出版者:	國立中央大學
摘要:	三維物件偵測是自動駕駛、機器人等領域的關鍵技術，而點雲資料作為最直觀的三維空間表示，其特徵提取與表示至關重要。然而，在實際應用中，點雲物件偵測常因遮擋問題導致物件點雲資料不完整，進而影響檢測效果。現有方法（如 PG-RCNN）採用在每個感興趣的區域（Region of Interest, RoI）中以單一網格大小（grid size）生成語義表面點（semantic surface points）。然而單一尺度的網格無法充分捕捉特徵。網格尺度過小會忽略細部結構，對處理體積小或點數稀疏的物體時更為明顯；而網格尺度過大則混入過多背景雜訊，使得特徵表達不夠精確。基於此，我們設計一個改良式的 PG-RCNN 架構，並提出多尺度網格注意力模組(Multi-scale Grid Attention Module)作為核心貢獻。用來提升點生成過程中的特徵表達能力，以及對來自不同尺度的特徵進行動態加權整合，透過簡潔的線性變換建構注意力權重，引導模型聚焦在對物體識別更具貢獻的區域，有效過濾冗餘雜訊。我們在 KITTI 3D 物件偵測驗證集上進行實驗評估，結果顯示，我們的方法在腳踏車類別中相較原始 PG-RCNN 分別於 Moderate 與 Hard 階段提升 2.66%與2.54%。此外，我們也觀察到在小型目標偵測任務中的表現更加穩定，平均提升幅度達 2.57%，驗證了多尺度網格注意力模組對於細緻幾何建模的正向影響，展現出模型的高效性與通用性。 ;3D object detection is a crucial technology in fields such as autonomous driving and robotics. As a direct representation of the 3D world, point cloud data plays a vital role in feature extraction and geometric representation. However, in real-world applications, point cloud data often suffers from occlusion, resulting in incomplete observations and degraded detection performance. Existing methods, such as PG-RCNN, generate semantic surface points within each Region of Interest (RoI) using a single grid size. However, a fixed grid scale cannot adequately capture multi-scale features. A grid that is too small may miss fine structures—especially problematic when dealing with small or sparse objects—while a grid that is too large may introduce excessive background noise, reducing the precision of feature representation. To address this issue, we propose an enhanced PG-RCNN architecture with a Multi-scale Grid Attention Module as the core contribution. This module improves the expressiveness of point features by aggregating multi-scale information and dynamically weighting features from different grid resolutions. Using a simple linear transformation, we generate attention weights to guide the model to focus on regions that contribute more to object recognition, while effectively filtering out redundant noise. We evaluate our method on the KITTI 3D object detection validation set. Experimental results show that, compared to the original PG-RCNN, our approach improves performance on the Cyclist category by 2.66% and 2.54% in the Moderate, and Hard settings, respectively. Additionally, our approach shows more stable performance on small object detection tasks, with an average improvement of 2.57%, validating the positive impact of the Multi-scale Grid Attention Module on fine-grained geometric modeling, and highlighting the efficiency and generalizability of our model.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	24	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....