Fed-HMF: 用於多模態車輛識別之階層式多尺度融合之聯;Fed-HMF: A Federated Learning Framework with Hierarchical Multi-scale Fusion for Multi-modal Vehicular Recognition

NCU Institutional Repository > 資訊電機學院 > 通訊工程研究所 > 博碩士論文 > Item 987654321/98174

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98174

題名:	Fed-HMF: 用於多模態車輛識別之階層式多尺度融合之聯;Fed-HMF: A Federated Learning Framework with Hierarchical Multi-scale Fusion for Multi-modal Vehicular Recognition
作者:	吳軒宇;Wu, Xuan-Yu
貢獻者:	通訊工程學系
關鍵詞:	多模態融合;相機影像;iDAR 點雲通訊與計算;數據異質性;聯邦學習;物件辨識;Multi-modal Fusion;Camera Image;LiDAR Point Cloud;Non-IID Data;Federated Learning;Object Recognition
日期:	2025-09-22
上傳時間:	2025-10-17 12:27:17 (UTC+8)
出版者:	國立中央大學
摘要:	在自動駕駛領域中，車輛必須能夠即時且精準地感知多重目標，然而，將大量車輛的感測數據集中至單一伺服器進行模型訓練，將引發嚴重的數據隱私與安全隱憂。為解決此一挑戰，本研究提出一個保護隱私的多模態三維目標分類框架，該框架融合相機影像與光達（LiDAR）點雲，並透過聯邦學習（FL）進行分散式訓練。在模型架構上，圖像分支採用多尺度視覺 Transformer（MViTv2）作為骨幹網路，並引入 Token 篩選機制以提升運算效率，同時結合雙向特徵金字塔網路（BiFPN）以強化多尺度特徵融合；光達分支則基於 PointNet 架構來提取三維幾何特徵。融合後的特徵將被送入分類器以預測目標類別。模型訓練採用聯邦平均（Federated Averaging）演算法，允許多台車輛在本地進行訓練，確保原始感測數據永遠不會離開車輛本身，從而保障數據隱私。大量的實驗結果證明，多模態融合的效能顯著優於單一模態模型，且本研究所提出的架構創新在提升效能的同時亦降低了計算複雜度。我們深入分析了聯邦學習與中心化訓練之間的性能權衡，以及不同數據分佈（同質 IID vs. 非同質 non-IID）對模型收斂的影響。本研究發現一個值得注意的現象：在中度偏斜的非同質數據環境下，全局模型的表現反而因一種類正則化的「特化」效應而超越了同質數據環境。此外，我們也凸顯了在數據不平衡場景下，採用多維度評估指標（如準確率與宏觀 F1 分數）的重要性，因為單一的準確率指標可能產生誤導。本研究為車載多目標感知提供了一個兼具效能與隱私保護的綜合解決方案，並為未來公平、穩健的聯邦學習自動駕駛模型研究奠定了基礎。;In autonomous driving, vehicles must accurately perceive multiple targets in real time, but centralizing the vast sensor data from many cars raises privacy and security concerns. This work proposes a privacy-preserving multi-modal 3D object classification framework that fuses camera images and LiDAR point clouds and is trained via Federated Learning (FL). The model features an image branch based on Multiscale Vision Transformers (MViTv2), en- hanced with a token selection mechanism for eﬀiciency and a Bi-directional Feature Pyramid Network (BiFPN) for multi-scale feature fusion. A LiDAR branch built on PointNet extracts 3D geometric features from point clouds. The fused features are fed to a classifier head to output object category pre- dictions. Model training is conducted with a federated averaging algorithm across multiple vehicles, so that raw sensor data never leaves the vehicle, addressing data privacy. Extensive experiments demonstrate that combin- ing modalities yields superior accuracy over single-modality models, and our architectural innovations improve performance while reducing computation. We analyze the trade-offs introduced by federated training versus centralized training, and how different data distributions (IID vs. non-IID) affect conver- gence. Notably, we observe an unexpected benefit under moderately skewed non-IID data, where the global model outperforms the IID case due to a reg- ularizing specialization effect. We also highlight the importance of evaluating with multiple metrics (accuracy and macro F1) in imbalanced scenarios, as accuracy alone can be misleading. This work provides a comprehensive solu- tion for multi-target perception that is both effective and privacy-preserving, laying the groundwork for future research in fair and robust federated au- tonomous driving models.
顯示於類別:	[通訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	98	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....