在自動駕駛領域中,車輛必須能夠即時且精準地感知多重目標, 然而,將大量車輛的感測數據集中至單一伺服器進行模型訓練,將引 發嚴重的數據隱私與安全隱憂。為解決此一挑戰,本研究提出一個 保護隱私的多模態三維目標分類框架,該框架融合相機影像與光達 (LiDAR)點雲,並透過聯邦學習(FL)進行分散式訓練。在模型架 構上,圖像分支採用多尺度視覺 Transformer(MViTv2)作為骨幹網 路,並引入 Token 篩選機制以提升運算效率,同時結合雙向特徵金字 塔網路(BiFPN)以強化多尺度特徵融合;光達分支則基於 PointNet 架構來提取三維幾何特徵。融合後的特徵將被送入分類器以預測目標 類別。模型訓練採用聯邦平均(Federated Averaging)演算法,允許 多台車輛在本地進行訓練,確保原始感測數據永遠不會離開車輛本 身,從而保障數據隱私。大量的實驗結果證明,多模態融合的效能顯 著優於單一模態模型,且本研究所提出的架構創新在提升效能的同時 亦降低了計算複雜度。我們深入分析了聯邦學習與中心化訓練之間的 性能權衡,以及不同數據分佈(同質 IID vs. 非同質 non-IID)對模型 收斂的影響。本研究發現一個值得注意的現象:在中度偏斜的非同質 數據環境下,全局模型的表現反而因一種類正則化的「特化」效應而 超越了同質數據環境。此外,我們也凸顯了在數據不平衡場景下,採 用多維度評估指標(如準確率與宏觀 F1 分數)的重要性,因為單一 的準確率指標可能產生誤導。本研究為車載多目標感知提供了一個兼 具效能與隱私保護的綜合解決方案,並為未來公平、穩健的聯邦學習 自動駕駛模型研究奠定了基礎。;In autonomous driving, vehicles must accurately perceive multiple targets in real time, but centralizing the vast sensor data from many cars raises privacy and security concerns. This work proposes a privacy-preserving multi-modal 3D object classification framework that fuses camera images and LiDAR point clouds and is trained via Federated Learning (FL). The model features an image branch based on Multiscale Vision Transformers (MViTv2), en- hanced with a token selection mechanism for efficiency and a Bi-directional Feature Pyramid Network (BiFPN) for multi-scale feature fusion. A LiDAR branch built on PointNet extracts 3D geometric features from point clouds. The fused features are fed to a classifier head to output object category pre- dictions. Model training is conducted with a federated averaging algorithm across multiple vehicles, so that raw sensor data never leaves the vehicle, addressing data privacy. Extensive experiments demonstrate that combin- ing modalities yields superior accuracy over single-modality models, and our architectural innovations improve performance while reducing computation. We analyze the trade-offs introduced by federated training versus centralized training, and how different data distributions (IID vs. non-IID) affect conver- gence. Notably, we observe an unexpected benefit under moderately skewed non-IID data, where the global model outperforms the IID case due to a reg- ularizing specialization effect. We also highlight the importance of evaluating with multiple metrics (accuracy and macro F1) in imbalanced scenarios, as accuracy alone can be misleading. This work provides a comprehensive solu- tion for multi-target perception that is both effective and privacy-preserving, laying the groundwork for future research in fair and robust federated au- tonomous driving models.