應用FairMOT多目標追蹤於 機器視覺編碼之研究;A Study of FairMOT-Based Multi-Object Tracking in Machine Vision Feature Coding

NCU Institutional Repository > 資訊電機學院 > 通訊工程研究所 > 博碩士論文 > Item 987654321/99365

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/99365

題名:	應用FairMOT多目標追蹤於機器視覺編碼之研究;A Study of FairMOT-Based Multi-Object Tracking in Machine Vision Feature Coding
作者:	徐曼禎;Hsu, Mou-Chen
貢獻者:	通訊工程學系
關鍵詞:	機器視覺編碼;機器視覺特徵編碼;多目標追蹤;FairMOT;JDE;深度學習;多通道後處理補償器系統;Video Coding for Machines (VCM);Feature Coding for Machines (FCM);Multi-Object Tracking (MOT);FairMOT;Joint Detection and Embedding (JDE);Deep Learning;Multi-Channel Post-Processing Compensation System
日期:	2026-01-22
上傳時間:	2026-03-06 18:49:06 (UTC+8)
出版者:	國立中央大學
摘要:	在智慧監控、智慧交通與自動駕駛等應用快速發展的背景下，多目標追蹤（Multi-Object Tracking, MOT）已成為電腦視覺領域中的關鍵技術之一。然而，隨著深度學習模型對即時傳輸與頻寬資源之需求日益增加，如何在有限傳輸資源下維持穩定且可靠的追蹤效能，已成為實務應用中一項重要課題。在此背景下，影像資料之受眾已由傳統的人眼視覺擴展至機器學習模型，而現行影像壓縮標準（如 HEVC、VVC）主要針對人類視覺感知進行最佳化設計，未必能有效保留機器視覺任務所需之關鍵特徵資訊。為此，國際標準組織 MPEG 提出機器視覺編碼（Video Coding for Machines, VCM）之研究方向，其中機器視覺特徵編碼（Feature Coding for Machines, FCM）透過傳輸壓縮後之特徵圖，以提升整體傳輸效率。然而，特徵壓縮不可避免會引入失真，進而導致多目標追蹤效能下降。本研究於 FCM 架構下，以 FairMOT 作為多目標追蹤模型，探討其在特徵壓縮與還原流程中的適用性與效能表現，並以原始特徵壓縮測試模型（Feature Compression Test Model, FCTM）所採用之 JDE 骨幹網路作為比較基準。整體系統流程由追蹤模型提取多層特徵，經特徵壓縮與解碼後再回饋至追蹤網路，以完成多目標追蹤任務。為降低特徵壓縮所造成之效能劣化，進一步導入多通道 CNN 後處理補償機制，作為解碼後之特徵補償模組。實驗結果顯示，在 HiEve 多人流動場景資料集上，FairMOT 在未進行特徵壓縮時之平均 MOTA 為 50.44%，經 FCM 特徵壓縮後下降至 45.27%；相較之下，JDE 在未進行特徵壓縮時之平均 MOTA 為 45.07%，於相同 FCM 架構下進一步下降至 37.06%。結果顯示，即使在特徵壓縮情境下，FairMOT 仍較 JDE 提升約 8.21% 的 MOTA。進一步導入三通道後處理補償器後，FairMOT 之平均 MOTA 可提升至 47.80%，相較未補償之壓縮結果額外改善約 2.53%。本研究證實FairMOT 即使在特徵壓縮情境下仍能維持較低的漏檢率，整體追蹤效能明顯優於 JDE，顯示出 FairMOT 結合 FCM 與多通道後處理補償架構，在顯著降低特徵資料量的同時，仍能維持具實務應用價值之多目標追蹤效能。;With the rapid development of intelligent surveillance, intelligent transportation, and autonomous driving applications, multi-object tracking (MOT) has become a key task in computer vision. As deep learning models increasingly rely on real-time transmission under limited bandwidth, maintaining reliable tracking performance has become challenging. In such scenarios, visual data are primarily used for machine vision analysis rather than human observation. However, existing video coding standards such as HEVC and VVC are optimized for human visual perception and may not effectively preserve task-relevant features for machine vision. To address this issue, MPEG proposed Video Coding for Machines (VCM), within which Feature Coding for Machines (FCM) improves transmission efficiency by compressing and transmitting feature maps, though compression distortion may degrade tracking performance. In this work, FairMOT is adopted as the tracking model under the FCM framework and compared with the JDE backbone used in the Feature Compression Test Model (FCTM). A multi-channel CNN-based post-processing compensation module is further introduced to alleviate performance degradation caused by feature compression. Experiments on the HiEve crowded pedestrian dataset show that FairMOT achieves an average MOTA of 50.44% without compression and 45.27% after FCM compression, while JDE drops from 45.07% to 37.06% under the same setting. With the proposed three-channel compensation module, FairMOT further improves its average MOTA to 47.80%, corresponding to an additional gain of 2.53%. These results demonstrate that FairMOT maintains superior tracking performance under feature compression and that the proposed compensation framework enables effective MOT with reduced feature data size.
顯示於類別:	[通訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	250	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....