| 摘要: | 多目標多攝影機追蹤(Multi-Target Multi-Camera Tracking, MTMCT)旨在於多台攝影機間實現多個移動目標的持續追蹤。MTMCT系統通常以結合視覺與空間資訊方式處理,其中「行人重識別」(Person Re-Identification, Re-ID)是指在不同攝影機或跨時間的情境下,辨識並匹配屬於同一行人的圖像,為多目標多攝影機追蹤中的關鍵步驟。傳統Re-ID方法主要依賴以卷積神經網路(Convolutional Neural Network, CNN)模型提取之行人圖像的視覺嵌入特徵進行比對。然而,當行人出現遮蔽、視角變化與光照變異等情況時,低階視覺特徵易受干擾,導致辨識效能下降。 為了提升模型在遮蔽與視覺變化情境下的辨識穩定性與解釋能力,本研究採用「行人屬性辨識」(Pedestrian Attribute Recognition, PAR)作為補充資訊來源,PAR的任務是預測行人圖像中具備可解釋性的語意屬性(如性別、服裝顏色、配件等),這些高階語意特徵相較於傳統外觀特徵更具穩定性與可解釋性。我們採用行人屬性辨識模型C2T-Net並以UPAR2024資料集進行訓練,並在虛擬資料集AI City Challenge Track 1以及真實世界的桃園捷運A18站監視器影像上進行評估。 本研究旨在探討不同層次的特徵表示在跨視角與視覺干擾條件下,對行人重識別任務中身份辨識穩定性的影響。研究中評估三種整合高階語意特徵與傳統 Re-ID 模型的策略:(1)原始屬性機率、(2)經最佳門檻篩選的屬性機率、(3)由行人屬性辨識模型中間層提取的語意嵌入,並與傳統 Re-ID 外觀特徵進行比較。為檢視各特徵在干擾前後的辨識穩定性與區辨能力,針對同一身份與不同身份圖像對,於遮擋與光照變化兩種條件變化下進行餘弦相似度分布分析。實驗結果顯示,語意特徵在視覺變化下仍能維持較高的辨識一致性與可解釋性,展現其在跨攝影機辨識任務中的潛在應用價值。 ;Multi-Target Multi-Camera Tracking (MTMCT) aims to achieve continuous tracking of multiple moving targets across multiple cameras. MTMCT systems typically integrate both visual and spatial information, with Person Re-Identification (Re-ID), defined as identifying and matching images of the same individual across different cameras or time instances, serving as a critical component. Conventional Re-ID methods primarily rely on visual embeddings extracted by convolutional neural networks (CNNs). However, such low-level visual features can be easily disrupted under conditions such as occlusion, viewpoint changes, and illumination variations, leading to degraded performance. To enhance recognition stability and interpretability under occlusion and visual variation, this study employs Pedestrian Attribute Recognition (PAR), which aims to predict interpretable semantic attributes of pedestrians (e.g., gender, clothing color, accessories), as a complementary source of information. We adopt the C2T-Net model, trained on the UPAR2024 dataset, and evaluate its performance on both the 2024 AI City Challenge Track 1 simulated dataset and real-world surveillance footage from the Taoyuan Metro A18 station. This study investigates how different levels of semantic features affect identity recognition stability in person re-identification tasks under cross-camera and visual interference scenarios. Specifically, we compare three integration strategies of semantic features with conventional Re-ID models: (1) attribute probabilities, (2) attribute probabilities with threshold, and (3) intermediate semantic embeddings extracted from the PAR model. These are further compared against traditional low-level features. Furthermore, to assess the stability and discriminability of each feature under interference, cosine similarity distributions are analyzed between same-identity and different-identity image pairs under occlusion and illumination change conditions. Experimental results demonstrate that semantic features maintain high consistency and interpretability under visual variations, highlighting their potential in cross-camera re-identification applications. |