Multi-Target Multi-Camera Tracking and Reidentification with Artificial Neural Networks and Spatial-Temporal Information

NCU Institutional Repository > 工學院 > 土木工程研究所 > 博碩士論文 > Item 987654321/88052

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/88052

題名:	Multi-Target Multi-Camera Tracking and Reidentification with Artificial Neural Networks and Spatial-Temporal Information
作者:	許又升;Hsu, Yu-Sheng
貢獻者:	土木工程學系
關鍵詞:	物件追蹤;物件再識別;類神經網路;監視攝影機;Object Tracking;Reidentification;Artificial Neural Network;CCTV
日期:	2022-01-25
上傳時間:	2022-07-13 16:26:45 (UTC+8)
出版者:	國立中央大學
摘要:	監視攝影機在交通監控、商業與居家安全以及犯罪偵查扮演重要的角色。而監視攝影機連續獲取影像資料後，仍須由人力解讀其中的資訊，如物件識別、情境語意或是物件位置追蹤等等，此過程效率差且成本高。本研究欲針對監視影像中攝影機之物件追蹤設計一自動化且低成本的方法。本研究的流程大致分為偵測、追蹤、識別三個部分。偵測即在監視影像中尋找前景物件。本研究中採用高斯混合法（Mixture of Gaussian），建立各監視影像背景模型獲得前景，再利用型態學（Morphology）方法去除雜訊並分離出單一影像中的前景物件。追蹤為在連續影像中標示出同一物件。為避免畫面震盪、雜訊遮擋等影響，本研究使用 RE 3 類神經網路追蹤器，透過其長短期記憶模型（Long short-term memory, LSTM），使追蹤器更加穩定並取得單一監視影像中同一物體的影像邊界框。識別是在不同影像中判斷同一物件。本研究中使用卷積神經網路萃取物件特徵以及物件的時空間資訊作為軟性生物特徵。透過將高維的影像資料降維至單一維度的特徵資訊，進而比較兩張物件影像之相似程度以達成識別。而時空間特徵，本研究以人工選取攝影機中的控制點，將不同攝影機的影像坐標投影至統一的坐標系統。對於出現在同一時間點的物件，若其在統一坐標系統中距離持續接近，則視其為同一類別。對於出現在不同時間點之物件，本研究設計一時空合理性函數，考量兩物件之時間差、距離以及物件移動速度，計算出兩物件為同一類別之合理程度，作為外觀匹配之候選條件。另外，藉由比對攝影機方向以及物件移動方向，可以求得攝影機拍攝物件之面向，作為外觀特徵匹配的候選條件。為驗證所提之方法，本研究使用 Unity 遊戲引擎建立一虛擬的辦公室場景，包含 7 台固定攝影機以及兩組各 6 個物件在場景中沿固定路線並以固定速率移動做為資料集。本研究分別對追蹤成果計算路徑誤差以及對再識別成果計算類別一致性。以結果而言，單攝影機物件追蹤誤差約在 1m 左右，而根據物件幀數進行權重平均之誤差為 0.8m，而以分類結果所得之多攝影機追蹤誤差約在 2-3m 左右。再識別之類別一致性達 80%，代表同類別中 80%為同一物件。透過本研究所提方法，能夠達成不同監視攝影機之連結進行跨攝影機之物件追蹤，預期將可於保全、防災、無人商店、智慧城市等領域有效應用。 ;Closed-circuit Television (CCTV) has been widely used in various applications such as security control, traffic monitoring, missing people finding or unmanned stores. CCTV systems provide real-time video feeds that usually require human interpretation to extract information, which is expensive and inefficient. This research aims at designing a framework to automatically extract locations of moving targets from CCTV systems. This framework includes three main steps: Detection, Tracking and Reidentification. For the Detection, we use the mixture of gaussians (MOG) method and morphology enhancement to separate the foreground from the background. Afterward, we initialize a RE3 (Real-Time Recurrent Regression) tracker to track each stable object detected from the MOG foreground. The tracker continuously outputs bounding boxes of an object, that provide two major information: object image crops and object foot locations. To classify the identity of objects (i.e., Reidentification), we first apply the Geo-Matching that compares the object foot locations detected by different cameras to link objects in these cameras together. In the meantime, we use the VGG16 to extract the feature embedding from the object image crops, which will be applied to match with known classes via the cosine similarity. In addition, to improve feature matching performance and avoid wrong matches, we use the object’s foot locations, moving velocity and last locations of known classes to estimate the spatial-temporal rationality of a correct match for each class. Furthermore, the moving directions of an object help estimate the captured object’s aspects in the image crops, which serve as a constraint to select suitable candidate classes’ images that have similar aspects to improve the feature matching accuracy. In terms of the testing dataset, we simulate a relatively ideal environment that is an office with 2 sets of 6 moving objects and 7 cameras in Unity, where high-definition videos were obtained without noises. As a result, the proposed solution reaches 1m of single-camera object tracking error, 2-3m of multi-camera multi-target object tracking error and over 80% of classification consistency. By this research, we can further develop applications in public surveillance, disaster prevention, unmanned store and smart city.
顯示於類別:	[土木工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	54	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....