dc.description.abstract | Closed-circuit Television (CCTV) has been widely used in various applications such as
security control, traffic monitoring, missing people finding or unmanned stores. CCTV systems
provide real-time video feeds that usually require human interpretation to extract information,
which is expensive and inefficient. This research aims at designing a framework to
automatically extract locations of moving targets from CCTV systems. This framework
includes three main steps: Detection, Tracking and Reidentification. For the Detection, we use
the mixture of gaussians (MOG) method and morphology enhancement to separate the
foreground from the background. Afterward, we initialize a RE3
(Real-Time Recurrent
Regression) tracker to track each stable object detected from the MOG foreground. The tracker
continuously outputs bounding boxes of an object, that provide two major information: object
image crops and object foot locations. To classify the identity of objects (i.e., Reidentification),
we first apply the Geo-Matching that compares the object foot locations detected by different
cameras to link objects in these cameras together. In the meantime, we use the VGG16 to extract
the feature embedding from the object image crops, which will be applied to match with known
classes via the cosine similarity. In addition, to improve feature matching performance and
avoid wrong matches, we use the object’s foot locations, moving velocity and last locations of
known classes to estimate the spatial-temporal rationality of a correct match for each class.
Furthermore, the moving directions of an object help estimate the captured object’s aspects in
the image crops, which serve as a constraint to select suitable candidate classes’ images that
have similar aspects to improve the feature matching accuracy. In terms of the testing dataset,
we simulate a relatively ideal environment that is an office with 2 sets of 6 moving objects and
7 cameras in Unity, where high-definition videos were obtained without noises. As a result, the
proposed solution reaches 1m of single-camera object tracking error, 2-3m of multi-camera
multi-target object tracking error and over 80% of classification consistency. By this research,
we can further develop applications in public surveillance, disaster prevention, unmanned store
and smart city. | en_US |