隨著科技進步,無論在居家照護、監控、無人商店等領域,機器人都將扮演重要的角色。其中,基於視覺的行人跟蹤是十分重要的技術,可以讓機器人能夠不受範圍限制,跟隨人到處移動。本文所提出的行人跟蹤方法在僅具備1.2GHz ARM CPU和1GB RAM的Raspberry Pi 3上達到實時且可靠地運行,並且所需的硬體成本低廉(35美元),使得IoT應用能更加廣泛,例如智慧行李箱、自動購物車…等,這些應用在未來將改變我們的日常生活。 行人偵測器的準確性和速度對於可靠的跟蹤系統至關重要。然而,目前出色的深度學習目標偵測方法,例如Faster R-CNN、YOLO等方法,仍需耗費大量的運算資源和儲存空間,必須使用高階的CPU或GPU才能達到實時(30fps)運行,但在僅具備低階CPU或FPGA的嵌入式平台上仍然難以應用。因此,本文通過優化模型架構,並且使用訓練技巧輔助,得到了輕量並可靠的人體偵測模型Brisk-YOLO。並且,此偵測器在INRIA和PASCAL VOC公開資料集上與其他目標偵測方法相比,在保證行人偵測器的準確度下,將Tiny-YOLO加速了55倍,在Raspberry Pi 3上達到22fps。 此外,為了進一步節省計算量,偵測器並不逐幀運算,只有當跟蹤器誤差惡化或目標丟失時才使用檢測器進行修正。我們挑選了速度快的目標跟蹤器(Object Tracking)和行人再識別(Person Re-identification)方法,確保系統能穩定地運作。 通過在BoBoT數據集上實驗表明,本系統在現實的行人目標跟蹤場景中,其速度和準確率皆優於其他實時長時間跟蹤算法。 ;A vision-based person following method is important for various applications in human robot interaction (HRI). The accuracy and speed of person detector decide the performance of reliable person following system. However, state-of-the-art object detection based on CNNs such as YOLO require large memory and computational resources provided by high-end GPUs for real-time applications. They are unable to run on an embedded device with low-level CPUs or FPGA. Therefore, in this paper, a lightweight but reliable human detector Brisk-YOLO which developed by optimizing the model architecture and using training techniques. This method can reduce the computed quantity greatly and guarantees the accuracy of person detection. In addition, in order to reduce the computation cost, the detector applies to every frame. It only applies in the beginning for initializing human target localization, alleviating the accumulated tracking error and on the events of object missing or occlusion. We have selected fast Object Tracking and Person Re-identification methods to ensure that system can run steadily. The experimental results indicate that this system achieves real-time and reliable operation on the Raspberry Pi 3 with only 1.2GHz ARM CPU and 1GB of RAM in real-world person following scenario videos, and its accuracy is better than other long-term tracking methods. The proposed system can re-identify persons after periods of occlusion and distinguish a target from each other, even if they are looking similar. The BoBoT benchmark resulted in an average IoU of 73.39%, which is higher than state-of-the-art algorithms.