摘要: | 根據研究顯示,不管是駕駛者或行人,其分心的行為都將對安全造成嚴重影響,因此若能避免分心的行為,將能使交通意外發生的機率降低。然而過去研究大多針對駕駛者的分心行為進行偵測,較少研究針對行人的分心行為進行偵測,因此,本研究的目的為透過自動偵測行人分心行為的技術,來改善駕駛者與行人的安全問題,當能夠精準的偵測分心的行人之後,可以透過干預措施來改善交通安全,譬如發送訊息或信號給駕駛者和行人,除此之外,此技術也能使用在自動駕駛或是先進駕駛輔助系統(ADAS),當偵測到行人分心時,可以透過減速或其他的預防措施來避免意外的發生。 本論文基於OpenPose特徵進行行人分心偵測,論文中提出新的CNN架構,使用OpenPose的中間層特徵圖當作CNN的輸入 (OpenPose-based CNN),取代以影像做為CNN的輸入 (Image-based CNN),使EER改善33.33% (EER = 8%),除此之外,實驗發現Skeleton-based SVM與OpenPose-based CNN可以處理不同的資料,因此將兩個模型進行Ensemble,相比只使用單一模型可使EER改善20% (EER = 6.4%)。最後我們嘗試使用連續的多張影像進行識別,相比以單張影像進行識別可使EER改善42.19% (EER = 3.7%)。 ;According to previous researches, both driver and pedestrian distraction will have a serious impact on safety. Therefore, if the distracting behavior can be avoided, the probability of traffic accidents will be reduced. However, in the past, most researches focused on the detection of distracted driver. There are limited researches which focused to detect distraction among pedestrians. Therefore, the goal of this research is to improve both the driver and pedestrian safety by automatically detect distracted pedestrian. After accurately detecting the distracted pedestrian, interventions can be applied to improve traffic safety, such as sending warning messages or signals to drivers and pedestrians. In addition, this technology can also be used in self-driving car or advanced driver assistance system (ADAS). ADAS can help avoid accident through deceleration or other preventive methods after distracted pedestrian is detected. In this paper, we propose a new distracted pedestrian detection method based on the OpenPose features, and propose a new CNN architecture, using OpenPose′s intermediate layer feature map as CNN input (OpenPose-based CNN), compared with using images as CNN input (image-based CNN), EER can be improved by 33.33% (EER = 8%). In addition, the experiment found that Skeleton-based SVM and OpenPose-based CNN can handle different type of data, so ensemble the two models can improve EER by 20% (EER = 6.4%). Finally, we try to use multiple continuously images for recognition, which can improve EER by 42.19% (EER = 3.7%) compared to use single image for recognition. |