|| W. H. Organization, Global status report on road safety 2018. World Health Organization, 2018.|
 F. Guo et al., "The effects of age on crash risk associated with driver distraction," International journal of epidemiology, vol. 46, no. 1, pp. 258-265, 2017.
 Z. Zhou, S. Liu, W. Xu, Z. Pu, S. Zhang, and Y. Zhou, "Impacts of mobile phone distractions on pedestrian crossing behavior at signalized intersections: An observational study in China," Advances in Mechanical Engineering, vol. 11, no. 4, p. 1687814019841838, 2019.
 H. Zhang, C. Zhang, F. Chen, and Y. Wei, "Effects of mobile phone use on pedestrian crossing behavior and safety at unsignalized intersections," Canadian Journal of Civil Engineering, vol. 46, no. 5, pp. 381-388, 2019.
 W. Yuanyuan, Z. Cunbao, Z. Bin, C. Feng, and Z. Hualong, "The mobile phone use behavior and its effect on pedestrian safety at signalized intersections in China," in 2017 4th International Conference on Transportation Information and Safety (ICTIS), 2017: IEEE, pp. 225-231.
 B. Le, C. Figueroa, C. Anderson, S. Lotfipour, and C. Barrios, "Determining the incidence of distraction among trauma patients in all modes of transportation," Journal of trauma and acute care surgery, vol. 87, no. 1, pp. 87-91, 2019.
 J. L. Nasar and D. Troyer, "Pedestrian injuries due to mobile phone use in public places," Accident Analysis & Prevention, vol. 57, pp. 91-95, 2013.
 D. L. Strayer and F. A. Drew, "Profiles in driver distraction: Effects of cell phone conversations on younger and older drivers," Human factors, vol. 46, no. 4, pp. 640-649, 2004.
 G. Selamaj, "Impacts of Mobile Phone Distractions on Walking Performance," Indonesian Journal of Computing, Engineering and Design (IJoCED), vol. 2, no. 1, pp. 32-37, 2020.
 T. Hoang Ngan Le, Y. Zheng, C. Zhu, K. Luu, and M. Savvides, "Multiple scale faster-rcnn approach to driver′s cell-phone usage and hands on steering wheel detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 46-53.
 W. Kim, H.-K. Choi, B.-T. Jang, and J. Lim, "Driver distraction detection using single convolutional neural network," in 2017 international conference on information and communication technology convergence (ICTC), 2017: IEEE, pp. 1203-1205.
 H. M. Eraqi, Y. Abouelnaga, M. H. Saad, and M. N. Moustafa, "Driver distraction identification with an ensemble of convolutional neural networks," Journal of Advanced Transportation, vol. 2019, 2019.
 A. Rangesh, E. Ohn-Bar, K. Yuen, and M. M. Trivedi, "Pedestrians and their phones-detecting phone-based activities of pedestrians for autonomous vehicles," in 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), 2016: IEEE, pp. 1882-1887.
 A. Rangesh and M. M. Trivedi, "When Vehicles See Pedestrians With Phones: A Multicue Framework for Recognizing Phone-Based Activities of Pedestrians," IEEE Transactions on Intelligent Vehicles, vol. 3, no. 2, pp. 218-227, 2018.
 K. Kumamoto and K. Yamada, "Detecting Interaction of Pedestrians with Their Smartphones Based on Body Keypoints," in 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2018: IEEE, pp. 3261-3266.
 J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767, 2018.
 G.-S. Xia et al., "DOTA: A large-scale dataset for object detection in aerial images," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3974-3983.
 T.-Y. Lin et al., "Microsoft coco: Common objects in context," in European conference on computer vision, 2014: Springer, pp. 740-755.
 A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, "Simple online and realtime tracking," in 2016 IEEE International Conference on Image Processing (ICIP), 2016: IEEE, pp. 3464-3468.
 Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7291-7299.
 Z. Fang, D. Vázquez, and A. M. López, "On-board detection of pedestrian intentions," Sensors, vol. 17, no. 10, p. 2193, 2017.
 K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
 S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, 2012.
 R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam: Visual explanations from deep networks via gradient-based localization," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618-626.