基於結合2D與3D卷積神經網路之駕駛人異常行為偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：40

、訪客IP：18.117.185.121

姓名

劉彥志(Yan-Zhi Liu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於結合2D與3D卷積神經網路之駕駛人異常行為偵測
(Driver Abnormal Behavior Detection Based on 2D and 3D Convolutional Neural Network)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

駕駛人異常行為偵測是近年來常被關注的熱門議題，透過系統偵測駕駛人的面部姿態、其他肢體以判斷當前駕駛人的狀態，若出現分心、疲勞駕駛等異常行為，系統會發出警告聲提醒駕駛人，以減少車禍傷亡的潛在風險。由於在真實場景中的車況變化多端，因此偵測系統必需同時具備「高偵測準確度」以及「能夠及時偵測」這兩項能力。
近年來深度學習在電腦視覺領域中取得相當的成功，至今已有許多文獻利用電腦視覺技術方法解決駕駛人異常行為偵測的議題，然而現有的文獻方法中，在「精度」以及「速度」兩方面無法同時取得良好的結果。本論文我們基於動作辨識文獻[15]中結合2D與3D卷積神經網路的方法，此架構擁有良好的靜態及動態特徵擷取能力，我們以此作為共享網路，同時學習昏睡、打盹、打哈欠、以及抽菸類別，並搭配其他正則化技巧像是Batch Normalization、預訓練、資料擴增等，以進一步提升效能。此外，我們也基於文獻[15]設計了一套在線視頻動作預測演算法，此演算法除了可以有效率地處理在線視頻預測問題，也能提取更長的時序結構以提升動作偵測的準確度。
在實驗中我們探討了預訓練、資料擴增、網路複雜度、採樣幀數大小、以及不同網路架構方法對模型的效能影響，我們證實了使用預訓練、資料擴增能有效幫助提升模型的精度，且若使用更多種資料擴增組合，精度也能更進一步提升。在採樣幀數與不同網路架構方法的實驗中，我們也證實了本論文架構能達到即時的預測速度，且在精度與模型大小方面皆比2D CNN方法及3D CNN方法好。

摘要(英)

Driver abnormal behavior detection is a hot issue that has been paid attention in recent years, the system detects the driver′s facial posture and other limbs to determine the current driver′s state. If abnormal behaviors such as distraction and drowsy driving have been detected, the system will make a warning sound to remind the driver to reduce the potential risks of car accident casualties. Since the vehicle conditions in real scenes change a lot, the detection system must provide both "high detection accuracy" and "capable of real time detection" abilities.
In recent years, deep learning has achieved considerable success in computer vision field. So far, there have been many papers using computer vision techniques to solve driver abnormal behavior detection issue. However, in the existing literature methods, both "accuracy" and "speed" cannot achieve good results at the same time. In this paper, we based on the method that combining 2D and 3D convolutional neural network from action recognition literature[15], this network can extract both static and dynamic features well. We regard this network as a shared network, and learn drowsiness, nodding, yawning and smoking classes simultaneously, as well as other regularization techniques like batch normalization, pre-training, data augmentations, etc. to further improve performance. In addition, we also design online video action prediction algorithm based on [15], this algorithm not only can handle online prediction problems efficiently, but also can extract longer-range temporal structure, further improving action detection accuracy.
In the experiment part, we explore the effects of pre-training, data augmentations, network complexity, the size of sampling frames, and different network architecture methods in terms of model performance. We confirm that we can gain significant accuracy improvements by applying pre-training and data augmentations, and if more types of data augmentations are used, the accuracy can be further improved again. Furthermore, according to the experiments of sampling frames and different network architecture methods, we also confirm that our model can not only achieve real time inference speed but also outperform 2D CNN and 3D CNN methods in terms of both accuracy and model size.

關鍵字(中)

★ 駕駛人行為辨識
★ 卷積神經網路
★ 在線視頻偵測

關鍵字(英)

★ driver behavior recognition
★ convolutional neural network
★ online video detection

論文目次

中文摘要 i
Abstract ii
圖目錄 iv
表目錄 vi
章節目錄 vii
第一章緒論 1
1.1 研究背景 1
1.2 研究動機與目的 1
1.3 研究方法與章節概要 3
第二章相關研究 4
2.1 資料擴增 4
2.2 卷積神經網路 6
2.2.1 卷積層 6
2.2.2 池化層 7
2.2.3 全連接層 8
2.3 3D卷積神經網路 8
2.4 典型卷積神經網路架構 9
2.4.1 Residual Network 9
2.4.2 Inception Network(GoogleNet) 12
第三章駕駛人異常行為偵測相關文獻 16
3.1 基於2D CNN方法[5] 16
3.2 基於3D CNN方法[8] 17
3.3 結合MTCNN與光流方法[11] 21
3.3.1 MTCNN 21
3.3.2 光流 23
3.3.3 網路架構與方法流程 24
第四章駕駛人異常行為偵測模型 27
4.1 網路架構 27
4.1.1 網路架構細節 28
4.2 網路訓練 31
4.2.1 損失函數 31
4.2.2 正則化技巧 32
4.3 在線視頻預測演算法 34
第五章實驗設計與實驗結果 38
5.1 實驗環境設置 38
5.2 資料集說明 38
5.2.1 NTHU-DDD資料集[37] 39
5.2.2 YawDD資料集[38] 39
5.2.3 HMDB-51資料集[39] 40
5.2.4 自製抽菸資料集 41
5.3 實驗設置與實作細節 41
5.4 實驗結果 44
5.4.1 實驗一 : 預訓練與從頭開始訓練方式的結果比較 44
5.4.2 實驗二 : 不同資料擴增組合的結果比較 45
5.4.4 實驗三 : 增加網路複雜度對模型結果的影響 47
5.4.3 實驗四 : 不同採樣幀數N的結果比較 48
5.4.5 實驗五 : 不同網路架構方法的比較 49
第六章結論與未來研究方向 51
第七章參考文獻 52

參考文獻

[1] Drowsy Driving NHTSA reports, Jun. 02, 2017. Retrieved from https://www.nhtsa.gov/risky-driving/drowsy-driving.
[2] Distracted Driving NHTSA reports, Sep. 08, 2016. Retrieved from https://www.nhtsa.gov/risky-driving/distracted-driving.
[3] H. Malik, F. Naeem, Z. Zuberi, and R. ul Haq, “Vision based driving simulation,” in 2004 International Conference on Cyberworlds, pp. 255–259, Nov. 2004.
[4] Z. Mardi, S. N. Ashtiani, and M. Mikaili, “EEG-based Drowsiness Detection for Safe Driving Using Chaotic Features and Statistical Tests,” J. Med. Signals Sens., vol. 1, no. 2, pp. 130–137, 2011.
[5] Y. Abouelnaga, H. M. Eraqi, and M. N. Moustafa, “Real-time Distracted Driver Posture Classification,” arXiv:1706.09498, Nov. 2018.
[6] S. Park, F. Pan, S. Kang and C. D. Yoo, “Driver Drowsiness Detection System Based on Feature Representation Learning Using Various Deep Networks,” in Computer Vision – ACCV 2016 Workshops, pp. 154–164, 2017.
[7] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.
[8] J. Yu, S. Park, S. Lee, and M. Jeon, “Representation Learning, Scene Understanding, and Feature Fusion for Drowsiness Detection,” in Computer Vision – ACCV 2016 Workshops, Cham, pp. 165–177, 2017.
[9] J. Lyu, Z. Yuan, and D. Chen, “Long-term Multi-granularity Deep Framework for Driver Drowsiness Detection,” arXiv:1801.02325, Jan. 2018.
[10] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[11] W. Liu, J. Qian, Z. Yao, X. Jiao, and J. Pan, “Convolutional Two-Stream Network Using Multi-Facial Feature Fusion for Driver Fatigue Detection,” Future Internet, vol. 11, no. 5, Art. no. 5, May 2019.
[12] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks,” IEEE Signal Process. Lett., vol. 23, no. 10, pp. 1499–1503, Oct. 2016.
[13] S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
[14] L. Wang et al., “Temporal Segment Networks: Towards Good Practices for Deep Action Recognition,” arXiv:1608.00859, Aug. 2016.
[15] M. Zolfaghari, K. Singh, and T. Brox, “ECO: Efficient Convolutional Network for Online Video Understanding,” arXiv:1804.09066, May 2018.
[16] A. G. Howard, “Some Improvements on Deep Convolutional Neural Network Based Image Classification,” arXiv:1312.5402, Dec. 2013.
[17] C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J Big Data, vol. 6, no. 1, p. 60, Jul. 2019.
[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., pp. 1097–1105, 2012.
[19] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” arXiv:1311.2901, Nov. 2013.
[20] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv:1409.1556, Apr. 2015.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv:1512.03385, Dec. 2015.
[22] C. Szegedy et al., “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 1–9, Jun. 2015.
[23] S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221–231, Jan. 2013.
[24] D. Tran, J. Ray, Z. Shou, S.-F. Chang, and M. Paluri, “ConvNet Architecture Search for Spatiotemporal Feature Learning,” arXiv:1708.05038, Aug. 2017.
[25] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning Spatiotemporal Features with 3D Convolutional Networks,” in 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 4489–4497, Dec. 2015.
[26] M. Lin, Q. Chen, and S. Yan, “Network In Network,” arXiv:1312.4400, Mar. 2014.
[27] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167, Mar. 2015.
[28] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, US, pp. 2818–2826, Jun. 2016.
[29] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” arXiv:1602.07261, Aug. 2016.
[30] S. S. Farfade, M. J. Saberian, and L.-J. Li, “Multi-view Face Detection Using Deep Convolutional Neural Networks,” in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval - ICMR ’15, Shanghai, China, pp. 643–650, 2015.
[31] S. Bambach, S. Lee, D. J. Crandall, and C. Yu, “Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions,” Proc IEEE Int Conf Comput Vis, vol. 2015, pp. 1949–1957, Dec. 2015.
[32] G. Farnebäck, “Two-Frame Motion Estimation Based on Polynomial Expansion,” in Image Analysis, vol. 2749, J. Bigun and T. Gustavsson, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 363–370, 2003.
[33] W. Kay et al., “The Kinetics Human Action Video Dataset,” arXiv:1705.06950, May 2017.
[34] J. Carreira and A. Zisserman, “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 4724–4733, Jul. 2017.
[35] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv:1207.0580, Jul. 2012.
[36] C.-H. Weng, Y.-H. Lai, and S.-H. Lai, “Driver Drowsiness Detection via a Hierarchical Temporal Deep Belief Network,” in Computer Vision – ACCV 2016 Workshops, Cham, pp. 117–133, 2017.
[37] S. Abtahi, M. Omidyeganeh, S. Shirmohammadi, and B. Hariri, “YawDD: A yawning detection dataset,” pp. 24–28, Mar. 2014.
[38] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “HMDB: A large video database for human motion recognition,” in 2011 International Conference on Computer Vision, pp. 2556–2563, Nov. 2011.
[39] A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, and A. A. Kalinin, “Albumentations: Fast and Flexible Image Augmentations,” Information, vol. 11, no. 2, Art. no. 2, Feb. 2020.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2020-7-30

推文