應用深度學習於結合自動偵測人物的步態辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：35

、訪客IP：18.221.55.135

姓名

洪昭銘(Chao-Ming Hung) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

應用深度學習於結合自動偵測人物的步態辨識
(Apply Deep Learning to Gait Recognition with Human Detection)

相關論文

★ 應用於3D手部點雲資料之2D輕量化分類器

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文之研究主題是應用卷積神經網路於步態識別方法的實踐與神經網路模型的改進，旨在透過對RGB彩色影像序列進行人物位置的偵測，擷取人物之連續步行序列，同時訓練一個卷積神經網路以萃取該步行序列之步態特徵，將擷取的特徵用以識別人物。

步態識別乃透過解析人們行走時各自不同的姿勢、習慣，包含骨架及關節運動等，來判斷目標的身分或身體狀況之非接觸式生物識別方法。此類方法最大的特點在於，它雖然可以使用穿戴式裝置捕捉精確的骨架，亦可在不需要對目標對象要求太多配合的狀況下進行。

有別於基於偵測骨架的方法，本論文針對輸入序列以預訓練模型萃取其光流特徵以及切取目標人物位置作為具體的低階特徵輸入，本論文多採用全卷積神經網路的架構或預訓練模型(YOLOv2及FlowNet2.0)，以克服輸入大小限制較多的問題。此後，訓練一個以拓寬殘差網路(Wide Residual Network, WRN)架構搭建的模型用來萃取高階的抽象特徵。並且本論文主要針對如何設計更高性能及效率的特徵萃取網路進行討論。

通過FlowNet2.0得出的光流特徵圖能夠有效濾除背景資訊，本論文將專注在目標的動作上，避免擷取抽象特徵的網路學習到不必要的資訊（包含人物外表及背景）。在此之上本論文加入人物偵測的預訓練模型YOLOv2，是為了剪除過剩的輸入大小，並且自動化前處理階段本須進行的人工標記過程。而基於殘差網路(Residual Network, ResNet)概念所改良出的WRN結構具有比VGG-Like網路結構更好的解析力以及訓練效率，足以負擔這項工作。

最後為了克服2D卷積網路難以取得區域時序特徵的問題，本論文提出了一個方法在網路中引入部分3D卷積結構，可以有效利用有限的記憶體資源設法取得更多有效的特徵，使得網路能夠發揮更佳的性能。

摘要(英)

The topic of this thesis is the implementation and improvement of convolutional neural networks applied to gait recognition methods. The purpose is training a convolutional neural network to extract the gait features of the human walking sequence, which preprocessed by detecting and cutting the ROI of person in the RGB image sequence. The extracted features extracted will use to identify people.

Gait recognition is a non-contact biometric method to determine the identity or physical condition of people by analyzing the different postures and habits of people performing when they are walking, including skeletons and joint movements.

Different from the method based on the detection skeleton, this paper uses the pre-training model(YOLOv2 and FlowNet2.0) to extract the optical flow feature maps and ROIs from the input sequence. Cutting and Concatenating the optical flow feature maps as the low-level feature input. Then, we will train a model built with the Wide Residual Network architecture to extract high-level abstract features from the low-level feature. And we mainly discuss how to design a feature extraction network with higher performance and efficiency.

Extracting Optical flow feature maps by using FlowNet 2.0 can effectively filter out background information, to avoid model to learn unnecessary information (including people appearance and background). Furthermore, we added YOLOv2 as the people detector, pruning the excess input size and automating the manual marking process.

In order to overcome the problem that 2D convolutional networks have difficulty in obtaining regional temporal feature, we have proposed a method to connect 3D/2D convolutional structures so that the network has better performance.

關鍵字(中)

★ 電腦視覺
★ 機器學習
★ 卷積神經網路
★ 步態識別
★ 深度學習
★ 稠密光流

關鍵字(英)

★ Computer Vision
★ Machine Learning
★ Convolutional Neural Network
★ Gait recognition
★ Deep Learning
★ Dense Optical Flow

論文目次

摘要 i
Abstract vi
目錄 vii
圖目錄 ix
表目錄 x
公式目錄 xi
第一章緒論 1
1.1 研究動機 1
1.2 相關研究文獻 3
1.3 系統架構 4
第二章相關技術探討 6
2.1 稠密光流特徵圖抽取 6
2.1.1 全卷積神經網路(FCN) 7
2.1.2 光流特徵網路(FlowNet2.0) 9
2.2 人物偵測 13
2.2.1 物件偵測網路(YOLOv2) 13
2.3 特徵網路訓練 17
2.3.1 殘差網路(Residual Networks) 17
2.3.2 拓寬殘差網路(Wide ResNet) 20
第三章步態識別網路 22
3.1 TUM-GAID步態資料集 22
3.2 前處理 23
3.3 資料標籤及分割 25
3.4 基礎網路結構 25
3.5 網路結構調整 28
3.5.1 調整網路寬度與訓練Batch Size 28
3.5.2 調整部分Kernel大小 29
3.5.3 增加3D卷積層及Depth Compaction 30
3.5.4 Learnable Depth Compaction 33
第四章實驗結果與討論 34
4.1 模型標註與實驗平台 34
4.2 學習率調整策略 35
4.3 全部樣本隨機分配(7:3) 35
4.3.1 收斂情形比較 36
4.3.2 測試階段損失與準確率比較 37
4.4 全部樣本隨機分配(5:5) 39
4.4.1 收斂情形比較 39
4.4.2 測試階段損失與準確率比較 41
4.5 根據人物對半分配序列 42
4.5.1 收斂情形比較 43
4.5.2 測試階段損失與準確率比較 43
4.6 全部序列隨機分配(5:5) 44
4.6.1 收斂情形比較 45
4.6.2 測試階段損失與準確率比較 46
4.7 幀數取樣數量的影響 47
4.8 與相關研究的比較 48
第五章結論與未來工作 51
5.1 結論 51
5.2 未來工作 51
參考文獻 53

參考文獻

[1] M. Hofmann, J. Geiger, S. Bachmann, B. Schuller and G. Rigoll, ”The TUM Gait from Audio, Image and Depth (GAID) Database: Multimodal Recognition of Subjects and Traits”, Journal of Visual Communication and Image Representation, Special Issue on Visual Understanding and Applications with RGB-D Cameras, vol. 25, no. 1, pp. 195-206, 2014.

[2] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos”, Conference and Workshop on Neural Information Processing Systems (NIPS), 2014.

[3] J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description” arXiv:1411.4389 [cs.CV], 2014.

[4] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks”, arXiv:1412.0767, 2015.

[5] Y. Feng, Y. Li, and J. Luo, “Learning Effective Gait Features Using LSTM”, International Conference on Pattern Recognition (ICPR), pp. 325–330, 2016.

[6] G. Giacomo, F. Martinelli, A. Saracino and M. S. Alishahi, “Try Walking in My Shoes, if You Can: Accurate Gait Recognition Through Deep Learning.”, International Conference on Computer Safety, Reliability, and Security SAFECOMP Workshops, 2017.

[7] D. Das and A. Chakrabarty, “Human Gait Recognition using Deep Neural Networks”, Proc. the Second International Conference on Information and Communication Technology for Competitive Strategies, no. 132, 2016.

[8] Sokolova and A. Konushin, “Pose-based Deep Gait Recognition”, arXiv:1710.06512[cs.CV], 2017.

[9] F.M. Castro, M.J. Marin-Jimenez and N. Guil, N. Perez de la Blanca, “Automatic learning of gait signatures for people identification”, arXiv:1603.01006v2[cs.CV], 2016.

[10] S. Zagoruyko and N. Komodakis, “Wide Residual Networks”, arXiv:1605.07146 [cs.CV], 2016.

[11] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You only look once: Unified, real-time object detection”, Computer Vision and Pattern Recognition (CVPR) ,2016.

[12] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger.”, arXiv:1612.08242[cs.CV], 2016.

[13] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks.”, arXiv:1612.01925 [cs.CV], 2016.

[14] P. Fischer, A. Dosovitskiy, E. Ilg, P. Hausser, C. Haz?rba?, V. Golkov, P. v. d. Smagt, D. Cremers, T. Brox, “Flownet: Learning optical flow with convolutional networks”, IEEE International Conference on Computer Vision (ICCV), 2015.

[15] J. Long, E. Shelhamer and T. Darrell, “Fully convolutional networks for semantic segmentation”, Computer Vision and Pattern Recognition (CVPR), 2015.

[16] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning”, arXiv:1603.07285v2 [stat.ML], 2018.

[17] K. He, X. Zhang, S. Ren and J. Sun, ”Deep residual learning for image recognition” arXiv:1512.03385[cs.CV], 2015.

[18] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation”, Computer Vision and Pattern Recognition (CVPR), 2015.

[19] R. Girshick,Fast, “R-CNN”, IEEE International Conference on Computer Vision (ICCV), 2015.

[20] S. Ren, K. He, R. Girshick, and J. Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks”, Conference and Workshop on Neural Information Processing Systems (NIPS), 2015.

指導教授

范國清林志隆(Kuo-Chin Fan Chih-Lung Lin)

審核日期

2018-7-26

推文