基於深度學習之人形偵測以實現空中手寫與行人姿態辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：29

、訪客IP：3.17.76.136

姓名

賴慶榮(Lai, Chin-Rong) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於深度學習之人形偵測以實現空中手寫與行人姿態辨識
(Human Body Detection Based on Deep Learning to Facilitate Air Writing and Pedestrian Gait Recognition)

相關論文

★ 使用視位與語音生物特徵作即時線上身分辨識	★ 以影像為基礎之SMD包裝料帶對位系統
★ 手持式行動裝置內容偽變造偵測暨刪除內容資料復原的研究	★ 基於SIFT演算法進行車牌認證
★ 基於動態線性決策函數之區域圖樣特徵於人臉辨識應用	★ 基於GPU的SAR資料庫模擬器：SAR回波訊號與影像資料庫平行化架構 (PASSED)
★ 利用掌紋作個人身份之確認	★ 利用色彩統計與鏡頭運鏡方式作視訊索引
★ 利用欄位群聚特徵和四個方向相鄰樹作表格文件分類	★ 筆劃特徵用於離線中文字的辨認
★ 利用可調式區塊比對並結合多圖像資訊之影像運動向量估測	★ 彩色影像分析及其應用於色彩量化影像搜尋及人臉偵測
★ 中英文名片商標的擷取及辨識	★ 利用虛筆資訊特徵作中文簽名確認
★ 基於三角幾何學及顏色特徵作人臉偵測、人臉角度分類與人臉辨識	★ 一個以膚色為基礎之互補人臉偵測策略

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

由於智慧型科技快速發展，人類姿態辨識的研究已成為熱門的研究領域之一。所謂姿態辨識即是使用電腦或智慧型設備來偵測並解譯人類姿態意涵的能力。這些姿態包括人類的手或軀體的移動、臉部表情甚或聲音指令等，皆可以做為用來控制設備或人機介面所使用。空中手寫是一種新型的人類與智慧型設備通信方法，允許使用者以自然連續的方式進行溝通控制。而步態辨識則是另一種健康照護或安全監視的應用領域，而最新興起的機器學習則可以應用於上述兩種技術的研究發展，並可對其所獲得的資料進行分析與解譯。
相較於其他書寫方法而言，空中手寫具有冗餘提筆筆畫、單一字書寫多樣性(multiplicity)及不同字軌跡類似模糊性(confusion)等獨特的特性，導致其較之於其書寫方法更具挑戰性。我們提出了一個嶄新的逆時序演算法，無需任何啟動的觸發動作或筆畫，有效率地過濾掉不必要的提筆筆畫，並簡化了複雜的筆劃軌跡比對程序。接著我們設計了一個三層階梯式結構，並以不同的取樣速率對空中手寫軌跡進行取樣，以解決書寫多樣性及軌跡類似模糊性等問題，所提出逆時序筆畫軌跡辨識的方法，其精確率可高達94%以上。
有關行人步態辨識方面，我們利用深度神經網路來達到自動偵測與辨識的功能。在抓取行人骨骼與關節移動部分，使用的是一連串的行人彩色影像輸入，而非使用穿戴式裝置來獲取影像資料。其後，我們使用捲機神經網路(CNN)抓取行人的位置，接著行人的密集光流這些低階特徵也被抽取出來，一起當成下階段處理的輸入資料。下一步是使用經微調的寬殘差網路(wide Residual Network)來抽取高階的抽象特徵。除此之外，為了克服使用二維(2D) CNN無法獲得局部且具有時序性特徵的困難，我們引入並使用了部分的三維(3D)卷積結構。此種設計使得在記憶體受到限制的實體環境中，能獲得有效的特徵抽取並提高了深度神經網路(DNN)的執行效能。實驗結果顯示本論文所提出的行人偵測辨識方法具有相當良好的執行效能。

摘要(英)

With the rapid development of intelligent technologies, gesture recognition has become one of the most popular research areas in the world. It is the ability of a computer or smart device to detect and interpret human gestures. Such gestures, including movements of hand or body, facial expressions or even voice commands, can be used to control devices or interfaces. Air-writing is a new human and smart device communication approach which permits users to write inputs in a natural and relentless way. Gait recognition is another one for healthcare and surveillance. And machine learning can be applied to these two typical applications to analyze and interpret the captured data.
Compared with other writing methods, air-writing is more challenging due to its unique characteristics such as redundant lifting strokes, multiplicity, and confusion. Without using any starting trigger, we propose a novel reverse time-ordered algorithm to efficiently filter out unnecessary lifting strokes, and thus simplifies the matching procedure. Then a tiered arrangement structure is proposed by sampling the air-writing results with various sampling rates to solve the multiplicity and confusion problems. The recognition accuracy of the proposed approach is satisfactorily higher than 94%.
As to the gait recognition, we apply a deep neural network (DNN) to achieve gait-based automatic pedestrian detection and recognition. Instead of using wearable devices to precisely capture skeletal and joint movements, pedestrian color-image sequences are used as input. At a subsequent time, a pretraining convolutional neural network (CNN) is employed to capture pedestrian location, and the pedestrian dense optical flow is extracted to serve as concrete low-level feature inputs. Then, a finely-tuned DNN based on the wide residual network is employed to extract high-level abstract features. In addition, to overcome the difficulty of obtaining local temporal features by using a 2D CNN, part of the 3D convolutional structure is introduced into the CNN. This design enabled use of limited memory to acquire more effective features and enhance the DNN performance. The experimental results show that the proposed method has exceptional performance for pedestrian detection and recognition.

關鍵字(中)

★ 空中手寫
★ 行人姿態

關鍵字(英)

★ Air Writing
★ Pedestrian Gait

論文目次

Chapter 1: Introduction 1
1.1 Motivation 1
1.2 Organization of Thesis 3
Chapter 2 Review of Related Work 4
2.1 Review of air writing 4
2.2 Review of Gait-Based Pedestrian Detection and Recognition 7
Chapter 3: Air-Writing Recognition using Reverse Time Ordered Stroke Context 10
3.1 The Proposed Method 10
3.2 Motion trajectory extraction and normalization 11
3.2.1 Turning point extraction 12
3.2.2Turning Point Extraction Algorithm 13
3.2.2 Normalization 14
3.3 Air writing representation 15
3.3.1 Shape context 15
3.3.2 Time-order shape context 15
3.3.3 Backward time-order stroke representation 17
3.4 Hierarchical air writing recognition 20
3.4.1 Dynamic time warping 20
3.4.2 Trajectory comparison with weighting 21
3.4.3 Hierarchical classification 22
3.5 Experimental results 25
Chapter 4: Gait-Based Pedestrian Detection and Recognition 35
4.1 The Proposed Method 35
4.2 Pedestrian Color Image Sequence 36
4.3 Pedestrian Detection and ROI Location 37
4.4 Pedestrian Dense Optical Flow Extraction 38
4.5 Pedestrian Dense Optical Flow ROI Processing 42
4.6 High-Level Abstract-Feature Extraction 44
4.6.1. Wide Residual Network (Wide ResNet) 44
4.6.2. Wide ResNet Modification 45
4.7 Experiments 51
4.7.1 Deep Neural Network Label and Experiment Platform 51
4.7.2 Learning Rate Adjustment Strategy and Loss Function 52
4.7.3 Dataset Splitting at a 7:3 Ratio 53
4.7.4 Dataset splitting at a 5:5 Ratio 56
4.7.5 Effects of Sample Frame Number 59
4.7.6 Summary of Related Research 60
Chapter 5: Conclusions and Future Works 63
References 64

參考文獻

[1] M.Y. Chen, G. Alregib, B.-H. Juang, Air-writing recognition—Part I: modeling and recognition of characters, words, and connecting motions, IEEE Trans. Hum.-Mach. Syst. 46 (3) (2016) 403–413.
[2] S. Mitra, T. Acharya, Gesture recognition: a survey, IEEE Trans. Syst., Man, Cybern. C Appl. Rev. 37 (3) (2007).
[3] L. Gupta, S. Ma, Gesture-based interaction and communication: automated classification of hand gesture contours, IEEE Trans. Syst. Man Cybern. C Appl. Rev. 31 (1) (2001) 114–120.
[4] I. Infantino, R. Rizzo, S. Gaglio, A framework for sign language sentence recognition by commonsense context, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., 37(5) (Sep. 2007) 1034–1039.
[5] X. Zhang, et al., A framework for hand gesture recognition based on accelerometer and EMG sensors, IEEE Trans. Syst., Man, Cybern., A, Syst., Humans, 41(6) (Nov. 2011) 1064-1076.
[6] K.M. Lim, A.W.C. Tan, S.C. Tan, Block-based histogram of optical flow for isolated sign language recognition, J. Vis. Commun. Image R. 40 (2016) 538–545.
[7] T.-H. S. Li, M.-C. Kao, P.-H. Kuo, Recognition System for Home-Service-Related Sign Language Using Entropy-Based K -Means Algorithm and ABC-Based HMM, IEEE Trans. Systems, Man, and Cybernetics, 46(1) (Jan. 2016).
[8] N.C. Kiliboz, U. Gudukay, A hand gesture recognition technique for human–computer interaction, J. Vis. Commun. Image R. 28 (2015) 97–104.
[9] J. Yang, J. Yuan, Y. Li, Parsing 3D motion trajectory for gesture recognition, J. Vis. Commun. Image R. 38 (2016) 627–640.
[10] X. Zhang, et al., A new writing experience: finger writing in the air using a kinect sensor, IEEE Multimedia 20 (4) (2013) 85–93.
[11] K. Tsuchida, H. Miyao, M. Maruyama, Handwritten Character Recognition in the Air by Using Leap Motion Controller, International Conference on Human Computer Interaction, vol. 52, Springer, pp. 534-538, 2015, https://doi.org /10.1007/978-3-319-21380-4_91.
[12] C.-C. Chiang, R.-H. Wang, B.-R. Chen, ‘Recognizing arbitrarily connected and superimposed handwritten numerals in intangible writing interfaces’, Pattern Recogn. 61 (2017) 15–28.
[13] J. Tian, C. Qu, W. Xu, S. Wang, KinWrite: Handwriting-Based Authentication Using Kinect, in: Proceedings of the 20th Annual Network & Distributed System Security Symposium, 2013.
[14] Romain Tavenard, An introduction to Dynamic Time Warping, https://rtavenar.github.io/blog/dtw.html#dynamic-time-warping
[15] C.Z. Qu, D.Y. Zhang, J. Tian, Online kinect handwritten digit recognition based on dynamic time warping and support vector machine, J. Inform. Computational Sci. 12 (1) (2015) 413–422.
[16] -T. Chu, C.-Y. Su, A Kinect-Based Handwritten Digit Recognition for TV Remote Controller, IEEE International Symposium on Intelligent Signal Processing and Communications Systems, 2012, pp.414-419.
[17] F.-A. Huang, C.-Y. Su, T.-Te Chu, Kinect-Based Bid-Air Handwritten Digit Recognition using Multiple Segments and Scaled Coding, IEEE International Symposium on Intelligent Signal Processing and Communications Systems, Nov. 2013, pp. 694-697.
[18] C.-Y. Su, et al., Kinect-Based Midair Handwritten Number Recognition System for Dialing Numbers and Setting a Timer, IEEE International Conference on Systems, Man and Cybernetics, Oct. 2014, pp. 2127-2130.
[19] T. Murata, J. Shin, Hand Gesture and Character Recognition Based on Kinect Sensor, International Journal of Distributed Sensor Networks, vol. 10, Jul. 2014, [online] Available: https://doi.org/10.1155/2014/543278460.
[20] A. Schick, D. Morlock, C. Amma, Vision-Based Handwriting Recognition for Unrestricted Text Input in Mid-Air, in: Proceedings of the 14th ACM international conference on Multimodal Interaction, Oct. 2012, pp. 217-220.
[21] S. Beg, M. F. Khan, and F. Baig, “Text writing in Air,” Journal of Information Display, vol. 14, no. 4, 2013, https://doi.org/10.1080/15980316.2013.860928.
[22] A. Takeuchi, Y. Manabe, K. Sugawara, Multimodal Soft Biometrie Verification by Hand Shape and Handwriting Motion in the Air, IEEE International Joint Conference on Awareness Science and Technology and Ubi-Media Computing, Nov. 2013, pp. 103-109.
[23] Z.-Wen Sun et al., A 3-D hand gesture signature based biometric authentication system for smartphones, Security Communication Networks, vol. 9, Feb. 2016, pp.1359-1373.
[24] G. Xiao, M. Milanova, M. Xie, Secure behavioral biometric authentication with leap motion, 2016 4th International Symposium on Digital Forensic and Security (ISDFS), Little Rock, AR, 2016, pp. 112-118, doi: 10.1109/ISDFS.2016.7473528.
[25] N. Akazawa, Y. Takei, Y. Nakayama, H. Kakuda, M. Suzuki, A Learning Support System for 9x9 multiplication table with Kinect, in: IEEE 2nd Global Conference on Consumer Electronics (GCCE), Oct. 2013, pp. 253-257.
[26] P. Suryanarayan, A. Subramanian, D. Mandalapu, Dynamic Hand Pose Recognition Using Depth Data, IEEE International Conference on Pattern Recognition, Aug. 2010, pp. 3105-3108.
[27] L.W. Chiu, et al., Person authentication by air-writing using 3D sensor and time order stroke context, International Conference on Smart Multimedia ICSM (2018) 260–273.
[28] T.-H. Tsai, J.-W. Hsieh, H.C. Chen, Shih-Chin Huang. Reverse time ordered stroke context for air-writing recognition, in: 2017 10th International Conference on Ubi-media Computing and Workshops (Ubi-Media).
[29] S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition using shape contexts, IEEE Trans. Pattern Recognition Mach. Intell. 24 (4) (2002) 509–522.
[30] Hofmann, M.; Geiger, J.; Bachmann, S.; Schuller, B.; Rigoll, G. The TUM Gait from Audio, Image and Depth (GAID) Database:Multimodal Recognition of Subjects and Traits. J. Vis. Commun. Image Represent. 2014, 25, 195–206. [CrossRef]
[31] Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 8 December 2014.
[32] Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
[33] Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015.
[34] Feng, Y.; Li, Y.; Luo, J. Learning Effective Gait Features Using LSTM. In Proceedings of the International Conference on Pattern Recognition (ICPR), Cancún, Mexico, 4–8 December 2016.
[35] Giacomo, G.; Martinelli, F.; Saracino, A.; Alishahi, M.S. Try Walking in My Shoes, if You Can: Accurate Gait Recognition Through Deep Learning. In Proceedings of the International Conference on Computer Safety, Reliability, and Security, Trento, Italy, 12–15 September 2017.
[36] Das, D.; Chakrabarty, A. Human Gait Recognition using Deep Neural Networks. In Proceedings of the International Conference on Information and Communication Technology for Competitive Strategies, Udaipur, India, 4–5 March 2016.
[37] Sokolova, A.; Konushin, A. Pose-based Deep Gait Recognition. IET Biom. 2019, 8, 134–143. [CrossRef]
[38] Castro, F.M.; Marín-Jiménez, M.J.; Guil, N.; Pérez de la Blanca, N. Automatic learning of gait signatures for people identification. In Proceedings of the International Work-Conference on Artificial Neural Networks, Cádiz, Spain, 18 May 2017.
[39] Redmon, J.; Farhadi, A. Yolo9000: Better, faster, stronger. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017.
[40] Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017.
[41] Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016.
[42] Girshick, R. Fast R-CNN. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015.
[43] Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 7–12 December 2015.
[44] Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015.
[45] Dosovitskiy, A.; Fischer, P.; Ilg, E.; Hausser, P.; Hazirbas, C.; Golkov, V.; Van Der Smagt, P.; Cremers, D.; Brox, T. Flownet: Learning optical flow with convolutional networks. In Proceedings of the International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015.
[46] He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016.
[47] Zagoruyko, S.; Komodakis, N. Wide Residual Networks. In Proceedings of the British Machine Vision Conference, York, UK,19–22 September 2016.

指導教授

范國清

審核日期

2024-6-27

推文