利用虛擬資料建構深度學習訓練集以實現凌空書寫應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：65

、訪客IP：3.148.106.0

姓名

黃啟軒(Chi-Hsuan Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

利用虛擬資料建構深度學習訓練集以實現凌空書寫應用
(Using Synthetic Data to Construct Deep Learning Datasets for Air-Writing Applications)

相關論文

★ 基於QT之跨平台無線心率分析系統實現	★ 網路電話之額外訊息傳輸機制
★ 針對與運動比賽精彩畫面相關串場效果之偵測	★ 植基於向量量化之視訊/影像內容驗證技術
★ 植基於串場效果偵測與內容分析之棒球比賽精華擷取系統	★ 以視覺特徵擷取為基礎之影像視訊內容認證技術
★ 使用動態背景補償以偵測與追蹤移動監控畫面之前景物	★ 應用於H.264/AVC視訊內容認證之適應式數位浮水印
★ 棒球比賽精華片段擷取分類系統	★ 利用H.264/AVC特徵之多攝影機即時追蹤系統
★ 利用隱式型態模式之高速公路前車偵測機制	★ 基於時間域與空間域特徵擷取之影片複製偵測機制
★ 結合數位浮水印與興趣區域位元率控制之車行視訊編碼	★ 應用於數位智權管理之H.264/AVC視訊加解密暨數位浮水印機制
★ 基於文字與主播偵測之新聞視訊分析系統	★ 植基於數位浮水印之H.264/AVC視訊內容驗證機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

凌空書寫是一項新穎的人機互動輸入方式，使用者自然地在空中書寫想要輸入於若干機器或設備的文字，藉由攝影機所拍攝的畫面中進行即時指尖偵測，將指尖座標點形成軌跡，進而辨識該軌跡所代表的文字。凌空書寫可做為如智慧型眼鏡的文字輸入方法，非接觸式的書寫方式也能使用於若干衛生敏感場域，例如降低在醫院的使用者因接觸設備而感染病毒的風險。本研究旨在提出基於深度學習之第一人稱以及第三人稱凌空書寫技術。由於深度學習技術的使用需仰賴大量標記資料，我們選擇以Unity3D建立訓練資料集，將所建構的手部虛擬模型合成於隨機影像或單一顏色背景中，藉此有效且快速地生成標記合成資料。我們利用手部模型的改變，模擬書寫過程中的旋轉以及移動來增加資料多樣性。在較複雜的第三人稱場景中，我們更加入隨機變換的人臉以及人體軀幹讓虛擬資料更接近真實情況。我們利用物件偵測模型偵測指尖位置以形成文字軌跡，並刪除書寫過程中所產生的冗餘筆跡，讓處理後筆跡更貼近文字本身。我們結合手寫字與印刷字形成綜合資料集訓練文字辨識模型，採用ResNeSt架構來辨識近5000個中文字。實驗結果顯示我們所產生的大量且精準標記合成資料可有效訓練模型，協助實現包括第一與第三人稱的即時凌空書寫。

摘要(英)

Air-writing is the practice of waving a finger in the air to write a character. Through the real-time fingertip detection from frames of captured videos, the trajectory of fingertip can be formed for character recognition. Air-writing may thus serve as a new human-computer interface to input texts for such facilities as smart glasses or computers requiring touchless operations. This research aims to propose deep-learning techniques for first-person and third-person air-writing. We first employed Unity3D to synthesize the hand model, which is superimposed onto randomly chosen images or single-color background to generate labeled data. The object detection model is trained accordingly to detect the fingertip positions. The trajectory can then be extracted to form a single-stroke character, and post-processing is applied to remove redundant connections within a character. A dataset containing handwritten and printed characters is built for training a classification model. The experimental results show that the large volume of high-quality labeled data can effectively train the model realizing the first- and third-person air writing.

關鍵字(中)

★ 指尖偵測
★ 凌空書寫
★ 合成資料
★ 文字辨識

關鍵字(英)

★ Fingertip detection
★ air-writing
★ synthetic datasets
★ character recognition

論文目次

Contents
論文摘要 VII
Abstract VIII
Contents IX
List of Figures X
List of Tables XII
Chapter 1 Introduction 1
1.1 Significance of the Research 1
1.2 Contribution of Research 2
1.3 The Organization of Thesis 3
Chapter 2 Related Work 4
2.1 Traditional methods 4
2.2 Deep Learning with depth information 5
2.3 Deep learning with RGB images 6
Chapter 3 Proposed Method 8
3.1 Egocentric-View Synthetic Fingertip Dataset 8
3.2 Third-Person View Synthetic Fingertip Dataset 12
3.3 The strategy of characters trajectory processing 15
3.4 Traditional Chinese character training dataset 17
3.5 Network Architecture 18
3.5.1 Backbone Network Design 19
3.5.2 Objection Detection Network Design 21
Chapter 4 Experimental Results 24
4.1 Development Environment 24
4.2 Experimental Results of Hand Detection 24
4.3 Experimental Results of Fingertip Detection 27
4.4 Experimental Results of Character Recognition 33
4.5 Air-writing Examples 37
Chapter 5 Conclusions and Future Work 38
5.1 Conclusions 38
5.2 Future Work 38
REFERENCES 40

參考文獻

[1] S. Ren, K. He, R. Girshick, and J. Sun. "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks." In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 1 June 2017.
[2] Vincent Girondel, Laurent Bonnaud, and Alice Caplier. "A human body analysis system." EURASIP journal on advances in signal processing 2006.
[3] Leonid Sigal, Stan Sclaroff, and Vassilis Athitsos. "Skin color-based video segmentation under time-varying illumination. " IEEE Trans. Pattern Anal. Mach. Intell 2004.
[4] Martin de La Gorce, David J. Fleet, and Nikos Paragios. "Model-Based 3D Hand Pose Estimation from Monocular Video." In IEEE Transactions on Pattern Analysis and Machine Intelligence 2011, 33(9), 1793-1805.
[5] Philip Krejov and Richard Bowden. "Multi-touchless: Real-time fingertip detection and tracking using geodesic maxima." 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai 2013, pp. 1–7.
[6] Hui Liang, Junsong Yuan, and Daniel Thalmann. "3D Fingertip and Palm Tracking in Depth Image" Sequences.MM 2012 - Proceedings of the 20th ACM International Conference on Multimedia 2012, pp. 785–788.
[7] Chia-Ping Chen, Yu-Ting Chen, Ping-Han Lee, Yu-Pao Tsai, and Shawmin Lei. "Real-time hand tracking on depth images." In Visual Communications and Image Processing (VCIP), 2011 IEEE 2011, pp. 1–4.
[8] J. S. Supancic, III, Grégory Rogez, Yi Yang, Jamie Shotton, and Deva Ramanan. "Depth-based hand pose estimation: Data, methods, and challenges." In The IEEE International Conference on Computer Vision (ICCV) 2015.
[9] Jonathan Tompson, Murphy Stein, Yann LeCun, and Ken Perlin. "Real-time continuous pose recovery of human hands using convolutional networks." ACM Transactions on Graphics (TOG) 2014, 33(5), 169.
[10] Lorenzo Baraldi, Francesco Paci, Giuseppe Serra, Luca Benini, and Rita Cucchiara. "Gesture recognition in ego-centric videos using dense trajectories and hand segmentation." In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference 2014, pp. 702–707.
[11] Aaron Wetzler, Ron Slossberg, and Ron Kimmel. "Rule of thumb: Deep derotation for improved fingertip detection." arXiv:1507.05726 2015.
[12] Sven Bambach,Stefan Lee, David J. Crandall, and Chen Yu. "Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions." In Proceedings of the IEEE International Conference on Computer Vision 2015, pp. 1949–1957.
[13] Chi Xu, Wendi Cai, Yongbo Li, Jun Zhou, and Longsheng. "Accurate Hand Detection from Single-Color Images by Reconstructing Hand Appearances." Sensors 2020, 20(1), 192.
[14] Xiaorui Liu, Yichao Huang, Xin Zhang, and Lianwen Jin. "Fingertip in the Eye: A cascaded CNN pipeline for the real-time fingertip detection in egocentric videos." arXiv:1511.02282 2015.
[15] Yichao Huang, Xiaorui Liu, Xin Zhang , and Lianwen Jin. "A Pointing Gesture Based Egocentric Interaction System: Dataset, Approach and Application." 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV 2016, pp. 370–377.
[16] Sohom Mukherjee, Sk. Arif Ahmed, Debi Prosad Dogra, Samarjit Kar, and Partha Pratim Roy. "Fingertip Detection and Tracking for Recognition of Air-Writing in Videos." arXiv:1809.03016 2018.
[17] Mohammad Mahmudul Alama, Mohammad Tariqul Islamb, and S. M. Mahbubur Rahmanc. "Unified Learning Approach for Hand Gesture Recognition and Fingertip Detection." arXiv:2101.02047 2021.
[18] A. Gupta, A. Vedaldi, and A. Zisserman, "Synthetic Data for Text Localisation in Natural Images." 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 2315-2324.
[19] Wang, Qi, et al. "Learning from synthetic data for crowd counting in the wild." Proceedings of the IEEE conference on computer vision and pattern recognition. 2019.
[20] Liu, Ziwei, et al. "Large-scale celebfaces attributes (celeba) dataset." Retrieved August 15 (2018): 2018.
[21] Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, and Hwalsuk Lee, "What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis." In Proceedings of the IEEE international conference on computer vision, 2019.
[21] Zhang, Hang, et al. "Resnest: Split-attention networks." arXiv preprint arXiv:2004.08955 (2020).
[22] Traditional Chinese Handwriting Dataset.
https://github.com/AI-FREE-Team/Traditional-Chinese-Handwriting-Dataset.Accessed: 2021-01-14.
[23] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV 2016, pp. 770–778.
[24] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. "Densely Connected Convolutional Networks." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI 2017, pp. 2261–2269.
[25] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA 2015, pp. 1–9.
[26] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. "Feature Pyramid Networks for Object Detection." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI 2017, pp. 936–944.
[27] Fanqing Lin and Tony. "Ego2Hands: A Dataset for Egocentric Two-hand Segmentation and Detection." arXiv:2011.07252.
[28] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. "Mask R-CNN." IEEE International Conference on Computer Vision (ICCV) 2017, pp. 2961–2969.
[29] O. Ronneberger, P. Fischer, and T. Brox. "U-Net: Convolutional Networks for Biomedical Image Segmentation." In MICCAI, 2015.
[30] V. Badrinarayanan, A. Kendall, and R. Cipolla. SegNet: "A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:2481–2495, 1 2017.
[31] W. Wang, K. Yu, J. Hugonot, P. Fua, and M. Salzmann. "Recurrent U-Net for Resource-Constrained Segmentation." In ICCV, 2019.
[32] W. Wu, C. Li, Z. Cheng, X. Zhang, L. Jin, "Yolse: Egocentric fingertip detection from single rgb images." in: Proceedings of the IEEE Int. Conf. on Computer Vision, Venice, Italy, 2017, pp. 623–630.
[33] P. Mishra and K. Sarawadekar. "Fingertips detection in egocentric video frames using deep neural networks." in: Proc.Int. Conf. on Image and Vision Computing New Zealand (IVCNZ), IEEE, Dunedin, New Zealand, 2019, pp.1–6.
[34] Henriques, J. F., Caseiro, R., Martins, P., and Batista, J. "High-speed tracking with kernelized correlation filters." IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3):583–596, 2015.
[35] Kalal, Z., Mikolajczyk, K., and Matas, J. "Tracking-learning-detection." IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7):1409–1422, 2012.
[36] Babenko, B., Yang, M.-H., and Belongie, S. "Robust object tracking with online multiple instance learning." IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1619–1632, 2011.
[37] Cohen Gregory, Afshar Saeed, Tapson Jonathan, and Schaik Andre Van. "EMNIST: Extending MNIST to handwritten letters." In 2017 International Joint Conference on Neural Networks (IJCNN), 2017.
[38] C. Chiang, R. Wang, and B. Chen, "Recognizing arbitrarily connected and superimposed handwritten numerals in intangible writing interfaces." Pattern Recognition, vol. 61, pp. 15–28, Jan. 2017.
[39] T. Chu and C. Su, "A Kinect-Based Handwritten Digit Recognition for TV Remote Controller." IEEE International Symposium on Intelligent Signal Processing and Communications Systems, pp.414-419, 2012.
[40] F. Huang, C. Su, and T. Chu, "Kinect-Based Bid-Air Handwritten Digit Recognition using Multiple Segments and Scaled Coding." IEEE International Symposium on Intelligent Signal Processing and Communications Systems, pp. 694-697, Nov. 2013.
[41] T. Murata and J. Shin, "Hand Gesture and Character Recognition Based on Kinect Sensor." International Journal of Distributed Sensor Networks, vol. 10, Jul. 2014, [online] Available: http://dx.doi.org/10.1155/2014/543278460.
[42] Saining Xie, Ross Girshick, Piotr Dollar, Zhuowen Tu, and Kaiming He. "Aggregated residual transformations for deep neural networks." arXiv preprint arXiv:1611.05431, 2016.
[43] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 779-788.

指導教授

蘇柏齊

審核日期

2021-8-3

推文