基於RGB無深度影像之中文空中手寫辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：46

、訪客IP：3.141.25.201

姓名

林士筆(Shih-Pi Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於RGB無深度影像之中文空中手寫辨識
(In-air Handwriting Chinese Character Recognition Base on RGB Image without Depth Information)

相關論文

★ 基於LSTM之中文空中手寫辨識	★ 基於物件偵測之多物件追蹤關聯策略
★ 結合跨尺度自注意力與分割混合層之輕量化分類網路

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

科技日新月異，人與電腦的溝通不再僅限於傳統的鍵盤輸入法，市面上已商品化的手寫輸入設備，如：手寫板或觸碰螢幕，這類裝置所取得的手寫軌跡有著穩定且密集的特性，以提供足夠特徵作為辨識依據。在空中手寫的發展史中，主要以英文字及阿拉伯數字為主，時至今日，擁有眾多人口使用的中文字辨識也逐漸受到重視，中文字體架構比起英文字及數字更為複雜，且在空中手寫情境下，要取得穩定的特徵相對困難。
過往的空中手部偵測及追蹤常使用具有深度資訊的硬體設備，像是Kinect設備內使用了兩顆紅外線攝影機來取得深度資訊，因此使其售價較高，基於上述原因，利用只有RGB資訊實現物件偵測與追蹤是近年來的趨勢，但使用RGB攝影機作為空中手寫的人機互動媒介將面臨兩個問題：首先，需要取得準確的手部偵測及穩定追蹤；此外，資料具有一筆劃完成的特性，這會使得取得的文字軌跡將同時具有實筆與虛筆，其增加了辨識的難度。
目前出色的RGB物件偵測方法皆建立於深度學習之上，而深度學習為數據驅動(data driven)的方法，除了需要給予大量的訓練資料外，資料處理也是深度學習中極其重要的一環。本論文透過自行錄製含有手部的影片，並整理、收集網路上已存在的相關手部資料集，建立出訓練核心模型的手部資料庫。在資料處理時加入了Effective Receptive Field(ERF)概念，將標準答案(ground truth)依比例放大並視為新的物件，其目的是為了增加偵測的穩健性。本論文使用YOLO v3作為核心神經網路模型，並在YOLO中額外加入Convolutional Recurrent Neural Network(CRNN)，將YOLO轉換為具有時序性的神經網路模型，以使追蹤穩定。
分析實驗結果得知，資料經過ERF的處理後，手部偵測可以更加穩健。經轉換後的YOLO，能提升手部追蹤的穩定性。最後，將所取得的手寫軌跡用數種中文辨識方法來實驗，正確辨識準確度達96.33%。

摘要(英)

As technology changes rapidly, Human-Computer Interaction(HCI) no longer being limited by keyboard. Existing handwriting products are provided sufficient feature to recognize handwriting trajectories on density and stability. For Chinese font, it is relatively difficult for machines to obtain stable trajectory comparing to English and numerals.
In the past, in-air hand detection and tracking often used the devices with depth information. For example, Kinect uses two infrared cameras to obtain depth information, which cause higher price on devices. Therefore, the use of RGB information with one camera to achieve object detection and tracking is a trend in recent years. The use of RGB camera as HCI media for in-air handwriting need to deal with accurate hand detection and stability tracking, and the handwriting trajectory has one stroke-finished attribute, which means that it will have both real stroke and virtual stroke, it increases the difficulty of recognition.
The hand database uses to build the model contains, self-recorded handwriting videos and the relevant hand data sets collected on the Internet. By adding the Multiple Receptive Field(MRF) in processing data, which scale the ground truth and regard the scaled as a new object, it increases the robustness of detection. This paper uses YOLO v3 as the core neural network model, and adds Convolutional Recurrent Neural Network(CRNN) to convert YOLO into a time-sequential neural network to stabilize tracking.
The analysis of the experimental results shows that the hand detection can be more robust after the data processed by the MRF. The converted YOLO improves the stability of hand tracking. Overall, using several Chinese character recognition methods, the accuracy of recognize in-air handwriting trajectory in Chinese characters is about 96.33%.

關鍵字(中)

★ RGB無深度影像
★ 手部偵測及追蹤
★ 中文空中手寫辨識

關鍵字(英)

★ RGB Image without Depth Information
★ Hand Detection and Tracking
★ In-air Handwriting Chinese Character Recognition

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vii
第一章緒論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 論文架構 2
第二章相關文獻 3
2.1 相關研究 3
2.2 卷積姿態機(Convolutional Pose Machines, CPM) 6
2.3 特徵金字塔網路(Feature Pyramid Networks, FPN) 7
2.4 You Only Look Once(YOLO) 8
2.5 卷積遞迴神經網路(Convolutional Recurrent Neural Network, CRNN) 11
2.6 形狀上下文(Shape Context) 12
第三章研究方法 13
3.1 系統架構 13
3.2 手部資料庫建立 14
3.2.1 線上手部資料庫轉換 14
3.2.2 手部標記 15
3.2.3 多視野域(Multiple Receptive Fields, MRF) 17
3.2.4 手部資料集介紹 18
3.3 模型訓練 19
3.3.1 MRF + YOLO v3 20
3.3.2 MRF + YOLO + CRNN 21
3.4 中文空中手寫辨識 23
3.4.1 手寫軌跡 23
3.4.2 中文辨識方法 24
第四章實驗結果 28
4.1 實驗建置環境 28
4.2 實驗說明 28
4.2.1 模型評估方法 28
4.2.2 中文空中手寫辨識 29
4.3 實驗數據 33
4.3.1 不同模型比較 33
4.3.2 中文辨識 35
第五章結論與未來展望 37
參考文獻 38

參考文獻

[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS′12 Proceedings of the 25th International Conference on Neural Information Processing Systems – Volume 1, pp. 1097-1105, 2012.
[2] Min Lin, Qiang Chen, and Shuicheng Yan, “Network In Network,” arXiv preprint arXiv:1312.4400, 2013.
[3] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep Residual Learning for Image Recognition,” arXiv preprint arXiv:1512.03385, 2015.
[4] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR), pp. 580-587, 2014.
[5] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, Vol. 104, pp. 154-171, 2013.
[6] Ross Girshick, “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision(ICCV), pp. 1440-1448, 2015.
[7] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, ”Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv preprint arXiv:1506.01497, 2013.
[8] Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh, “Convolutional Pose Machines,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 4724-4732, 2016.
[9] Tsung-Yi Lin, Piotr Doll´ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie, “Feature pyramid networks for object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2117-2125, 2017.
[10] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, “You only look once: Unified, real-time object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 779-788, 2016.
[11] Joseph Redmon, and Ali Farhadi, “YOLO9000: Better, faster, stronger,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 7263-7271, 2017.
[12] Common Objects in Context. [Accessed: 15-Jul-2018]. Available from: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/.
[13] Joseph Redmon, and Ali Farhadi, “YOLOv3: An Incremental Improvement,” arXiv preprint arXiv:1804.02767, 2018.
[14] Baoguang Shi, Xiang Bai, and Cong Yao, “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 2298-2304, 2017.
[15] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 4, pp. 509-522, Apr. 2002.
[16] Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh, “Hand keypoint detection in single images using multiview bootstrapping,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1145-1153, 2017.
[17] Hand Keypoint Dataset. [Accessed: 08-Apr-2018]. Available from: http://domedb.perception.cs.cmu.edu/handdb.html.
[18] VIVA Vision for Intelligent Vehicles and Applications. [Accessed: 08-Apr-2018]. Available from: http://cvrr.ucsd.edu/vivachallenge/index.php/hands/hand-detection/.
[19] YOLO mark. [Accessed: 20-Apr-2018]. Available from: https://github.com/AlexeyAB/Yolo_mark.
[20] 鄒佩珊, “空中手寫中文字辨識,” 國立中央大學資訊工程學系碩士論文, 2018.

指導教授

范國清謝君偉(Kuo-Chin Fan Jun-Wei Hsieh)

審核日期

2019-7-25

推文