空中手寫場景下基於RGB無深度影像之手指尖偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：30

、訪客IP：18.218.106.172

姓名

曾其雷(Qi-Lei Zeng) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

空中手寫場景下基於RGB無深度影像之手指尖偵測

相關論文

★ 使用視位與語音生物特徵作即時線上身分辨識	★ 以影像為基礎之SMD包裝料帶對位系統
★ 手持式行動裝置內容偽變造偵測暨刪除內容資料復原的研究	★ 基於SIFT演算法進行車牌認證
★ 基於動態線性決策函數之區域圖樣特徵於人臉辨識應用	★ 基於GPU的SAR資料庫模擬器：SAR回波訊號與影像資料庫平行化架構 (PASSED)
★ 利用掌紋作個人身份之確認	★ 利用色彩統計與鏡頭運鏡方式作視訊索引
★ 利用欄位群聚特徵和四個方向相鄰樹作表格文件分類	★ 筆劃特徵用於離線中文字的辨認
★ 利用可調式區塊比對並結合多圖像資訊之影像運動向量估測	★ 彩色影像分析及其應用於色彩量化影像搜尋及人臉偵測
★ 中英文名片商標的擷取及辨識	★ 利用虛筆資訊特徵作中文簽名確認
★ 基於三角幾何學及顏色特徵作人臉偵測、人臉角度分類與人臉辨識	★ 一個以膚色為基礎之互補人臉偵測策略

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-31以後開放)

摘要(中)

隨著虛擬實境(virtual reality)和擴增實境(augmented reality) 技術設備的出現和不斷更新，對於人機交互系統的需求日益增加。而在人機交互過程中手扮演了重要的角色——手是我們和世界互動的重要方式：我們用手使用工具，我們用手演奏樂器，我們用手觸摸和做出手勢。空中手寫就是人機交互的一種形式。
空中手寫是指不使用其他手戴輔助設備，用手在空中寫字的過程。過往的空中手寫偵測及追蹤常使用具有深度資訊的硬體設備，像是Kinect設備使用一個红外線發射器和一個红外線攝影機來取得深度資訊，因此其價格較為昂貴。而僅利用RGB圖像來實現物件偵測與追蹤成爲了這一領域更爲普遍的追求。儘管近年來物件偵測和追蹤技術有了顯著提升，但手由於的形狀多變、以及手指尖的細小，對於手指尖的偵測和追蹤仍是一項難題。
目前表現優異的RGB物件偵測和追蹤方法皆是使用深度學習方法，而深度學習因爲是數據驅動(data driven)的方法，除了需要收集大量的訓練資料外，資料處理在深度學習的過程也極為重要。本論文透過自行錄製空中手寫的影片，並整理、收集網路上已存在的人體部位關鍵點資料集，建立出適用於本文的定位指尖方法的資料庫。本論文利用卷積神經網路學習圖片特徵以及圖片中肩部、手肘、手腕等人體部位關鍵點間的相互位置訊息，並以Convolutional Pose Machines (CPM)模型作爲核心神經網路模型，以此來達到對於手部和手指尖的檢測和追蹤。
實驗結果表明，本論文的方法在偵測速度上表現良好，達到了43幀/秒，但可能由於訓練資料過少，導致偵測準確率不足。所以還需要更多訓練資料來進行實驗，以對本文的方法進行驗證。

摘要(英)

With the emergence of virtual and augmented reality, the need for the development of natural human-computer interaction (HCI) systems to replace the traditional HCI approaches is increasing rapidly. The hand plays an important role in the process of human-computer interaction. We use tools by hands. We play instrument by hands. We touch and make gestures by hands. Hand air-writing is just a form of human-computer interaction.
Air-writing is the process of writing characters or words in free space using finger or hand movements without the aid of any handheld device. In the past, in-air hand detection and tracking often used the devices with depth information. For example, Kinect uses two infrared cameras to obtain depth information, which cause higher price on devices. Therefore, the use of RGB information with one camera to achieve object detection and tracking becomes a trend in recent years. Due to the variable shape of the hand and the tiny finger-tip, it still remains a challenge to detect and track finger-tip.
Right now the remarkable method of the RGB detection method still builds on deep learning. Deep learning is a data driven method, which needs mass of data. Besides the mass of training data, data processing is also an important part of deep learning. The paper builds a dataset that applies to the method of locating finger-tip in this paper by recording video of air-writing and collecting human body key point database on the Internet. The paper learns image features and rich implicit spatial information between human body key points, such as should, elbow and wrist to achieve the goal of detecting and tracking hands. The paper uses Convolutional Pose Machines as a core neural network.
The analysis of the experimental results show that the method in this paper performs well in detection speed, reaching 43 frames per second. But the detection accuracy is insufficient. It may be caused by the lack of training data. We need to search more data to conduct more experiments to verify the method of this paper.

關鍵字(中)

★ RGB無深度影像
★ 手指尖偵測及追蹤
★ 空中手寫辨識

關鍵字(英)

★ RGB Image without Depth Information
★ Finger-tip Detection and Tracking
★ In-air Handwriting Character Recognition

論文目次

摘要 i
Abstract iii
目錄 iv
圖目錄 vi
表目錄 vii
第一章緒論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 論文架構 2
第二章相關文獻 4
2.1 相關研究 4
2.1.1 追蹤方法 4
2.1.2 偵測方法 5
2.2 VGG網路 7
2.3 卷積姿態機(Convolutional Pose Machines, CPM) 8
2.4 手部關鍵點偵測 (Hand Keypoint Detection) 10
第三章研究方法 11
3.1 關鍵點資料庫建立 11
3.1.1 關鍵點線上資料庫 11
3.1.2 資料庫關鍵點標註 14
3.2 模型訓練 16
3.2.1 真值 (Ground Truth) 16
3.2.2 Convolutional Pose Machines 17
第四章實驗結果 21
4.1 實驗建置環境 21
4.2 實驗評估方法 21
4.3 資料集分佈及訓練參數 22
4.4 實驗數據 22
4.4.1 偵測準確率 22
4.4.2 模型執行速度 25
4.4.3 模型訓練時間 25
4.5 實驗結果分析 25
第五章結論與未來展望 27
參考文獻 28

參考文獻

[1]Y. Huang, X. Liu, X. Zhang and L. Jin, "A pointing gesture based egocentric interaction system: dataset, approach and application," 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, 2016, pp. 370-377, doi: 10.1109/CVPRW.2016.53.
[2]林士筆，基於RGB無深度影像之中文空中手寫辨識，國立中央大學資訊工程學系碩士論文，2019。
[3]T. Simon et al., "Hand keypoint detection in single images using multiview bootstrapping," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
[4]S. E. Wei, V. Ramakrishna, T. Kanade and Y. Sheikh, "Convolutional pose machines," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724-4732, 2016.
[5]W. W. Mayol, A. J. Davison, B. J. Tordoff, N. D. Molton and D. W. Murray, "Interaction between hand and wearable camera in 2D and 3D environments," In Proc. British Machine Vision Conference, 2004.
[6]T. Kurata, T. Okuma, M. Kourogi and K. Sakaue, "The hand mouse: GMM hand-color classification and mean shift tracking," In Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, pp. 119-124, IEEE, 2001.
[7]N. Wang, J. P. Shi, D. Y. Yeung and J. Jia, "Understanding and diagnosing visual tracking systems," In Proceedings of the IEEE International Conference on Computer Vision, pp. 3101-3109, 2015.
[8]J. F. Henriques, R. Caseiro, P. Martins and J. Batista, "High-speed tracking with kernelized correlation filters," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 3, pp. 583-596, 2014.
[9]Z. Kalal, K. Mikolajczyk and J. Matas, "Tracking-Learning-Detection," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 7, pp. 1409-1422, 2012.
[10]J. Tompson, M. Stein, Y. Lecun and K. Perlin, "Real-time continuous pose recovery of human hands using convolutional networks," ACM Transactions on Graphics (ToG), vol. 33, no. 5, pp. 1-10, 2014.
[11]L. Baraldi, F. Paci, G. Serra, L. Benini and R. Cucchiara, "Gesture recognition in ego-centric videos using dense trajectories and hand segmentation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 688-693, 2014.
[12]J. S. Supancic, G. Rogez, Y. Yang, J. Shotton and D. Ramanan, "Depth-based hand pose estimation: data, methods, and challenges," In Proceedings of the IEEE International Conference on Computer Vision, pp. 1868-1876, 2015.
[13]鄒佩珊，空中手寫中文字辨識，國立中央大學資訊工程學系碩士論文，2018。
[14]Y. Wu, J. W. Lim and M. H. Yang, "Online object tracking: A benchmark," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2411-2418, 2013.
[15]A. Dutta and A. Zisserman, “The VIA annotation software for images, audio and video,” In Proceedings of the 27th ACM International Conference on Multimedia (MM ’19), October 21–25, 2019, Nice, France. ACM, New York, NY, USA, 4 pages, https://doi.org/10.1145/3343031.3350535.
[16]C. Li and K. M. Kitani, "Model recommendation with virtual probes for egocentric hand detection," In Proceedings of the IEEE International Conference on Computer Vision, pp. 2624-2631, 2013.
[17]C. Li and K. M. Kitani, "Pixel-level hand detection in ego-centric videos," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570-3577, 2013.
[18]S. Bambach, S. Lee, D. J. Crandall and C. Yu, "Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions," In Proceedings of the IEEE International Conference on Computer Vision, pp. 1949-1957, 2015.
[19]A. Betancourt, P. Morerio, L. Marcenaro, M. Rauterberg and C. Regazzoni, "Filtering SVM frame-by-frame binary classification in a detection framework," In 2015 IEEE International Conference on Image Processing (ICIP), pp. 2552-2556, IEEE, 2015.
[20]W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu and A. C. Berg, "Ssd: Single shot multibox detector." In European Conference on Computer Vision, pp. 21-37, Springer, Cham, 2016.
[21]J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified, real-time object detection," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
[22]S. Karen and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[23]M. Andriluka, L. Pishchulin, P. Gehler and B. Schiele, "2d human pose estimation: New benchmark and state of the art analysis," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686-3693, 2014.
[24]S. Johnson and M. Everingham, "Learning effective human pose estimation from inaccurate annotation," In CVPR 2011, pp. 1465-1472, IEEE, 2011.
[25]B. Sapp and B. Taskar, "Modec: Multimodal decomposable models for human pose estimation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3674-3681, 2013.
[26]Hand Keypoint Dataset. [Accessed: 08-Apr-2018]. Available from: http://domedb.perception.cs.cmu.edu/handdb.html.
[27]Z. Cao, G. Hidalgo, T. Simon, S. E. Wei and Y. Sheikh, "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields," arXiv preprint arXiv:1812.08008, 2018.
[28]R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, 2014.
[29]R. Girshick, "Fast R-CNN," In Proceedings of the IEEE International Conference on Computer Vision, pp. 1440-1448, 2015.
[30]S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," In Advances in Neural Information Processing Systems, pp. 91-99, 2015.

指導教授

范國清(Guo-Qing Fan)

審核日期

2020-8-18

推文