利用虛擬資料建構深度學習訓練集以實現凌空書寫應用;Using Synthetic Data to Construct Deep Learning Datasets for Air-Writing Applications

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/86608

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86608

題名:	利用虛擬資料建構深度學習訓練集以實現凌空書寫應用;Using Synthetic Data to Construct Deep Learning Datasets for Air-Writing Applications
作者:	黃啟軒;Huang, Chi-Hsuan
貢獻者:	資訊工程學系
關鍵詞:	指尖偵測;凌空書寫;合成資料;文字辨識;Fingertip detection;air-writing;synthetic datasets;character recognition
日期:	2021-08-03
上傳時間:	2021-12-07 13:01:17 (UTC+8)
出版者:	國立中央大學
摘要:	凌空書寫是一項新穎的人機互動輸入方式，使用者自然地在空中書寫想要輸入於若干機器或設備的文字，藉由攝影機所拍攝的畫面中進行即時指尖偵測，將指尖座標點形成軌跡，進而辨識該軌跡所代表的文字。凌空書寫可做為如智慧型眼鏡的文字輸入方法，非接觸式的書寫方式也能使用於若干衛生敏感場域，例如降低在醫院的使用者因接觸設備而感染病毒的風險。本研究旨在提出基於深度學習之第一人稱以及第三人稱凌空書寫技術。由於深度學習技術的使用需仰賴大量標記資料，我們選擇以Unity3D建立訓練資料集，將所建構的手部虛擬模型合成於隨機影像或單一顏色背景中，藉此有效且快速地生成標記合成資料。我們利用手部模型的改變，模擬書寫過程中的旋轉以及移動來增加資料多樣性。在較複雜的第三人稱場景中，我們更加入隨機變換的人臉以及人體軀幹讓虛擬資料更接近真實情況。我們利用物件偵測模型偵測指尖位置以形成文字軌跡，並刪除書寫過程中所產生的冗餘筆跡，讓處理後筆跡更貼近文字本身。我們結合手寫字與印刷字形成綜合資料集訓練文字辨識模型，採用ResNeSt架構來辨識近5000個中文字。實驗結果顯示我們所產生的大量且精準標記合成資料可有效訓練模型，協助實現包括第一與第三人稱的即時凌空書寫。;Air-writing is the practice of waving a finger in the air to write a character. Through the real-time fingertip detection from frames of captured videos, the trajectory of fingertip can be formed for character recognition. Air-writing may thus serve as a new human-computer interface to input texts for such facilities as smart glasses or computers requiring touchless operations. This research aims to propose deep-learning techniques for first-person and third-person air-writing. We first employed Unity3D to synthesize the hand model, which is superimposed onto randomly chosen images or single-color background to generate labeled data. The object detection model is trained accordingly to detect the fingertip positions. The trajectory can then be extracted to form a single-stroke character, and post-processing is applied to remove redundant connections within a character. A dataset containing handwritten and printed characters is built for training a classification model. The experimental results show that the large volume of high-quality labeled data can effectively train the model realizing the first- and third-person air writing.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	36	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....