科技日新月異,人與電腦的溝通不再僅限於傳統的鍵盤輸入法,市面上已商品化的手寫輸入設備,如:手寫板或觸碰螢幕,這類裝置所取得的手寫軌跡有著穩定且密集的特性,以提供足夠特徵作為辨識依據。在空中手寫的發展史中,主要以英文字及阿拉伯數字為主,時至今日,擁有眾多人口使用的中文字辨識也逐漸受到重視,中文字體架構比起英文字及數字更為複雜,且在空中手寫情境下,要取得穩定的特徵相對困難。 過往的空中手部偵測及追蹤常使用具有深度資訊的硬體設備,像是Kinect設備內使用了兩顆紅外線攝影機來取得深度資訊,因此使其售價較高,基於上述原因,利用只有RGB資訊實現物件偵測與追蹤是近年來的趨勢,但使用RGB攝影機作為空中手寫的人機互動媒介將面臨兩個問題:首先,需要取得準確的手部偵測及穩定追蹤;此外,資料具有一筆劃完成的特性,這會使得取得的文字軌跡將同時具有實筆與虛筆,其增加了辨識的難度。 目前出色的RGB物件偵測方法皆建立於深度學習之上,而深度學習為數據驅動(data driven)的方法,除了需要給予大量的訓練資料外,資料處理也是深度學習中極其重要的一環。本論文透過自行錄製含有手部的影片,並整理、收集網路上已存在的相關手部資料集,建立出訓練核心模型的手部資料庫。在資料處理時加入了Effective Receptive Field(ERF)概念,將標準答案(ground truth)依比例放大並視為新的物件,其目的是為了增加偵測的穩健性。本論文使用YOLO v3作為核心神經網路模型,並在YOLO中額外加入Convolutional Recurrent Neural Network(CRNN),將YOLO轉換為具有時序性的神經網路模型,以使追蹤穩定。 分析實驗結果得知,資料經過ERF的處理後,手部偵測可以更加穩健。經轉換後的YOLO,能提升手部追蹤的穩定性。最後,將所取得的手寫軌跡用數種中文辨識方法來實驗,正確辨識準確度達96.33%。 ;As technology changes rapidly, Human-Computer Interaction(HCI) no longer being limited by keyboard. Existing handwriting products are provided sufficient feature to recognize handwriting trajectories on density and stability. For Chinese font, it is relatively difficult for machines to obtain stable trajectory comparing to English and numerals. In the past, in-air hand detection and tracking often used the devices with depth information. For example, Kinect uses two infrared cameras to obtain depth information, which cause higher price on devices. Therefore, the use of RGB information with one camera to achieve object detection and tracking is a trend in recent years. The use of RGB camera as HCI media for in-air handwriting need to deal with accurate hand detection and stability tracking, and the handwriting trajectory has one stroke-finished attribute, which means that it will have both real stroke and virtual stroke, it increases the difficulty of recognition. The hand database uses to build the model contains, self-recorded handwriting videos and the relevant hand data sets collected on the Internet. By adding the Multiple Receptive Field(MRF) in processing data, which scale the ground truth and regard the scaled as a new object, it increases the robustness of detection. This paper uses YOLO v3 as the core neural network model, and adds Convolutional Recurrent Neural Network(CRNN) to convert YOLO into a time-sequential neural network to stabilize tracking. The analysis of the experimental results shows that the hand detection can be more robust after the data processed by the MRF. The converted YOLO improves the stability of hand tracking. Overall, using several Chinese character recognition methods, the accuracy of recognize in-air handwriting trajectory in Chinese characters is about 96.33%.