隨著虛擬實境(virtual reality)和擴增實境(augmented reality) 技術設備的出現和不斷更新,對於人機交互系統的需求日益增加。而在人機交互過程中手扮演了重要的角色——手是我們和世界互動的重要方式:我們用手使用工具,我們用手演奏樂器,我們用手觸摸和做出手勢。空中手寫就是人機交互的一種形式。 空中手寫是指不使用其他手戴輔助設備,用手在空中寫字的過程。過往的空中手寫偵測及追蹤常使用具有深度資訊的硬體設備,像是Kinect設備使用一個红外線發射器和一個红外線攝影機來取得深度資訊,因此其價格較為昂貴。而僅利用RGB圖像來實現物件偵測與追蹤成爲了這一領域更爲普遍的追求。儘管近年來物件偵測和追蹤技術有了顯著提升,但手由於的形狀多變、以及手指尖的細小,對於手指尖的偵測和追蹤仍是一項難題。 目前表現優異的RGB物件偵測和追蹤方法皆是使用深度學習方法,而深度學習因爲是數據驅動(data driven)的方法,除了需要收集大量的訓練資料外,資料處理在深度學習的過程也極為重要。本論文透過自行錄製空中手寫的影片,並整理、收集網路上已存在的人體部位關鍵點資料集,建立出適用於本文的定位指尖方法的資料庫。本論文利用卷積神經網路學習圖片特徵以及圖片中肩部、手肘、手腕等人體部位關鍵點間的相互位置訊息,並以Convolutional Pose Machines (CPM)模型作爲核心神經網路模型,以此來達到對於手部和手指尖的檢測和追蹤。 實驗結果表明,本論文的方法在偵測速度上表現良好,達到了43幀/秒,但可能由於訓練資料過少,導致偵測準確率不足。所以還需要更多訓練資料來進行實驗,以對本文的方法進行驗證。 ;With the emergence of virtual and augmented reality, the need for the development of natural human-computer interaction (HCI) systems to replace the traditional HCI approaches is increasing rapidly. The hand plays an important role in the process of human-computer interaction. We use tools by hands. We play instrument by hands. We touch and make gestures by hands. Hand air-writing is just a form of human-computer interaction. Air-writing is the process of writing characters or words in free space using finger or hand movements without the aid of any handheld device. In the past, in-air hand detection and tracking often used the devices with depth information. For example, Kinect uses two infrared cameras to obtain depth information, which cause higher price on devices. Therefore, the use of RGB information with one camera to achieve object detection and tracking becomes a trend in recent years. Due to the variable shape of the hand and the tiny finger-tip, it still remains a challenge to detect and track finger-tip. Right now the remarkable method of the RGB detection method still builds on deep learning. Deep learning is a data driven method, which needs mass of data. Besides the mass of training data, data processing is also an important part of deep learning. The paper builds a dataset that applies to the method of locating finger-tip in this paper by recording video of air-writing and collecting human body key point database on the Internet. The paper learns image features and rich implicit spatial information between human body key points, such as should, elbow and wrist to achieve the goal of detecting and tracking hands. The paper uses Convolutional Pose Machines as a core neural network. The analysis of the experimental results show that the method in this paper performs well in detection speed, reaching 43 frames per second. But the detection accuracy is insufficient. It may be caused by the lack of training data. We need to search more data to conduct more experiments to verify the method of this paper.