|Abstract: ||近年來,在人機互動(Human Computer Interaction, HCI)之領域，對於如何更直覺地進行人與電腦的互動已逐漸引起關注。運用手勢是其中最主要的方式之一。然而，如何能準確地辨識出手勢仍然是個議題。本研究提出結合巴塔恰里雅距離(Bhattacharyya Divergence)與貝氏感測隱藏馬可夫模型(Bayesian Sensing Hidden Markov Model, BS-HMM)之手勢辨識系統。|
本論文提出的系統包含兩個階段，在第一階段，我們透過微軟Kinect獲得深度影像，並運用骨架資訊(Skeleton Information)進行手勢區域之定位。接著，從深度影像中擷取出方向梯度直方圖(Histogram of Oriented Normal, HOG)參數及四維方向常態直方圖(Histogram of Oriented Normal 4D, HON4D)做為特徵參數。在第二階段，對於每一種手勢，我們提出結合巴塔恰里雅距離之貝氏感測隱藏馬可夫模型(BDBS-HMM)，將時序參數視作機率分佈之序列進行模型訓練。在測試時，給定觀測參數，我們選取對數似然(Log-likelihood)值最高之類別(label)當作其辨識結果。
在實驗方面，我們運用MSRGesture3D資料庫，進行所提出之BDBS-HMM與傳統HMM及BS-HMM之比較。實驗結果顯示所提出之BDBS-HMM有相較其他方法有較好的表現。此外，對於真實環境下之資料，我們也建立了一套演示系統，使用者只需站在Kinect前，即可運用所提出之系統進行手勢辨識。;In recent years, the research of human computer interaction (HCI) has seen an increasing interest to propose several natural ways to interact between human and computer. One of them is using a hand gestures. Many approaches have been made to address the problem of hand gesture recognition. However, the accurate recognition of gestures still remains as a big issue. The aim of this research is to develop a system that naturally recognizes common hand gestures. This proposed system does not require the user to wear any sensors. This research includes the implementation of Bayesian sensing hidden markov model (BS-HMM), the embedding of Bhattacharyya divergence in BS-HMM (BDBS-HMM), and the development of real-world hand gesture recognition system.
The proposed system consists of two stages. First, real time depth video is captured by Microsoft Kinect, and then, hand region is obtained from that video by tracking the position of the hand using skeleton information. After hand region is obtained, two features descriptors are extracted from the segmented hand region, i.e. Histogram of Oriented Normal 4D (HON4D) and Histogram of Oriented Gradient (HOG). In the second stage, for each class of hand gesture, BDBS-HMM is trained by measuring the observation data point as a sequence of distribution. Finally, the proposed system calculates the log-likelihood for each BDBS-HMM then assigns the corresponding label to data that has the highest probability.
We compared the proposed BDBS-HMM to traditional HMM and BS-HMM in MSRGesture3D database, which is consisted of 12 classes of hand gestures. Experimental results showed that the proposed BDBS-HMM yields better performance than the baseline method. For real-world application, a user must stand in front of Microsoft Kinect while making the gestures, then gesture’s label is displayed at monitor.