基於巴塔恰里雅距離與貝氏感測隱藏馬可夫模型之手勢辨認研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：58

、訪客IP：3.19.28.64

姓名

何納桓(Ari Hernawan) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於巴塔恰里雅距離與貝氏感測隱藏馬可夫模型之手勢辨認研究
(Hand Gesture Recognition using Bhattacharyya Divergence Bayesian Sensing Hidden Markov Models)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來,在人機互動(Human Computer Interaction, HCI)之領域，對於如何更直覺地進行人與電腦的互動已逐漸引起關注。運用手勢是其中最主要的方式之一。然而，如何能準確地辨識出手勢仍然是個議題。本研究提出結合巴塔恰里雅距離(Bhattacharyya Divergence)與貝氏感測隱藏馬可夫模型(Bayesian Sensing Hidden Markov Model, BS-HMM)之手勢辨識系統。
本論文提出的系統包含兩個階段，在第一階段，我們透過微軟Kinect獲得深度影像，並運用骨架資訊(Skeleton Information)進行手勢區域之定位。接著，從深度影像中擷取出方向梯度直方圖(Histogram of Oriented Normal, HOG)參數及四維方向常態直方圖(Histogram of Oriented Normal 4D, HON4D)做為特徵參數。在第二階段，對於每一種手勢，我們提出結合巴塔恰里雅距離之貝氏感測隱藏馬可夫模型(BDBS-HMM)，將時序參數視作機率分佈之序列進行模型訓練。在測試時，給定觀測參數，我們選取對數似然(Log-likelihood)值最高之類別(label)當作其辨識結果。
在實驗方面，我們運用MSRGesture3D資料庫，進行所提出之BDBS-HMM與傳統HMM及BS-HMM之比較。實驗結果顯示所提出之BDBS-HMM有相較其他方法有較好的表現。此外，對於真實環境下之資料，我們也建立了一套演示系統，使用者只需站在Kinect前，即可運用所提出之系統進行手勢辨識。

摘要(英)

In recent years, the research of human computer interaction (HCI) has seen an increasing interest to propose several natural ways to interact between human and computer. One of them is using a hand gestures. Many approaches have been made to address the problem of hand gesture recognition. However, the accurate recognition of gestures still remains as a big issue. The aim of this research is to develop a system that naturally recognizes common hand gestures. This proposed system does not require the user to wear any sensors. This research includes the implementation of Bayesian sensing hidden markov model (BS-HMM), the embedding of Bhattacharyya divergence in BS-HMM (BDBS-HMM), and the development of real-world hand gesture recognition system.
The proposed system consists of two stages. First, real time depth video is captured by Microsoft Kinect, and then, hand region is obtained from that video by tracking the position of the hand using skeleton information. After hand region is obtained, two features descriptors are extracted from the segmented hand region, i.e. Histogram of Oriented Normal 4D (HON4D) and Histogram of Oriented Gradient (HOG). In the second stage, for each class of hand gesture, BDBS-HMM is trained by measuring the observation data point as a sequence of distribution. Finally, the proposed system calculates the log-likelihood for each BDBS-HMM then assigns the corresponding label to data that has the highest probability.
We compared the proposed BDBS-HMM to traditional HMM and BS-HMM in MSRGesture3D database, which is consisted of 12 classes of hand gestures. Experimental results showed that the proposed BDBS-HMM yields better performance than the baseline method. For real-world application, a user must stand in front of Microsoft Kinect while making the gestures, then gesture’s label is displayed at monitor.

關鍵字(中)

★ Hand Gesture Recognition
★ Bayesian Sensing
★ Hidden Markov Model

關鍵字(英)

★ Hand Gesture Recognition
★ Bayesian Sensing
★ Hidden Markov Model

論文目次

摘要 i
ABSTRACT ii
ACKNOWLEDGEMENT iii
TABLE OF CONTENT iv
LIST OF FIGURES vi
LIST OF TABLES vii
I. INTRODUCTION 1
1-1. Background 1
1-2. Aim of Thesis 4
1-3. Technology Constraints 4
1-4. Disposition 4
II. RELATED WORKS 6
2-1. Hand Gesture Recognition 6
2-2. Hidden Markov Models 7
2-3. Relevance Vector Machine 11
2-4. Bayesian Sensing Hidden Markov Models 13
2-5. Histogram of Oriented 4D Normal 15
2-6. Histogram of Oriented Gradient 16
2-7. Bhattacharyya Divergence 18
III. METHODOLOGY 19
3-1. System Overview 19
3-2. Hand Gesture Localization 22
3-3. Feature Extraction 23
3-3-1. Image Normalization 23
3-3-1. Histogram Equalization 24
3-3-2. Median Filter 24
3-3-3. Histogram Oriented of Gradient 25
3-3-4. Histogram Oriented 4D Normal 26
3-4. Bhattacharya divergence for Bayesian Sensing Hidden Markov Model 28
IV. EXPERIMENT RESULT 42
4-1. Dataset 42
4-2. Experimental Setup 45
4-3. Evaluation Method 46
4-4. Hand Gesture Recognition using pre-recorded Videos 47
4-4-1. Comparison Feature Extraction and Classification Approaches 48
4-3-2. Test of Robustness 58
4-5. Real-world Hand Gesture Recognition 59
4-5-1. Application Demo 59
V. CONCLUSION 61
BIBLIOGRAPHY 62

參考文獻

[1] D.-W. Lee, J.-M. Lim, J. Sunwoo, I.-Y. Cho, and C.-H. Lee, “Actual remote control: a universal remote control using hand motions on a virtual menu,” IEEE Trans. Consum. Electron., vol. 55, no. 3, pp. 1439–1446, Aug. 2009.
[2] A. Thayananthan, “Template-based pose estimation and tracking of 3d hand motion,” 2005.
[3] H. Heo, E. Lee, K. Park, C. Kim, and M. Whang, “A realistic game system using multi-modal user interfaces,” IEEE Trans. Consum. Electron., vol. 56, no. 3, pp. 1364–1372, Aug. 2010.
[4] X. Chen and M. Koskela, “Using appearance-based hand features for dynamic RGB-D gesture recognition,” 2014 22nd Int. Conf. Pattern Recognit., pp. 411–416, Aug. 2014.
[5] Q. Pu, S. Gupta, S. Gollakota, and S. Patel, “Whole-home gesture recognition using wireless signals,” in Proceedings of the 19th annual international conference on Mobile computing & networking - MobiCom ’13, 2013, pp. 27.
[6] R. Aoki, M. Ihara, a. Maeda, M. Kobayashi, and S. Kagami, “Expanding kinds of gestures for hierarchical menu selection by unicursal gesture interface,” IEEE Trans. Consum. Electron., vol. 57, no. 2, pp. 731–737, May 2011.
[7] Y. Han, “A low-cost visual motion data glove as an input device to interpret human hand gestures,” IEEE Trans. Consum. Electron., vol. 56, no. 2, pp. 501–509, May 2010.
[8] B. Ionescu, V. Suse, C. Gadea, B. Solomon, D. Ionescu, S. Islam, and M. Cordea, “Using a NIR camera for car gesture control,” vol. 12, no. 3, pp. 520–523, 2014.
[9] C. I. Penaloza, Y. Mae, F. F. Cuellar, M. Kojima, and T. Arai, “Brain machine interface system automation considering user preferences and error perception feedback,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1275–1281, Oct. 2014.
[10] L. De Miranda, H. Hornung, and M. C. Baranauskas, “Adjustable interactive rings for iDTV,” IEEE Trans. Consum. Electron., vol. 56, no. 3, pp. 1988–1996, Aug. 2010.
[11] G. Pavlakos, S. Theodorakis, V. Pitsikalis, A. Katsamanis, and P. Maragos, “Kinect-based multimodal gesture recognition using a two-pass fusion scheme,” in 2014 IEEE International Conference on Image Processing (ICIP), 2014, pp. 1495–1499.
[12] J. Suarez and R. R. Murphy, “Hand gesture recognition with depth images: A review,” 2012 IEEE RO-MAN 21st IEEE Int. Symp. Robot Hum. Interact. Commun., pp. 411–417, Sep. 2012.
[13] C. Wolf, G. W. Taylor, and F. Nebout, “Multi-scale deep learning for gesture detection and localization,” 2014 European Conference Computer Vision Workshop (ECCV Work.)., pp. 474-490, 2014.
[14] J. Alon, V. Athitsos, Q. Yuan, S. Member, S. Sclaroff, and S. Member, “A unified framework for gesture recognition and spatiotemporal gesture segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence., vol. 31, no. 9, pp. 1685–1699, 2009.
[15] V. Frati and D. Prattichizzo, “Using Kinect for hand tracking and rendering in wearable haptics,” 2011 IEEE World Haptics Conf., pp. 317–321, Jun. 2011.
[16] Y. Li, “Hand gesture recognition using kinect,” Softw. Eng. Serv. Sci. (ICSESS), pp. 196–199, 2012.
[17] D. Uebersax, J. Gall, M. Van den Bergh, and L. Van Gool, “Real-time sign language letter and word recognition from depth data,” 2011 IEEE Int. Conf. Comput. Vis. Work. (ICCV Work.), pp. 383–390, Nov. 2011.
[18] K. Fujimura, “Hand gesture recognition using depth data,” Sixth IEEE Int. Conf. Autom. Face Gesture Recognition, 2004. Proceedings., pp. 529–534, 2004.
[19] U. Neumann, “Real-time hand pose recognition using low-resolution depth Images,” 2006 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. - Vol. 2, vol. 2, pp. 1499–1505, 2006.
[20] J. Tompson, M. Stein, Y. Lecun, and K. E. N. Perlin, “Real-time continuous pose recovery of human hands using convolutional networks,” ACM Trans. on Graphic., vol. 33, no. 5, 2014.
[21] S. Park, S. Yu, J. Kim, S. Kim, and S. Lee, “3D hand tracking using kalman filter in depth space,” EURASIP J. Adv. Signal Process., vol. 2012, no. 1, p. 36, 2012.
[22] M. S. M. Asaari and S. A. Suandi, “Hand gesture tracking system using adaptive kalman filter,” 2010 10th Int. Conf. Intell. Syst. Des. Appl., pp. 166–171, Nov. 2010.
[23] C. Chen, M. Zhang, K. Qiu, and Z. Pan, “Real-time robust hand tracking based on camshift and motion velocity,” 2014 5th Int. Conf. Digit. Home, pp. 20–24, Nov. 2014.
[24] Y. Jang, “Gesture recognition using depth-based hand tracking for contactless controller application,” 2012 IEEE Int. Conf. Consum., pp. 297–298, 2012.
[25] A. Thayananthan, R. Navaratnam, P. H. S. Torr, and R. Cipolla, “Multivariate Relevance Vector Machines for Tracking,” 2006 European Converence on Computer Vision (ECCV)., pp. 124–138. 2006.
[26] C. Keskin, F. Kıraç, Y. Kara, and L. Akarun, “Real time hand pose estimation using depth sensors,” Consum. Depth Cameras., pp. 1228–1234, 2013.
[27] C.-P. Chen, Y.-T. Chen, P.-H. Lee, Y.-P. Tsai, and S. Lei, “Real-time hand tracking on depth images,” 2011 Vis. Commun. Image Process., no. 1, pp. 1–4, Nov. 2011.
[28] M.-B. Kaâniche and F. Brémond, “Recognizing gestures by learning local motion signatures of HOG descriptors.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2247–58, Nov. 2012.
[29] M. B. Kaaniche and F. Bremond, “Tracking HoG descriptors for gesture recognition,” 2009 Sixth IEEE Int. Conf. Adv. Video Signal Based Surveill., pp. 140–145, Sep. 2009.
[30] O. Oreifej and Z. Liu, “HON4D: Histogram of Oriented 4D normals for activity recognition from depth sequences,” 2013 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 716–723, Jun. 2013.
[31] A. Mumtaz, E. Coviello, G. R. G. Lanckriet, and A. B. Chan, “A scalable and accurate descriptor for dynamic textures using bag of system trees,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 4, pp. 697–712, Apr. 2015.
[32] A. Just and S. Marcel, “A comparative study of two state-of-the-art sequence processing techniques for hand gesture recognition,” Comput. Vis. Image Underst., vol. 113, no. 4, pp. 532–543, 2009.
[33] S. Kim, G. Park, S. Yim, S. Choi, and S. Choi, “Gesture-recognizing hand-held interface with vibrotactile feedback for 3D interaction,” IEEE Trans. Consum. Electron., vol. 55, no. 3, pp. 1169–1177, Aug. 2009.
[34] Christopher M. Bishop, "Pattern recognition and machine learning," Springer, 2006.
[35] L. Deng and X. Li, “Machine learning paradigms for speech recognition: an overview,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 21, no. 5, pp. 1060–1089, May 2013.
[36] J. T. Geiger, F. Weninger, J. F. Gemmeke, M. Wöllmer, B. Schuller, G. Rigoll, and S. Member, “Memory-enhanced neural networks and nmf for robust ASR,” IEEE/ACM Trans., Audio, Speech, and Language Processing., vol. 22, no. 6, pp. 1037–1046, 2014.
[37] Z. Li and R. Jarvis, “Real time hand gesture recognition using a range camera,” Australas. Conf. Robot. Autom., 2009.
[38] G. Saon, “Bayesian Sensing Hidden Markov Models,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 20, no. 1, pp. 43–54, Jan. 2012.
[39] K. Murphy, “HMM matlab toolbox.” MIT, 1998. Available : http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html
[40] D.-J. Kroon, “Kinect Matlab.” 2013. Available : http://www.mathworks.com/matlabcentral/fileexchange/30242-kinect-matlab.
[41] Wikipedia, “Body Language.” [Online]. Available: https://en.wikipedia.org/wiki/Body_language.
[42] N. Mohammadiha, S. Member, and A. Leijon, “Nonnegative hmm for babble noise derived from speech hmm : application to speech enhancement,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 21, no. 5, pp. 998–1011, 2013.
[43] B. Raj and R. M. Stern, “Missing-feature approaches in speech recognition ©,” no. September 2005, pp. 101–116.
[44] O. Aran and L. Akarun, “Multi-class classification strategies for fisher scores of gesture and sign sequences,” 2008 19th Int. Conf. Pattern Recognit., pp. 1–4, Dec. 2008.
[45] G. a. Ten Holt, A. J. Van Doorn, M. J. T. Reinders, E. a. Hendriks, and H. De Ridder, “Human-inspired search for redundancy in automatic sign language recognition,” ACM Trans. Appl. Percept., vol. 8, no. 2, pp. 1–15, Jan. 2011.
[46] S. C. W. Ong and S. Ranganath, “Automatic sign language analysis: a survey and the future beyond lexical meaning.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6, pp. 873–91, Jun. 2005.
[47] S. Gupta, D. Morris, S. N. Patel, and D. Tan, “SoundWave : Using the doppler effect to sense gestures,” pp. 4–7, 2012.
[48] M. Chen, G. Alregib, S. Member, and B. Juang, “Feature processing and modeling for 6D motion gesture recognition,” IEEE Trans. Multimedia., vol. 15, no. 3, pp. 561–571, 2013.
[49] S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE Trans., Pattern Analysis and Machine Intelligence., vol. 35, no. 1, pp. 221–231, 2013.
[50] S. España-Boquera, M. J. Castro-Bleda, J. Gorbe-Moya, and F. Zamora-Martinez, “Improving offline handwritten text recognition with hybrid HMM/ANN models.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 4, pp. 767–79, Apr. 2011.
[51] M. Bicego, C. Acosta-muñoz, and M. Orozco-alzate, “Classification of seismic volcanic signals using hidden-markov-model-based generative embeddings,” IEEE Trans. Geoscience and Remote Sensing., vol. 51, no. 6, pp. 3400–3409, 2013.
[52] M. J. F. Gales and K. Yu, “Canonical state models for automatic speech recognition,” pp. 9–12.
[53] Michael E. Tipping, “Sparse bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, no. 9/1/2001, pp. 211–244, 2001.
[54] N. Dalal, B. Triggs, and D. Europe, “Histograms of oriented gradients for human detection,” IEEE Conf. Computer Vision and Pattern Recognition (CVPR)., vol. 1, pp. 886-893. 2005.
[55] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.
[56] S. Belongie, J. Malik, and J. Puzicha, “Matching shapes,” Proc. Eighth IEEE Int. Conf. Comput. Vision. ICCV 2001, vol. 1, pp. 454–461, 2001.
[57] W. T. Freeman, “For computer games,” pp. 100–105, 1996.
[58] W. Freeman and M. Roth, “Orientation histograms for hand gesture recognition,” Int. Work. Autom., pp. 296–301, 1995.
[59] P. Viola and M. J. Jones, “Detecting pedestrians using patterns of motion and appearance,” Proc. Ninth IEEE Int. Conf. Comput. Vis., pp. 734–741 vol.2, 2003.
[60] A. Mohan, C. Papageorgiou, and T. Poggio, “Example-based object detection in images by components,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 4, pp. 349–361, Apr. 2001.
[61] J. R. Hershey and P. A. Olsen, “Variational bhattacharyya divergence for hidden markov models,” in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 4557–4560.
[62] M. Zhang, Q. Wang, Z. He, Y. Shen, and Y. Lin, “Bhattacharyya distance based kernel method for hyperspectral data multi-class classification,” 2010 IEEE Instrum. Meas. Technol. Conf. Proc., no. 1, pp. 629–632, 2010.
[63] T. Huang, G. Yang, and G. Tang, “A fast two-dimensional median filtering algorithm,” Acoust. Speech Signal, no. 1, 1979.
[64] X. Yang, C. Zhang, and Y. Tian, “Recognizing actions using depth motion maps-based histograms of oriented gradients,” Proc. 20th ACM Int. Conf. Multimed. - MM ’12, no. c, p. 1057, 2012.
[65] J. Wang, Z. Liu, J. Chorowski, Z. Chen, and Y. Wu, “Robust 3D action recognition with random occupancy patterns,” ECCV, pp. 872–885, 2012.
[66] A. Klaser, M. Marszalek, C. Schmid, A. S. D. Based, M. Everingham, C. Needham, R. F. Bmvc, and I. Grenoble, “A spatio-temporal descriptor based on 3D-gradients,” BMVC, 2008.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2015-8-4

推文