博碩士論文 91522065 詳細資訊


姓名 余執彰(Chih-Chang Yu)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 應用塑模與非塑模方式作人類行為分析
(Human Behavior Analysis using Model-based and Model-free Approaches)
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 近年來,由於大容量儲存裝置以及網路多媒體技術的普及,多媒體檔案的數量有著巨幅的成長。為了能快速以及有效率的將這些多媒體資訊做有系統的分類與規劃,自動化影像分析系統勢在必行。近年來,專家學者們在人類行為分析上多有著墨,原因無他,正是因為人類的行為往往是視訊影像中令人注目的焦點。因此,若能詳盡分析畫面中人類的行為,將能提供此類分析系統極為豐富的資訊。在本論文中,我們深入的探討了辨識人類行為的兩種主要研究方法。此外,對於此兩類研究方法所遇到的問題,我們也提出了詳細的描述及解決方法。
  第一類方法是以模組化為基礎的研究方法。此類方法是將人體以關節點為區分,歸類為幾部分,例如頭部,身體,上臂,前臂,大腿,小腿等。在此類方法中,我們提出了一個階層式的架構,依循頭部,軀幹,四肢的順序,依序取出人體的這幾個重要部位。在擷取四肢的部份,我們進一步的提出了兩種擷取方式,分別是以直線為基礎的直觀比對方式以及以區塊為基礎的機率比對模型。第一種比對方法速度較快但無法處理肢體自我遮蔽的問題。為了解決自我遮蔽的問題,我們提出了一個機率式比對的方法來找出最可能的肢體姿勢。藉由此種方法,我們可以解決在人體模組化過程中常常發生的部分肢體遮蔽的問題。
  第二類方法是不採用模組化的研究方法。由於不取出人類肢體特徵,此類方法以分析整個前景物體的特徵為主要分析方式。本論文中我們只針對二值化後的前景來進行分析。一般而言,要比較兩個圖樣的類似程度,L1-norm 是很常使用的方式。然而,L1-norm的比對效率會隨著圖樣的特徵維度增加而降低。為了解決此問題,我們將行為辨識的問題轉變成直方圖比對的問題。如此一來我們可以利用許多直方圖的特性來解決比對效率不佳的問題。在本論文中,我們提出了一個創新的直方圖建構方式,將一個直方圖切割為多重解析度的直方圖。藉由不同解析度直方圖之間的關係,我們提出了一種加速比對過程的方法,在維持最高辨識率的前提下,將比對時間縮短為傳統L1-norm比對方法的9%。同時,我們也提出了一種自動辦別影像內容重要性的機制,讓不同的影像可針對其內容的重要部分建立出各具特色的多重解析度直方圖,來提高比對的辨識率。
  為了證明所提出方法的實用性與穩定性,我們分析了在單一視角下的幾種人類常有的行為進行實驗並在最後做出總結及未來的改進方向。
摘要(英) Recently, the development of video archives grows rapidly due to the advancement and popularization of multimedia internetworking technologies and high-capacity data storage devices. To efficiently summarize these multimedia contents, an automated video understanding system is highly required. When performing video understanding and summarization, researchers are most interested in analyzing human behaviors due to the high demanding of various applications. Hence, having a detailed description of human actions can provide rich information for these applications. In this dissertation, we make a broad study on human behavior analysis. Among them, we comprehensively study two main categories of approaches for human action recognition. Problems that may occur in both categories of approaches are fully addressed and solutions are proposed.
The first category is the model-based approach. For this type of approach, several body parts including head, torso, arms and legs are extracted to build a human body model. A hierarchical system is designed starting with head extraction, torso extraction, and following by limb extraction. In terms of limb extraction, two methods are proposed including line-based and patch based methods. The line-based method is simpler and faster. However, it cannot deal with the partial occlusion problem. Thus, we further propose the patch based method which adopts a probabilistic framework to find the best configuration of limbs. By using the patch based method, we can successfully tackle the partial occlusion problem, which usually happens on the limbs.
The second category is the model-free approach. This type of approach tries to recognize human actions via the overall video objects. In this dissertation, we propose a novel approach based on the human silhouettes. As we know, the L1-norm is a popular way to estimate the similarity between two patterns. However, the computation efficiency decreases because the L1-norm measurement is relevant to the dimension of feature. In our work, we convert the human action recognition problem to a histogram matching problem. By doing so, many characteristics of histogram matching can be employed to improve the recognition efficiency and accuracy. Moreover, a novel histogram matching method is proposed by creating multi-resolution histograms, whose bins at higher resolution levels are unevenly partitioned into its lower resolution levels. By utilizing this multi-resolution structure, the computation time will only be relevant to the partitioned histogram bins and the recognition time can be reduced to 9% of the original L1-norm measurement. Because of the reduced computational complexity, the proposed approach allows a real-time recognition system to be realized.
To demonstrate the feasibility and validity of the proposed approaches, several generic human actions, such as walking, running, jumping, waving hands, falling were performed under a monocular camera. With the success of the experimental results, we believe that the development of this framework can eventually be applied to all kinds of human centric event detection and behavior understanding systems.
關鍵字(中) ★ 行為分析
★ 動作辨識
★ 人體塑模
關鍵字(英) ★ Behavior analysis
★ Action recognition
★ Human body modeling
論文目次 ABSTRACT ii
CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES x
CHAPTER 1 INTRODUCTION 1
1.1 Motivation 1
1.2 Model-based Human Behavior Analysis 3
1.3 Model-free Human Behavior Analysis 4
1.4 Organization of the Dissertation 5
CHAPTER 2 HUMAN BODY MODELING: HEAD AND TORSO EXTRACTION 6
2.1 Related Works 6
2.2 System Overview 8
2.3 Background Extraction with Shadow Removal 10
2.4 Human Body Parts Decomposition 11
2.5 Head Region Detection 15
2.5.1 Head Acquisition 15
2.5.2 Kalman Filtering 18
2.5.3 Head Region Tracking 20
2.6 Torso Estimation 23
2.7 Conclusion 26
CHAPTER 3 HUMAN BODY MODELING: HEURISTIC AND PROBABLISTIC LIMB EXTRACTION 28
3.1 Related Works 28
3.2 Limb Ends Extraction and Tracking 30
3.2.1 Limb Ends Extraction 31
3.2.2 Limb Ends Tracking 32
3.3 Line-based Limb Modeling 34
3.4 Patch-based Limb Modeling 37
3.5 Experiments 42
3.5.1 Effectiveness on Spline Interpolation Using Different Lengths 42
3.5.2 Performance of the Line-based Approach 43
3.5.3 Performance of the Patch-based Approach 46
3.5.4 Human Body Modeling on Behavior Analysis Application 48
3.6 Conclusions 50
CHAPTER 4 HUMAN ACTION RECOGNITION: A MODEL-FREE APPRAOCH 52
4.1 Related Works 53
4.2 Average Motion Energy (AME) 54
4.3 Histogram Based Approach 55
4.3.1 Basic Characteristics of Histogram 55
4.3.2 Characteristic of Mutli-resolution Histogram 56
4.3.3 Motion Energy Histogram (MEH) 60
4.3.4 Multi-Resolution Motion Energy Histogram (MRMEH) 62
4.4 Efficient Action Recognition Using MRMEH 65
4.5 Time Complexity Analysis 66
4.6 Experiments 68
4.6.1 Dataset 68
4.6.2 Recognition 70
4.6.3 Recognition Accuracy Analysis 71
4.6.4 Recognition Efficiency Analysis 75
4.6.5 Real-time Action Recognition Application 76
4.7 Conclusion 78
CHAPTER 5 CONCLUSIONS 79
5.1 Concluding Remarks 79
5.2 Future Works 82
REFERENCES 84
APPENDIX 89
參考文獻 [1] D. M. Gavrila. “The visual analysis of human movement: A survey,” Computer Vision and Image Understanding: CVIU, vol.73 (1), pp. 82–98, 1999.
[2] P. Viola, M. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance. In Proc. 9th Int’l Conf. Computer Vision, pages 734–741, 2003.
[3] J. Grahn and H. Kjellstron, “Using SVM for efficient detection of human motion”, IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp.231- 238, 2005
[4] B. Li, H. Holstein, "Recognition of Human Periodic Motion — A Frequency Domain Approach," 16th International Conference on Pattern Recognition, Vol. 1, pp.10311, 2002
[5] Q. Meng, B. Li and H. Holstein, “Recognition of human periodic movements from unstructured information using a motion-based frequency domain approach”, Image and Vision Computing, vol. 24(8), pp. 795-809, 2006.
[6] G. Mori and J. Malik, “Estimating human body configurations using shape context matching.” In Proc. 7th European Conf. on Computer Vision, Vol.3, pp. 666-680, 2002.
[7] S. X. Ju, M. J. Black, and Y. Yacoob, “Cardboard people: A parameterized model of articulated image motion.” 2nd International Conference on Automatic Face and Gesture Recognition, 1996
[8] X. Lan and D.Hutternlocher, “A unified spatio-temporal articulated model for tracking.” Proc. IEEE Conf. Computer Vision and Pattern Recognition, vol.1, pp 722-729, 2004.
[9] G. Mori, X. Ren, A. Efros, and J. Malik. “Recovering human body configurations: Combining segmentation and recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 326-333, 2004
[10] Javed, M. Shah, “Tracking And Object Classification for Automated Surveillance,” Proc. 7th European Conf. Computer Vision, vol.2423, pp. 343-357, 2002
[11] D. Ramanan, D.A. Forsyth and A. Zisserman, “Tracking People by Learning Their Appearance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no.1, pp. 65-81, 2007
[12] D. Ramanan, D.A. Forsyth, and A. Zisserman, “Strike a Pose: Tracking People by Finding Stylized Poses,” Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 05), pp. 271-278, 2005.
[13] Chen, D. Y., Shih, S. W. and Liao, H. Y. Mark Laio, “Human Action Recognition Using 2-D Spatio-Temporal Templates,” International Conference on Multimedia and Expo, pp. 667-670 (2007)
[14] I. Haritaoglu, D. Harwood, and L.S. Davis, “Ghost: A human body part labeling system using silhouettes,” proc. of the 14th Intl. Conf. on Pattern Recognition, pp. 77 – 82., 1998.
[15] I. Haritaoglu, D. Harwood, and L. S. Davis, 2000. “W4: Real-Time Surveillance of People and Their Activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809-830.
[16] F. Remondino and A. Roditakis, “Human Figure Reconstruction and Modeling from Single Image or Monocular Video Sequence”, 4th International Conf. on 3D Digital Imaging and Modeling, pp.116-123, 2003.
[17] D. Hoffman and W. Richards, “Parts of recognition,” Cognition 18: 65-96, 1984.
[18] K. Siddiqi, and B. B. Kimia, “Parts of visual form: computational aspects,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, pp. 239–251,1995
[19] L. Piegl, W. Tiller, “The NURBS Book, ” Springer, ISBN 3-540-61545-8, 1997.
[20] R. E. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” Transactions of the ASME - Journal of Basic Engineering Vol. 82: pp. 35-45 , 1960
[21] G. Welch and G. Bishop, “An introduction to the Kalman filter,” SIGGRAPH 2001 course.
[22] Javed and M. Shah, “Tracking And Object Classification for Automated Surveillance,” Proc. of the 7th European Conf. Computer Vision, vol.2423, pp. 343-357, 2002
[23] Fujiyoshi and A. J. Lipton. “Real-Time Human Motion Analysis by Image Skeletonization,” Proc. of the 4th IEEE Workshop on Applications of Computer Vision, pp. 15-21, 1998.
[24] D.Y. Chen, H.Y. Mark Liao, and S.W. Shih, “Continuous Human Action Segmentation and Recognition Using a Spatio-Temporal Probabilistic Framework,” ism, 8th IEEE International Symposium on Multimedia, pp. 275-282,2006.
[25] P. Peursum, H. H. Bui, S. Venkatesh, and G. West,” Robust Recognition and Segmentation of Human Actions Using HMMs with Missing Observations,” EURASIP Journal on Applied Signal Processing vol.13, pp.2110–2126,2005.
[26] H.S. Chen, H. T. Chen, Y. W. Chen and S.Y. Lee, “Human Action Recognition Using Star Skeleton,” ACM International Workshop on Video Surveillance & Sensor Networks, 2006.
[27] J. Shotton, A. Blake, and R. Cipolla, “Contour-Based Learning for Object Detection,” IEEE International Conference on Computer Vision, pp.503-510, 2005.
[28] A. Rosenfeld and J. L. Pfaltz, “Sequential operations in digital picture processing,” J. ACM, vol. 13, pp. 471–496, Oct. 1966.
[29] U. Montanari, “A method for obtaining skeletons using a quasieuclidean distance,” J. ACM, vol. 15, pp. 600–624, Oct. 1968.
[30] G. Borgefors, “Distance transformations in digital images,” Comput. Vision, Graphics, Image Processing, vol. 34, pp. 344–371, 1986.
[31] S. Carbini, L. Delphin-Poulat, L. Perron and J.E. Viallet , “From a Wizard of Oz experiment to a real time speech and gesture multimodal interface”, Signal Processing, Vol. 86, Issue 12, pp. 3559-3577,2006
[32] C. C. Chen, J. W. Hsieh, Y. T. Hsu and C. Y. Huang, “Segmentation of Human Body Parts Using Deformable Triangulation,” Int’l conf. on Pattern Recognition, vol.1, pp.355-358, 2006
[33] Fujiyoshi and A. J. Lipton. “Real-Time Human Motion Analysis by Image Skeletonization,” Proc. of the 4th IEEE Workshop on Applications of Computer Vision, pp. 15-21, 1998.
[34] P.F. Felzenszwalb and D.P. Huttenlocher, “Efficient Matching of Pictorial Structures,” Proc. of the IEEE Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 66-73, 2000.
[35] L. Gorelick, M. Galun, E. Sharon, R. Basri, and A. Brandt, “Shape Representation and Classification Using the Poisson Equation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1997-2005, 2006.
[36] T. Horprasert, D. Harwood and L.S. Davis, “A Statistical Approach for Real-time Robust Background Subtraction and Shadow Detection”, IEEE ICCV'99 Frame-Rate Workshop, 1999
[37] J.S. Hu and T.M. Su, “Robust Background Subtraction with Shadow and Highlight Removal for Indoor Surveillance”, EURASIP Journal on Advances in Signal Processing Vol. 2007 Article ID 82931, 14 pages, 2007.
[38] N. Thome, D. Merad and S. Miguet. “Human body part labeling and tracking using graph matching theory”, IEEE Int’l Conf. on Advanced Video and Signal based Surveillance, pp 38-46, 2006.
[39] A. Elgammal, V.Shet, Y.Yacoob, and L.S. Davis, “Gesture recognition using a probabilistic framework for pose matching”, 7th International Conference on Control, Automation, Robotics and Vision (ICARCV 2002), vol. 2, pp. 763-769, 2002
[40] J. D. Shutler, M. G. Grant, M. S. Nixon, and J. N. Carter, “On a Large Sequence-Based Human Gait Database,” Proc. 4th International Conference on Recent Advances in Soft Computing, pp. 66-71, 2002
[41] Tilley, The measure of man and woman, New York: John Wiley and Sons, 2002.
[42] L. R. Rabiner. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, pp. 257-286, 1989
[43] Bobick and J. Davis. “The recognition of human movement using temporal templates”, PAMI, vol. 23, pp.257–267, 2001
[44] H. W. Lam, S. T. Lee, and D. Zhang, ”Human gait recognition by the fusion of motion and static spatio-temporal templates”, Pattern Recognition, vol. 40, no. 9, pp. 2563-2573, Sep. 2007.
[45] C. Schuldt, I. Laptev, and B. Caputo, “Recognizing human actions: a local SVM approach”, ICPR, vol.3, pp. 32–36, 2004.
[46] S. Carlsson and J. Sullivan, “Action recognition by shape matching to key frames”, Workshop on Models Versus Exemplars in Computer Vision, 2001
[47] A. Bobick and J. Davis. The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23(3) pp. 257–267, 2001
[48] A. Veeraraghavan, A. Roy-Chowdhury, and R. Chellappa.Role of shape and kinematics in human movement analysis. CVPR (2004): 730-737
[49] M. Blank, et al. Action as space-time shapes. ICCV (2005): 1395-1402
[50] J. Han and B. Bhanu. Human activity recognition in thermal infrared imagery. Workshop on Object Tracking and Classification Beyond the Visible Spectrum (2005)
[51] H. Lakany, “Extracting a diagnostic gait signature”, Pattern Recognition, Vol. 41(5), pp. 1644-1654, 2008.
[52] J. Davis , “Hierarchical Motion History Images for Recognizing Human Motion”, IEEE Workshop on Detection and Recognition of Events in Video, 2001.
[53] A. Bobick and J. Davis, “The Representation and Recognition of Action Using Temporal Templates”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 3, pp.257-267, 2001.
[54] L. Wang and D. Suter, “Informative Shape Representations for Human Action Recognition”, Int’l Conf on Pattern Recognition, vol. 2, pp. 1266-1269, 2006.
[55] B. C. Song, M. J. Kim, and J. B. Ra. A fast multiresolution feature matching algorithm for exhaustive search in large image databases. IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 5, pp. 673 - 678, May 2001
[56] F. D. Jou, K. C. Fan and Y. L. Chang. “Efficient matching of large-size histograms.” Pattern Recognition Letters , vol. 25, issue: 3, pp. 277-286, Feb. 2004.
[57] S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, K. W. Bowyer, "The HumanID Gait Challenge Problem: Data Sets, Performance, and Analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 2, pp. 162-177, Feb., 2005
[58] R. Cutler, L.S. Davis, "Robust Real-Time Periodic Motion Detection, Analysis, and Applications," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 781-796, Aug., 2000
[59] J. Han, B. Bhanu, "Individual Recognition Using Gait Energy Image," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 2, pp. 316-322, Feb., 2006
[60] C.C. Yu, F.D. Jou, C.C. Lee, K.C. Fan and Thomas C. Chuang, "Efficient Multi-resolution Histogram Matching for Fast Video Retrieval", Pattern Recognition Letters, vol.29, pp.1858-1867, 2008
[61] Xiaotao Zou, Bir Bhanu, "Human Activity Classification Based on Gait Energy Image and Coevolutionary Genetic Programming,", 18th International Conference on Pattern Recognition, Volume 3, pp.556-559, 2006
[62] G. Salton and M. J. McGill. “Introduction to modern information retrieval.” McGraw-Hill. ISBN 0070544840, 1983.
[63] J.P. Eakins, J.M. Boardman, and K. Shields. Retrieval of trade mark images by shape feature-the ARTISAN project. IEE Colloquium on Intelligent Image Databases, May 1996.
指導教授 范國清(Kuo-Chin Fan) 審核日期 2009-1-12
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡