dc.description.abstract | Recently, the development of video archives grows rapidly due to the advancement and popularization of multimedia internetworking technologies and high-capacity data storage devices. To efficiently summarize these multimedia contents, an automated video understanding system is highly required. When performing video understanding and summarization, researchers are most interested in analyzing human behaviors due to the high demanding of various applications. Hence, having a detailed description of human actions can provide rich information for these applications. In this dissertation, we make a broad study on human behavior analysis. Among them, we comprehensively study two main categories of approaches for human action recognition. Problems that may occur in both categories of approaches are fully addressed and solutions are proposed.
The first category is the model-based approach. For this type of approach, several body parts including head, torso, arms and legs are extracted to build a human body model. A hierarchical system is designed starting with head extraction, torso extraction, and following by limb extraction. In terms of limb extraction, two methods are proposed including line-based and patch based methods. The line-based method is simpler and faster. However, it cannot deal with the partial occlusion problem. Thus, we further propose the patch based method which adopts a probabilistic framework to find the best configuration of limbs. By using the patch based method, we can successfully tackle the partial occlusion problem, which usually happens on the limbs.
The second category is the model-free approach. This type of approach tries to recognize human actions via the overall video objects. In this dissertation, we propose a novel approach based on the human silhouettes. As we know, the L1-norm is a popular way to estimate the similarity between two patterns. However, the computation efficiency decreases because the L1-norm measurement is relevant to the dimension of feature. In our work, we convert the human action recognition problem to a histogram matching problem. By doing so, many characteristics of histogram matching can be employed to improve the recognition efficiency and accuracy. Moreover, a novel histogram matching method is proposed by creating multi-resolution histograms, whose bins at higher resolution levels are unevenly partitioned into its lower resolution levels. By utilizing this multi-resolution structure, the computation time will only be relevant to the partitioned histogram bins and the recognition time can be reduced to 9% of the original L1-norm measurement. Because of the reduced computational complexity, the proposed approach allows a real-time recognition system to be realized.
To demonstrate the feasibility and validity of the proposed approaches, several generic human actions, such as walking, running, jumping, waving hands, falling were performed under a monocular camera. With the success of the experimental results, we believe that the development of this framework can eventually be applied to all kinds of human centric event detection and behavior understanding systems.
| en_US |