基於Kinect的行為辨識與互動分析

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：39

、訪客IP：3.138.170.21

姓名

黃柏穎(Bo-Ying Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於Kinect的行為辨識與互動分析
(Kinect based Action Recognition and Human Interaction Analysis)

相關論文

★ 使用視位與語音生物特徵作即時線上身分辨識	★ 以影像為基礎之SMD包裝料帶對位系統
★ 手持式行動裝置內容偽變造偵測暨刪除內容資料復原的研究	★ 基於SIFT演算法進行車牌認證
★ 基於動態線性決策函數之區域圖樣特徵於人臉辨識應用	★ 基於GPU的SAR資料庫模擬器：SAR回波訊號與影像資料庫平行化架構 (PASSED)
★ 利用掌紋作個人身份之確認	★ 利用色彩統計與鏡頭運鏡方式作視訊索引
★ 利用欄位群聚特徵和四個方向相鄰樹作表格文件分類	★ 筆劃特徵用於離線中文字的辨認
★ 利用可調式區塊比對並結合多圖像資訊之影像運動向量估測	★ 彩色影像分析及其應用於色彩量化影像搜尋及人臉偵測
★ 中英文名片商標的擷取及辨識	★ 利用虛筆資訊特徵作中文簽名確認
★ 基於三角幾何學及顏色特徵作人臉偵測、人臉角度分類與人臉辨識	★ 一個以膚色為基礎之互補人臉偵測策略

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

人類的行為辨識是長久以來電腦視覺領域中一個重要的課題，並以此為基礎衍生了許多技術與相關應用。本篇論文的主要目標是設計一個具有泛用性質的人體行為辨識之分類演算法，這個方法可以透過預先收集的動作資料庫，進行含有時間資訊的動作特徵擷取，並將取得的特徵以本論文提出的動作森林 (Action Forests, AF) 模型進行訓練，使其適用於三維空間的骨架特徵。藉由深度攝影機的輔助，本論文能成功在不限制背景與攝影機位置的情境下實時的進行分類與投票，並獲取最終的行為分類結果。
在實驗的部份，我們藉由Kinect設備收集了幾種常見的單人行為與多人行為的深度影像與骨架資訊，丟入動作森林演算法進行訓練，並組合比較各種參數之效果，以及相對於原始演算法的改良程度。從結論來說，此一訓練模型下成功的適應了三維空間與人體骨架相關的特徵，並且能在即時運行 (30fps) 的條件下有效的分類出不同的人類行為模式。

摘要(英)

Human action recognition is one of the most important issues in computer vision. In this thesis, we plan to design a general approach to recognize human behavior. The approach is implemented based on a pre-collected action database, which is extracted by the depth images to form the sequences of skeletons, trained by the proposed Action Forests (AF) model. The proposed AF model extends the random forest algorithm by using different decision functions to fit the skeletal features in the 3D space. The system achieves real-time classification result without the limitation constrained by background and camera position.
Experiments were conducted on various examples to verify the validity of the proposed method. Several human behaviors with single-character actions and two-person interactions were collected to train the AF model. The skeletal features were retrieved from the depth sensor Kinect. In the experiments, we investigate the effects of several training parameters in AF. Experimental results demonstrate that the proposed AF model can learn the skeletal features efficiently and run at 30 frames per second on action classification with high accuracy.

關鍵字(中)

★ 動作森林
★ 行為識別
★ 隨機森林
★ 深度影像
★ Kinect

關鍵字(英)

★ Action Forest
★ Action Recognition
★ Random Forests
★ depth image
★ Kinect

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vi
第一章緒論 1
1.1 研究動機 1
1.2 研究目的 3
1.3 文獻探討 5
1.4 系統流程 12
1.5 論文架構 13
第二章背景知識 14
2.1 隨機森林 14
2.1.1 決策樹 15
2.1.2 自助抽樣法 15
2.1.3 決策函數的最佳化 16
2.1.4 訓練流程 17
2.2 Kinect 18
2.3 人類姿態估測 20
2.3.1 關節位置的回歸分析 21
2.3.2 回歸森林的訓練 24
第三章動作森林 27
3.1 行為的定義 27
3.2 骨架資訊 29
3.3 特徵選取 31
3.4 決策函數 32
3.4.1 單人用決策函數 33
3.4.2 多人用決策函數 34
3.5 權重校正與投票 35
3.6 綜合模型分類 36
第四章實驗結果 38
4.1 實驗環境與實驗資料集 38
4.2 模型參數調試 41
4.2.1 動作森林參數 41
4.2.2 歷史視窗投票 42
4.3 結果分析 43
4.3.1 動作分類 43
4.3.2 動作事件偵測 46
4.3.3 系統效能 48
第五章結論與未來研究方向 50
參考文獻 52

參考文獻

[1] D. Grest, J. Woetzel, and R. Koch, “Nonlinear Body Pose Estimation from Depth Images,” in Proc. of DAGM, Vol. 3663, pp. 285–292, 2005.
[2] Y. Zhu and K. Fujimura, “Constrained Optimization for Human Pose Estimation from Depth Sequences,” in Asian Conference on Computer Vision (ACCV), Vol. 4843, pp. 408-418, 2007.
[3] V. Ganapathi, C. Plagemann, D. Koller, and S. Thrun, “Real Time Motion Capture Using a Single Time-Of-Flight Camera,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp.1297-1304, 2010.
[4] C. Plagemann, V. Ganapathi, D. Koller, and S. Thrun, “Real-time Identiﬁcation and Localization of Body Parts from Depth Images,” in IEEE Conf. on Robotics and Automation (ICRA), pp. 3108-3113, 2010.
[5] A. Baak, M. Muller, G. Bharaj, H. P. Seidel, and C. Theobalt, “A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera,” in IEEE International Conference on Computer Vision (ICCV), pp1092-1099, 2011.
[6] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, “Real-Time Human Pose Recognition in Parts from Single Depth Images,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp.755-762, 2011.
[7] R. Girshick, J. Shotton, P. Kohli, A. Criminisi, and A. Fitzgibbon, “Efficient Regression of General-Activity Human Poses from Depth Images,” in IEEE International Conference on Computer Vision (ICCV), pp. 415-422, 2011.
[8] J. Gall, A. Yao, N. Razavi, L. Van Gool, and V. Lempitsky, “Hough Forests for Object Detection, Tracking, and Action Recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 33, Issue 11, pp. 2188-2202, 2011.
[9] M. Sun, P. Kohli, and J. Shotton, “Conditional Regression Forests for Human Pose Estimation,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3394-3401, 2012.
[10] A. Abramov, K. Pauwels, J. Papon, F. Worgotter, and B. Dellen, “Depth-supported Real-time Video Segmentation with the Kinect,” in IEEE Workshop on Applications of Computer Vision (WACV), pp. 457-464, 2012.
[11] G. Klein and D. W. Murray, “Parallel Tracking and Mapping for Small AR Workspaces,” in IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR), pp. 225-234, 2007.
[12] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, and A. Fitzgibbon, “KinectFusion: Real-time 3D Reconstruction and Interaction using a Moving Depth Camera,” in ACM Symposium on User Interface Software and Technology (UIST), pp. 559-568, 2011.
[13] H. Lim, S. O. Lee, J. H. Lee, M. H. Sung, Y.W. Cha, H. G. Kim and S. C. Ahn, “Putting Real-World Objects into Virtual World: Fast Automatic Creation of Animatable 3D models with a Consumer Depth Camera,” in International Symposium on Ubiquitous Virtual Reality (ISUVR), pp. 38-41, 2012.
[14] J. Tong, J. Zhou, L. Liu, Z. Pan, and H. Yan, “Scanning 3D Full Human Bodies using Kinects,” in IEEE Transactions on Visualization and Computer Graphics, Vol.18, Issue 4, pp.643-650, 2012.
[15] G. Ye, Y. Liu, N. Hasler, X. Ji, Q. Dai, and C. Theobalt, “Performance Capture of Interacting Characters with Handheld Kinects,” in Proc. of European Conference on Computer Vision (ECCV), pp. 828-841, 2012.
[16] A. Maimone, J. Bidwell, K. Peng, and H. Fuchs, “Enhanced Personal Autostereoscopic Telepresence System using Commodity Depth Cameras,” In Computers & Graphics, Vol. 36, Issue 7, pp. 791-807, 2012.
[17] R. Poppe, “A Survey on Vision-based Human Action Recognition”, in Image and Vision Computing, Vol. 28, Issue 6, pp. 976–990, 2010.
[18] C. Schuldt, I. Laptev and B. Caputo, “Recognizing Human Actions: a Local SVM Approach,” in Proc. of International Conference on Pattern Recognition (ICPR), Vol.3, pp. 32-36, 2004.
[19] K. Mikolajczyk and H. Uemura, “Action Recognition with Motion-Appearance Vocabulary Forest,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1-8, 2008.
[20] S. Ali and M. Shah, “Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning,” in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 32, No. 2, 2010.
[21] A. Yao, J. Gall, and L. Van Gool, “A Hough Transform-based Voting Framework for Action Recognition,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2061-2068, 2010.
[22] A. Verikas, A. Gelzinis and M. Bacauskiene, “Mining Data with Random Forests:A Survey and Results of new Tests,” in Pattern Recognition, Vol. 44, Issue 2, pp. 330–349, 2011.
[23] H. Wang, A. Klaser, C. Schmid and C. L. Liu, “Action Recognition by Dense Trajectories,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3169-3176, 2011.
[24] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 1, pp. 886-893, 2005.
[25] N. Dalal, B. Triggs and C. Schmid, “Human Detection using Oriented Histograms of Flow and Appearance,” in Proc. of European Conference on Computer Vision (ECCV), pp. 428-441, 2006.
[26] S. Sadanand and J. J. Corso, “Action Bank: A High-Level Representation of Activity in Video,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1234-1241, 2012.
[27] A. Klaser, M. Marszałek and C. Schmid, “A Spatio-Temporal Descriptor Based on 3D-Gradients,” in Proc. of British Machine Vision Conference (BMVC), pp. 275:1-10, 2008.
[28] Microsoft Kinect for Windows, http://www.microsoft.com/en-us/kinectforwindows/, 2011.

指導教授

范國清(Kuo-Chin Fan)

審核日期

2013-8-2

推文