博碩士論文 108521035 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:75 、訪客IP:3.145.98.49
姓名 羅翊甄(Yi-Jhen Luo)  查詢紙本館藏   畢業系所 電機工程學系
論文名稱 具有細化模塊和特殊損失函數的單鏡頭 3D人體姿態估計
(Monocular Based 3D Human Pose Estimation with Refinement Block and Special Loss Function)
相關論文
★ 即時的SIFT特徵點擷取之低記憶體硬體設計★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2024-8-31以後開放)
摘要(中) 近年來,隨著GPU 運算能力的發展以及各種演算法的發展,深度學習在許多任務上都有了顯著的進步,特別是基於影像的應用,也已經被廣泛的使用於我們的日常生活中,常見的應用有:人臉辨識解鎖、停車場的車牌辨識或是產品的瑕疵檢測都已經有了成熟的發展,這說明了深度學習所帶來的影響。
最近,隨著深度卷積神經網絡的發展 ,來自單鏡頭RGB圖像的3D人體姿態估計引起了許多關注。許多算法將2.5D heatmap視為3D座標,其X軸和Y軸對應圖像座標,Z軸對應相機座標。因此,通常採用相機矩陣或根骨架節點與相機之間的距離(ground-truth資訊)來將2.5D座標轉換到3D空間,這限制了在現實世界中的泛用性。2.5D heatmap忽略了2D和3D位置之間的轉換,這意味著它失去了一些轉換功能。
在本文中,我們提出了一個端到端框架,它可以利用RGB圖像中的上下文訊息直接從單鏡頭圖像預測在空間中的3D骨架。具體來說,我們使用依賴於 2D heatmap和 volumetric heatmap的多損失方法以及一個細化模塊來定位相對於根節點的 3D 人體姿態。我們的方法將 2D heatmap和volumetric heatmap作為計算損失的特 徵,並結合相對的3D位置的損失來生成總損失。該模型可以聯合學習2Dheatmap特徵和3D位置,並專注於相機座標中的相對根骨架點的3D位置。實驗結果表明,我們的模型可以在 Human3.6M上很好地預測相對根節點的3D人體姿態。
摘要(英) In recent years, with the development of GPU computing power and the development of various algorithms, deep learning has made significant progress in many tasks. In particular, image-based applications have also been widely used in our daily life. Common applications are: face recognition unlocking, license plate recognition in parking lots, or product defect detection have all matured, which illustrates the impact of deep learning.
Recently, 3D human pose estimation (HPE) from a monocular RGB image has attracted much attention following the success of a deep convolution neural network. Many algorithms take 2.5D heatmaps as the 3D coordinate, whose X axis and Y axis are corresponding to image coordinate and Z axis corresponding to camera coordinate. Therefore, the camera matrix or the distance between root skeleton and camera (the ground-truth information) is usually adopted to transform the 2.5D coordinate to 3D space, which limits the applicability in real world. 2.5D heatmaps ignore the conversion between 2D and 3D positions which means it loses some conversion feature.
In this paper, we present an end-to-end framework which can utilize the contextual information in RGB image to directly predict 3D space skeleton from a monocular image. Specifically, we use the multi-loss method that depends on 2D heatmaps and volumetric heatmaps and a refinement block to locate root-relative 3D human pose. Our approach takes 2D heatmaps and volumetric heatmaps as features to compute loss and combine the loss from relative 3D location to generate the total loss. The model can learn the 2D heatmap feature and 3D location jointly and focus on the root-relative 3D position in the camera coordinate. And the experimental result shows that our model can predict relative 3D human pose well on Human3.6M.
關鍵字(中) ★ 3D人體姿態估計
★ 單鏡頭人體姿態估計
★ 卷積神經網路
關鍵字(英) ★ 3D human pose estimation
★ Monocular based human pose estimation
★ Convolution neural network
論文目次 摘要 I
ABSTRACT II
1. 序論 1
1.1. 研究背景與動機 1
1.2. 研究 方向與研究貢獻 3
1.3. 論文架構 5
2. 文獻探討 6
2.1. 2D人體姿態估計 6
2.2. 單視角 3D 人體姿態估計 9
2.3. 多視角 3D 人體姿態估計 13
3. 網路模型設計與實驗 15
3.1. 神經網路架構 15
3.2. VOLUMETRIC HEATMAP 18
3.3. REFINEMENT BLOCK 19
3.4. 組合式的損失函數 21
4. 實驗結果與討論 24
4.1. HUMAN3.6M資料集 24
4.2. 訓練與實作細節 25
4.3. 模型比較 27
4.4. 消融實驗 30
5. 結論 34
參考文獻 35
參考文獻 [1] Loren Arthur Schwarz, Diana Mateus, Nassir Navab, “Discriminative Human Full-Body Pose Estimation from Wearable Inertial Sensor Data,” in Modelling the Physiological Human. Springer, 2009, pp. 159–172.
[2] Matthias Dantone, Juergen Gall, Christian Leistner, Luc Van Gool, “Human Pose Estimation Using Body Parts Dependent Joint Regressors,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 3041-3048
[3] B. Ren, M. Liu, R. Ding, and H. Liu, ‘‘A survey on 3D skeleton-based action recognition using learning method,’’ 2020, arXiv:2002.05907. [Online]. Available: https://arxiv.org/abs/2002.05907
[4] J. Hayakawa and B. Dariush, “Recognition and 3D Localization of Pedestrian Actions from Monocular Video,” 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), 2020, pp. 1-7, doi: 10.1109/ITSC45102.2020.9294551.
[5] Yousif Al Mashhadany, Sameer Algburi, Mohammed Abdulteef Jasim, Ali Qusay Khalaf, Ibrahem Basem, “Human-Robot Arm Interaction Based on Electromyography Signal,” 2021 14th International Conference on Developments in eSystems Engineering (DeSE), pp.475-480, 2021.
[6] Lewis Bridgeman, Marco Volino, Jean-Yves Guillemaut, and Adrian Hilton. “Multi-person 3d pose estimation and tracking in sports,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2019.
[7] Alexander Toshev, Christian Szegedy, “DeepPose: Human Pose Estimation via Deep Neural Networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1653-1660
[8] A. Krizhevsky, I. Sutskever, and G. Hinton. “ImageNet classification with deep convolutional neural networks,” In NIPS, 2012
[9] Newell, A., Yang, K., & Deng, J. (2016). “Stacked hourglass networks for human pose estimation,” In ECCV pp. 483–499.
[10] Yilun Chen et al., “Cascaded pyramid network for multi-person pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7103-7112
[11] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields,” In Computer Vision and Pattern Recognition (CVPR), 2017.
[12] Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. “Deep high-resolution representation learning for human pose estimation.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5693–5703, 2019.
[13] Adrian Bulat and Georgios Tzimiropoulos, “Human pose estimation via convolutional part heatmap regression,” In ECCV, volume 9911 of Lecture Notes in Computer Science, pages 717–732. Springer, 2016.
[14] Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis, “Coarse-to-fine volumetric prediction for single-image 3d human pose,” in (CVPR, 2017, pp. 7025-7034
[15] Bowen Shi et al, “Tiny-Hourglassnet: An efficient design for 3D human pose estimation,” in 2020 IEEE International Conference on Image Processing (ICIP), pages 1491–1495, 2020.
[16] Georgios Pavlakos, Xiaowei Zhou, and Kostas Daniilidis, “Ordinal Depth Supervision for 3D Human Pose Estimation,” in CVPR, 2018, pp. 7307-7316.
[17] Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee, “Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image,” in ICCV, 2019, pp. 10133-10142
[18] Julieta Martinez, Rayat Hossain, Javier Romero, and James J Little, “A simple yet effective baseline for 3d human pose estimation,” in ICCV, 2017, pp. 2640-2649
[19] Shichao Li et al., “Cascaded deep monocular 3D human pose estimation with evolutionary training data,” in CVPR, 2020, pp. 6173-6183
[20] Yang Li et al., “Geometry-Driven Self-Supervised Method for 3D Human Pose Estimation,” in Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 11442-11449.
[21] A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-to-end recovery of human shape and pose,” in CVPR, 2018, pp. 7122-7131
[22] Matthew Loper et al., “SMPL: A skinned multi-person linear model,” ACM Trans. Graph. (Proc. SIGGRAPH Asia), vol. 34, no. 6, pp. 248:1–248:16, Oct. 2015.
[23] K. Iskakov, E. Burkov, V. Lempitsky, and Y. Malkov, “Learnable triangulation of human pose,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 7718-7727
[24] Edoardo Remelli et al., “Lightweight Multi-View 3D Pose Estimation Through Camera-Disentangled Representation,” in CVPR, 2020, pp. 6040-6049
[25] Hanyue Tu, Chunyu Wang, and Wenjun Zeng. “Voxelpose: Towards multi-camera 3d human pose estimation in wild environment.” In ECCV, pages 197–212. Springer, 2020.
[26] Junting Dong, Wen Jiang, Qixing Huang, Hujun Bao, Xiaowei Zhou, “Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7792-7801
[27] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep Residual Learning for Image Recognition,” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[28] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE Trans. Pattern Anal. Machine Intell., vol. 36, no. 7, pp. 1325–1339, 2014.
[29] S. Park, J. Hwang, and N. Kwak, “3d human pose estimation using convolutional neural networks with 2d pose information,” In European Conference on Computer Vision (ECCV), pages 156–169. Springer, 2016
[30] C.-H. Chen and D. Ramanan, “3D human pose estimation = 2D pose estimation + matching,” in CVPR, 2017, pp. 7035-7043
[31] Xingyi Zhou et al., “Deep kinematic pose regression,” In ECCV Workshop on Geometry Meets Deep Learning, pages 186–201, 2016
[32] B. Xiaohan Nie, P. Wei, and S.-C. Zhu, “Monocular 3D human pose estimation by predicting depth on joints,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3447-3455
[33] B. Wandt and B. Rosenhahn, “RepNet: Weakly supervised training of an adversarial reprojection network for 3d human pose estimation,” in CVPR, 2019, pp. 7782–7791.
[34] Zhi Li, Xuan Wang, Fei Wang, and Peilin Jiang. “On boosting single-frame 3d human pose estimation via monocular videos,” in ICCV, October 2019.
[35] Dushyant Mehta et al., “VNect: real-time 3D human pose estimation with a single RGB camera,” ACM Trans. Graph., vol. 36, no. 4, July 2017.
[36] Xiaowei Zhou et al., “Monocap: Monocular human motion capture using a cnn coupled with a geometric prior,” in IEEE transactions on pattern analysis and machine intelligence 41(4): 901–914, 2018
[37] C. Luo, X. Chu, and A. L. Yuille, “Orinet: A fully convolutional network for 3d human pose estimation,” in British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3-6, 2018, page 92, 2018.
指導教授 蔡宗漢(Tsung-Han Tsai) 審核日期 2022-8-3
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明