關節連續熱圖之多任務手部骨架神經網路

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：34

、訪客IP：18.188.172.203

姓名

黃世安(Shih-An Huang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

關節連續熱圖之多任務手部骨架神經網路
(Multitask Learning on 3D Hand Pose Estimation with Continuous Joints Heatmap)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年深度學習算法配合GPU或其他卷機加速硬體的加速，深度神經網路在各種任務上都獲得了顯著的改進。從基本的影像前處理、影像切割技術、人臉辨識、語音辨識等，逐漸的取代了以往的傳統演算法，這說明了神經網路的興起已經帶動人工智慧的各項改革。
3D骨架的領域中，傳統演算法必須在身上綁上感測器或透過隨機森林演算法來預測關節點，但缺點就是需要額外設備或者隨機森林的準確度不夠，透過深度學習的方式可使用RGB或RGB-D相機且不須額外穿戴式設備便可對骨架進行預測，這促使近幾年有不少研究如何改善模型的準確度。
本論文使用RGB當作輸入並提出基於2D/3D HeatMap形式的多任務學習方式來訓練一單級3D手部骨架預測網路，僅需一個骨幹網路便可同時輸出2D/3D HeatMap，透過取出HeatMap上最大值的(x,y,z)即為座標，透過分享卷積網路的權重來避免各項任務的重複運算。我們認為同一根手指間是有連續的關係，有別於一般一張HeatMap只預測一個關節點，將其修改為一張圖預測5個關節點(即同一根手指預測在同一張熱圖中)，並將其作為特徵來分別預測左右手的3D HeatMap，在從3D HeatMap中取最大值即目標的(x, y, z)座標。由於手部大型資料集多半在實驗室蒐集，因此我們還提出了手部切割技術，透過改善基本的編碼-解碼架構，來將資料集的手切割出來，並與各式風景照結合，來訓練出一個更泛化的網路，而不侷限在資料集的背景上。

摘要(英)

In recent years, deep learning algorithms have been accelerated with GPUs or other volume acceleration hardware, and deep neural networks have gained significant improvements in various tasks. From basic image pre-processing, image cutting techniques, face recognition, voice recognition, etc., they are gradually replacing the traditional algorithms, which shows that the rise of neural networks has led to various reforms in artificial intelligence.
In the field of 3D hand pose estimation, traditional algorithms require sensors tied to the body or random forest algorithms to predict joints, but the drawback is that additional equipment is required or the accuracy of random forest is not sufficient.
We propose a multi-task learning approach based on 2D/3D HeatMap as input to train a single-level 3D hand skeleton prediction network, which only requires one backbone network to output 2D/3D HeatMap simultaneously. We believe that there is a continuous relationship between the same finger, so we modify it to predict 5 nodes in one HeatMap (i.e., the same finger is predicted in the same HeatMap), and use it as a feature to predict the 3D HeatMap of left and right hand separately, and take the maximum value of (x, y, z) coordinates of the target from the 3D HeatMap. Since large hand datasets are mostly collected in the laboratory, we also propose a hand-segmentation technique to improve the basic encoding and decoding architecture to segment out the hands of the dataset and combine them with various landscape photographs to train a more robust network without restricting to the context of the dataset.

關鍵字(中)

★ 手勢切割

關鍵字(英)

★ 3D Hand Pose Estimation

論文目次

目錄
摘要 II
ABSTRACT VII
致謝 VIII
1. 序論 1
1.1. 研究背景與動機 1
1.2. 論文架構 4
2. 手部切割模型與結果 4
2.1. 文獻探討 4
2.2. 手部切割資料集與虛擬資料集 7
2.3. 圖片前處理 10
2.4. 切割模型的設計 11
2.5. 手部切割訓練過程與實驗結果 12
3. 單級3D手部骨架訓練策略設計與結果 18
3.1. 文獻探討 18
3.2. 3D手部骨架資料集 21
3.3. 圖片前處理 24
3.4. 架構設計 24
3.5. 訓練過程與環境 27
3.6. 3D手部骨架預測結果 29
4. 結論 34
參考文獻 35

參考文獻

[1] J. Long, E. Shelhamer, and T. Darrell. Fully Convolutional Networks for Semantic Segmentation. CVPR, 2015.
[2] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015
[3] Christian Zimmermann, Thomas Brox. Learning to Estimate 3D Hand Pose from Single RGB Images. ICCV, 2017.
[4] A. Urooj and A. Borji. Analysis of hand segmentation in the wild. In IEEE Conference on Computer Vision and Pattern Recognition, pages 4710–4719, 2018.
[5] S. Bambach, S. Lee, D. Crandall, and C. Yu. Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. ICCV, 2015.
[6] J. Rehg, Y. Li, and Z. Ye. Delving into egocentric actions. 2015.
[7] K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. CVPR, 2016.
[8] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid Scene Parsing Network. CVPR, 2017.
[9] G. Lin, A. Milan, C. Shen, and I. Reid. Refinenet: Multipath refinement networks for high-resolution semantic segmentation. CVPR, 2017.
[10] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. TPAMI 2017.
[11] Kaiming He Georgia Gkioxari Piotr Dollar Ross Girshick. Mask R-CNN. ICCV 2017.
[12] Alexander Kirillov, Yuxin Wu, Kaiming He and Ross Girshick. PointRend: Image Segmentation as Rendering. CVPR 2020.
[13] Z. Tu and X. Bai. Auto-Context and Its Applications to HighLevel Vision Tasks and 3D Brain Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
[14] J. Shotton, M. Johnson, and R. Cipolla. Semantic texton forests for image categorization and segmentation. CVPR, 2008.
[15] R. P. Poudel, P. Lamata, and G. Montana. Recurrent fully convolutional neural networks for multi-slice mri cardiac segmentation. In Reconstruction, Segmentation, and Analysis of Medical Images, pages 83–94. Springer, 2016.
[16] Wei Wang, Kaicheng Yu, Joachim Hugonot, Pascal Fua and Mathieu Salzmann. Recurrent U-Net for Resource-Constrained Segmentation. ICCV 2019.
[17] Sakher Ghanem, Ashiq Imran, Vassilis Athitsos. Analysis of Hand Segmentation on Challenging Hand Over Face Scenario. ACM 2019.
[18] Matti Matilainen, Pekka Sangi, Jukka Holappa, Olli Silvén. OUHANDS database for hand detection and pose recognition. IPTA 2016.
[19] Yichen Qian, Weihong Deng, Jiani Hu. Unsupervised Face Normalization with Extreme Pose and Expression in the Wild. CVPR 2019.
[20] J. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13:600–612, 2004.
[21] Sergey Ioffe, Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML′15: Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37July 2015 Pages 448–456.
[22] Yuxin Wu, Kaiming He. Group Normalization. ECCV 2018.
[23] Hang Zhang, Han Zhang, Chenguang Wang and Junyuan Xie. Co-occurrent Features in Semantic Segmentation. CVPR 2019.
[24] Sharp, T., Keskin, C., Robertson, D., Taylor, J., Shotton, J., Kim, D., Rhemann, C., Leichter, I., Vinnikov, A., Wei, Y., et al.: Accurate, robust, and flexible realtime hand tracking. In: ACM Conference on Human Factors in Computing Systems (2015)Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. ICLR, 2015.
[25] Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T.K., Shotton, J.: Opening the black box: Hierarchical sampling optimization for estimating human hand pose. ICCV. 2015.
[26] Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM TOG (2014)
[27] Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view cnn to multi-view cnns. CVPR 2016.
[28] Moon, G., Ju, Y.C., Lee, K.M.: V2V-PoseNet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. CVPR ,2018.
[29] Zimmermann, C., Brox, T.: Learning to estimate 3D hand pose from single RGB images. ICCV ,2017.
[30] Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., Yang, Q.: 3D hand pose tracking and estimation using stereo matching. arXiv preprint arXiv:1610.07214 ,2016.
[31] Gyeongsik Moon , Shoou-I Yu , He Wen , Takaaki Shiratori , and Kyoung Mu Lee. InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image. ECCV 2021.
[32] Chen, L., Lin, S.Y., Xie, Y., Tang, H., Xue, Y., Xie, X., Lin, Y.Y., Fan, W.: Generating realistic training images based on tonality-alignment generative adversarial networks for hand pose estimation. arXiv preprint arXiv:1811.09916 ,2018.
[33] Yang, L., Yao, A.: Disentangling latent hands for image synthesis and pose estimation. CVPR ,2019.
[34] Spurr, A., Song, J., Park, S., Hilliges, O.: Cross-modal deep variational hand pose estimation. CVPR, 2018.

指導教授

蔡宗漢(Tsung-Han Tsai)

審核日期

2021-9-16

推文