彩色手套影像下基於 EANet 的手部姿態預測方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：96

、訪客IP：3.133.123.129

姓名

徐嘉彤(Chia-Tung Hsu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

彩色手套影像下基於 EANet 的手部姿態預測方法
(A Hand Pose Estimation Method Based on EANet with Colored Glove Images)

相關論文

★ 以Q-學習法為基礎之群體智慧演算法及其應用	★ 發展遲緩兒童之復健系統研製
★ 從認知風格角度比較教師評量與同儕互評之差異：從英語寫作到遊戲製作	★ 基於檢驗數值的糖尿病腎病變預測模型
★ 模糊類神經網路為架構之遙測影像分類器設計	★ 複合式群聚演算法
★ 身心障礙者輔具之研製	★ 指紋分類器之研究
★ 背光影像補償及色彩減量之研究	★ 類神經網路於營利事業所得稅選案之應用
★ 一個新的線上學習系統及其於稅務選案上之應用	★ 人眼追蹤系統及其於人機介面之應用
★ 結合群體智慧與自我組織映射圖的資料視覺化研究	★ 追瞳系統之研發於身障者之人機介面應用
★ 以類免疫系統為基礎之線上學習類神經模糊系統及其應用	★ 基因演算法於語音聲紋解攪拌之應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

台灣的聽障人口數超過 13 萬人，手語是這些人的主要溝通方式。對於手語翻譯以及手語辨識等應用，準確的手部姿態預測模型至關重要。然而，由於雙手的互動與手部的遮擋，此任務對單一鏡頭的 RGB 影像是一大挑戰。因此，本研究旨在提升雙手手語場景的手部姿態預測結果。
　　本論文提出了一種應用 Extract-and-adaptation network(EANet)與彩色手套的手部姿態預測方法，並針對彩色手套手語影像進行優化。我
們使用將資料集渲染成彩色手套的方式增加手指的資訊，並採用基於Transformer 架構的 EANet 進行模型訓練，再使用多種影像處理技術來優化手部關鍵點的預測結果。實驗結果顯示，該方法在彩色手套手語資料
集上完整偵測雙手的穩定性高於 Mediapipe 55%，亦在測試資料集中得到
比使用原始資料集訓練的 EANet 更好的結果。

摘要(英)

With over 130,000 hearing-impaired individuals in Taiwan, sign language serves as their primary mode of communication. Accurate hand pose estimation models are crucial for applications such as sign language translation and recognition. However, due to interactions between two hands and occlusions, this task poses a significant challenge for single RGB images. This study aims to enhance hand pose estimation in two-hand sign language scenarios.
This research proposes a hand pose estimation method using Extract-and-adaptation network (EANet) and colored gloves, optimized for sign language images with colored gloves. We enhance finger information by rendering the dataset into colored gloves and employ a Transformer-based EANet for model training. Additionally, multiple image processing techniques were employed to optimize the prediction result of hand keypoints. Experimental results demonstrate that our method achieves a 55% higher stability in detecting two hands on sign language datasets compared to Mediapipe and yields superior results on test datasets compared to EANet trained on the original dataset.

關鍵字(中)

★ 深度學習
★ 電腦視覺
★ 電腦圖學
★ 影像處理
★ 3D 手部姿態辨識

關鍵字(英)

★ Deep Learning
★ Computer Vision
★ Computer Graphics
★ Image Processing
★ 3D Hand Pose Estimation

論文目次

摘要 i
Abstract ii
誌謝 iii
目錄 iv
一、緒論 1
1.1 研究動機 .................................................................. 1
1.2 研究目的 .................................................................. 3
1.3 論文架構 .................................................................. 4
二、文獻回顧 5
2.1 手部姿態預測的應用 ................................................... 6
2.2 手部姿態預測之相關研究 ............................................. 7
2.2.1 基於感測器的手部姿態預測方法 ........................... 7
2.2.2 基於深度資訊的手部姿態預測方法 ........................ 8
2.2.3 基於 RGB 影像之手部姿態預測方法....................... 8
三、研究方法 11
3.1 資料集前處理 ............................................................ 11
3.1.1 資料集 ............................................................ 11
3.1.2 資料集渲染 ...................................................... 14
3.2 手部姿態預測 ............................................................ 18
3.3 手語影像前處理 ......................................................... 21
3.3.1 HSV 色彩空間下的多顏色遮罩提取........................ 21
3.3.2 去除雜訊 ......................................................... 22
3.3.3 HSV 色彩空間下的色彩線性轉換........................... 24
3.3.4 影像對比度增強 ................................................ 25
四、實驗設計與結果 27
4.1 手部遮蔽之手語資料集 ................................................ 27
4.1.1 資料集拍攝 ...................................................... 27
4.1.2 資料標註 ......................................................... 29
4.2 EANet 實驗數據 ......................................................... 30
4.2.1 實驗細節 ......................................................... 30
4.2.2 評估指標 ......................................................... 30
4.2.3 EANet 訓練結果 ................................................ 31
4.3 手語資料集實驗 ......................................................... 33
4.3.1 不同模型的實驗結果 .......................................... 33
4.3.2 雙手偵測實驗 ................................................... 36
4.3.3 影像處理實驗分析 ............................................. 38
4.4 彩色手套資料集在 State-of-the-Art 模型訓練結果................ 40
五、總結 41
5.1 結論 ........................................................................ 41
5.2 未來展望 .................................................................. 42
參考文獻 43
附錄 A 手語資料集 47
A.1 手套手語資料集 ......................................................... 47
A.2 赤手手語資料集 ......................................................... 51

參考文獻

[1] 衛生福利部統計處.“身心障礙統計專區.”(2021), [Online]. Available: https://dep.mohw.gov.tw/dos/cp-5224-62359-113.html (visited on 05/31/2024).
[2] 教育部國民及學前教育署.“十二年國民基本教育課程綱要語文領域─臺灣手語.” (2021), [Online]. Available: https://www.k12ea.gov.tw/Tw/Common/SinglePage?
filter=11C2C6C1-D64E-475E-916B-D20C83896343 (visited on 06/02/2024).
[3] F. Zhang, V. Bazarevsky, A. Vakunov, et al., “Mediapipe hands: On-device real-time hand tracking,” arXiv preprint arXiv:2006.10214, 2020.
[4] J. Park, D. S. Jung, G. Moon, and K. M. Lee, “Extract-and-adaptation network for 3d interacting hand mesh recovery,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4200–4209.
[5] A. Sinha, C. Choi, and K. Ramani, “Deephand: Robust hand pose estimation by completing a matrix imputed with deep features,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2016, pp. 4150–4158.
[6] Y. He, R. Yan, K. Fragkiadaki, and S.-I. Yu, “Epipolar transformers,” in Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2020, pp. 7779-7788.
[7] M. Nishiyama and K. Watanabe, “Wearable sensing glove with embedded hetero-core fiber-optic nerves for unconstrained hand motion capture,” IEEE Transactions on In
strumentation and Measurement, vol. 58, no. 12, pp. 3995–4000, 2009.
[8] Z. Shen, J. Yi, X. Li, et al., “A soft stretchable bending sensor and data glove applications,” Robotics and biomimetics, vol. 3, no. 1, p. 22, 2016.
[9] G. Moon, S.-I. Yu, H. Wen, T. Shiratori, and K. M. Lee, “Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image,” in Computer
Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, Springer, 2020, pp. 548–564.
[10] 全國特殊教育資訊網.“常用手語辭典手語六十基本手勢.”(2019),[Online].Available: https://special.moe.gov.tw/signlanguage/basis/detail/7c338ab6-87a9-46cc-a50f34b82b4dac8a (visited on 06/02/2024).
[11] Y.Jang,S.-T.Noh,H.J.Chang,T.-K.Kim,andW.Woo,“3dfingercape:Clickingaction and position estimation under self-occlusions in egocentric viewpoint,” IEEE Transac
tions on Visualization and Computer Graphics, vol. 21, no. 4, pp. 501–510, 2015.
[12] O. G. Guleryuz and C. Kaeser-Chen, “Fast lifting for 3d hand pose estimation in ar/vr applications,” in 2018 25th IEEE International Conference on Image Processing (ICIP),
IEEE, 2018, pp. 106–110.
[13] M.-Y. Wu, P.-W. Ting, Y.-H. Tang, E.-T. Chou, and L.-C. Fu, “Hand pose estimation in object-interaction based on deep learning for virtual reality applications,” Journal of
Visual Communication and Image Representation, vol. 70, p. 102802, 2020.
[14] Y. Che and Y. Qi, “Detection-guided 3d hand tracking for mobile ar applications,” in 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), IEEE,
2021, pp. 386–392.
[15] T.LeeandT.Hollerer,“Multithreaded hybrid feature tracking for markerless augmented
reality,” IEEEtransactionsonvisualizationandcomputergraphics,vol.15,no.3,pp.355-368, 2009.
[16] E.Ueda,Y.Matsumoto,M.Imai,andT.Ogasawara,“Ahand-poseestimationforvision based human interfaces,” IEEE Transactions on Industrial Electronics, vol. 50, no. 4,
pp. 676–684, 2003.
[17] F.Yin, X. Chai, and X. Chen, “Iterative reference driven metric learning for signer independent isolated sign language recognition,” in Computer Vision–ECCV 2016: 14th Eu
ropean Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, Springer, 2016, pp. 434–450.
[18] A.Markussen, M.R.Jakobsen, and K. Hornbæk, “Vulture: A mid-air word-gesture keyboard,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Sys
tems, 2014, pp. 1073–1082.
[19] H.J.Chang,G.Garcia-Hernando,D.Tang,andT.-K.Kim,“Spatio-temporalhoughforest for efficient detection–localisation–recognition of fingerwriting in egocentric camera,” Computer Vision and Image Understanding, vol. 148, pp. 87–96, 2016.
[20] D. Maji, S. Nagori, M. Mathew, and D. Poddar, “Yolo-pose: Enhancing yolo for multi
personposeestimationusingobjectkeypointsimilarityloss,”inProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition, 2022, pp. 2637–2646.
[21] Y.Xu,J.Zhang,Q.Zhang,andD.Tao,“Vitpose:Simplevision transformer baselines for human pose estimation,” Advances in Neural Information Processing Systems, vol. 35,
pp. 38571–38584, 2022.
[22] Z.Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, pp. 7291–7299.
[23] T. Sharp, C. Keskin, D. Robertson, et al., “Accurate, robust, and flexible real-time hand tracking,” in Proceedings of the 33rd annual ACM conference on human factors in computing systems, 2015, pp. 3633–3642.
[24] T.Simon,H.Joo,I.Matthews,andY.Sheikh,“Handkeypointdetection in single images
using multiview bootstrapping,” in Proceedings of the IEEE conference on Computer
Vision and Pattern Recognition, 2017, pp. 1145–1153.
[25] R. Y. Wang and J. Popović, “Real-time hand-tracking with a color glove,” ACM transactions on graphics (TOG), vol. 28, no. 3, pp. 1–8, 2009.
[26] C. Zimmermann and T. Brox, “Learning to estimate 3d hand pose from single rgb images,” in Proceedings of the IEEE international conference on computer vision, 2017,
pp. 4903–4911.
[27] A.Vaswani,N.Shazeer,N.Parmar,etal.,“Attentionisallyouneed,”Advancesinneural
information processing systems, vol. 30, 2017.
[28] A.Dosovitskiy,L.Beyer,A.Kolesnikov, et al., “An image is worth 16x16 words:Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[29] S. Hampali, S. D. Sarkar, M. Rad, and V. Lepetit, “Keypoint transformer: Solving joint identification in challenging hands and object interactions for accurate 3d pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11090–11100.
[30] M. Li, L. An, H. Zhang, et al., “Interacting attention graph for single image two-hand reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2761–2770.
[31] J. Romero, D. Tzionas, and M. J. Black, “Embodied hands: Modeling and capturing hands and bodies together,” arXiv preprint arXiv:2201.02610, 2022.
[32] N. Ravi, J. Reizenstein, D. Novotny, et al., “Accelerating 3d deep learning with pytorch3d,” arXiv:2007.08501, 2020.
[33] B.D.Team.“Blender4.1manual.”(2024), [Online]. Available: https://docs.blender.org/
manual/en/latest/copyright.html (visited on 06/05/2024).
[34] Scratchapixel. “The rasterization stage,” [Online]. Available: https://www.scratchapixel.
com/lessons/3d-basic-rendering/rasterization-practical-implementation/rasterizationstage.html (visited on 06/14/2024).
[35] Scratchapixel. “An overview of the rasterization algorithm,” [Online]. Available: https://
www.scratchapixel.com/lessons/3d-basic-rendering/rasterization-practical-implementation/
overview-rasterization-algorithm.html (visited on 06/14/2024).
[36] K.He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016,
pp. 770–778.
[37] C.-F. R. Chen, Q. Fan, and R. Panda, “Crossvit: Cross-attention multi-scale vision transformer for image classification,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 357–366.
[38] 教育部國民及學前教育署. “臺灣手語教材資源網.”(2022), [Online]. Available: https://jung-hsingchang.tw/twsl/movies.html (visited on 06/01/2024).
[39] K. Wada, Labelme: Image polygonal annotation with python, https://github.com/wkentaro/labelme, 2018.
[40] Google. “Hand landmarks detection guide.” (2024), [Online]. Available: https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker?hl=zh-tw (visited on 06/02/2024).
[41] A. Paszke, S. Gross, F. Massa, et al., “Pytorch: An imperative style, high-performance deeplearning library,” Advances in neuralinformation processingsystems, vol. 32, 2019.
[42] C.Ionescu,D.Papava,V.Olaru,andC.Sminchisescu,“Human3.6m:Largescaledatasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1325–1339, 2013.
[43] M.Andriluka,L.Pishchulin, P. Gehler, and B. Schiele, “2d human pose estimation: New benchmark and state of the art analysis,” in Proceedings of the IEEE Conference on
computer Vision and Pattern Recognition, 2014, pp. 3686–3693.
[44] V. U. of Wellington. “Nz sign language exercises.” (2010), [Online]. Available: https: //www.wgtn.ac.nz/llc/llc_resources/nzsl/ (visited on 07/20/2024).

指導教授

蘇木春(Mu-Chun Su)

審核日期

2024-8-12

推文