以作者查詢圖書館館藏 、以作者查詢臺灣博碩士 、以作者查詢全國書目 、勘誤回報 、線上人數:45 、訪客IP:3.148.105.131
姓名 維亞彥(Axel Yann Velez) 查詢紙本館藏 畢業系所 資訊工程學系 論文名稱 用於自然人機互動的可客製化手勢辨識系統設計
(Customizable Gesture Recognition System for Natural Human-Machine Interactions)相關論文 檔案 [Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載]
- 本電子論文使用權限為同意立即開放。
- 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
- 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
摘要(中) 本論文設計了一個手勢識別系統,可實現更自然的人機互動。該系統基於美國手語 (ASL) 字母的識別,並且追踪用戶手部的動作。它可以辨識靜態手勢(ASL 字母)、複合手勢(ASL 字母序列)以及動態手勢(ASL 字母與手部動作相結合)。我們還設計了對每個手勢的多種動 作支持,並提供使用者反饋。本系統的一個特色是允許用戶彈性添加自定義手勢和修改現 有手勢,通過簡單地將手勢與特定手部動作相結合或定義靜態手勢序列。本研究首先介紹 了開發此可定制手勢識別系統的動機,並確定了缺乏靈活性的現有系統所面臨的挑戰。隨 後,詳細描述了系統的設計和實現,採用了包括卷積神經網絡 (CNNs)、長短期記憶網絡 (LSTMs) 及動態時間規整 (DTW) 在內的先進機器學習技術。這些技術被整合成一個階層 式模組化的系統架構,能夠分類並辨識靜態、複合和動態手勢。系統實作階段包括資料的 收集,以及其預處理,包括手部標誌並轉化為可用於訓練模型的資料集。評估階段採用各 種指標(包括準確度、召回率和 F1 分數)來驗證系統所達到的高精度和強健性。最後, 我們探討了手勢的個人化和不同的人機互動方式,驗證了系統的易用性及其在真實世界中 的應用潛力。 摘要(英) This thesis presents a customizable hand gesture recognition system designed for natural human machine interactions. This system is based on the recognition of ASL (American Sign Language) letters as well as the tracking of the user’s hand movement. It can detect static signs (single ASL letter), composed gestures (sequence of ASL letters) and dynamic gestures (ASL letter combined with hand movement path). It is also designed to handle the various actions associated to them and providing feedback to the user. One of the key features of this system is its flexibility, allowing the user to add more gestures and easily modify existing ones by associating a sign and a movement path, or defining a sequence of static signs. The study begins with the motivations for developing a customizable gesture recognition system and outlines the challenges of existing systems that lack adaptability. It then details the design and implementation of the system, which leverages advanced machine learning techniques, including Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Dynamic Time Warping (DTW). These techniques are integrated into a modular framework, capable of distinguishing and recognizing the static, composed, and dynamic gestures. The implementation phase covers the data collection and creation, and the preprocessing pipeline, including the extraction of the hand landmarks and their transformation into usable data for training our models. The evaluation phase demonstrates the system’s high accuracy and robustness across various metrics, including accuracy, loss and F1-score. Finally, the gesture customization and the different human-machine interactions are addressed, demonstrating the ease of use of the system and its real-world applications. 關鍵字(中) ★ 電腦視覺
★ 辨識
★ 手勢關鍵字(英) ★ Computer Vision
★ Recognition
★ Gestures論文目次 Abstract ........................................................................................................................................... ii
Résumé ........................................................................................................................................... iii
摘要 ................................................................................................................................................. iv
Acknowledgements .......................................................................................................................... v
List of Figures .............................................................................................................................. viii
List of Tables ................................................................................................................................... x
1. Chapter 1. Introduction ........................................................................................................... 1
1.1 Background ................................................................................................................................. 1
1.2 Motivation ................................................................................................................................... 1
1.3 Objectives .................................................................................................................................... 1
1.4 Thesis Structure .......................................................................................................................... 2
2. Chapter 2: Literature review ................................................................................................... 3
2.1 Hand gesture recognition ........................................................................................................... 3
2.1.1 Sensor-based gesture recognition ............................................................................................................ 3
2.1.2 Vision-based gesture recognition ............................................................................................................ 4
2.2 Static Sign Recognition ............................................................................................................... 5
2.2.1 Traditional Computer Vision ................................................................................................................... 5
2.2.1 Convolutional Neural Networks .............................................................................................................. 6
2.2.2 Vision Transformers ................................................................................................................................ 6
2.3 Dynamic Gesture Recognition ................................................................................................... 7
2.3.1 Deep Learning ......................................................................................................................................... 7
2.3.2 Dynamic Time Warping .......................................................................................................................... 9
2.4 Synthesis .................................................................................................................................... 10
3. Chapter 3. System Design ..................................................................................................... 11
3.1 MIAT Methodology .................................................................................................................. 11
3.1.1 IDEF0 .................................................................................................................................................... 11
3.1.2 GRAFCET ............................................................................................................................................. 12
3.2 IDEF0 Modelization ................................................................................................................. 13
3.2.1 System Overview ................................................................................................................................... 13
3.2.2 A1: Hand detection ................................................................................................................................ 14
3.2.3 A2: Sign recognition .............................................................................................................................. 15
3.2.4 A3: Path recognition .............................................................................................................................. 16
3.2.5 A4: Gesture prediction ........................................................................................................................... 17
3.2.6 A5: System response ............................................................................................................................. 18
3.3 GRAFCET Modelization ......................................................................................................... 20
3.3.1 System overview .................................................................................................................................... 20
3.3.2 A1: Hand detection ................................................................................................................................ 21
3.3.3 A2: Sign recognition .............................................................................................................................. 22
3.3.4 A3: Path recognition .............................................................................................................................. 23
3.3.5 A4 Gesture identification ....................................................................................................................... 24
vii
3.3.6 A5: System response ............................................................................................................................. 26
4. Chapter 4: Implementation ................................................................................................... 28
4.1 Environment .............................................................................................................................. 28
4.2 Data Collection and Preprocessing ......................................................................................... 28
4.2.1 ASL sign dataset .................................................................................................................................... 28
4.2.2 Dynamic gestures dataset ...................................................................................................................... 30
4.3 Static Sign Detection ................................................................................................................. 32
4.3.1 Model Architecture ................................................................................................................................ 32
4.3.2 Training and Validation ......................................................................................................................... 33
4.4 Composed Sign Detection ......................................................................................................... 33
4.5 Dynamic Sign Detection ........................................................................................................... 35
4.5.1 Path Recognition Model Architecture ................................................................................................... 35
4.5.2 Training and Validation ......................................................................................................................... 36
4.5.3 Dynamic Time Warping extra-validation .............................................................................................. 36
4.6 Gesture segmentation ............................................................................................................... 37
4.6.1 Dealing with intentionality .................................................................................................................... 37
4.6.2 Intentionality model ............................................................................................................................... 38
4.6.3 Change points boosting ......................................................................................................................... 39
4.7 Systems integration ................................................................................................................... 41
5. Chapter 5. Results and interpretation ................................................................................... 42
5.1 Static sign recognition .............................................................................................................. 42
5.2 Dynamic sign recognition ......................................................................................................... 45
5.3 Gesture segmentation ............................................................................................................... 50
6. Chapter 6. Customization and Human-Machine Interactions ............................................ 56
6.1 Customization ........................................................................................................................... 56
6.1.1 Adding new hand movements ............................................................................................................... 56
6.1.2 Adding or modifying gestures ............................................................................................................... 57
6.2 HMI ............................................................................................................................................ 57
6.2.1 Action handling ..................................................................................................................................... 57
6.2.2 LLM ....................................................................................................................................................... 58
6.2.3 Voice synthesis and feedback ................................................................................................................ 58
7. Chapter 7: Conclusion ......................................................................................................... 59
7.1 Challenges and Limitations ..................................................................................................... 59
7.2 Future Work .............................................................................................................................. 59
7.3 Final thoughts ............................................................................................................................ 60
8. References .............................................................................................................................. 61參考文獻 [1] J. Shin, A. Matsuoka, M. A. M. Hasan and A. Y. Srizon, "American Sign Language
Alphabet Recognition by Extracting Feature from Hand Pose Estimation," Sensors, vol. 21,
no. 17, 2021.
[2] M. Oudah, A. Al-Naji and J. Chahl, "Hand Gesture Recognition Based on Computer
Vision: A Review of Techniques," Journal of Imaging, vol. 6, no. 8, 2020.
[3] M. S. Amin, S. T. H. Rizvi, A. Mazzei and L. Anselma, "Assistive Data Glove for Isolated
Static Postures Recognition in American Sign Language Using Neural Network,"
Electronics, vol. 12, no. 8, 2023.
[4] M. Králik and M. Šuppa, "Waveglove: Transformer-Based Hand Gesture Recognition
Using Multiple Inertial Sensors," in 2021 29th European Signal Processing Conference
(EUSIPCO), 2021.
[5] S. Kang, H. Kim, C. Park, Y. Sim, S. Lee and Y. Jung, "sEMG-Based Hand Gesture
Recognition Using Binarized Neural Network," Sensors, vol. 23, no. 3, 2023.
[6] J. Wu, P. Ren, B. Song, R. Zhang, C. Zhao and X. Zhang, "Data glove-based gesture
recognition using CNN-BiLSTM model with attention mechanism," PLOS ONE, vol. 18,
no. 11, pp. 1-22, 11 2023.
[7] MTheiler, «HOG scikit-image AngelaMerkel, CC BY-SA 4.0,» 2019. [En ligne].
Available: https://commons.wikimedia.org/wiki/File:HOG_scikit
image_AngelaMerkel.jpeg.
[8] A. R. Lubis, S. Prayudani, Y. Fatmi, Al-Khowarizmi, Julham and Y. Y. Lase, "Detection of
HOG Features on Tuberculosis X-Ray Results Using SVM and KNN," in 2021 2nd
International Conference on Innovative and Creative Information Technology (ICITech),
2021.
[9] R. Dhiman, P. Luthra and N. T. Singh, "Different Categories of Feature Extraction and
Machine Learning Classification Models Used for Hand Gesture Recognition Systems: A
Review," in 2023 IEEE 8th International Conference for Convergence in Technology
(I2CT), 2023.
[10] Advances in Computer Vision: Proceedings of the 2019 Computer Vision Conference
(CVC), vol. 1, Springer International Publishing, 2020, pp. 128-144.
[11] S. Loussaief and A. Abdelkrim, "Deep learning vs. bag of features in machine learning for
image classification," in 018 International Conference on Advanced Systems and Electric
Technologies (IC_ASET), 2018.
[12] Aphex34, "Typical CNN, CC BY-SA 4.0," 2015. [Online]. Available:
https://commons.wikimedia.org/wiki/File:Typical_cnn.png. [Accessed 2024].
[13] K. Simonyan and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale
Image Recognition," 2015. [Online]. [Accessed 2024].
[14] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition,"
2015. [Online]. [Accessed 2024].
[15] T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, "Focal Loss for Dense Object
Detection," arXiv [cs.CV], 2018.
61
[16] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real
Time Object Detection’," 2016. [Online]. [Accessed 2024].
[17] A. Vaswani, N. Shazeer, N. Parmar and J. Uszkoreit, "Attention Is All You Need," 2023.
[Online].
[18] C. K. Tan, K. M. Lim, C. P. Lee, R. K. Y. Chang and A. Alqahtani, "SDViT: Stacking of
Distilled Vision Transformers for Hand Gesture Recognition," Applied Sciences, vol. 13,
no. 22, 2023.
[19] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn and X. Zhai, "An Image is
Worth 16x16 Words: Transformers for Image Recognition at Scale," 2021. [Online].
[Accessed 2024].
[20] O. Uparkar, J. Bharti, R. K. Pateriya, R. K. Gupta and A. Sharma, "Vision Transformer
Outperforms Deep Convolutional Neural Network-based Model in Classifying X-ray
Images," Procedia Computer Science, vol. 218, pp. 2338-2349, 2023.
[21] S. H. Lee, S. Lee and B. C. Song, "Vision Transformer for Small-Size Datasets," 2021.
[Online]. Available: https://arxiv.org/abs/2112.13492. [Accessed 2024].
[22] M. Sari, A. Moussaoui et A. Hadid, «Deep Learning Techniques for Colorectal Cancer
Detection: Convolutional Neural Networks vs Vision Transformers,» chez 2024 2nd
International Conference on Electrical Engineering and Automatic Control (ICEEAC),
2024.
[23] fdeloche, "Long Short-Term Memory, CC BY-SA 4.0," 2017. [Online]. Available:
https://commons.wikimedia.org/wiki/File:Long_Short-Term_Memory.svg. [Accessed
2024].
[24] V. Sharma, M. Jaiswal, A. Sharma, S. Saini and R. Tomar, "Dynamic Two Hand Gesture
Recognition using CNN-LSTM based networks," in 2021 IEEE International Symposium
on Smart Electronic Systems (iSES), 2021.
[25] XantaCross, "Euclidean vs DTW, CC BY-SA 3.0," [Online]. Available:
https://commons.wikimedia.org/wiki/File:Euclidean_vs_DTW.jpg.
[26] W. Li, Z. Luo and X. Xi, "Movement Trajectory Recognition of Sign Language Based on
Optimized Dynamic Time Warping," Electronics, vol. 9, no. 9, 2020.
[27] S. Salvador and P. Chan, "FastDTW: Toward Accurate Dynamic Time Warping in Linear
Time and Space," KDD Workshop on Mining Temporal and Sequential Data, pp. 70-80,
2004.
[28] C. H. Chen, M. Y. Lin and X. C. Guo, "High-level modeling and synthesis of smart sensor
networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61,
pp. 48-66, 2017.
[29] A. Presley and D. Liles, "The Use of IDEF0 for the Design and Specification of
Methodologies," 10 1998. [Online].
[30] M. Blanchard, P. Brard and J. P. Frachet, "A modeling Tool for the Specifications of a
Logical Automatic System: The Grafcet (France)," IFAC Proceedings Volumes, vol. 13, no.
11, p. 521–529, 1980.
[31] Ssire, "Graf7_02.png, CC BY-SA 3.0," 8 July 2004. [Online]. Available:
https://commons.wikimedia.org/wiki/File:Graf7_02.png.
62
[32] C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, M. Georg
and M. Grundmann, "MediaPipe: A Framework for Building Perception Pipelines," 2019.
[33] A. Nagarah, "ASL Alphabet [Dataset]," 2018. [Online]. Available:
https://www.kaggle.com/dsv/29550. [Accessed October 2023].
[34] Meta AI, "Introducing Meta Llama 3: The most capable openly available LLM to date,"
2024. [Online]. Available: https://ai.meta.com/blog/meta-llama-3/.指導教授 陳慶瀚(Ching-Han Chen) 審核日期 2024-7-24 推文 facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu 網路書籤 Google bookmarks del.icio.us hemidemi myshare