Non-Touch Cooperation: An Interactive Mechanism Design Based on Mid-Air Gestures

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：106

、訪客IP：3.137.189.14

姓名

王昭元(Chao-Yuan Wang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

(Non-Touch Cooperation: An Interactive Mechanism Design Based on Mid-Air Gestures)

相關論文

★ 以機器學習技術為基礎建構新生兒孕育健康狀態預測模型	★ 電子病歷縮寫消歧與一對多分類任務
★ 使用文字探勘與深度學習技術建置中風後肺炎之預測模型	★ 精準社群廣告投資策略：以機器學習技術為基礎之社會影響力管理模式
★ 智慧活躍老化之實現：以資料驅動為基礎之AI長者在地交友推薦模式	★ 整合深度學習技術與SOR理論之資訊情緒傳遞性探索：新聞生成特質與資訊情感傳播行為
★ 基於混合過濾的電影推薦系統	★ 以 Reddit 使用者生成內容探討糖尿病照護社會支持
★ 防範於未然：基於機器學習技術之網路入侵偵測系統	★ 可靠度驗證實驗室導入人工智慧技術的可行性探討 -以A公司為例
★ 智慧共同照護之實現: 以資料驅動為基礎之 AI 糖尿病個案管理模式	★ 基於 UGC 的反脆弱社交口碑策略:以臺灣飯店業為例
★ 基於機器學習技術之誘導式評論過濾機制：以餐廳評論為例	★ 資料驅動的球隊經營：NBA球隊競爭力與抱團策略之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-10以後開放)

摘要(中)

由於疫情的發展和時代的演進，非接觸的服務更具規模，形成了非接觸「經濟」，其中數位看板便是快速發展且適合非接觸服務的產業。相關基於骨骼的人體、手勢的操控方法，能提供更加直接的操控，而且骨骼資料可以保護隱私，能在非接觸經濟中有更良好的發展。然而目前的大多數的解決方案都存有硬體限制和領域要求，並且存在操控動作過多的問題，加大了使用者的使用門檻。
　　本研究結合多個開源套件，提出了可以應用在數位看板中，結合手臂和手勢的多人動作識別框架，透過對操控動作增加人體關節數量增加人體資訊，達到功能性動作的同時，也擴大與日常動作的差異性，更能以較少的動作組合達到更多的操控功能。接著在框架中提出了一種動作區間檢測策略再次降低功能性動作與日常動作的誤判，減少不必要的計算量。此外，針對人體的結構性，本篇研究調整了現有人體3維卷積神經網路的資料輸入方法至提出的結合辨識方法上，探討相關的效能。本研究另一個貢獻是將公開的手勢和人體動作資料集做結合，從而模擬真實動作。
　　最後為了驗證框架和合成資料集的效果，對於同一人多次進行錄製相關動作使動作更加符合資料集，並挑選一些經典的卷積神經網路模型並轉換成3D 模型後評估其效果。

摘要(英)

Due to the development of the COVID-19 pandemic and the evolution of our society, non-touch services have gained significant momentum, giving rise to the concept of the "Non-Touch Economy". One industry that has experienced rapid growth and is well-suited for non-touch services is digital signage. Skeleton-based action and gesture recognition method provide more direct and intuitive means of control, and the use of skeletal data helps protect privacy, making them ideal for the non-touch economy. However, existing solutions often have hardware limitations, domain-specific requirements, and involve excessive number of control movements, which increase the user′s learning curve and make adoption challenging.
　　This research proposed a multi-person action recognition framework that combines arm and gesture control, specifically designed for digital signage applications. By incorporating additional body joint information into the gesture control, enhancing the functionality while also expanding the differentiation from everyday actions and achieve a wider range of control functions with fewer gesture combinations. Furthermore, this research introduced a motion interval detection strategy in the framework to reduce false recognition between functional actions and everyday movements, thereby minimizing unnecessary computations. Additionally, considering the structural characteristics of the human body, the existing 3D convolutional neural network data input method was adjusted to the proposed recognition method and explore its performance. Another contribution of this study is the integration of publicly available gesture and human action datasets to simulate real movements.
　　To validate the effectiveness of the framework and synthesized dataset, this study recorded related actions multiple times with the same individual to ensure better alignment with the dataset. It also selected several well-known convolutional neural network models and transformed them into 3D models for evaluation purposes.

關鍵字(中)

★ 半空手勢
★ 骨骼
★ 動作辨識
★ 數位看板
★ 非接觸經濟
★ 卷積神經網路

關鍵字(英)

★ mid-air gesture
★ skeleton
★ action recognition
★ digital signage
★ non-touch economy
★ convolutional neural network

論文目次

摘要 I
Abstract II
Acknowledgements III
List of Tables VI
List of Figures VII
1 Introduction 1
1-1 Background 1
1-2 Motivation 3
1-3 Objectives 5
2 Literature Review 7
2-1 Non-Touch Services 7
2-2 Skeleton-Based Gesture and Action Recognition 9
3 Methodology 11
3-1 Proposed Framework 11
3-2 Adjusted Data Reorganization Approach 16
3-3 Public Datasets 18
3-4 Data Simulation 20
3-5 Experiment Design 24
3-6 Evaluation 26
4 Results 28
4-1 Dataset Description 28
4-2 Experiment Results 29
4-3 Architecture Validation 31
4-4 Framework Performance 32
4-5 Discussion 34
5 Conclusion 35
5-1 Academic Impact 35
5-2 Business Impact 36
5-3 Limitation 37
5-4 Future Work 38
Reference 39

參考文獻

Al-Hammadi, M., Muhammad, G., Abdul, W., Alsulaiman, M., Bencherif, M. A., Alrayes, T. S., Mathkour, H., & Mekhtiche, M. A. (2020). Deep Learning-Based Approach for Sign Language Gesture Recognition With Efficient Hand Gesture Representation. IEEE Access, 8, 192527–192542. https://doi.org/10.1109/ACCESS.2020.3032140
Anand S., Sengar V., Kumar Y., & Chauhan A. (2022). Contactless ATM - Touch Free Banking Experience (SSRN Scholarly Paper No. 4157234). https://doi.org/10.2139/ssrn.4157234
Arora, R., Kazi, R. H., Kaufman, D. M., Li, W., & Singh, K. (2019). MagicalHands: Mid-Air Hand Gestures for Animating in VR. Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, 463–477. https://doi.org/10.1145/3332165.3347942
Asnani, S., Kunte, A., Hasan Charoliya, M., & Gupta, N. (2021). Temperature actuated non-touch automatic door. 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), 669–675. https://doi.org/10.1109/ICOSEC51865.2021.9591951
Ayubi, S. A., Sudiharto, D. W., Jadied, E. M., & Aryanto, E. (2019). The Prototype of Hand Gesture Recognition for Elderly People to Control Connected Home Devices. Journal of Physics: Conference Series, 1201(1), 012042. https://doi.org/10.1088/1742-6596/1201/1/012042
Borghi, G., Vezzani, R., & Cucchiara, R. (2016). Fast gesture recognition with Multiple Stream Discrete HMMs on 3D skeletons. 2016 23rd International Conference on Pattern Recognition (ICPR), 997–1002. https://doi.org/10.1109/ICPR.2016.7899766
Caetano, C., Sena, J., Brémond, F., Dos Santos, J. A., & Schwartz, W. R. (2019). SkeleMotion: A New Representation of Skeleton Joint Sequences based on Motion Information for 3D Action Recognition. 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1–8. https://doi.org/10.1109/AVSS.2019.8909840
Chiang, P.-Y., Chen, C.-C., & Hsia, C.-H. (2019). A touchless interaction interface for observing medical imaging. Journal of Visual Communication and Image Representation, 58, 363–373. https://doi.org/10.1016/j.jvcir.2018.12.004
Coenen, J., Claes, S., & Moere, A. V. (2017). The concurrent use of touch and mid-air gestures or floor mat interaction on a public display. Proceedings of the 6th ACM International Symposium on Pervasive Displays, 1–9. https://doi.org/10.1145/3078810.3078819
Cronin, S., & Doherty, G. (2019). Touchless computer interfaces in hospitals: A review. Health Informatics Journal, 25(4), 1325–1342. https://doi.org/10.1177/1460458217748342
Dadhich, S., Dabre, P., Dabreo, R., & Raut, P. (2021). Contactless IoT Doorbell for Covid-safe Household. 2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC), 01–04. https://doi.org/10.1109/R10-HTC53172.2021.9641565
Fan, Y., Weng, S., Zhang, Y., Shi, B., & Zhang, Y. (2020). Context-Aware Cross-Attention for Skeleton-Based Human Action Recognition. IEEE Access, 8, 15280–15290. https://doi.org/10.1109/ACCESS.2020.2968054
Guo, Y., Yang, Z., Yuan, Y., Ma, H., & Liu, Y. Q. (2022). Contactless Services: A Survey of the Practices of Large Public Libraries in China. Information Technology and Libraries, 41(2), Article 2. https://doi.org/10.6017/ital.v41i2.14141
Hara, K., Kataoka, H., & Satoh, Y. (2018). Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? 6546–6555. https://openaccess.thecvf.com/content_cvpr_2018/html/Hara_Can_Spatiotemporal_3D_CVPR_2018_paper.html
Kabsch, W. (1978). A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 34(5), Article 5. https://doi.org/10.1107/S0567739478001680
Kapitanov, A., Makhlyarchuk, A., & Kvanchiani, K. (2022). HaGRID - HAnd Gesture Recognition Image Dataset (arXiv:2206.08219; Version 1). arXiv. http://arxiv.org/abs/2206.08219
King, B., Chen, I.-F., Vaizman, Y., Liu, Y., Maas, R., Parthasarathi, S. H. K., & Hoffmeister, B. (2017). Robust Speech Recognition via Anchor Word Representations. Interspeech 2017, 2471–2475. https://doi.org/10.21437/Interspeech.2017-1570
Kopinski, T., Eberwein, J., Geisler, S., & Handmann, U. (2016). Touch versus mid-air gesture interfaces in road scenarios—Measuring driver performance degradation. 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), 661–666. https://doi.org/10.1109/ITSC.2016.7795624
Köpüklü, O., Kose, N., Gunduz, A., & Rigoll, G. (2021). Resource Efficient 3D Convolutional Neural Networks (arXiv:1904.02422). arXiv. http://arxiv.org/abs/1904.02422
Kraljević, L., Russo, M., Pauković, M., & Šarić, M. (2020). A Dynamic Gesture Recognition Interface for Smart Home Control based on Croatian Sign Language. Applied Sciences, 10(7), Article 7. https://doi.org/10.3390/app10072300
Lee, J., & Ahn, B. (2020). Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform. Sensors, 20(10), Article 10. https://doi.org/10.3390/s20102886
Li, M., Yin, D., Qiu, H., & Bai, B. (2022). Examining the effects of AI contactless services on customer psychological safety, perceived value, and hospitality service quality during the COVID‐19 pandemic. Journal of Hospitality Marketing & Management, 31(1), 24–48. https://doi.org/10.1080/19368623.2021.1934932
Liu, J., Wang, G., Duan, L.-Y., Abdiyeva, K., & Kot, A. C. (2018). Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks. IEEE Transactions on Image Processing, 27(4), 1586–1599. https://doi.org/10.1109/TIP.2017.2785279
Ma, N., Zhang, X., Zheng, H.-T., & Sun, J. (2018). ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. 116–131. https://openaccess.thecvf.com/content_ECCV_2018/html/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.html
MediaPipe. (n.d.). Retrieved March 15, 2023, from https://mediapipe.dev/
Nguyen, T.-T., Pham, D.-T., Vu, H., & Le, T.-L. (2022). A robust and efficient method for skeleton-based human action recognition and its application for cross-dataset evaluation. IET Computer Vision, 16(8), 709–726. https://doi.org/10.1049/cvi2.12119
Nie, W., Wang, W., & Huang, X. (2019). SRNet: Structured Relevance Feature Learning Network From Skeleton Data for Human Action Recognition. Ieee Access, 7, 132161–132172. https://doi.org/10.1109/ACCESS.2019.2940281
Pantano, E., & Vannucci, V. (2019). Who is innovating? An exploratory research of digital technologies diffusion in retail industry. Journal of Retailing and Consumer Services, 49, 297–304. https://doi.org/10.1016/j.jretconser.2019.01.019
Pumarola, A., Sanchez, J., Choi, G. P. T., Sanfeliu, A., & Moreno, F. (2019). 3DPeople: Modeling the Geometry of Dressed Humans. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2242–2251. https://doi.org/10.1109/ICCV.2019.00233
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2019). MobileNetV2: Inverted Residuals and Linear Bottlenecks (arXiv:1801.04381). arXiv. https://doi.org/10.48550/arXiv.1801.04381
Smedt Q. D., Wannous H., Vandeborre J.-P., Guerry J., Saux B. L., & Filliat D. (2017). 3D Hand Gesture Recognition Using a Depth and Skeletal Dataset. The Eurographics Association. https://doi.org/10.2312/3dor.20171049
Surale, H. B., Matulic, F., & Vogel, D. (2019). Experimental Analysis of Barehand Mid-air Mode-Switching Techniques in Virtual Reality. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–14. https://doi.org/10.1145/3290605.3300426
Vogiatzidakis, P., & Koutsabasis, P. (2020). Mid-Air Gesture Control of Multiple Home Devices in Spatial Augmented Reality Prototype. Multimodal Technologies and Interaction, 4(3), 61. https://doi.org/10.3390/mti4030061
Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors (arXiv:2207.02696). arXiv. https://doi.org/10.48550/arXiv.2207.02696
Zhao, Y., & Bacao, F. (2021). How Does the Pandemic Facilitate Mobile Payment? An Investigation on Users’ Perspective under the COVID-19 Pandemic. International Journal of Environmental Research and Public Health, 18(3), 1016. https://doi.org/10.3390/ijerph18031016
Zhu, K., Wang, R., Zhao, Q., Cheng, J., & Tao, D. (2020). A Cuboid CNN Model With an Attention Mechanism for Skeleton-Based Action Recognition. IEEE Transactions on Multimedia, 22(11), Article 11. https://doi.org/10.1109/TMM.2019.2962304

指導教授

曾筱珽(Hsiao-Ting Tseng)

審核日期

2023-7-11

推文