摘要(英) |
Recently, due to the COVID-19, quarantined people are isolated in different indoor spaces, the world is going through a difficult time. In order to allow people living indoors to have more interaction with other people, we propose a system that uses deep learning and image processing technology to develop and establish a system that allows people in different indoor spaces to communicate and play games normally. Problems solved include management of skeletal data, interaction of virtual objects, interactive control in virtual reality (VR) or augmented reality (AR), and Construction of 3D Virtual Space. At the same time, we focus on the interaction between people and the environment, and construct a 3D model of an exclusive mini golf course space, so that multiple users can generate and interact with each other in this 3D space. In a 3D mini golf course, under the limited viewing angle of a single RGB camera, the reconstructed avatar will reflect the same upper body posture as the user, and different 3D avatars can interact in the AR/VR space, and these interactions will be based on MediaPipe skeleton recognition as the core.
In addition to this, our interaction model also considers the relationship between the hand skeleton and objects and the environment, such as the interaction of an avatar with a golf club, or the interaction of a golf club with a golf ball. For these high-precision interactive functions, we combine gesture detection and human skeleton detection in the module function, track the nodes of the hand skeleton and the spine skeleton, and use algorithms such as IK (Inverse kinematics) to project them to the virtual space. Let the player do not need a controller, just simple gestures, or some intuitive movements, you can control the character to move in the metaverse. Using Mirror connection module as the support, to construct a complete mini golf world, and at the same time, there are more player-to-player and player-to-virtual item interactions, expanding the sociality in the metaverse. |
參考文獻 |
[1] Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vincent Lepetit. HOnnotate: A method for 3D Annotation of Hand and Object Poses. In arXiv preprint:1907.01481v6 2020.
[2] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin. Attention Is All You Need. In arXiv preprint:1706.03762v5 2017.
[3] Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun. Point Transformer. In ICCV 2021.
[4] Mingxing Tan and Quoc V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In arXiv preprint:1905.11946v5 2020.
[5] Ukrit Marung , Nipon Theera-Umpon and Sansanee Auephanwiriyakul. Top-N Recommender Systems Using Genetic Algorithm-Based Visual-Clustering Methods. In MDPI Symmetry 2016.
[6] Kian Ming Lim, Alan Wee Chiat Tan, Chin Poo Lee, Shing Chiang Tan. Isolated sign language recognition using Convolutional Neural Network hand modelling and Hand Energy Image. In Springer Science+Business Media, LLC, part of Springer Nature 2019.
[7] Rasha Amer Kadhim and Muntadher Khamees. A Real-Time American Sign Language Recognition System using Convolutional Neural Network for Real Datasets. In TEM Journal. Volume 9, Issue 3, Pages 937-943, 2020.
[8] Haikel Alhichri, Asma S. Alswayed, Yakoub Bazi, Nassim Ammour and Naif A. Alajlan. Classification of Remote Sensing Images Using EfficientNet-B3 CNN Model With Attention. In IEEE 2020.
[9] Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao. YOLOv4: Optimal Speed and Accuracy of Object Detection. In arXiv preprint:2004.10934v1 2020.
[10] Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo and Ling Shao. PVT v2: Improved baselines with Pyramid Vision Transformer. In Computational Visual Media volume 8, pages415–424 (2022).
[11] Haoqiang Fan, Hao Su, Leonidas Guibas. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. In CVPR 2017.
[12] Yu-Wei Chao, Yunfan Liu, Xieyang Liu, Huayi Zeng, and Jia Deng. Learning to detect human-object interactions. In WACV 2018.
[13] Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3d hands, face, and body from a single image. In CVPR, 2019.
[14] Okan Kopuklu, Neslihan Kose, Gerhard Rigoll. Motion Fused Frames: Data Level Fusion Strategy for Hand Gesture Recognition. In arXiv preprint:1804.07187v2 2018.
[15] Shu-Yu Chen, Wanchao Su, Lin Gao, Shihong Xia, Hongbo Fu. DeepFaceDrawing: Deep Generation of Face Images from Sketches. In ACM 2020.
[16] Guangming Zhu, Liang Zhang, Lin Mei, Jie Shao, Juan Song, Peiyi Shen. Large-scale Isolated Gesture Recognition using Pyramidal 3D Convolutional Networks. In ICPR 2016.
[17] Sijie Yan, Yuanjun Xiong, Dahua Lin. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI, 2018.
[18] Lei Shi, Yifan Zhang, Jian Cheng, Hanqing Lu. Skeleton-Based Action Recognition With Directed Graph Neural Networks. In CVPR, 2019.
[19] A. D′Souza, S. Vijayakumar, S. Schaal. Learning Inverse Kinematics. In IEEE, 2001.
[20] Hailin Ren, Pinhas Ben-Tzvi. Learning inverse kinematics and dynamics of a robotic manipulator using generative adversarial networks. ELSEVIER-Robotics and Autonomous Systems, 2020.
[21] Keith Grochow, Steven L. Martin, Aaron Hertzmann, Z. Popovic. Style-based inverse kinematics. ACM Digital Library, 2004.
[22] Wisarut Bholsithi, Nonlapas Wongwaen, Chanjira Sinthanayothin. 3D avatar developments in real time and accuracy assessments. In ICSEC, 2014.
[23] John David N. Dionisio, William G. Burns III, Richard Gilbert. 3D Virtual Worlds and the Metaverse: Current Status and Future Possibilities. In ACM Computing Surveys Volume 45 Issue 3June 2013.
[24] Alanah Davis, John D. Murphy, Dawn Owens, Deepak Khazanchi, Ilze Zigurs. Avatars, People, and Virtual Worlds: Foundations for Research in Metaverses. In Journal of the Association for Information Systems 2009.
[25] Bektur Ryskeldiev, Yoichi Ochiai, Michael Cohen, Jens Herder. Distributed Metaverse: Creating Decentralized Blockchain-based Model for Peer-to-peer Sharing of Virtual Spaces for Mixed Reality Applications. In Proceedings of the 9th Augmented Human International Conference February 2018 Article.
[26] Masaki Oshita. Motion-Capture-Based Avatar Control Framework in Third-Person View Virtual Environments. In ACM SIGCHI International Conference on Advances in Computer Entertainment Technology 2006.
[27] Hee-soo Choi, Sang-heon Kim. A content service deployment plan for metaverse museum exhibitions—Centering on the combination of beacons and HMDs. In International Journal of Information Managemen 2017.
[28] Aziz Siyaev and Geun-Sik Jo. Neuro-Symbolic Speech Understanding in Aircraft Maintenance Metaverse. In IEEE 2021.
[29] Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, Matthias Grundmann. MediaPipe Hands: On-device Real-time Hand Tracking. In arXiv preprint:2006.10214v1 2020.
[30] Tsung-Long Chen. iGolf: A Golf Swing Training System Prototype. In NYCU 2012. |