dc.description.abstract | In recent years, the COVID-19 pandemic has fundamentally transformed people’s lifestyles and interaction modes worldwide. To ensure public health safety, the demand for contactless technologies has surged. Gesture recognition, as an intuitive form of human-computer interaction, has become increasingly important. By utilizing deep learning-based image recognition technology, combined with readily available web cameras, and achieving a certain level of accuracy and efficiency in recognizing dynamic gestures, it can be applied to contactless interactions, thereby reducing contact and effectively lowering the risk of virus transmission.
This study uses RGB web cameras in combination with MediaPipe Hands for hand detection and static gesture recognition, and employs a Decouple + Recouple deep learning network to learn 27 defined dynamic gestures. Human action recognition datasets are used for model pre-training. By adjusting the dataset size and different model configurations, we compare their respective differences. Finally, we develop a self-service ordering interface and integrate it into a Real-Time scenario to simulate the actual ordering process, achieving a contactless system with customizable control functions corresponding to each gesture.
For hand detection, we achieve an average detection confidence of 99 %. In terms of dynamic gesture recognition, we attain an overall average recognition accuracy exceeding 95 %, and an average F1-Score of 95 % for individual gestures. On a very small custom dataset, we achieve overall average accuracy exceeding 93 %. In Real-Time recognition, the average execution time per operation is approximately 0.4 seconds, with a gesture prediction time of 0.27 seconds. The correct recognition rate stands at 94.07 %, showcasing the system’s high stability, accuracy, and excellent recognition speed, making dynamic gesture recognition highly practical and beneficial for contactless applications. | en_US |