dc.description.abstract | A quickly-responded precise hand gesture recognition (HGR) system is an important and convenient human–computer interaction (HCI). In this paper, we propose two loose hand gesture recognition (LHGR) systems individually using a cascade classifier with geometric relational features and a multi-resolution convolutional neural network. The loose means that the system accepts more different variations on the bending degrees of fingers, the direction of palm, and the bending angles of wrist.
The LHGR system based on geometric relational features uses a depth camera, which not only maintains an impressive accuracy in real-time processing but also enables the users to pose loose gestures. The process of a HGR system is usually divided into three stages: hand detection, feature extraction, and gesture classification. However, the method we propose has been useful in improving all the stages of HGR. In the hand detection stage, we propose a dynamic ROI estimation method and a wrist-cutting method that conform to the characteristics of a human hand. In the feature extraction stage, we use the more reliable geometric relational features which are constructed by local features, global features, and depth coding. In the gesture classification stage, we use three layers of classifiers including finger counting, finger name matching, and coding comparison; these layers are used to classify 16 kinds of hand gestures. In the end, the final output is adjusted by an adaptive decision.
Convolutional neural network (CNN) can extract gesture features to adapt various mutations. It can overcome light and shadow, blur noises, hand rotation and other factors under adequate sample conditions. The proposed LHGR system based on deep learning have two input-paths for color images and depth maps. The two paths learn the low-resolution features at beginning, and then concatenate the low-resolution features to learn RGBD high-resolution features. The advantage is that it can suppress the problem of the inaccurate alignment pixels between color images and deep images, and it can also reduce the parameter number of the model. In addition, we use multi-resolution features to classify the hand gestures, therefor, the proposed model has stronger ability for smaller, farther, and blurrier images. During the training stage, we trained the proposed CNN model using a dataset that contained various mutations of loose hand gestures to make CNN have the ability to classify loose hand gestures. In the experiments, we compared the results of the proposed CNN model with many different CNN architectures; the mAP of the model we proposed is up to 0.997333. The proposed method not only enables better and more efficiently use of color images and depth images, but also have better accuracy for lower-quality images (even if the training dataset lacks of the lower-quality images), which the mAP still has a value of 0.662222 for the 10×10 image dataset. As mentioned above, the proposed method not only has reliability in the scaling and rotation of gestures, but allows the lower resolution images as the inputs. Therefore, the proposed CNN model is suitable for LHGR system. | en_US |