關鍵字 - 手語辨識、特徵擷取、深度學習;Deep learning-based sign language recognition usually requires a large number of sign language videos to train neural network models. In this study, we consider generating effective sign language training data to help construct deep learning recognition models through feature extraction and expansion of training data when a smaller number of sign language videos are used for training. We use Mediapipe to obtain the hand skeleton from the sign language video, analyze several hand skeleton adjustment policies and color arrangement, and generate hand masks from the skeleton to simulate hands of different persons. Since the miss detection of hands may happen due to the motion blurring caused by rapid hand movements, we incorporate optical flows to ensure that the hand movement information is retained in each frame. We use different spatial and temporal processing strategies to simulate different hand sizes, different filming angles, and different hand speeds. The experimental results show that the proposed approach is effective in improving the accuracy of sign language recognition in the American Sign Language dataset.
Index Terms - Sign Language Recognition, Feature Extraction, Deep Learning