姓名 翁浚銘(Jun-Ming Wong)  查詢紙本館藏   畢業系所 軟體工程研究所
論文名稱 以少量視訊建構台灣手語詞分類模型
(Using a Small Video Dataset to Construct a Taiwanese-Sign-Language Word Classification Model)
摘要(中) 手語是一種視覺語言,利用手形、動作,甚至面部表情傳達訊息以作為聽障人
首先,我們由視訊共享平台中取得一系列手語教學視訊,透過Mask RCNN[1]
後我們以具注意力機制的3D-ResNet 對多種台灣手語辭彙進行分類,實驗結果顯
摘要(英) Sign languages (SL) are visual languages that use shapes of hands,
movements, and even facial expressions to convey information, acting
as the primary communication tool for hearing-impaired people. Sign
language recognition (SLR) based on deep learning technologies has attracted
much attention in recent years. Nevertheless, training neural
networks requires a massive number of SL videos. Their preparation process
is time-consuming and cumbersome. This research proposes using a
set of SL videos to build effective training data for the classification of
Taiwanese Sign Language (TSL) vocabulary. First, we begin with a series
of TSL teaching videos from the video-sharing platform. Then, Mask
RCNN[1] is employed to extract the segmentation masks of hands and
faces in all video frames. Next, spatial domain data augmentation is applied
to create the training set with different contents. Varying temporal
domain sampling strategies are also employed to simulate the speeds of
different signers. Finally, the attention-based 3D-ResNet trained by the
synthetic dataset is used to classify a variety of TSL vocabulary. The
experimental results show the promising performance and the feasibility
to SLR.
關鍵字(中) ★ 台灣手語
★ 手語識別
★ 深度學習
關鍵字(英) ★ Taiwanese sign language
★ sign language recognition
★ deep learning
論文目次 1 Introduction 1
1.1 Motivation of Research . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution of Research . . . . . . . . . . . . . . . . . . . 3
1.3 The Organization of Thesis . . . . . . . . . . . . . . . . . 4
2 Related Work 5
2.1 TSL Background . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Sign Language . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Types of TSL . . . . . . . . . . . . . . . . . . . . . 7
2.2 Related Solutions . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Annotation Type of Data . . . . . . . . . . . . . . 10
2.2.2 Methods of Eliciting Features . . . . . . . . . . . . 10
2.2.3 Deep Learning Models . . . . . . . . . . . . . . . . 11
3 TSL Recognition 17
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 TSL Dataset . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Data Augmentation . . . . . . . . . . . . . . . . . 18
3.2 Model Architecture . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Local feature attention . . . . . . . . . . . . . . . . 24
3.2.2 Layer attention . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Experimental Results 29
4.1 Development Environment . . . . . . . . . . . . . . . . . . 29
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.1 Test dataset . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Spatial domain results . . . . . . . . . . . . . . . . 30
4.2.3 Temporal domain results . . . . . . . . . . . . . . . 31
4.2.4 Attention effects . . . . . . . . . . . . . . . . . . . 32
5 Conclusion and Future Work 34
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 36
References 37
指導教授 蘇柏齊(Po-Chyi Su) 審核日期 2021-8-4
