使用虛擬合成資料實現臺灣手語特徵擷取暨手型辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：65

、訪客IP：18.119.253.184

姓名

陳宥榕(Yu-Jung Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

使用虛擬合成資料實現臺灣手語特徵擷取暨手型辨識
(Hand Feature Extraction and Gesture Recognition for Taiwan Sign Language by Using Synthetic Datasets)

相關論文

★ 基於QT之跨平台無線心率分析系統實現	★ 網路電話之額外訊息傳輸機制
★ 針對與運動比賽精彩畫面相關串場效果之偵測	★ 植基於向量量化之視訊/影像內容驗證技術
★ 植基於串場效果偵測與內容分析之棒球比賽精華擷取系統	★ 以視覺特徵擷取為基礎之影像視訊內容認證技術
★ 使用動態背景補償以偵測與追蹤移動監控畫面之前景物	★ 應用於H.264/AVC視訊內容認證之適應式數位浮水印
★ 棒球比賽精華片段擷取分類系統	★ 利用H.264/AVC特徵之多攝影機即時追蹤系統
★ 利用隱式型態模式之高速公路前車偵測機制	★ 基於時間域與空間域特徵擷取之影片複製偵測機制
★ 結合數位浮水印與興趣區域位元率控制之車行視訊編碼	★ 應用於數位智權管理之H.264/AVC視訊加解密暨數位浮水印機制
★ 基於文字與主播偵測之新聞視訊分析系統	★ 植基於數位浮水印之H.264/AVC視訊內容驗證機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本研究針對臺灣手語視訊進行手部特徵擷取暨手型辨識。首先，我們以 Unity3D 建立訓練資料集，利用 3D 手部模型合成於自然場景、人物場景及純色背景中，快速地且大量地產生高品質訓練資料，其中包含手部影像、手部輪廓、手部關節點。透過合成資料的使用，可以減少人工標記所可能產生的負擔與誤差。我們討論如何讓人工合成影像更貼近實際影像，藉由調整背景複雜度、膚色多樣性及加入動態模糊等方式產生多樣化影像以增加模型可靠度。接著，我們比較利用ResNeSt＋Detectron2模型產生的邊界框(bounding box)和語義分割(semantic segmentation)、以及改良EfficientDet模型所產生之熱圖(heatmap)的完整性後，最終我們使用邊界框作為手型辨識的特徵擷取，利用邊界框切出手語視訊中的手部影像進行手型辨識。我們同樣以 Unity3D 建立訓練資料集，利用 3D 手部模型製作數個臺灣手語基本手型，再利用ResNeSt進行分類辨識。實驗結果顯示本研究所產生的大量且高品質虛擬合成資料能有效的應用於手部特徵擷取，及臺灣手語之手型辨識。

摘要(英)

Hearing-impaired people rely on sign languages to communicate with each other but may have problems interacting with the persons who may not understand sign languages. Since sign languages belong to a type of visual languages, computer vision approaches to recognizing sign languages are usually considered feasible to bridge the gap. However, recognition of sign languages is a complex task, which requires classifying hand shapes, hand motions and facial expressions. The detection and classification of hand gestures should be the first step because hands are the most important elements. This research thus focuses on hand feature extraction and gesture recognition for Taiwan Sign Language (TSL) videos.
First, we established a synthetic dataset by using Unity3D. The advantage of using synthetic data is to reduce the effort of manual labeling and to avoid possible errors. A large dataset with high quality labeling can thus be achieved. The dataset is generated by changing hand shapes, colors and orientations. The background images are also changed to increase the robustness of the model. Motion blurriness is also added to make the synthetic data look closers to real cases. We compare three feature extractions: bounding boxes, semantic segmentation generated by the ResNeSt+Detectron2 and the heatmap generated by the EfficientDet. The bounding boxes are selected for the subsequent gesture recognition. We also employ Unity3D to create several basic sign gestures for TSL, and then use ResNeSt for classification and recognition.
Experimental results demonstrate that the synthetic dataset can effectively help to train the suitable models for hand feature extraction and gesture recognition in TSL videos.

關鍵字(中)

★ 虛擬合成資料
★ 臺灣手語
★ 特徵擷取
★ 手型辨識

關鍵字(英)

★ synthetic datasets
★ Taiwanese sign language
★ feature extraction
★ gesture recognition

論文目次

論文摘要 I
Abstract II
致謝 III
目錄 IV
附圖目錄 VII
表格目錄 X
第一章緒論 1
1.1 研究動機 1
1.2 研究貢獻 2
1.3 論文架構 2
第二章相關研究 3
2.1 台灣手語 3
2.1.1 歷史起源 3
2.1.2 臺灣手語分類 4
2.1.3 手型 6
2.2 特徵擷取 7
2.2.1 傳統影像處理 7
2.2.2 輔助儀器 7
2.2.3 深度學習及其常見模型 9
2.2.4 網路架構之應用 12
第三章提出方法 13
3.1 Unity 介紹 13
3.2 人工合成影像訓練集 14
3.2.1 手部生成 15
3.2.2 輪廓生成 20
3.2.3 手型生成 20
3.2.4 標記方式 21
3.3 特徵提取網路 23
3.3.1 EfficientDet 23
3.3.2 ResNeSt 25
3.3.3 Detectron2 26
第四章實驗結果 27
4.1 開發環境 27
4.2 特徵擷取成果及效果評估 27
4.2.1 評估指標 27
4.2.2 關節點評估 30
4.2.3 輪廓評估 32
4.2.4 邊界框評估 33
4.2.5 綜合評估 34
4.2.6 手部定位 35
4.3 手型辨識 36
第五章未來展望 42
5.1 結論 42
5.2 未來展望 42
參考文獻 43

參考文獻

[1] 衛生服務部統計處. 社會福利統計 https://dep.mohw.gov.tw/DOS/lp-2976-113.html
[2] Huang, Jung-Ning. "台灣手語手型辨識研究." 成功大學資訊工程學系學位論文 (2005): 1-55.
[3] Ko, Chih-Ang. "手勢跨越顏面部位的台灣手語辨識." 成功大學資訊工程學系學位論文 (2009): 1-64.
[4] 張光寒. "3D台灣手語辨識系統." (2007).
[5] 蕭怡涵. "基於 Kinect 之台灣手語單字辨識." (2013).
[6] 林政諺. "利用RGB-D相機之台灣手語辨識." (2016).
[7] 姚俊英. "台灣手語演進." 聽障教育期刊 (2006): 11-15.
[8] 丁立芬; 史文漢. "手能生橋. " 台北: 中華民國聾人協會, (2001).
[9] 潘秋雯執行編輯 "臺北市手語翻譯培訓教材第一冊修訂版" (2018)
[10] 全國特殊教育資訊網 https://tinyurl.com/y7v3kalk
[11] Sign Tube 手語天地 (YouTube)
[12] Huang, Deng-Yuan, Wu-Chih Hu, and Sung-Hsiang Chang. "Vision-based hand gesture recognition using PCA+ Gabor filters and SVM." 2009 fifth international conference on intelligent information hiding and multimedia signal processing. IEEE, 2009.
[13] Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995): 273-297.
[14] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[15] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[16] Li, Xiang, et al. "Selective kernel networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2019.
[17] Tan, Mingxing, and Quoc V. Le. "Efficientnet: Rethinking model scaling for convolutional neural networks." arXiv preprint arXiv:1905.11946 (2019).
[18] Girshick, Ross, et al. "Region-based convolutional networks for accurate object detection and segmentation." IEEE transactions on pattern analysis and machine intelligence 38.1 (2015): 142-158.
[19] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
[20] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
[21] He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.
[22] Unity. https://unity.com/
[23] Betancourt, A. "EgoHands: a unified framework for hand-based methods in first person vision videos." (2017).
[24] Mittal, Arpit, Andrew Zisserman, and Philip HS Torr. "Hand detection using multiple proposals." BMVC. Vol. 2. No. 3. 2011.
[25] Wang, Qi, et al. "Learning from synthetic data for crowd counting in the wild." Proceedings of the IEEE conference on computer vision and pattern recognition. 2019.
[26] Liu, Ziwei, et al. "Large-scale celebfaces attributes (celeba) dataset." Retrieved August 15 (2018): 2018.
[27] Tan, Mingxing, Ruoming Pang, and Quoc V. Le. "Efficientdet: Scalable and efficient object detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
[28] Zhang, Hang, et al. "Resnest: Split-attention networks." arXiv preprint arXiv:2004.08955 (2020).
[29] Wu, Yuxin, et al. "Detectron2." (2019).
[30] Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[31] Liu, Shu, et al. "Path aggregation network for instance segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[32] Ghiasi, Golnaz, Tsung-Yi Lin, and Quoc V. Le. "Nas-fpn: Learning scalable feature pyramid architecture for object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2019.
[33] Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[34] Intersection over Union (IoU). https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/
[35] Average Precision.(AP) https://cocodataset.org/#detection-eval
[36] Chinese Sign Language Recognition Dataset http://home.ustc.edu.cn/~pjh/openresources/cslr-dataset-2015/index.html
[37] Maćkiewicz, Andrzej, and Waldemar Ratajczak. "Principal components analysis (PCA)." Computers & Geosciences 19.3 (1993): 303-342.

指導教授

蘇柏齊

審核日期

2020-8-10

推文