基於通道拓樸增強圖卷積神經網絡之手語單詞辨識演算法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：103

、訪客IP：3.136.236.126

姓名

董致輔(Chih-Fu Tung) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於通道拓樸增強圖卷積神經網絡之手語單詞辨識演算法
(A CTRGCN-based model for Isolated Sign Language Recognition)

相關論文

★ 以Q-學習法為基礎之群體智慧演算法及其應用	★ 發展遲緩兒童之復健系統研製
★ 從認知風格角度比較教師評量與同儕互評之差異：從英語寫作到遊戲製作	★ 基於檢驗數值的糖尿病腎病變預測模型
★ 模糊類神經網路為架構之遙測影像分類器設計	★ 複合式群聚演算法
★ 身心障礙者輔具之研製	★ 指紋分類器之研究
★ 背光影像補償及色彩減量之研究	★ 類神經網路於營利事業所得稅選案之應用
★ 一個新的線上學習系統及其於稅務選案上之應用	★ 人眼追蹤系統及其於人機介面之應用
★ 結合群體智慧與自我組織映射圖的資料視覺化研究	★ 追瞳系統之研發於身障者之人機介面應用
★ 以類免疫系統為基礎之線上學習類神經模糊系統及其應用	★ 基因演算法於語音聲紋解攪拌之應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來，聽障人士的人口逐漸增長，大眾對於手語學習的需求也跟
著逐年提升，然而，手語學習的困難度高，且學習資源有限，使得手語
學習成為一個困難的任務。
為了解決這個問題，本論文提出了一種基於通道拓樸增強圖卷積神
經網絡（CTRGCN）的基於骨架手語單詞辨識演算法。本研究針對手語
單詞辨識，設計了改良的CTRGCN 模型，並提出多分支的架構，以提高
辨識準確度。我們使用WLASL100 數據集進行訓練，並與現有模型進行
了的比較。結果顯示，我們的方法在多數情境下均優於現有技術，展示
了其在手語單詞辨識上的潛力和實用性，並希望為手語學習提供更多的
幫助。

摘要(英)

In recent years, the population of hearing-impaired individuals has been
gradually increasing, and the public’s demand for sign language learning has
been steadily rising as well. However, the difficulty of learning sign language is
high, and the learning resources are limited, making it a relatively challenging
task.
To address this issue, this paper proposes a Skeleton based sign language
word recognition algorithm based on Channel-Topology Refinement Graph Convolutional
Network (CTRGCN). This method tackles the challenges in sign language
word recognition, by designing an improved CTRGCN model to enhance
recognition accuracy. We trained the model using the WLASL100 dataset and
compared it with existing models. The results demonstrate that our method outperforms
existing techniques in most scenarios, showcasing its potential and
practicality in sign language word recognition. We hope to provide more assistance
for sign language learning through this approach.

關鍵字(中)

★ 深度學習
★ 骨架辨識
★ 手語單詞辨識
★ 圖卷積神經網路

關鍵字(英)

★ Deep learning
★ Skeleton recognition
★ Sign language recognition
★ Graph convolutional neural network

論文目次

一、緒論1
1.1 研究動機.................................................................. 1
1.2 研究目標.................................................................. 3
1.3 論文架構.................................................................. 4
二、背景知識以及文獻回顧5
2.1 背景知識.................................................................. 5
2.1.1 各種手語......................................................... 5
2.1.2 手語辨識種類................................................... 7
2.1.3 圖卷積(GCN) 介紹............................................. 8
2.2 文獻回顧.................................................................. 10
2.2.1 關鍵點偵測之相關研究....................................... 10
2.2.2 基於骨架動作辨識之相關研究.............................. 12
2.2.3 基於3DCNN 的影片辨識相關研究......................... 15
2.2.4 基於骨架手語單詞辨識之相關研究........................ 15
三、研究方法20
3.1 系統架構.................................................................. 20
3.2 前處理..................................................................... 21
3.3 模型架構.................................................................. 25
3.3.1 CTRGCN 模型.................................................. 25
3.3.2 修改後的CTRGCN 模型...................................... 27
3.3.3 多分支架構...................................................... 29
3.3.4 模型結果合併方法............................................. 30
3.3.5 融合RGB 結果.................................................. 31
四、實驗設計與結果32
4.1 資料集..................................................................... 32
4.2 實驗配置.................................................................. 34
4.3 實驗結果評估............................................................ 36
4.3.1 比較額外分支結果............................................. 36
4.3.2 比較不同分支合併的方法.................................... 39
4.3.3 比較減少層數後的效果....................................... 40
4.3.4 比較模塊修改後的效果....................................... 40
4.3.5 比較不同的分支組合.......................................... 42
4.3.6 比較不同的參數................................................ 43
4.3.7 與現有手語單詞辨識模型比較.............................. 44
五、總結46
5.1 結論........................................................................ 46
5.2 未來展望.................................................................. 47
參考文獻48

參考文獻

[1] W. H. Organization. “Deafness and hearing loss — who.int.” (2024), [Online]. Available:
https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss (visited
on 05/18/2024).
[2] D. Li, C. Rodriguez, X. Yu, and H. Li, “Word-level deep sign language recognition from
video: A new large-scale dataset and methods comparison,” in The IEEE Winter Conference
on Applications of Computer Vision, 2020, pp. 1459–1469.
[3] 教育部國民及學前教育署. “學齡前2 至6 歲教保服務人員手語手冊,” [Online].
Available: https://www.ece.moe.edu.tw/ch/special_education/skill/skill_0002/ (visited
on 06/11/2024).
[4] 李信賢. “國際手語(is) 是否為一種語言？.” (2019), [Online]. Available: https : / /
taslifamily.org/?p=4826 (visited on 05/18/2024).
[5] E. Drasgow. “American sign language.” (2024), [Online]. Available: https : / / www .
britannica.com/topic/American-Sign-Language (visited on 05/20/2024).
[6] D. W. Vicars. “Gloss,” [Online]. Available: https://www.lifeprint.com/asl101/topics/
gloss.htm (visited on 05/20/2024).
[7] 中華民國啟聰協會. “台灣手語介紹及手語qa,” [Online]. Available: https://www.
deaf.org.tw/OnePage.aspx?mid=51&id=46 (visited on 05/20/2024).
[8] SignTube, 台灣手語南北差異1 tsl dialects (1), YouTube, Accessed: 2024-06-02, 2023.
[9] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional
networks,” arXiv preprint arXiv:1609.02907, 2016.
[10] C. Lugaresi, J. Tang, H. Nash, et al., “Mediapipe: A framework for building perception
pipelines,” arXiv preprint arXiv:1906.08172, 2019.
[11] google-ai-edge. “Mediapipe holistic.” Accessed: 2024-06-02. (2022), [Online]. Available:
https://github.com/google-ai-edge/mediapipe/blob/master/docs/solutions/holistic.
md.
[12] Z. Cao, G. Hidalgo, T. Simon, S. Wei, and Y. Sheikh, “Openpose: Realtime multi-person
2d pose estimation using part affinity fields,” CoRR, vol. abs/1812.08008, 2018. arXiv:
1812.08008. [Online]. Available: http://arxiv.org/abs/1812.08008.
[13] T. Jiang, P. Lu, L. Zhang, et al., “Rtmpose: Real-time multi-person pose estimation based
on mmpose,” arXiv preprint arXiv:2303.07399, 2023.
[14] A. Sengupta, F. Jin, R. Zhang, and S. Cao, “Mm-pose: Real-time human skeletal posture
estimation using mmwave radars and cnns,” IEEE Sensors Journal, vol. 20, no. 17,
pp. 10 032–10 044, 2020.
[15] C. Li, P. Wang, S. Wang, Y. Hou, and W. Li, “Skeleton-based action recognition using
LSTM and CNN,” CoRR, vol. abs/1707.02356, 2017. arXiv: 1707.02356. [Online].
Available: http://arxiv.org/abs/1707.02356.
[16] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeletonbased
action recognition,” CoRR, vol. abs/1801.07455, 2018. arXiv: 1801.07455. [Online].
Available: http://arxiv.org/abs/1801.07455.
[17] L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Adaptive spectral graph convolutional networks
for skeleton-based action recognition,” CoRR, vol. abs/1805.07694, 2018. arXiv: 1805.
07694. [Online]. Available: http://arxiv.org/abs/1805.07694.
[18] Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise topology refinement
graph convolution for skeleton-based action recognition,” CoRR, vol. abs/2107.12213,
2021. arXiv: 2107.12213. [Online]. Available: https://arxiv.org/abs/2107.12213.
[19] J. Carreira and A. Zisserman, “Quo vadis, action recognition? A new model and the kinetics
dataset,” CoRR, vol. abs/1705.07750, 2017. arXiv: 1705.07750. [Online]. Available:
http://arxiv.org/abs/1705.07750.
[20] S. Xie, C. Sun, J. Huang, Z. Tu, and K. Murphy, “Rethinking spatiotemporal feature
learning for video understanding,” CoRR, vol. abs/1712.04851, 2017. arXiv: 1712.04851.
[Online]. Available: http://arxiv.org/abs/1712.04851.
[21] A. Tunga, S. V. Nuthalapati, and J. P. Wachs, “Pose-based sign language recognition
using GCN and BERT,” CoRR, vol. abs/2012.00781, 2020. arXiv: 2012.00781. [Online].
Available: https://arxiv.org/abs/2012.00781.
[22] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional
transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018. arXiv:
1810.04805. [Online]. Available: http://arxiv.org/abs/1810.04805.
[23] M. Boháček and M. Hrúz, “Sign pose-based transformer for word-level sign language
recognition,” in Proceedings of the IEEE/CVF Winter Conference on Applications of
Computer Vision (WACV) Workshops, Jan. 2022, pp. 182–191.
[24] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” CoRR, vol. abs/
1706.03762, 2017. arXiv: 1706.03762. [Online]. Available: http://arxiv.org/abs/1706.
03762.
[25] H. Hu, W. Zhao, W. Zhou, and H. Li, “Signbert+: Hand-model-aware self-supervised
pre-training for sign language understanding,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 45, no. 9, pp. 11 221–11 239, Sep. 2023, ISSN: 1939-3539.
DOI: 10.1109/tpami.2023.3269220. [Online]. Available: http://dx.doi.org/10.1109/
TPAMI.2023.3269220.
[26] D. Laines, G. Bejarano, M. Gonzalez-Mendoza, and G. Ochoa-Ruiz, Isolated sign language
recognition based on tree structure skeleton images, 2023. arXiv: 2304 . 05403
[cs.CV].
[27] M. Contributors. “Openmmlab pose estimation toolbox and benchmark.” Accessed: 2024-
06-02. (2020), [Online]. Available: https://github.com/open-mmlab/mmpose.
[28] jin-s13. “Coco-wholebody.” (2020), [Online]. Available: https://github.com/jin- s13/
COCO-WholeBody/ (visited on 06/02/2024).
[29] Z. Liu, H. Zhang, Z. Chen, Z. Wang, and W. Ouyang, “Disentangling and unifying graph
convolutions for skeleton-based action recognition,” CoRR, vol. abs/2003.14111, 2020.
arXiv: 2003.14111. [Online]. Available: https://arxiv.org/abs/2003.14111.
[30] A. G. Howard, M. Zhu, B. Chen, et al., “Mobilenets: Efficient convolutional neural networks
for mobile vision applications,” CoRR, vol. abs/1704.04861, 2017. arXiv: 1704.
04861. [Online]. Available: http://arxiv.org/abs/1704.04861.
[31] S. Jiang, B. Sun, L. Wang, Y. Bai, K. Li, and Y. Fu, “Sign language recognition via
skeleton-aware multi-model ensemble,” CoRR, vol. abs/2110.06161, 2021. arXiv: 2110.
06161. [Online]. Available: https://arxiv.org/abs/2110.06161.
[32] R. Zuo, F. Wei, and B. Mak, Natural language-assisted sign language recognition, 2023.
arXiv: 2303.12080 [cs.CV]. [Online]. Available: https://arxiv.org/abs/2303.12080.
[33] D. Li, X. Yu, C. Xu, L. Petersson, and H. Li, “Transferring cross-domain knowledge for
video sign language recognition,” CoRR, vol. abs/2003.03703, 2020. arXiv: 2003.03703.
[Online]. Available: https://arxiv.org/abs/2003.03703.

指導教授

蘇木春(Mu-Chun Su)

審核日期

2024-8-12

推文