基於多層自組織映射圖之手語辨識演算法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：12

、訪客IP：18.191.91.228

姓名

鄭書伃(Shu-Yu Cheng) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於多層自組織映射圖之手語辨識演算法
(A Hierarchical Self-Organizing Maps-based Sign Language Recognition Algorithm)

相關論文

★ 以Q-學習法為基礎之群體智慧演算法及其應用	★ 發展遲緩兒童之復健系統研製
★ 從認知風格角度比較教師評量與同儕互評之差異：從英語寫作到遊戲製作	★ 基於檢驗數值的糖尿病腎病變預測模型
★ 模糊類神經網路為架構之遙測影像分類器設計	★ 複合式群聚演算法
★ 身心障礙者輔具之研製	★ 指紋分類器之研究
★ 背光影像補償及色彩減量之研究	★ 類神經網路於營利事業所得稅選案之應用
★ 一個新的線上學習系統及其於稅務選案上之應用	★ 人眼追蹤系統及其於人機介面之應用
★ 結合群體智慧與自我組織映射圖的資料視覺化研究	★ 追瞳系統之研發於身障者之人機介面應用
★ 以類免疫系統為基礎之線上學習類神經模糊系統及其應用	★ 基因演算法於語音聲紋解攪拌之應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

手語辨識可以讓很多聽語障人士受益，也能縮短聽語障人士與親友之間溝通的橋樑。多年來，深度學習在手語辨識的領域達成了很大的成就。有很多方法可以提取手部骨架或符號的特徵。這些不同的特徵被許多研究用作深度神經網絡（DNN）的輸入，以用來辨識手語。然而，特徵提取的效率和手語辨識的準確度仍有進步的空間。本文中，我們提出了一種新的手語辨識演算法，此演算法先用多層自組織映射圖（SOM）來將動態手語轉成靜態的響應圖 (response map)。由於卷積神經網路（CNN）在圖像分類方面具有非凡的性能，因此，我們就將此靜態的響應圖當成特徵輸入卷積神經網路予以達成手語辨識之目的。

從美國手語詞典視頻資料集 (ASLLVD) 中選出來 36 個單字作為我們的資料集來測試所提之手語辨識演算法之有效性，我們在資料集上達到了 78.57% 的辨識準確率。

摘要(英)

The recognition of sign language can beneﬁt many dumb deaf people and bridge the gap of communication between them and their families and friends. For many years, deep learning has achieved great results in the ﬁeld of sign language recognition. There are lots of methods for extracting features of hand shapes or signs. These diﬀerent features are used as input of deep neural networks (DNN) in many studies for sign language recognition. However, the eﬃciency of feature extraction and the recognition accuracy still have room for improvement. In this study, we proposed a novel algorithm for sign language recognition. The algorithm ﬁrst uses a hierarchical self-organizing map (SOM) to covert dynamic sign language into a static response map. Since the convolutional neural network (CNN) has an extraordinary performance in image classiﬁcation, we take the static response maps as input features to CNN to achieve the purpose of sign language recognition.

We selected 36 signs from the American sign language lexicon video dataset (ASLLVD) as our dataset to test the eﬀectiveness of our proposed algorithm. Finally, We reached a recognition accuracy of 78.57% on the dataset.

關鍵字(中)

★ 手語辨識
★ 自組織映射圖
★ 深度學習

關鍵字(英)

★ Sign language recognition
★ Self-Organizing Maps,
★ Deep learning

論文目次

Contents

Abstract i

Contents v

List of Figures vii

List of Algorithms ix

List of Tables x

1 Introduction 1

1.1 Introudction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Background 3

2.1 Related Works of Sign Language Recognition . . . . . . . . . . 3

2.2 Review of Unsupervised Learning Methods . . . . . . . . . . . 5

2.2.1 K-means Clustering . . . . . . . . . . . . . . . . . . . . 5

2.2.2 Principal Component Analysis . . . . . . . . . . . . . . 6

2.2.3 Singular Value Decomposition . . . . . . . . . . . . . . 7

2.2.4 Independent Component Analysis . . . . . . . . . . . . 8

2.3 Review of Self-Organizing Maps . . . . . . . . . . . . . . . . . 10

2.4 Review of Convolution Neural Networks . . . . . . . . . . . . 13

v 2.4.1 Convolution Layers . . . . . . . . . . . . . . . . . . . . 14

2.4.2 Activation Layers . . . . . . . . . . . . . . . . . . . . . 14

2.4.3 Pooling Layers . . . . . . . . . . . . . . . . . . . . . . 18

2.4.4 Batch Normalization . . . . . . . . . . . . . . . . . . . 19

2.4.5 Fully connected Layers . . . . . . . . . . . . . . . . . . 20

3 The Proposed Algorithm 21

3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.1 Hand detection using MediaPipe . . . . . . . . . . . . 24

3.3 The Flowchart of the Proposed Algorithm . . . . . . . . . . . 26

3.4 Network Conﬁguration . . . . . . . . . . . . . . . . . . . . . . 31

3.4.1 The Architecture of Fast Self-Organizing Maps . . . . . 31

3.4.2 The Architecture of Convolutional Neural Networks . . 34

4 Results and Discussion 36

4.1 Experimental Deﬁnition and Premise . . . . . . . . . . . . . . 36

4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . 42

5 Conclusions and Perspectives 49

參考文獻

[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haﬀner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

[2] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.-L.

Chang, and M. Grundmann, “Mediapipe hands: On-device real-time hand tracking,” arXiv preprint arXiv:2006.10214, 2020.

[3] M.-C. Su and H.-T. Chang, “Fast self-organizing feature map algorithm,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 721733, 2000.

[4] C. Neidle, A. Thangali, and S. Sclaroﬀ, “Challenges in development of the american sign language lexicon video dataset (asllvd) corpus,” in 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, LREC. Citeseer, 2012.

[5] Y. Bengio and P. Frasconi, “An input output hmm architecture,” Advances in neural information processing systems, pp. 427–434, 1995.

[6] T. Starner, J. Weaver, and A. Pentland, “Real-time american sign language recognition using desk and wearable computer based video,”IEEE Transactions on pattern analysis and machine intelligence, vol. 20, no. 12, pp. 1371–1375, 1998.

[7] C. Vogler and D. Metaxas, “Parallel hidden markov models for american sign language recognition,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1. IEEE, 1999, pp. 116–122.

[8] Z. Zafrulla, H. Brashear, T. Starner, H. Hamilton, and P. Presti, “American sign language recognition with the kinect,” in Proceedings of the 13th international conference on multimodal interfaces, 2011, pp. 279–286.

[9] S. Theodorakis, V. Pitsikalis, and P. Maragos, “Dynamic–static unsupervised sequentiality, statistical subunits and lexicon for sign language recognition,” Image and Vision Computing, vol. 32, no. 8, pp. 533–549, 2014.

[10] T.-W. Chong and B.-G. Lee, “American sign language recognition using leap motion controller with machine learning approach,” Sensors, vol. 18, no. 10, p. 3554, 2018.

[11] C. K. Lee, K. K. Ng, C.-H. Chen, H. C. Lau, S. Chung, and T. Tsoi, “American sign language recognition and training method with recurrent neural network,” Expert Systems with Applications, vol. 167, p. 114403, 2021.

[12] N. Kasukurthi, B. Rokad, S. Bidani, D. Dennisan et al., “American sign language alphabet recognition using deep learning,” arXiv preprint arXiv:1905.05487, 2019.

[13] K. Bantupalli and Y. Xie, “American sign language recognition using deep learning and computer vision,” in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 4896–4899.

[14] C. C. de Amorim, D. Macˆedo, and C. Zanchettin, “Spatial-temporal graph convolutional networks for sign language recognition,” in International Conference on Artiﬁcial Neural Networks. Springer, 2019, pp. 646–657.

[15] T.-W. Chong and B.-J. Kim, “American sign language recognition system using wearable sensors with deep learning approach,” The Journal of the Korea institute of electronic communication sciences, vol. 15, no. 2, pp. 291–298, 2020.

[16] S. Wold, K. Esbensen, and P. Geladi, “Principal component analysis,” Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, pp. 37–52, 1987.

[17] G. H. Golub and C. Reinsch, “Singular value decomposition and least squares solutions,” in Linear algebra. Springer, 1971, pp. 134–151.

[18] P. Comon, “Independent component analysis, a new concept?” Signal processing, vol. 36, no. 3, pp. 287–314, 1994.

[19] T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.

[20] A. L. Maas, A. Y. Hannun, A. Y. Ng et al., “Rectiﬁer nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, no. 1. Citeseer, 2013, p. 3.

[21] S. Ioﬀe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning. PMLR, 2015, pp. 448–456.

[22] J. D. Schein and M. T. Delk Jr, “The deaf population of the united states.” 1974.

[23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overﬁtting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.

指導教授

蘇木春(Mu-Chun Su)

審核日期

2021-8-16

推文