基於深度學習的語音人名辨識系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：37

、訪客IP：3.17.175.239

姓名

鄭雅馨(Ya-Hsin Cheng) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

基於深度學習的語音人名辨識系統
(Deep Learning Based Speech Personal Name Recognition System)

相關論文

★ 整合GRAFCET虛擬機器的智慧型控制器開發平台	★ 分散式工業電子看板網路系統設計與實作
★ 設計與實作一個基於雙攝影機視覺系統的雙點觸控螢幕	★ 智慧型機器人的嵌入式計算平台
★ 一個即時移動物偵測與追蹤的嵌入式系統	★ 一個固態硬碟的多處理器架構與分散式控制演算法
★ 基於立體視覺手勢辨識的人機互動系統	★ 整合仿生智慧行為控制的機器人系統晶片設計
★ 嵌入式無線影像感測網路的設計與實作	★ 以雙核心處理器為基礎之車牌辨識系統
★ 基於立體視覺的連續三維手勢辨識	★ 微型、超低功耗無線感測網路控制器設計與硬體實作
★ 串流影像之即時人臉偵測、追蹤與辨識─嵌入式系統設計	★ 一個快速立體視覺系統的嵌入式硬體設計
★ 即時連續影像接合系統設計與實作	★ 基於雙核心平台的嵌入式步態辨識系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2027-6-25以後開放)

摘要(中)

公司總機人員經常需要轉接客戶來電給公司同仁，不僅耗時而且容易失誤。本研究透過命名實體識別技術，自動擷取語音中的人名，透過雙數組Trie樹與AC自動機算法的技術，融合編輯距離的方法，進而找出公司同仁的人名。我們以精確率、召回率及F1 Score為評估方法，對不同類型語料的不同語音來源，進行語音人名辨識評估。最後我們設計了一個語音人名辨識系統，模擬一間小公司的電話轉接功能，以此驗證辨識性能。實驗結果顯示，可辨認出的在職員工的準確率為90.2%，而辨識出無此員工的準確率為88.32%，而整體的準確率達到89.73%。本研究成果可應用於公司總機的自動轉機。

關鍵詞：語音人名辨識；自動語音辨識、命名實體識別、雙數組Trie樹、AC自動機算法

摘要(英)

The company′s switchboard often needs to transfer customer calls to the company′s colleagues, which is not only time-consuming but also prone to errors. In this study, named entity recognition technology is used to automatically capture the names of people in speech, and the Double-Array Trie and Aho–Corasick algorithm are combined with the edit distance method to find out the names of colleagues in the company. We use precision, recall and F1 Score as evaluation methods to evaluate speech name recognition for different speech sources of different types of corpus. Finally, we designed a speech personal name recognition system to simulate the phone transfer function of a small company to verify the recognition performance. The experimental results show that the accuracy of identifying active employees is 90.2%, and the accuracy of identifying ex-employees or un-hired employees is 88.32%, and the overall accuracy is 89.73%. The research results can be applied to the automatic transfer of the company′s switchboard.

Keywords: speech personal name recognition; automatic speech recognition, named entity recognition, Double-Array Trie, Aho–Corasick algorithm.

關鍵字(中)

★ 語音人名辨識
★ 自動語音辨識
★ 命名實體識別
★ 雙數組Trie樹
★ AC自動機算法

關鍵字(英)

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vii
第一章、緒論 1
1.1 研究背景 1
1.2 研究目的 1
1.3 論文架構 3
第二章、文獻回顧 4
2.1中文分詞與識別技術 4
2.1.1 隱藏式馬可夫模型(Hidden Markov Model，HMM) 4
2.1.2 條件隨機場(Conditional Random Fields，CRFs) 5
2.1.3 HMM與CRF之不同 6
2.1.4 詞性標註(Part-of-Speech tagging，POS tagging) 6
2.1.5 命名實體識別(Named Entity Recognition，NER) 8
2.2 NLP的深度模型技術 9
2.2.1 遞迴神經網路(Recurrent Neural Network，RNN) 10
2.2.2 長短期記憶(Long Short-Term Memory，LSTM) 11
2.2.3 門閘遞迴單元(Gated Recurrent Unit，GRU) 12
2.2.4 非等長結構(Sequence-to-sequence，seq2seq) 13
2.2.5 注意力機制(Attention Mechanism) 14
第三章、語音人名辨識系統設計 17
3.1 MIAT方法論 17
3.1.1 IDEF0階層式模組化設計 17
3.1.2 GRAFCET離散事件建模 19
3.2 語音人名辨識系統架構 21
3.3 語音辨識 22
3.4 人名辨識 25
3.4.1 雙向GRU(Bidirectional Gate Recurrent Unit，Bi-GRU) 26
3.4.2 CRF分詞與序列標註 27
3.5 人名比對 29
3.5.1 雙數組Trie樹與AC自動機算法 30
3.5.2 萊文斯坦距離(Levenshtein Distance) 32
第四章、實驗 37
4.1 語音辨識實驗 37
4.1.1 實驗評估方法 40
4.1.2 語音辨識實驗結果 41
4.2 人名辨識實驗 43
4.2.1 實驗評估方法 45
4.2.2 人名辨識實驗結果 45
4.3 注音轉換分析 46
4.3.1 實驗評估方法 48
4.3.2 注音轉換實驗結果 48
4.4 MIAT語音人名辨識 49
4.4.1 實驗評估方法 50
4.4.2 MIAT語音人名辨識結果 51
4.5 討論 52
4.5.1 分析無法辨識出人名 52
4.5.2 分析高於或低於標準分數 53
第五章、結論與未來展望 57
5.1 結論 57
5.2 未來展望 58
參考文獻 59

參考文獻

[1] B. Babych and A. Hartley, "Improving machine translation quality with automatic named entity recognition," in Proceedings of the 7th International EAMT workshop on MT and other language technology tools, Improving MT through other language technology tools, Resource and tools for building MT at EACL 2003, 2003.
[2] T. Mandl and C. Womser-Hacker, "The effect of named entities on effectiveness in cross-language information retrieval evaluation," in Proceedings of the 2005 ACM symposium on Applied computing, 2005, pp. 1059-1064.
[3] 郑洪浩, 郝一诺, and 于洪涛, "基于 XLnet 嵌入的中文命名实体识别方法," 信息工程大学学报, vol. 22, no. 4, 2021.
[4] 王泉根, "中国民间的字辈谱," 民俗研究, no. 4, 1993, pp. 26-31.
[5] H.-H. Hu, J.-i. Li, T.-Y. Hu, and R.-R. Hsu, "電話轉接對話模式與表達轉接要求句型的分析 (Analyses of Dialogue Patterns and Sentence Patterns of Telephone Switching Requests)[In Chinese]," in Proceedings of Rocling III Computational Linguistics Conference III, 1990, pp. 273-294.
[6] 顾明亮 and 沈兆勇, "基于语音配列的汉语方言自动辨识," 中文信息学报, vol. 20, no. 5, 2006, pp. 79-84.
[7] T.-L. Pao, Y.-T. Chen, J.-H. Yeh, and P.-J. Li, "Mandarin emotional speech recognition based on SVM and NN," in 18th International Conference on Pattern Recognition (ICPR′06), 2006, vol. 1, pp. 1096-1100.
[8] J. Li, L. Deng, Y. Gong, and R. Haeb-Umbach, "An overview of noise-robust automatic speech recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 4, 2014, pp. 745-777.
[9] C. Donahue, B. Li, and R. Prabhavalkar, "Exploring speech enhancement with generative adversarial networks for robust speech recognition," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 5024-5028.
[10] M. Cooke, P. Green, L. Josifovski, and A. Vizinho, "Robust automatic speech recognition with missing and unreliable acoustic data," Speech communication, vol. 34, no. 3, 2001, pp. 267-285.
[11] E. Fosler-Lussier and N. Morgan, "Effects of speaking rate and word frequency on conversational pronunciations," in Modeling Pronunciation Variation for Automatic Speech Recognition, 1998.
[12] W. Ghai and N. Singh, "Literature review on automatic speech recognition," International Journal of Computer Applications, vol. 41, no. 8, 2012.
[13] Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, and S. Bengio, "Tacotron: Towards end-to-end speech synthesis," arXiv preprint arXiv:1703.10135, 2017.
[14] X.-X. Chen, C.-N. Cai, P. Guo, and Y. Sun, "A hidden Markov model applied to Chinese four-tone recognition," in ICASSP′87. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1987, vol. 12, pp. 797-800.
[15] H.-P. Zhang, Q. Liu, X. Cheng, H. Zhang, and H.-K. Yu, "Chinese lexical analysis using hierarchical hidden markov model," in Proceedings of the second SIGHAN workshop on Chinese language processing, 2003, pp. 63-70.
[16] T. Fischer and C. Krauss, "Deep learning with long short-term memory networks for financial market predictions," European Journal of Operational Research, vol. 270, no. 2, 2018, pp. 654-669.
[17] D. L. Minh, A. Sadeghi-Niaraki, H. D. Huy, K. Min, and H. Moon, "Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network," IEEE Access, vol. 6, 2018, pp. 55392-55404.
[18] G. Tang, R. Sennrich, and J. Nivre, "An analysis of attention mechanisms: The case of word sense disambiguation in neural machine translation," arXiv preprint arXiv:1810.07595, 2018.
[19] Y. Zhao, H. Li, S. Yin, and Y. Sun, "A New Chinese Word Segmentation Method Based on Maximum Matching," J. Inf. Hiding Multim. Signal Process., vol. 9, no. 6, 2018, pp. 1528-1535.
[20] I. Ullah, R. Ahmad, and D. Kim, "A prediction mechanism of energy consumption in residential buildings using hidden markov model," Energies, vol. 11, no. 2, 2018, p. 358.
[21] S. Rashmi, M. Hanumanthappa, and M. V. Reddy, "Hidden Markov Model for speech recognition system—a pilot study and a naive approach for speech-to-text model," in Speech and Language Processing for Human-Machine Communications: Springer, 2018, pp. 77-90.
[22] R. Kumar, L. S. Videla, S. SivaKumar, A. G. Gupta, and D. Haritha, "Murmured speech recognition using hidden markov model," in 2020 7th International Conference on Smart Structures and Systems (ICSSS), 2020, pp. 1-5.
[23] H. Z. Muhammad, M. Nasrun, C. Setianingsih, and M. A. Murti, "Speech recognition for English to Indonesian translator using hidden Markov model," in 2018 International Conference on Signals and Systems (ICSigSys), 2018, pp. 255-260.
[24] S. Bhatt, A. Dev, and A. Jain, "Hindi Speech Vowel Recognition Using Hidden Markov Model," in SLTU, 2018, pp. 201-204.
[25] Z.-R. Wang, J. Du, W.-C. Wang, J.-F. Zhai, and J.-S. Hu, "A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition," International Journal on Document Analysis and Recognition (IJDAR), vol. 21, no. 4, 2018, pp. 241-251.
[26] M. Rahul, N. Kohli, R. Agarwal, and S. Mishra, "Facial expression recognition using geometric features and modified hidden Markov model," International Journal of Grid and Utility Computing, vol. 10, no. 5, 2019, pp. 488-496.
[27] J. Lafferty, A. McCallum, and F. C. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," 2001.
[28] 梁喜涛 and 顾磊, "中文分词与词性标注研究," 计算机技术与发展, vol. 25, no. 2, 2015, pp. 175-180.
[29] J. Li, A. Sun, J. Han, and C. Li, "A survey on deep learning for named entity recognition," IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 1, 2020, pp. 50-70.
[30] L. Weston, V. Tshitoyan, J. Dagdelen, O. Kononova, A. Trewartha, K. A. Persson, G. Ceder, and A. Jain, "Named entity recognition and normalization applied to large-scale information extraction from the materials science literature," Journal of chemical information and modeling, vol. 59, no. 9, 2019, pp. 3692-3702.
[31] V. Yadav and S. Bethard, "A survey on recent advances in named entity recognition from deep learning models," arXiv preprint arXiv:1910.11470, 2019.
[32] Y. Li, P. Shetty, L. Liu, C. Zhang, and L. Song, "BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition," arXiv preprint arXiv:2105.12848, 2021.
[33] A. Setiyoaji, L. Muflikhah, and M. A. Fauzi, "Named entity recognition menggunakan hidden markov model dan algoritma viterbi pada teks tanaman obat," Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer e-ISSN, vol. 2548, 2017, p. 964X.
[34] M. Habibi, L. Weber, M. Neves, D. L. Wiegandt, and U. Leser, "Deep learning with word embeddings improves biomedical named entity recognition," Bioinformatics, vol. 33, no. 14, 2017, pp. i37-i48.
[35] S. Song, N. Zhang, and H. Huang, "Named entity recognition based on conditional random fields," Cluster Computing, vol. 22, no. 3, 2019, pp. 5195-5206.
[36] M. Lukoševičius and H. Jaeger, "Reservoir computing approaches to recurrent neural network training," Computer Science Review, vol. 3, no. 3, 2009, pp. 127-149.
[37] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, 1997, pp. 1735-1780.
[38] W. Zaremba, I. Sutskever, and O. Vinyals, "Recurrent neural network regularization," arXiv preprint arXiv:1409.2329, 2014.
[39] J. Hammerton, "Named entity recognition with long short-term memory," in Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, 2003, pp. 172-175.
[40] K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, "On the properties of neural machine translation: Encoder-decoder approaches," arXiv preprint arXiv:1409.1259, 2014.
[41] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, "Convolutional sequence to sequence learning," in International Conference on Machine Learning, 2017, pp. 1243-1252.
[42] K. Lim, W.-D. Jang, and C.-S. Kim, "Background subtraction using encoder-decoder structured convolutional neural network," in 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS), 2017, pp. 1-6.
[43] V. Vukotić, S.-L. Pintea, C. Raymond, G. Gravier, and J. C. v. Gemert, "One-step time-dependent future video frame prediction with a convolutional encoder-decoder neural network," in International conference on image analysis and processing, 2017, pp. 140-151.
[44] S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, and P. H. Torr, "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 6881-6890.
[45] R. J. Weiss, J. Chorowski, N. Jaitly, Y. Wu, and Z. Chen, "Sequence-to-sequence models can directly translate foreign speech," arXiv preprint arXiv:1703.08581, 2017.
[46] Y. Keneshloo, T. Shi, N. Ramakrishnan, and C. K. Reddy, "Deep reinforcement learning for sequence-to-sequence models," IEEE transactions on neural networks and learning systems, vol. 31, no. 7, 2019, pp. 2469-2489.
[47] Z. Niu, G. Zhong, and H. Yu, "A review on the attention mechanism of deep learning," Neurocomputing, vol. 452, 2021, pp. 48-62.
[48] J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, "Attention-based models for speech recognition," Advances in neural information processing systems, vol. 28, 2015.
[49] L. Zhou, Z. Sun, X. Wu, and J. Wu, "End-to-end Optimized Image Compression with Attention Mechanism," in CVPR workshops, 2019, p. 0.
[50] Y. Tay, D. Bahri, D. Metzler, D.-C. Juan, Z. Zhao, and C. Zheng, "Synthesizer: Rethinking self-attention for transformer models," in International Conference on Machine Learning, 2021, pp. 10183-10192.
[51] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, "Self-attention generative adversarial networks," in International conference on machine learning, 2019, pp. 7354-7363.
[52] J. Lee, I. Lee, and J. Kang, "Self-attention graph pooling," in International conference on machine learning, 2019, pp. 3734-3743.
[53] C.-H. Chen, M.-Y. Lin, and X.-C. Guo, "High-level modeling and synthesis of smart sensor networks for Industrial Internet of Things," Computers & Electrical Engineering, vol. 61, 2017, pp. 48-66.
[54] Q. Li and Y.-L. Chen, "IDEF0 function modeling," in Modeling and Analysis of Enterprise and Information Systems: Springer, 2009, pp. 98-122.
[55] R. David, "Grafcet: A powerful tool for specification of logic controllers," IEEE transactions on control systems technology, vol. 3, no. 3, 1995, pp. 253-268.
[56] M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE transactions on Signal Processing, vol. 45, no. 11, 1997, pp. 2673-2681.
[57] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
[58] K. Torisawa, "Exploiting Wikipedia as external knowledge for named entity recognition," in Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), 2007, pp. 698-707.
[59] G. D. Forney, "The viterbi algorithm," Proceedings of the IEEE, vol. 61, no. 3, 1973, pp. 268-278.
[60] J. I. Aoe, K. Morimoto, and T. Sato, "An efficient implementation of trie structures," Software: Practice and Experience, vol. 22, no. 9, 1992, pp. 695-721.
[61] A. V. Aho and M. J. Corasick, "Efficient string matching: an aid to bibliographic search," Communications of the ACM, vol. 18, no. 6, 1975, pp. 333-340.
[62] L. Yujian and L. Bo, "A normalized Levenshtein distance metric," IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 6, 2007, pp. 1091-1095.
[63] R. Errattahi, A. El Hannani, and H. Ouahmane, "Automatic speech recognition errors detection and correction: A review," Procedia Computer Science, vol. 128, 2018, pp. 32-37.
[64] A. C. Morris, V. Maier, and P. Green, "From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition," in Eighth International Conference on Spoken Language Processing, 2004.
[65] Taiwan AILabs, 雅婷語音逐字稿, https://asr.yating.tw/
[66] 音易Taption, AI快速上字幕平台, https://www.taption.com/
[67] cSubtitle, https://cSubtitle.com
[68] Hankcs, HanLP: Han Language Processing , https://github.com/hankcs/HanLP
[69] 哈爾濱工業大學社會計算與資訊檢索研究中心(HIT-SCIR), LTP, https://github.com/HIT-SCIR/ltp
[70] Richy Li, 中文姓名產生器 v3.4, http://www.richyli.com/name/index.asp

指導教授

陳慶瀚

審核日期

2022-7-4

推文