語碼轉換文本資料增強於中英混合語音辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：89

、訪客IP：18.221.11.166

姓名

王騰輝(Wang Tenghui) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

語碼轉換文本資料增強於中英混合語音辨識
(Code-Switching Data Augmentation For Chinese-English Speech Recognition)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

語碼轉換是一種常見的語言表達模式，即對話中出現兩種以上的語言交錯使用。中文和英語作為世界上的主流語言，中英混合的語言表達方式更是日常的對話交流中不可或缺的，然而不同的族群、環境、說話習慣都會影響語碼轉換的組成特性。
目前語音辨識研究中對此類語碼轉換文本的訓練語料依舊稀缺，因此本論文將利用類神經網路訓練出一個生成器生成語碼轉換文本，並作為kaldi toolkit語言模型的資料增強語料以達到改善中英混合辨識率的目的。
本論文使用的方法是基於SEAME語料庫中的中英文本，訓練 BERT-BiLSTM-CRF模型以得知語碼轉換位置，再將純中文文本應該被進行語碼轉化的位置翻譯成英文，藉此產生符合此語料庫特性的句子。

摘要(英)

Code-switching is a common mode of language expression, that is, the interleaved use of more than two languages in a conversation. Chinese and English are the mainstream languages in the world, and Chinese-English mixed language expressions are indispensable in daily dialogue and communication. However, different ethnic groups, environments, and speaking habits will affect the composition of code-switching characteristics.
At present, the training corpus of such code-switched texts in speech recognition research is still scarce. Therefore, this paper will use a neural network to train a generator to generate code-switched texts, and use them as the training data of the kaldi toolkit language model .To achieve the purpose of better accuracy of Chinese-English.speech recognition.
The method used in this paper is based on the Chinese-English Code-switching corpus in the SEAME dataset. We training the BERT-BiLSTM-CRF model to know the Code-switching position, and then translate the Chinese words into English words, thereby generating Sentences that match the characteristics of this corpus.

關鍵字(中)

★ 語碼轉換
★ 語音辨識
★ 基於變換器的雙向編碼器表示技術
★ 自然語言處理

關鍵字(英)

★ Code-Switching
★ Speech Recognition
★ BERT
★ NLP

論文目次

章節目次
摘要 i
Abstract ii
章節目次 iii
圖目次 v
表目次 vi
第一章　緒論 ii
1.1　研究動機 1
1.2　研究方向 2
1.3　章節概要 2
第二章　相關背景知識 3
2.1　語音辨識系統介紹 3
2.1.1　提取特徵向量 3
2.1.2　聲學模型 4
2.1.3　語言模型 5
2.1.4　發音詞典 6
2.1.5　語音辨識解碼流程 6
2.2　深度學習簡介 8
2.2.1　感知機 8
2.2.2　循環神經網路 9
2.2.3　長短期記憶網路 10
2.2.4　雙向長短期記憶網路 11
第三章　語碼轉換文本生成方法 7
3.1　語碼轉換簡介 13
3.2　語料介紹 13
3.3　生成方法簡介 14
3.3.1　主要方法之實驗模型 15
3.3.2　實驗對照方法 16
第四章　實驗 17
4.1　實驗流程與評量方法 17
4.1.1　混淆矩陣 17
4.1.2　詞錯率統計 19
4.2　實驗結果 19
4.2.1　實驗一數據 19
4.2.2　實驗一結果分析 20
4.2.3　實驗二數據 20
4.2.4　實驗二結果分析庫 21
第五章　結論與未來展望 22
第六章　參考文獻 23

參考文獻

第六章　參考文獻
[1] Nilep, C. (2006). “Code switching” in sociocultural linguistics. Colorado research in linguistics.
[2] Lin, B. Y., Xu, F. F., Luo, Z., & Zhu, K. (2017, September). Multi-channel bilstm-crf model for emerging named entity recognition in social media. In Proceedings of the 3rd Workshop on Noisy User-generated Text (pp. 160-165).
[3] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[4] Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.
[5] Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE transactions on speech and audio processing, 3(1), 72-83.
[6] Kuhn, R., & De Mori, R. (1990). A cache-based natural language model for speech recognition. IEEE transactions on pattern analysis and machine intelligence, 12(6), 570-583.
[7] Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005, October). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM (Vol. 1, No. 2005, pp. 191-194).
[8] Brown, P. F., Della Pietra, V. J., Desouza, P. V., Lai, J. C., & Mercer, R. L. (1992). Class-based n-gram models of natural language. Computational linguistics, 18(4), 467-480.
[9] Hori, T., Hori, C., Minami, Y., & Nakamura, A. (2007). Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition. IEEE Transactions on audio, speech, and language processing, 15(4), 1352-1365.
[10] Deng, L., & Yu, D. (2014). Deep learning: methods and applications. Foundations and trends in signal processing, 7(3–4), 197-387.
[11] Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306.
[12] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[13] Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11), 2673-2681.
[14] Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine learning, 37(3), 277-296.
[15] Dai, Z., Wang, X., Ni, P., Li, Y., Li, G., & Bai, X. (2019, October). Named entity recognition using bert bilstm crf for chinese electronic health records. In 2019 12th international congress on image and signal processing, biomedical engineering and informatics (cisp-bmei) (pp. 1-5). IEEE.
[16] Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
[17] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[18] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
[19] Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.
[20] Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., ... & Vesely, K. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society.
[21] Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3(1), 32-35.
[22] Klakow, D., & Peters, J. (2002). Testing the correlation of word error rate and perplexity. Speech Communication, 38(1-2), 19-28.
[23] 張瀞婷. (2019). 以生成對抗網路自動產生中英文語碼轉換文句. 臺灣大學電信工程學研究所學位論文, 1-74.

指導教授

王家慶(Wang Jia-Ching)

審核日期

2022-1-25

推文