博碩士論文 105521050 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:8 、訪客IP:3.226.245.48
姓名 郝平正(Ping-Cheng Hao)  查詢紙本館藏   畢業系所 電機工程學系
論文名稱 離線自定義語音語者喚醒詞系統與嵌入式開發實現
(Self-defined Wake-Up-Word Recognition and its Embedded System Implementation)
相關論文
★ 即時的SIFT特徵點擷取之低記憶體硬體設計★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 晶圓圖缺陷分類與嵌入式系統實現★ 補償無乘法數位濾波器有限精準度之演算法設計技巧
★ 可規劃式維特比解碼器之設計與實現★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計
★ JPEG2000靜態影像編碼系統之分析與架構設計★ 適用於通訊系統之低功率渦輪碼解碼器
★ 應用於多媒體通訊之平台式設計★ 適用MPEG 編碼器之數位浮水印系統設計與實現
★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量★ 一個低功率的MPEG Layer III 解碼器架構設計
★ 具有高品質反量化演算的AAC解碼器 之平台式設計★ 適用於第三代行動通訊之最大事後機率演算法發展及渦輪碼解碼器超大型積體電路設計
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 本論文提出離線自定義語音語者喚醒詞系統,是一種能夠讓使用者自行定義語音喚醒詞並以此用來喚醒設備。系統執行時分為兩個階段:訓練階段及測試比對階段。訓練階段為自行定義並錄製一段任何語言的喚醒詞,利用語音活動檢測裁切出語音片段,然後以梅爾倒頻譜算法做為語音前處理,抽取出聲音特徵以供後續使用,再利用高斯混合模型EM算法將語音特徵訓練成聲紋模型,同時利用高斯分布隱藏馬可夫模型的Baum-Welch算法訓練對應之語音序列,兩者合起來即是特定語者語音的資料模型。
比對階段為輸入任意語音段,同樣使用梅爾倒頻譜算法抽取聲音特徵,將此特徵透過高斯混合模型的log probability從資料集中找出正確語者,而後利用隱藏馬可夫模型Viterbi算法計算出未知語音的序列,最後計算出高斯混合模型的相似程度以及編輯距離算法比對未知語音與資料語音的狀態序列匹配度,若通過門檻值即成功喚醒。
此系統可以在少量訓練資料的情況下達到準確的結果,並且於比對階段時透過先搜索聲紋再比對語音的方法省去隱藏馬可夫模型算法對整個資料集採用窮舉法運算的時間,最後將此系統實現在嵌入式開發板中評估驗證效能,結果顯示本系統能在real time運作情況下達到高準確率與低誤喚醒率。
摘要(英) We propose an Self-define Wake-Up-Word Recognition system and its embedded system Implementation. To execute whole system, It is divided into two phases: training phase and testing-comparison phase. In the training phase, a wake-up word of any language is recorded, and the voice segment is cut out by using the Voice Activity Detection, and then we use the Mel-Frequency Cepstral Coefficients as the pre-processing to extract the speech feature for follow-up use. The Expectation-Maximization Algorithm is used to train the Gaussian Mixture Model, and the Baum-Welch algorithm is used to train the Hidden Markov Model. These two models are combined to be a data model of a speaker′s speech dataset.
In the testing-comparison phase, an unknown voice segment is inputted. The Voice Activity Detection and Mel-Frequency Cepstral Coefficients are still used for cutting and extracting. Next, this feature will be calculated through the log likelihood of the Gaussian Mixture Model to find the correspond speaker, and the Viterbi algorithm is used to calculate the state sequence of the unknown speech through Hidden Markov Model. Finally we calculate Gaussian Mixture Model similarity and use Levenshtein Distance to compare dataset state sequence with the unknown speech state sequence. If both of them pass the threshold, then it is a successful wake-up voice control, if not, it means waking up fails.
This system can work well with a small amount of training data, and the system is implemented on the embedded board to test performance. The results show that the system can achieve high accuracy and low false alarm under real time operation.
關鍵字(中) ★ 自定義喚醒詞
★ 梅爾倒譜係數
★ 高斯混合模型
★ 隱藏馬可夫模型
★ 編輯距離
★ 嵌入式系統
關鍵字(英) ★ Customized Wake-Up-Word
★ Mel-Frequency Cepstral Coefficients
★ Gaussian Mixture Model
★ Hidden Markov Model
★ Levenshtein Distance
★ Embedded System
論文目次 致謝 i
摘要 v
Abstract vi
Table of contents vii
List of Figures ix
List of Tables xi
Chapter I Introduction 1
1.1 Background Introduction 1
1.2 Motivation 3
1.3 Thesis Organization 4
Chapter II Basic Knowledge 5
2.1 Voice Activity Detection 6
2.2 Mel-Frequency Cepstral Coefficients 8
2.3 Gaussian Mixture Model 12
2.4 Hidden Markov Model 17
2.5 Levenshtein Distance 21
Chapter III Related Work 24
Chapter IV System Architecture 29
4.1 Overall Flow Chart 30
4.2 Training Step 34
4.3 Testing Step 38
Chapter V Experiment Result 39
5.1 Embedded Board Information 39
5.2 Result 41
Chapter VI Conclusion 45
Reference 46
參考文獻 [1] Jongseo Sohn, Nam Soo Kim and Wonyong Sung, "A statistical model-based voice activity detection," in IEEE Signal Processing Letters, vol. 6, no. 1, pp. 1-3, Jan. 1999.
[2] Steven Bo Davis and Paul Mermelstein, "COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES", IEEE Transactions on Acoustics, Speech and Signal Processing, Status Report on Speech Research SR-61 (1980)
[3] CRV "2010 A short tutorial on Gaussian Mixture Models". by: Mohand Saïd Allili Université du Québec en Outaouais
[4] Jeff A. Bilmes, "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models", U.C. Berkeley, INTERNATIONAL COMPUTER SCIENCE INSTITUTE, April 1998
[5] Christopher D. Manning, Hinrich Schűtze, Foundations of Statistical Natural Language Processing, Fourth printing, The MIT Press, 2001.
[6] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications inSpeech Recognition”, Proc. of IEEE, 77(2), pp. 257-285, February 1989.
[7] https://mi.eng.cam.ac.uk/~mjfg/mjfg_NOW.pdf
[8] https://kaldi-asr.org/doc/model.html
[9] A. Weigel and F. Fein, "Normalizing the weighted edit distance," Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5), Jerusalem, Israel, 1994, pp. 399-402 vol.2.
[10] https://en.wikipedia.org/wiki/Levenshtein_distance
[11] https://www.cnblogs.com/xiongzaiqiren/p/4997947.html
[12] K. Kaur and N. Jain, "Performance analysis of text-dependent speaker recognition system based on template model based classifiers," 2015 International Conference on Signal Processing, Computing and Control (ISPCC), Waknaghat, 2015, pp. 36-39.
[13] D. A. Reynolds and R. C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," in IEEE Transactions on Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, Jan. 1995.
[14] S. Laxman and P. S. Sastry, "Text-dependent speaker recognition using speaker specific compensation," TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region, Bangalore, India, 2003, pp. 384-387 Vol.1.
[15] D. Burton, "Text-dependent speaker verification using vector quantization source coding," in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, no. 2, pp. 133-143, February 1987.
[16] D. D. T. Thu, L. T. Van, Q. N. Hong and H. P. Ngoc, "Text-dependent speaker recognition for vietnamese," 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), Hanoi, 2013, pp. 196-200.
[17] Sayana P Babu, Jayadas C K, 2015, GMM Based Speaker Verification System, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 04, Issue 04 (April 2015)
[18] Mahboob, Tahira & Khanam, Memoona & Khiyal, Malik & Bibi, Ruqia. (2015). Speaker Identification Using GMM with MFCC. International Journal of Computer Science Issues. 12. 126-135.
[19] Dave, Namrata. (2013). Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal For Advance Research in Engineering And Technology(ISSN 2320-6802). Volume 1.
[20] Chen Zhongbao, Yu Zhenli and Zhang Lihe, "Automatic speaker verification using the neural network and combined LPC parameters," Proceedings of TENCON ′93. IEEE Region 10 International Conference on Computers, Communications and Automation, Beijing, China, 1993, pp. 345-347 vol.3.
[21] K. Gopalan and S. S. Mahil, "Speaker identification using singular value decomposition of LPC spectral magnitudes," [1992] Proceedings of the 35th Midwest Symposium on Circuits and Systems, Washington, DC, USA, 1992, pp. 960-963 vol.2.
[22] M. Chougala and S. Kuntoji, "Novel text independent speaker recognition using LPC based formants," 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, 2016, pp. 510-513.
[23] Sharma, Rajib & Bhukya, Ramesh & Prasanna, S.R.M.. (2017). Analysis of the Hilbert Spectrum for Text-Dependent Speaker Verification. Speech Communication. 96. 10.1016/j.specom.2017.12.001.
[24] Islam, Md & Galib, An Nazmus Sakib. (2019). Bangla Dataset and MMFCC in Text-dependent Speaker Identification. 10.14456/easr.2019.7.
[25] Das, Rohan & Prasanna, S.. (2017). Speaker Verification from Short Utterance Perspective: A Review. IETE Technical Review. 1-19.
[26] Kong Aik Lee, Bin Ma, and Haizhou Li, "Text-dependent speaker verification: Classifiers, databases and RSR2015", Speech Communication Volume 60, May 2014, Pages 56-77
[27] I. Shahin and N. Botros, "Text-dependent speaker identification using hidden Markov model with stress compensation technique," Proceedings IEEE Southeastcon ′98 ′Engineering for a New Era′, Orlando, FL, USA, 1998, pp. 61-64.
[28] G. Kaur, N. Kumar, R. Khanna and A. Kumar, "Implementation of text dependent speaker verification on MATLAB," 2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS), Chandigarh, 2015, pp. 1-4.
[29] Jeff A. Bilmes, "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models", U.C. Berkeley, INTERNATIONAL COMPUTER SCIENCE INSTITUTE, April 1998
[30] https://www.raspberrypi.com.tw/10684/55/
[31] https://www.bayometric.com/false-acceptance-rate-far-false-recognition-rate-frr/
指導教授 蔡宗漢(Tsung-Han Tsai) 審核日期 2019-11-5
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明