以作者查詢圖書館館藏 、以作者查詢臺灣博碩士 、以作者查詢全國書目 、勘誤回報 、線上人數:29 、訪客IP:3.135.190.107
姓名 凌欣暉(xing-hung lan) 查詢紙本館藏 畢業系所 電機工程學系 論文名稱 強健性語音辨識及語者確認之研究
(A Study of Robust Speech Recognition and Speaker Verification)相關論文 檔案 [Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載]
- 本電子論文使用權限為同意立即開放。
- 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
- 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
摘要(中) 本論文可分為三個部分:關鍵詞萃取、特徵參數統計值正規化法及語者確認。在關鍵詞萃取方面, 採用次音節中的右相關音素模型串連來產生關鍵詞與無關詞模組
語音辨識系統經常因環境不匹配的影響而使辨識率大幅的下降,特徵參數統計值正規化技術有低複雜度及運算快速的優點,本論文以ARUORA 2語料庫來評估效能,統計圖等化法結合ARMA低通濾波器可將統計圖等化法之辨識率由84.93%提升至86.37%,而使用統計圖等化法結合調適性ARMA濾波器則可提升至86.91%。
語者確認系統是利用參數核函數結合高斯混合模型及支撐向量機模型,藉以提升系統效能。使用各語者的高斯混合模型參數建立超級向量,以雜訊屬性補償(NAP)修正超級向量,在訓練階段中,需將超級向量做正規化,之後利用正規化後的超級向量訓練SVM模型。而在仿冒者的選取上,則是選取與目標語者特徵最相似的前n名仿冒語音,使得訓練出來的SVM 模型更有鑑別力。而測試時以測試分數正規化技術調整距離值。從NIST 2001語料庫實驗結果顯示,64mixture的參數核函數(NAP)結合測試分數正規化之確認系統可達最好的相等錯誤率及決策成本函數分別為4.17%及0.0491。
摘要(英) This thesis consists of three main parts:Keyword Spotting、Cepstral Feature normalization and speaker verification.In the Keyword Spotting, the use of sub-syllable models to establish the keyword and filler module.
Environment mismatch is the major source of performance degradation in speech recognition. Cepstral Feature normalization Technique has been popularly used as a powerful approach to produce robust features. A common advantage of these methods is its low computation complexity. The experimental results on Aurora 2 database had shown that the Histogram Equalization and ARMA filter front-end achieved 86.37%, and Histogram Equalization and Adaptive ARMA filter front-end achieved achieved 86.91% digit recognition rates.
The speaker verification combines the Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) with Kernel Function.From the UBM, we can use map to get the parameters of the GMM. We used the new features to establish target supervector and imposter supervector,then we do the NAP process to modify supervector. In the train stage, we used the target supervector and imposter supervector to train SVM model. About the imposters selection, we choose the top n speaker’s whose characteristics are similar to the target which can let the model become more discriminative.In the testing stage, we used the test normalization to adjust the distance.From the experiment on NIST 2001 SRE, we can find 64mixture parametric kernel combined with result in better EER and DCF which are 4.17% and 0.0491 respectively.
關鍵字(中) ★ 語音辨識
★ 語者確認
★ 支撐向量機
★ 強健特徵參數
★ 關鍵詞萃取關鍵字(英) ★ Keyword Spotting
★ Speech Recognition
★ speaker verification
★ Support vector machine
★ robust features論文目次 摘要.........................................i
英文摘要.....................................ii
致謝........................................iii
目錄.........................................v
附圖目錄.....................................xi
附表目錄.....................................xxi
第一章 緒論...................................1
1.1 研究動機..................................1
1.2 語音強健技術概述..........................2
1.3 語者辨識概述..............................5
1.4 研究方向..................................7
1.5 章節概要..................................8
第二章 語音辨識之基本技術....................11
2.2 統計式語音辨識...........................12
2.3 梅爾倒頻譜係數...........................13
2.4 隱藏式馬可夫模型.........................24
2.4.1 聲學模型參數估測.......................27
2.5 維特比演算法.............................29
第三章 關鍵詞萃取............................33
3.1 關鍵詞萃取系統基本架構...................33
3.2 右相關次音節模型.........................34
3.3 訓練流程.................................38
3.4 關鍵詞萃取...............................40
3.4.1 關鍵詞模組.............................41
3.4.2 無關詞模型.............................41
3.4.3 關鍵詞萃取架構.........................42
3.5 一階動態規劃演算法.......................43
3.6 關鍵詞辨識流程...........................46
第四章 特徵參數統計值正規化法................49
4.1 雜訊與語音訊號 ...........................49
4.2 特徵參數統計值正規化.....................51
4.2.1 倒頻譜平均消去法.......................52
4.2.2 倒頻譜正規化法.........................53
4.2.3 倒頻譜增益正規化法.....................55
4.2.4 統計圖等化法...........................55
4.3 時間序列濾波器...........................58
第五章 語者確認..............................63
5.1 高斯混合模型.............................63
5.1.1 高斯混合模型訓練流程...................66
5.1.2 向量量化...............................67
5.1.3 期望值最大化演算法.....................71
5.1.4 貝氏調適法.............................73
5.1.5 GMM/UBM語者確認系統....................77
5.2 支撐向量機...............................79
5.2.1 超平面.................................79
5.2.2 線性支撐向量機分類器...................81
5.2.3 非線性支撐向量機分類器.................88
5.2.4核函數..................................91
5.3 高斯混合模型結合支撐向量機語者確認系統...92
5.3.1 類重疊.................................92
5.3.2廣義超級向量............................93
5.3.3 KL散度線性核函數......................96
5.3.4 參數核函數.............................97
5.3.5 擾動屬性投影...........................99
5.3.6 參數核函數結合擾動屬性投影............101
5.3.7 GMM/SVM語者模型訓練...................102
5.3.8 GMM/SVM語者確認系統...................104
5.3.9 仿冒資料選取 ..........................104
5.4 分數正規化..............................107
第六章 實驗與結果...........................111
6.1 語音辨識實驗............................111
6.1.1 實驗語料..............................111
6.1.2 特徵參數萃取.................. .......114
6.1.3 效能評估..............................114
6.1.4 實驗一:聲韻母模型....................117
6.1.5 實驗二:增加高斯混合數................120
6.1.6實驗三:強健性特徵值對辨識率的影響.....122
6.1.7 實驗四:關鍵詞萃取實驗一..............129
6.2 強健性語音特徵參數實驗..................133
6.2.1 AURORA 2 語料庫......................133
6.2.2 聲學模型與特徵參數....................136
6.2.3 辨識效能評估.................. .......137
6.2.4 實驗一:基礎實驗......................139
6.2.5 實驗二:倒頻譜平均值消去法與倒頻譜正規化法..........................................144
6.2.6實驗三:倒頻譜增益正規化法.............155
6.2.7實驗四:統計圖等化法...................161
6.2.8實驗五:改良式倒頻譜正規化法...........167
6.2.9實驗六:倒頻譜增益正規化法結合時間序列濾波器..........................................174
6.2.10實驗七:統計圖等化法結合時間序列濾波器..........................................181
6.2.11實驗八:倒頻譜正規化法結合調適性自動迴歸移動平均濾波器..........................................188
6.2.12實驗九:倒頻譜增益正規化法結合調適性自動迴歸移動平均濾波器......................................195
6.2.13 實驗十:統計圖等化法結合調適性自動迴歸移動平均濾波器..........................................202
6.2.14 本節結論.............................208
6.3 語者確認實驗............................210
6.3.1 實驗語料..............................210
6.3.2 語音特徵參數擷取......................210
6.3.3 語者確認效能評估......................211
6.3.4 實驗一:GMM/UBM語者確認系統使用強健性語音特徵值..........................................215
6.3.5 實驗二:KL散度線性核函數之GMM/SVM語者確認系統..........................................218
6.3.6 實驗三:超級向量核函數之GMM/SVM語者確認系統..........................................224
6.3.7 實驗四:參數核函數之GMM/SVM語者確認系統..........................................229
6.3.8 實驗五:參數核函數結合NAP 補償之GMM/SVM語者確認系統..........................................234
6.3.9 實驗六:參數核函數之GMM/SVM語者確認系統考慮類不平衡..........................................240
6.3.10 實驗七:參數核函數之GMM/SVM語者確認系統考慮類不平衡及正規化....................................246
6.3.11 實驗八:參數核函數結合NAP補償之GMM/SVM語者確認系統結合測試分數正規化...........................250
6.3.12 與文獻比較..........................252
6.3.13 本節結論............................253
第七章 結論與展望 .........................255
7.1結論....................................255
7.2未來展望................................256
參考文獻...................................260
參考文獻 [1] J. W. Huang, J. L. Shen, and L. S. Lee, “New Approaches for Domain Transformation and Parameter Combination(PMC)Techniques”, IEEE Trans. on Speech and Audio Processing, Vol. 9, No. 8, Nov. 2001.
[2] S. Furui, “Cepstral analysis technique for automatic speaker verification”, IEEE Transaction on Acoustics, Speech and Signal Processing, 29, pp. 254.272, 1981.
[3] A. E. Rosenberg, C. H. Lee and F. K. Soong, “Cepstral channel normalization techniques for HMM-based speaker verification”, ICSLP, pp.1835-1838, 1994.
[4] O. Viikki, and K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition”, Speech Communication, 25, pp. 133-147, 1998.
[5] O. Vikki and K. Laurila, “Noise robust HMM-based speech recognition using segmental cepstral feature vector norlization”, ESCA NATO Workshop Robust Speech Recognition Unknown Communication Channels, France, pp.107-110, 1997.
[6] H. Hermansky and N. Morgan, “RASTA processing of speech”, IEEE Transaction on Speech and Audio Processing, 2, pp. 578-589, 1994.
[7] S. H. Lin, “Exploring the Use of Data Fitting and Clustering Techniques for Robust Speech Recognition”, Master Thesis, Department of Computer Science and Information Engineering, National Taiwan Normal University, Taiwan,2006
[8] L. S. Lee, and Y. Lee, “Voice Access of Global Information for Broad-Band Wireless: Technologies of Today and Challenges of Tomorrow”, Proceedings of the IEEE, vol. 89, no. 1, pp. 41-57,January 2001.
[9] S. Furui, “An overview of speaker recognition technology”, ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, page 1-9, 1994.
[10] S. Furui, “Recent advances in speaker recognition”, Pattern Recognition Letters, pp. 859-872, 1997
[11] A. Solomonoff, W. M. Campbell, and I. Boardman, “Advances in channel compensation for SVM speaker recognition”, in Proceedings of ICASSP, 2005.
[12] R. Auckenthaler, M. Carey and H. Lloyd-Thomas, “Score normalization for text-independent speaker verification systems”, Digital Signal Processing, 10, pp. 42-54, 2000.
[13] D. E. Sturim and D. A. Reynolds, “Speaker adaptive cohort selection for Tnorm in text-independent speaker verification”, Proc. ICASSP’05, pp. I-741 – I-744, 2005.
[14] R. O. Duda, P. E. Hart, and D. G. Stork. “Pattern Classification”, Wiley, New York, 2nd edition, 2000.
[15] X. Huang, A. Acero, H. W. Hon, “Spoken language processing”, Prentice Hall, 2001.
[16] B. H. Juang, “ The past, present, and future of speech processing ”, IEEE Trans. on Signal Processing, pp. 24-28, May 1998.
[17] K. C. Huang, Y. T. Juang and W. C. Chang, “ Robust integration for speech features”, Signal Processing Volume: 86, Issue: 9, September, 2006, pp. 2282-2288(SCI) , September 2006
[18] L. R. Rabiner and B. H. Juang, “ Fundamentals of Speech Recognition” , Prentice Hall, New Jersey, 1993.
[19] D. Burshtein, “ Robust parametric modeling of duration in hidden Markov models”, IEEE Trans. on Speech Audio Processing, vol. 4, pp. 240-242, May 1996.
[20] R. Vergin, et al., "Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition", Speech and Audio Processing, IEEE Transactions on, vol. 7, pp. 525-532, 1999.
[21] S. Imai, "CEPSTRAL ANALYSIS SYNTHESIS ON THE MEL FREQUENCY SCALE", 1983 IEEE, Tokyo Institute of Technology Nagatsuta—cho, Midori-ku, Yokohama 227 Japan
[22] B.S. Atal and L.R. Rabiner,, "A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition", IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-24, NO. 3, JUNE 1976
[23] H. K. Kim, S. H. Choi, and H. S. Lee, "On Approximating Line Spectral Frequencies to LPC Cepstral Coefficients," IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 8, NO. 2, MARCH 2000
[24] JSD Mason and Y Gu, "Perceptually-based Features in ASR", University College of Swansea, UK
[25] H. Hermansky, “Perceptual linear predictive (PLP)analysis of speech,” Journal of the Acoustic Society of America, vol. 87, issue 4, pp. 1738-1752, Apr. 1990
[26] L. R. Rabiner and R. W. Schafer, “Digital processing of speech recognition signals,” Prentice-Hall Co. Ltd, 1978.
[27] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An introduction to the application of the theory of probabilistic function of a markov process to automatic speech recognition,” The Bell System Technical Journal, vol. 62, no. 4, April 1983.
[28] L. R. Rabiner, “ A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition, ” Proceedings of the IEEE, vol. 77, No. 2, Feb. 1989.
[29] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “ An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition, ” The Bell System Technical Journal, vol. 62, No. 4, April 1983.
[30] L.R.Bahl, F. Jelinek and R. L. Mercer . “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. PAMI-5, N0.2, pp.179-190, March 1983.
[31] L.E.Baum.“An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes,” Inequalities, Vol. 3, No. 1, pp.1-8, 1972.
指導教授 莊堯棠(Yau-Tarng Juang) 審核日期 2010-8-3 推文 facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu 網路書籤 Google bookmarks del.icio.us hemidemi myshare