最小錯誤鑑別式應用於語者辨識之競爭語者探討

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：25

、訪客IP：18.118.205.146

姓名

黃夢晨(Meng-chen Huang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

最小錯誤鑑別式應用於語者辨識之競爭語者探討
(The research of competitive speakers on MCE for speaker identification)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在本論文中，我們主要是利用最小錯誤鑑別式(Minimum Classification Error, MCE)重新訓練語者模型，而使用最小錯誤鑑別式(MCE)在訓練語者模型時，所會遇到的最大問題則是要以何種標準選取競爭語者群，針對這一項問題，我們共提出四種競爭語者群的選取方法，包含：排名法、臨界值法、分數分類法及模型分類法，分數分類法及模型分類法皆是將語者參數輸入至支撐向量機(SVM)內做分類的動作，分數分類法是輸入每一位語者的最大相似分數，而模型分類法則是輸入每位語者的模型參數。將參數皆輸入至支撐向量機(SVM)後，再藉由支撐向量機(SVM)優良的分類特性，從語料庫中找到更合適的競爭語者群，進而提升系統語者辨識率，分數分類法對傳統高斯混合模型(Gaussian mixture model, GMM)語者辨識系統有42.27%的錯誤改善率，本論文實驗中是使用TIMIT語料庫為基礎。

摘要(英)

In this thesis, we re-train speaker model by Minimum Classification Error Method (MCE). For Minimum Classification Error Method, searching competitive speakers is the most important problem, and then we propose four methods for searching competitive speakers, ex: ranking method, threshold method, model classification method and score classification method. For model classification method and score classification method, we use speaker’s parameters as inputs to train Support Vector Machine (SVM), and SVM will classify target speaker and competitive speakers. In this paper, we expect that the two methods will raise speaker recognition rate. The experimental result shows that Score classification method obtains a 42.27% speaker recognition rate improvement over Gaussian mixture model (GMM). This paper is based on TIMIT database..

關鍵字(中)

★ 高斯混合模型
★ 支撐向量機
★ 最小錯誤鑑別式

關鍵字(英)

★ Support Vector Machine
★ Minimum Classification Error
★ Gaussian mixture model

論文目次

摘要......................................................i
Abstract.................................................ii
謝誌....................................................iii
目錄....................................................iv
附圖目錄................................................vii
附表目錄...............................................viii
第一章緒論...............................................1
1.1 研究動機.........................................2
1.2 語者辨識概述.....................................5
1.3 語者調適技術概述.................................7
1.4 研究方向.........................................8
1.5 章節概要.........................................9
第二章語者識別之基本技術................................11
2.1 特徵參數擷取....................................11
2.2 語者模型建立....................................14
2.2.1 高斯混合模型................................14
2.2.2 語者模型訓練流程............................15
2.2.3 向量量化....................................17
2.2.4 EM演算法....................................18
2.3 語者模型調適技術................................21
2.3.1 貝適調適法..................................21
2.4 語者識別........................................25
第三章系統架構..........................................27
3.1 最小錯誤鑑別式..................................30
3.1.1 鑑別函式...................................30
3.1.2 錯誤鑑別準則...............................31
3.1.3 綜合機率減少演算法.........................33
3.1.4 最小錯誤鑑別式之應用.......................34
3.2 支撐向量機......................................36
第四章實驗與討論........................................38
4.1 TIMIT語音資料庫.................................38
4.2 模型訓練及測試..................................40
4.3 實驗數據........................................41
4.3.1 實驗一排名法..............................41
4.3.2 實驗二臨界值法............................43
4.3.3 實驗三模型分類法..........................46
4.3.4 實驗四分數分類法..........................49
第五章結論與末來展望....................................52
5.1 結論............................................52
5.2 未來展望........................................53
參考文獻.................................................54

參考文獻

[1] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993
[2] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[3] G.R. Doddington: Speaker Recognition-Identifying People by Their Voices. Proceedings of IEEE, Vol. 73, No. 11, 1986, pp. 1651-1644.
[4] B.H Juang, W. Hou, C.H Lee, “Minimum classification error rate methods for speech recognition:’ IEEE Trans. on Speech and Audio Processing. vol. 5, pp. 257-265, May 1997.
[5] O. Siohan, A. E. Rosenberg, and S. Parthasarathy, “Speaker identification using minimum classification error training,” ICASSP-98, vol.1, pp.109–112, May 1998.
[6] Y. Kida, H. Yamamoto, C. Miyajima, K. Tokuda, T Kitamura, , “Minimum Classification Error Interactive Training for Speaker Identification”, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on Volume 1, March 18-23, 2005 Page(s):641 – 644
[7] Valente Fabio, Wellekens, Christian J, “Minimum classification error /eigenvoices training for speaker identification”, ICASSP 2003, 28th IEEE International Conference on Acoustics, Speech, and Signal Processing, April 6-10, 2003 - Hong Kong
[8] Yamamoto, H.; Nankaku, Y.; Miyajima, C.; Tokuda, K. Kitamura, T.; “Parameter sharing and minimum classification error training of mixtures of factor analyzers for speaker identification” Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on Volume 1, 17-21 May 2004 Page(s):I - 29-32 vol.1
[9] Johan A.K. Suykens, Tony Van Gestel, Jos De Brabanter, Bart De Moor and Joos Vandewalle, Least Squares Support Vector Machines, World Scientific, 2002
[10] Sheng-Yu Sun; Tseng, C.L.; Chen, Y.H.; Chuang, S.C.; Fu, H.C., “Cluster-based support vector machines in text-independent speaker identification”, Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on Volume 1, 25-29 July 2004 Page(s):
[11] J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,”IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,April 1994.
[12] R. Kuhn, J. C. Junqua, P. Nguyen and N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, November 2000.
[13] J. McDonough, T. Schaaf, A. Waibel, “On maximum mutual information speaker-adapted training” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP '02). IEEE International Conference on Volume 1, 2002 Page(s):I-601 - I-604 vol.
[14] J. Kaiser, B. Horvat, Z. Kacic, “Overall Risk Criterion Estimation of Hidden Markov Model Parameters,” Speech Communication, Vol. 38, 2002, pp.383-398.
[15] V. Doumpiotis, W. Byrne, “Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition,” to appear in Speech Communication.
[16] L. Wang, P.C. Woodland,, “MPE-Based Discriminative Linear Transform for Speaker Adaptation” in Proc. IEEE International Conference on Acoustics, Speech, Signal processing, vol. I, 2004, pp. 321-324.
[17] D. A. Reynolds and R. C. Rose, “Robust text independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. on Speech and Audio Process., vol.3, no.1, pp.72–83, Jan. 1995
[18] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker- Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[19] T. E. Tremain. “The Government Standard Linear Predictive Coding Algorithm. ” Speech Technology (1982) 40--49.
[20] T. K. Moon, "The Expectation Maximization. Algorithm", IEEE Signal processing magazine, Nov. 1996.
[21] D. Reynolds and T. Quatieri, Speaker Verification Using Adapted Gaussian Mixture Models, in Digital Signal Processing A Review Journal, vol. 10, no. 1-3, pages19-41, Academic Press,2000.
[22] W. Chou, C.-H. Lee and B.-H. Juang, “Segmental GPD training of an hidden Markov model based speech recognizer,” Proc. ICASSP-92, pp. 473–476
[23] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, “A Practical Guide to Support Vector Classification”, abailable at
http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[24] R. Vergin and D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999
[25] del Alamo, C.M.; Alvarez, J.; de la Torre, C.; Poyatos, F.J.; Hernandez, L.; “Incremental speaker adaptation with minimum error discriminative training for speaker identification” Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on Volume 3, 3-6 Oct. 1996 Page(s):1760 - 1763 vol.3
[26] 李信廷， “改善最小錯誤鑑別式之語者辨認方法” ，國立中央大學電機工程研究所碩士論文，民國九十五年。
[27] 朱映霖，“利用支撐向量機改善最小錯誤鑑別式之語者辨識方法”，國立中央大學電機工程研究所碩士論文，民國九十六年
[28] 陳柏仁，“應用投票演算法之語者確認系統研究”，國立中央大學電機工程研究所碩士論文，民國九十六年

指導教授

莊堯棠(Yau-Tarng Juang)

審核日期

2008-6-23

推文