利用支撐向量機改善最小錯誤鑑別式之語者辨識方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：55

、訪客IP：3.15.14.245

姓名

朱映霖(Ying-Lin Chu) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

利用支撐向量機改善最小錯誤鑑別式之語者辨識方法
(SPEAKER IDENTIFICATION BASED ON AN IMPROVED MINIMUM CLASSIFICATION ERROR METHOD)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在語者辨識中，有效的訓練語料是非常重要的，因為是以其來建立語者模型，所以對辨識效果有很大的影響。傳統的語者模型都是以最大相似度為準則，雖然在大量的訓練語料下有很好的效果，但在極少量的訓練語料下卻不然，並且因為最大相似度估計的方法，是利用同一個語者的訓練語料去訓練此語者的模型，而跟其他語者的訓練語料則無相關。由於此種模型訓練時並沒有考慮到語者辨識時，語者模型互相間的關係，所以在語者辨識時容易產生混淆。因此近年來有所謂的鑑別式聲學模型訓練方式被提出來，不以最大化訓練聲學語料的相似度為目標，而以最小化分類錯誤為目標。
本論文中我們使用最小錯誤鑑別式重新去訓練語者模型，並利用支撐向量機來改善最小錯誤鑑別式，由於最小錯誤鑑別式在競爭語者數量的設定方面不夠強健，所以我們透過語者模型對調適語料的分數，附上類別標籤後來訓練支撐向量機，再由其支撐向量選取競爭語者，使選取競爭語者這方面比傳統最小錯誤鑑別式較有強健性，也有較高的語者辨識效果。

摘要(英)

In speaker recognition, it is important to have effective training data to train speaker models which have a great effect on recognition performance. In abundant training data, traditional speaker models which is based on maximum likelihood have a good effect, but it is opposite in slight training data. Besides, being independent with other speakers, we used training data for the same speaker to train speaker model owning to the method of maximum likelihood. In the stage of training model, we did not concern the relation of different speaker model, so we would get confused easily in speaker recognition. In recent years, Discriminative Acoustic Model Training is proposed to minimize classification error, not maximizing training acoustic models likelihood.
In this thesis, we use minimum classification error to train speaker models, and support vector machines to improve minimum classification error. Due to the non-robustness of minimum classification error in setup for the amount of competitive speakers, we use the scores of speaker models for training data as labels of classes to train support vector machines. Then, we use support vectors to choose competitive speakers to make more robust and higher speaker recognition performance than minimum classification error.

關鍵字(中)

★ 支撐向量機
★ 語者辨識
★ 最小錯誤鑑別式

關鍵字(英)

★ Minimum Classification Error
★ Speaker Identification
★ Support Vector Machines

論文目次

摘要.....................................................i
致謝....................................................iv
目錄.....................................................v
圖目錄................................................viii
表目錄...................................................x
第一章緒論..............................................1
1.1 研究動機.........................................................................................1
1.2 語者辨識概述................................................................................2
1.3 語者調適技術概述........................................................................4
1.4 研究方向.........................................................................................6
1.5 章節概要.........................................................................................7
第二章語者識別之基本技術................................8
2.1 特徵參數擷取................................................................................8
2.2 語者模型建立..............................................................................12
2.2.1 高斯混合模型......................................................................13
2.2.2 語者模型訓練流程..............................................................14
2.2.3 向量量化..............................................................................16
2.2.4 EM演算法............................................................................19
2.3 語者模型調適技術......................................................................21
2.3.1 貝式調適法..........................................................................21
2.4 語者識別.......................................................................................25
第三章系統架構.........................................27
3.1 支撐向量機..................................................................................27
3.1.1 線性SVM分類器....................................................................27
3.1.2 資料不可分隔情形..............................................................33
3.1.3 核函數..................................................................................34
3.2 最小錯誤鑑別式..........................................................................35
3.2.1 鑑別函式..............................................................................36
3.2.2 錯誤鑑別準則......................................................................38
3.3 廣義機率遞減法則......................................................................39
第四章實驗與討論.......................................43
4.1 實驗環境.......................................................................................43
4.2 MCE與SVM-MCE實驗......................................................................45
4.2.1 模型遞迴次數與辨識率比較..............................................45
4.2.2 模型遞迴次數與平均競爭語者比較..................................47
4.2.3 調適語料長度之影響..........................................................49
4.2.4 系統總人數之影響..............................................................51
第五章結論與未來展望...................................54
5.1 結論...............................................................................................54
5.2 未來展望.......................................................................................55
參考文獻................................................56

參考文獻

[1] B.H Juang, W. Hou, C.H Lee, “Minimum classification error rate methods for speech recognition:?IEEE Trans. on Speech and Audio Processing. vol. 5, pp. 257-265, May 1997.
[2] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, “A Practical Guide to Support Vector Classification?, abailable at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[3] D. A. Reynolds and R. C. Rose, “Robust text independent speaker identification using Gaussian mixture speaker models,? IEEE Trans. on Speech and Audio Process., vol.3, no.1, pp.72–83, Jan. 1995.
[4] D. Reynolds and T. Quatieri, Speaker Verification Using Adapted Gaussian Mixture Models, in Digital Signal Processing A Review Journal, vol. 10, no. 1-3, pages19-41, Academic Press, 2000.
[5] G.R. Doddington: Speaker Recognition-Identifying People by Their Voices. Proceedings of IEEE, Vol. 73,
No. 11, 1986, pp. 1651-1644.
[6] Johan A.K. Suykens, Tony Van Gestel, Jos De Brabanter, Bart De Moor and Joos Vandewalle, Least Squares Support Vector Machines, World Scientific, 2002
[7] J. Kaiser, B. Horvat, Z. Kacic, “Overall Risk Criterion Estimation of Hidden Markov Model Parameters,? Speech Communication, Vol. 38, 2002, pp.383-398.
[8] J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,?IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,April 1994.
[9] J. McDonough, T. Schaaf, A. Waibel, “On maximum mutual information speaker-adapted training? Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP '02). IEEE International Conference on Volume 1, 2002 Page(s):I-601 - I-604 vol.
[10] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
[11] L. Wang, P. Woodland, “MPE-based discriminative linear transform for speaker adaptation? Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
[12] O. Siohan, A. E. Rosenberg, and S. Parthasarathy, “Speaker identification using minimum classification error training,? ICASSP-98, vol.1, pp.109–112, May 1998.
[13] R. Kuhn, J. C. Junqua, P. Nguyen and N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space,? IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, November 2000.
[14] R. Vergin, D. O' Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker- Independent Continuous-Speech Recognition,? IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532,September 1999.
[15] T. E. Tremain. “The Government Standard Linear Predictive Coding Algorithm. ? Speech Technology (1982) 40--49.
[16] Tie Cai, Jie Zhu, “A novel Method for rapid speaker adaptation based on support speaker weighting?, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on Volume 1, March 18-23, 2005 Page(s):993 – 996
[17] T. K. Moon, "The Expectation Maximization. Algorithm", IEEE Signal processing magazine, Nov. 1996.
[18] V. Doumpiotis, W. Byrne, “Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition,? to appear in Speech Communication.
[19] W. Chou, C.-H. Lee and B.-H. Juang, “Segmental GPD training of an hidden Markov model based speech
recognizer,? Proc. ICASSP-92, pp. 473–476.
[20] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[21] Y. Kida, H. Yamamoto, C. Miyajima, K. Tokuda, T Kitamura, , “Minimum Classification Error Interactive Training for Speaker Identification?, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on Volume 1, March 18-23, 2005 Page(s):641 – 644
[22] 賴彥輔， “語者辨識之研究? ，國立中央大學電機工程研究所碩士論文，民國九十二年。
[23] 張文杰， “模型調適之語者辨識系統? ，國立中央大學電機工程研究所碩士論文，民國九十四年。
[24] 李信廷， “改善最小錯誤鑑別式之語者辨認方法? ，國立中央大學電機工程研究所碩士論文，民國九十五年。

指導教授

莊堯棠(Yau-Tarng Juang)

審核日期

2007-7-6

推文