改善最小錯誤鑑別式之語者辨認方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：46

、訪客IP：18.190.160.235

姓名

李信廷(Shin-Ting Li) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

改善最小錯誤鑑別式之語者辨認方法
(Improved Minimum Classifiaction Error Method for Speaker Identification)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在語者辨認中，能夠有效的訓練語料是非常重要的，因為這對辨識的效果是有很大的影響。到目前為止，傳統的語者模型都還是以最大相似度為準則，這在擁有大量訓練語料之下確實是有很好的效果，但在極少量訓練語料下卻不然，並且最大相似度估計的方法是，利用同一個語者的訓練語料去訓練出這個語者的模型，跟其它語者的訓練語料並無相關。，而此種模型訓練並沒有考慮到語者辨認時模型間彼此的關係，在模型參數訓練完成後有可能使得語音特徵向量落在對應的聲學模型與非相關模型的相似度值同時變大，產生辨識上的混淆。因此近十幾年來有所謂的鑑別式聲學模型訓練方法被提出來，不以最大化訓練聲學語料的相似度為目標，而以最小化分類(或辨識)錯誤為目標。
在本論文中，我們使用最小錯誤鑑別式法則重新去訓練語者模型，並提出了三個改善傳統最小錯誤鑑別式法則的方法。此外，還把最小錯誤鑑別式使用在特徵語音調適法上，因為最小錯誤鑑別式受劣質近似模型的影響比最大相似度小。於是我們提出一個結合最小錯誤鑑別式和特徵語音調適法的方法，增加在極少語料時的強健性，以及降低建構聲學空間時造成劣質近似模型的影響性。

摘要(英)

In the speaker identification, the data that can be effective training is very important, because this has very great influence on identification rate. Up to now, traditional speaker model use maximum likelihood. There is a very good result in a large amount of training data, but not good in a small amount of training data. The method of maximum likelihood is, use the training data for this speaker to train model for this speaker and not relevant with other speaker’s training data. This kind of training model which does not consider mutual relation among the models to verification.After the parameters are trained to finish,it may make the likelihood value of feature vectors leave the corresponding acoustics model and non- relevant model which become great at the same time,then produce the obscurity in verifying.So the so-called Discriminative Acoustic Model Training has been proposed in recent ten years.Do not regard maximizing to train acoustic data of likelihood as the goal, but regard minimizing classification(or identificaion) error as the goal.
In this thesis, we use minimum classification error to train speaker model again, and propose three method of improved traditional minimum classification error. In addition, also use minimum classification error in eigenvoices, because minimum classification error is smaller of mistake distinguishing than maximum likelihood. Then we purpose a method of to combine minimum classification error and eigenvoices, increase robust in a few data, and reduce influence of mistake distinguishing when construct acoustics space.

關鍵字(中)

★ 最小錯誤鑑別式
★ 語者辨認

關鍵字(英)

★ Speaker Identification
★ Minimum Classifiaction Error

論文目次

目錄
摘要 Ⅰ
目錄 Ⅱ
附圖目錄 Ⅴ
附表目錄 Ⅵ
第一章緒論 1
1.1 研究動機 1
1.2語者辨識概述 2
1.3語者調適技術概述 4
1.4 研究方向 5
1.5 章節概要 7
第二章語者識別之基本技術 8
2.1 特徵參數擷取 8
2.2語者模型建立 12
2.2.1高斯混合模型 13
2.2.2語者模型訓練流程 14
2.2.3向量量化 16
2.2.4 EM演算法 19
2.3語者模型調適技術 20
2.3.1貝式調適法 20
2.3.2特徵語音調適法 25
2.4語者識別 30
第三章最小錯誤鑑別式 32
3.1 鑑別函式 33
3.2 綜合機率減少演算法 35
3.3 最小錯誤鑑別式之特徵語音調適 39
第四章實驗結果 40
4.1 實驗環境 40
4.2 MCE實驗 42
4.2.1模型的遞迴次數之實驗 42
4.2.2門檻值對MCE之影響 43
4.2.3每次遞迴次數之競爭語者數目 45
4.2.4語料長度對MCE之影響 46
4.2.5改善的MCE與傳統的MCE比較 47
4.3 MCE結合Eigenvoices實驗 49
第五章結論與未來展望 50
5.1 結論 50
5.2 未來展望 51
參考文獻 52

參考文獻

參考文獻
[1] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[2] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
[3] G.R. Doddington: Speaker Recognition-Identifying People by Their Voices. Proceedings of IEEE, Vol. 73, No. 11, 1986, pp. 1651-1644.
[4] J. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,”IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 291-298,April 1994.
[5] R. Kuhn, J. C. Junqua, P. Nguyen and N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 6, pp. 695-707, November 2000.
[6] B.H Juang, W. Hou, C.H Lee, “Minimum classification error rate methods for speech recognition:’ IEEE Trans. on Speech and Audio Processing. vol. 5, pp. 257-265, May 1997.
[7] O. Siohan, A. E. Rosenberg, and S. Parthasarathy, “Speaker identification using minimum classification error training,” ICASSP-98, vol.1, pp.109–112, May 1998.
[8] J. McDonough, T. Schaaf, A. Waibel, “On maximum mutual information speaker-adapted training” Acoustics, Speech, and Signal Processing, 2002. Proceedings. (ICASSP '02). IEEE International Conference on Volume 1, 2002 Page(s):I-601 - I-604 vol.
[9] J. Kaiser, B. Horvat, Z. Kacic, “Overall Risk Criterion Estimation of Hidden Markov Model Parameters,” Speech Communication, Vol. 38, 2002, pp.383-398.
[10] V. Doumpiotis, W. Byrne, “Lattice Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition,” to appear in Speech Communication.
[11] L. Wang, P. Woodland, “MPE-based discriminative linear transform for speaker adaptation” Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
[12] D. A. Reynolds and R. C. Rose, “Robust text independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. on Speech and Audio Process., vol.3, no.1, pp.72–83, Jan. 1995.
[13] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker- Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[14] T. E. Tremain. “The Government Standard Linear Predictive Coding Algorithm. ” Speech Technology (1982) 40--49.
[15] T. K. Moon, "The Expectation Maximization. Algorithm", IEEE Signal processing magazine, Nov. 1996.
[16] D. Reynolds and T. Quatieri, Speaker Verification Using Adapted Gaussian Mixture Models, in Digital Signal Processing A Review Journal, vol. 10, no. 1-3, pages19-41, Academic Press, 2000.
[17] W. Chou, C.-H. Lee and B.-H. Juang, “Segmental GPD training of an hidden Markov model based speech recognizer,” Proc. ICASSP-92, pp. 473–476.
[18] Q.Y Hong, S. Kwong , “Discriminative training for speaker identification based on maximum model distance algorithm”, Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on Volume 1, 17-21 May 2004 Page(s):I - 25-8 vol.1
[19] F. Valente, C. Wellekens, “Minimum classification error/eigenvoices training for speaker identification” Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on Volume 2, 6-10 April 2003 Page(s):II - 213-16 vol.2
[20] Y. Kida, H. Yamamoto, C. Miyajima, K. Tokuda, T Kitamura, , “Minimum Classification Error Interactive Training for Speaker Identification”, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on Volume 1, March 18-23, 2005 Page(s):641 – 644
[21] 賴彥輔， “語者辨識之研究” ，國立中央大學電機工程研究所碩士論文，民國九十二年。
[22] 張文杰， “模型調適之語者辨識系統” ，國立中央大學電機工程研究所碩士論文，民國九十四年。
[23] 莊智顯， “結合聲學與韻律訊息之強健性語者辨認” ，國立臺北科技大學電腦通訊與控制研究所碩士論文，民國九十四年。

指導教授

莊堯棠(Yau-Tarng Juang)

審核日期

2006-7-5

推文