結合高斯混合超級向量與微分核函數之
語者確認研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：67

、訪客IP：18.119.235.47

姓名

林立源(Li-yuan Lin) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

結合高斯混合超級向量與微分核函數之語者確認研究
(Combine GMM-Supervector and Derivative Kernel for speaker verification )

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 敏捷移動粒子群最佳化方法	★ 改良式粒子群方法之無失真影像預測編碼應用
★ 粒子群演算法應用於語者模型訓練與調適之研究	★ 粒子群演算法之語者確認系統
★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究	★ 利用語者特定背景模型之語者確認系統
★ 智慧型遠端監控系統	★ 正向系統輸出回授之穩定度分析與控制器設計
★ 混合式區間搜索粒子群演算法	★ 基於深度神經網路的手勢辨識研究
★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統	★ 非監督式快速語者調適演算法研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近幾年來SVM已經被廣泛的使用在很多領域，而且有很好的效果。本論文採用NIST 2001語料庫，利用SVM來作與文字不相關的語者確認。SVM的語者確認系統，通常都是用動態核函數去處理語音自然的特性。而核函數可分為兩種，分別是參數式和微分式。本論文結合此兩種核函數來計算分數使系統的性能更好。
從UBM-MAP調適出的高斯混合語者模型參數透過GMM-supervector Kernel及Likelihood Kernel做映射，得到兩組超級向量，接著執行雜訊屬性補償(NAP)修正GMM-supervector，最後利用兩組超級向量分別訓練SVM 模型。而在仿冒者的選取上，則是選取與目標語者特徵最相似的前20名仿冒語音，使得訓練出來的SVM 模型更有鑑別力。測試時，先得到所有測試句的兩組超級向量之後，再依序分別對指定的SVM模型算分數，得到兩組分數。
從NIST 2001語料庫實驗結果顯示，64mixture的GMM-superveror Kernel (NAP)結合256mixture的derivative Kernel系統可達最好的相等錯誤率及決策成本函數分別為6.04%及0.0777，比起傳統語者確認模型的效能15.87%及0.1911，改善了9.8%及0.1135。

摘要(英)

A Support vector machine-based speaker verification has become a standard approach in recent year. This thesis will evaluate the text-independent speaker verification on NIST 2001 SRE. The SVM speaker verification system usually uses dynamic kernels to handle the dynamic nature of the speech utterance. We can always separate dynamic kernel into two general classes, derivative and parametric. This paper will combine them in the score and space to promote the system’s performance.
From the UBM, we can use map to get the parameters of the GMM, by use of GMM-supervector Kernel and Likelihood Kernel to do the mapping which can get the two supervectors, and then we do the NAP process to modify the GMM-supervector. Finally we put these two supervector into the SVM for training the SVM model. About the imposters selection, we choose the top 20 speaker’s whose characteristics are similar to the target which can let the model become more discriminative. When testing, after we get all test speech’s two supervector, we use them to calculate the score with the specific model and get the score. Finally we combine two score together.
From the experiment on NIST 2001 SRE, we can find 64mixture GMM-supervector combined with a 256mixture derivative kernel result in better EER and DCF which are 6.05% and 0.1023 respectively. Comparing with the traditional speaker verification performance 15.87% and 0.1911,the proposed method obtains improvement 9.8% and 0.1135 respectively.

關鍵字(中)

★ 語者確認

關鍵字(英)

★ speaker verification

論文目次

摘要．．．．．．．．．．．．．．．．．．．．．．．．．．．i
目錄．．．．．．．．．．．．．．．．．．．．．．．．．．．iii
附圖目錄．．．．．．．．．．．．．．．．．．．．．．．．．vi
附表目錄．．．．．．．．．．．．．．．．．．．．．．．．．vii
第一章緒論
1.1 研究動機．．．．．．．．．．．．．．．．．．．．．． 1
1.2 語者辨識概述．．．．．．．．．．．．．．．．．．．． 2
1.3 研究方向．．．．．．．．．．．．．．．．．．．．．． 4
1.4 章節概要．．．．．．．．．．．．．．．．．．．．．． 4
第二章語音處理與語者辨識基本技術
2.1 特徵參數擷取．．．．．．．．．．．．．．．．．．．． 5
2.2 語者模型建立．．．．．．．．．．．．．．．．．．．． 8
2.2.1 通用背景模型．．．．．．．．．．．．．．．．． 8
2.2.2 高斯混合語者模型．．．．．．．．．．．．．．． 10
2.2.3 語者模型訓練流程．．．．．．．．．．．．．．．．12
2.2.4 向量量化．．．．．．．．．．．．．．．．．．．．13
2.2.5 EM 演算法．．．．．．．．．．．．．．．．．．．16
2.3 語者模型調適．．．．．．．．．．．．．．．．．．．． 18
2.4 語者確認．．．．．．．．．．．．．．．．．．．．．． 21
2.5 相等錯誤率與偵測錯誤交易曲線圖．．．．．．．．．．． 24
第三章系統架構
3.1 序列核函數．．．．．．．．．．．．．．．．．．．．． 26
3.1.1 GMM超級向量．．．．．．．．．．．．．．．．． 26
3.1.2 GMM超級向量線性核函數．．．．．．．．．．．． 28
3.1.3 雜訊屬性投影．．．．．．．．．．．．．．．．． 30
3.1.4 證明段落補償方程式．．．．．．．．．．．．．．． 32
3.1.5 微分核函數．．．．．．．．．．．．．．．．．． 33
3.2 支撐向量機．．．．．．．．．．．．．．．．．．．．． 35
3.2.1 線性SVM分類器．．．．．．．．．．．．．．．．． 35
3.2.2 資料不可分隔情形．．．．．．．．．．．．．．．． 40
3.2.3 SVM的核函數．．．．．．．．．．．．．．．．．． 41
3.3 訓練支撐向量機模型．．．．．．．．．．．．．．．．． 43
3.3.1 仿冒語者挑選．．．．．．．．．．．．．．．．．． 44
3.4 語者確認系統．．．．．．．．．．．．．．．．．．．． 45
3.4.1 GMM超級向量核函數語者確認系統．．．．．．．．．45
3.4.2 結合GMM超級向量與微分核函數之語者確認系統．．．．． 48
第四章語者辨識實驗之研究
4.1 語音資料庫．．．．．．．．．．．．．．．．．．．．． 50
4.2 運用序列核函數於語者確認系統．．．．．．．．．．．． 52
4.2.1 實驗一運用GMM超級向量核函數．．．．．．．．．52
4.2.2 實驗二運用NAP於GMM超級向量．．．．．．．．．54
4.2.3 實驗三運用微分核函數．．．．．．．．．．．．． 56
4.2.4 實驗四結合GMM超級向量與微分核函數．．．．． 57
4.2.5 實驗五與參考文獻比較．．．．．．．．．．．．． 58
第五章結論與未來展望
5.1 結論．．．．．．．．．．．．．．．．．．．．．．．． 59
5.2 未來展望．．．．．．．．．．．．．．．．．．．．．． 60
參考文獻．．．．．．．．．．．．．．．．．．．．．．．．． 61

參考文獻

[1] Rabiner , L. R. and Juang, B. H., Fundamentals of Speech Recognition,
Prentice Hall, New Jersey, 1993.
[2] Huang, X., Acero, A. and Hon, H. W., Spoken Language Processing, Prentice Hall, 2001.
[3] J. T. Tou, R. C. Gonzalez, Pattern Recognition Principles, Addison Wesley, 1974.
[4] L. S. Lee, Y. Lee, “Voice Access of Global Information for Broad-Band Wireless: Technologies of Today and Challenges of Tomorrow,” Proceedings of the IEEE, vol. 89, no. 1, pp. 41-57, January 2001.
[5] Johan A.K. Suykens, Tony Van Gestel, Jos De Brabanter, Bart De Moor and Joos Vandewalle, Least Squares Support Vector Machines, World Scientific, 2002
[6] Reynolds, D. A. and Rose, R. C., “Robust Text-Independent Speaker Identification Using Gaussian Mixture Models,” IEEE Trans. Speech and Audio Processing, vol. 3, no. 1, pp. 72-83, January 1995.
[7] Alex Solomonoff, W. M. Campbell, and I. Boardman, “Advances in channel compensation for SVM speaker recognition,” in Proceedings of ICASSP, 2005.
[8] Vergin , R. and O’Shaughnessy, D., and Farhat, A., “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[9] Rosenberg, A. E. and Parthasarathy, S.”Speaker background models for connected digit password speaker verification”. In Proceedings of the International Conference on Acoustics,Speech, and Signal Processing, May 1996, pp. 81–84.
[10] Isobe, T. and Takahashi, J., “Text-independent speaker verification using virtual speaker based cohort normalization”. In Proceedings of the European Conference on Speech Communication and Technology, 1999, pp. 987–990.
[11] Reynolds, D. and Quatieri, T., “Speaker Verification Using Adapted Gaussian Mixture Models,” Digital Signal Processing 10, PP. 19-41, 2000.
[12] Dempster, A., Laird, N., and Rubin, D.,” Maximum likelihood from incomplete data via the EM algorithm”, J. Roy. Stat. Soc. 39 (1977), 1–38.
[13] Reynolds, D. A.,” A Gaussian Mixture Modeling Approach to Text-Independent Speaker Identification”. Ph.D. thesis, Georgia Institute of Technology, September 1992.
[14] Reynolds, D. A. and Rose, R. C., “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Trans. Speech Audio Process. 3 (1995), 72–83.
[15] Moon, T. K., “The Expectation-Maximization Algorithm,” IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, November 1996.
[16] Duda, R. O. andHart, P. E., “Pattern Classification and Scene Analysis”. Wiley, New York, 1973.
[17] Gauvain, J. L. and Lee, C.-H., “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains”, IEEE Trans. Speech Audio Process. 2 (1994), 291–298.
[18] Vuuren, S., “Speaker Verification in a Time-Feature Space”. Ph.D. thesis, Oregon Graduate Institute, March 1999.
[19] Dunn, R. B., Reynolds, D. A., and Quatieri, T. F., “Approaches to speaker detection and tracking in conversational speech”, Digital Signal Process. 10 (2000), 93–112.
[20] Higgins, A., Bahler, L., and Porter, J.,“ Speaker verification using randomized phrase prompting”,Digital Signal Process. 1 (1991), 89–106.
[21] Rosenberg, A. E., DeLong, J., Lee, C. H., Juang, B. H., and Soong, F. K., “The use of cohort normalized scores for speaker verification”.In International Conference on Speech and Language Processing, November 1992, pp. 599–602.
[22] Reynolds, D. A.,“ Speaker identification and verification using Gaussian mixture speaker models”,Speech Commun. 17 (1995), 91–108.
[23] Matsui, T. and Furui, S.,“ Similarity normalization methods for speaker verification based on a posteriori probability”, In Proceedings of the ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, 1994, pp. 59–62.
[24] Carey, M., Parris, E., and Bridle, J., “A speaker verification system using alphanets”. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, May 1991, pp. 397–400.
[25] Reynolds, D. A., “Comparison of background normalization methods for text-independent speaker verification”. In Proceedings of the European Conference on Speech Communication and Technology, September 1997, pp. 963–966.
[26] Matsui, T. and Furui, S., “Likelihood normalization for speaker verification using a phonemeand speaker-independent model”, Speech Commun. 17 (1995), 109–116.
[27] Rosenberg, A. E. and Parthasarathy, S.,” Speaker background models for connected digit password speaker verification”. In Proceedings of the International Conference on Acoustics,Speech, and Signal Processing, May 1996, pp. 81–84.
[28] Heck, L. P. and Weintraub, M., “Handset-dependent background models for robust textindependent speaker recognition”. In Proceedings of the International Conference on Acoustics,Speech, and Signal Processing, April 1997, pp. 1071–1073.
[29] Martin, A., Doddington, G., Kamn, T., Ordowski, M., and Przybocki, M., “The DET curve in assessment of detection task performance,” in Proceedings of European Conference on Speech Communication and Technology, pp. 1895-1898, 1997.
[30] V. Wan and W. M. Campbell, “Support vector machines for speaker verification and identification,” in Proc. Neural Networks for Signal Processing X, pp. 775–784, 2000.
[31] Wan, V. and Renals, S., “SVMSVM: Support Vector Machine speaker verification methodology,” in Proc. IEEE ICASSP, 2003.
[32] Kong-Aik Lee, Changhuai You1, Haizhou Li1, Tomi Kinnunen2, and Donglai Zhu1“ Characterizing Speech Utterances for Speaker Verification with Sequence Kernel SVM”
[33] Haykin, S., Neural Network: A Comprehensive Foundation. NJ:
Prentice-Hall, 1999.
[34] N. Cristianini and J. Shawe-Taylor, “An Introduction to Support
Vector Machines”. Cambridge: Cambridge University Press,2000.
[35] Campbell, W.M., “Generalized linear discriminant sequence kernels for speaker recognition,” in Proceedings of ICASSP, 2002, pp. 161–164.
[36] Cristianini, Nello and John Shawe-Taylor, Support Vector Machines,
Cambridge University Press, Cambridge, 2000.
[37] Pedro J. Moreno, Purdy P. Ho, and Nuno Vasconcelos, “A Kullback-
Leibler divergence based kernel for SVM classification in multimedia
applications,” in Adv. in Neural Inf. Proc. Systems 16, S. Thrun, L. Saul,
and B. Sch‥olkopf, Eds. MIT Press, Cambridge, MA, 2004.
[38] Minh N. Do, “Fast approximation of Kullback-Leibler distance for
dependence trees and hidden Markov models,” IEEE Signal Processing
Letters, vol. 10, no. 4, pp. 115–118, 2003.
[39] Mathieu Ben,Michel Bester, Frederic Bimbot, and Guillaume Gravier,“Speaker diarization using bottom-up clustering based on a parameterderived distance between adapted GMMs,” in Proc. of ICSLP, 2004.
[40] Campbell, W.M., “Generalized linear discriminant sequence kernels for speaker recognition,” in Proceedings of ICASSP, 2002, pp. 161–164.
[41] Mikhail Belkin and Partha Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,”in Advances in Neural Information Processing 14, T. G.Deitterich, S. Beck, and Z. Ghahramani, Eds., 2003.
[42] Layton, M., “Augmented statistical models for classifying sequence
data,” Ph.D. dissertation, Cambridge Univ., Cambridge, U.K., 2006.
[43] Jaakkola , T. and Haussler, D., “Exploiting generative models in discriminative classifiers,” in Proc. NIPS, 1999, pp. 487–493.
[44] Gales, M. J. F. and Layton, M., “Training augmented models using
SVMs,” IEICE Special Iss. Statist. Models Speech Recognition, 2006.
[45] Wan, V. and Renals, S., “Speaker verification using sequence discriminant support vector machines,” IEEE Trans. Speech Audio Process.,vol. 13, no. 2, pp. 203–210, Mar. 2004.
[46] Raghavan, S., Lazarou, G.. and Picone, J., “Speaker Verification Using Support Vector Machines,” in Proc. IEEE, 2006.
[47] “The NIST Year 2001 Speaker Recognition Evaluation Plan”, http://www.nist.gov/speech/tests/spk/2001/
[48] Wei Huang, Jianshu Chao, Yaxin Zhang,” Combination of Pitch and MFCC GMM Supervectors for Speaker Verification,” ISSC 2008. IET Irish Publication,pp. 32 – 36. 2008
[49] Chen, C. P. and Bilmes, J.,“MVA Processing of Speech Features” , Audio, Speech and Language Processing, vol. 15,pp257-270, 2007.

指導教授

莊堯棠

審核日期

2010-6-21

推文