利用支撐向量機模型改善對立假設特徵函數之語者確認研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：12

、訪客IP：18.118.126.11

姓名

黃國豪(Guo-hao Huang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

利用支撐向量機模型改善對立假設特徵函數之語者確認研究
(Using SVM to Improve the Characterization of the Alternative Hypothesis for Speaker Verification)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 結合高斯混合超級向量與微分核函數之語者確認研究
★ 敏捷移動粒子群最佳化方法	★ 改良式粒子群方法之無失真影像預測編碼應用
★ 粒子群演算法應用於語者模型訓練與調適之研究	★ 粒子群演算法之語者確認系統
★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究	★ 利用語者特定背景模型之語者確認系統
★ 智慧型遠端監控系統	★ 正向系統輸出回授之穩定度分析與控制器設計
★ 混合式區間搜索粒子群演算法	★ 基於深度神經網路的手勢辨識研究
★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統	★ 非監督式快速語者調適演算法研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

本論文主要針對語者確認系統提出新的確認流程，使得系統效能得到提升。此架構是利用權重幾何組合及權重算術組合，結合傳統的通用背景模型及最佳競爭語者等相似度值計算方法，透過支撐向量機計算出的權重，產生新的目標決策函數，以達到最佳的辨識效果。
此系統主要是將輸入語句的梅爾頻率倒頻譜特徵與註冊語者的高斯混合模型算分數，透過加權幾何組合與加權算術組合，建立出輸入訓練支撐向量機模型及測試的向量。本論文加入了分數尺度化，以調整不合理的分數，尺度化範圍設定在，此範圍在實驗中驗證可以使效能再獲得些許的提升。在仿冒語者的選取上，則是選取與目標語者模型分數最高的前60位，使得訓練出來的模型不失鑑別力，又能夠有效的節省運算時間。最後我們將加權幾何組合、加權算術組合與通用背景模型及最佳競爭語者等其他計算相似度比值的方法進行整合，使系統效能再提升。
從實驗結果顯示，在我們使用的架構中，高斯混合模型選定為128-mixture、選取60位仿冒語者及尺度化分數範圍為，系統可達到最好的相等錯誤率及決策成本函數分別為6.06%及0.0787，比起參考文獻[50]的相等錯誤率改善了2.63%，決策成本函數改善了0.0236，而比起參考文獻[51]的語者確認系統的相等錯誤率改善了0.70%。

摘要(英)

This thesis proposes a new verification system to improve the performance for speaker verification. The proposed system combines Weighted Geometric Combination (WGC), Weighted Arithmetic Combination (WAC), Universal Background Model (UBM) and Most Competitive Cohort Model (MAX), and uses Support Vector Machine to generate weight vectors for a new decision function.
We calculate the likelihood scores of input utterances’ MFCC with registered speaker model, and build input vectors for training SVM models and testing through WGC and WAC. Also, we proposed a scaling method to adapt the unreasonable likelihood scores. The range of scaling is , and this method and range is shown to improve the system by our experiments. And then we select the Top 60 imposters from total speakers’ likelihood scores by imposter selection. These methods not only can make the training model more robust but also can reduce the time of calculations.
The experiments results are based on 128-mixture GMMs, Top 60 imposter selection and scaling range of environment. The proposed system obtains a 2.63% EER and 2.36% DCF improvement over [50], and a 0.61% EER improvement over [51].

關鍵字(中)

★ 語音
★ 語者確認
★ 支撐向量機

關鍵字(英)

★ speech
★ speaker verification
★ SVM

論文目次

摘要 I
目錄 III
附圖目錄 VI
附表目錄 VIII
第一章緒論 1
1.1 研究動機 1
1.2 語者辨識概述 2
1.3 研究方向 4
1.4 章節概要 5
第二章語者確認之基本技術 7
2.1 特徵參數擷取 7
2.2 語者模型建立 14
2.2.1 高斯混合模型 14
2.2.2 語者模型訓練流程 16
2.2.3 向量量化 17
2.2.4 期望值最優化演算法 19
2.3 最大事後機率法 21
2.4 語者識別 28
2.5 語者確認 30
2.6 語者確認效能評估 32
第三章系統架構 35
3.1 目標決策函數 35
3.1.1 加權幾何組合 36
3.1.2 加權算術組合 37
3.1.3 核鑑別分析 38
3.2 支撐向量機 39
3.2.1 線性SVM分類器 40
3.2.2 資料不可分隔情形 45
3.2.3 核函數 47
3.3 目標決策函數結合支撐向量機之權重訓練 48
3.4 權重Likelihood Ratio值整合 50
3.5 仿冒語者選取及尺度化分數 51
3.6 目標決策函數結合SVM之語者確認系統 52
第四章實驗與討論 54
4.1 語音資料庫 54
4.2 目標決策函數結合SVM之語者確認系統 56
4.2.1 實驗一使用加權算術組合 56
4.2.2 實驗二使用加權幾何組合 58
4.2.3 實驗三加入分數尺度化及仿冒語者挑選 61
4.2.4 實驗四 LR值整合 64
4.2.5 實驗五與參考文獻比較 67
第五章結論與未來展望 68
5.1 結論 68
5.2 未來展望 69
參考文獻 70

參考文獻

[1] X. Huang, et al., Spoken Language Processing: A Guide to Theory, Algorithm and System Development: Prentice Hall, 2001.
[2] L. Lin-Shan and L. Yumin, "Voice access of global information for broad-band wireless: technologies of today and challenges of tomorrow," Proceedings of the IEEE, vol. 89, pp. 41-57, 2001.
[3] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition: Prentice Hall, 1993.
[4] J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles: Addison-Wesley, 1977.
[5] G. R. Doddington, "Speaker recognition—Identifying people by their voices," Proceedings of the IEEE, vol. 73, pp. 1651-1664, 1985.
[6] D. O'Shaughnessy, "Speaker recognition," ASSP Magazine, IEEE, vol. 3, pp. 4-17, 1986.
[7] 鍾偉仁，"語者辨認與驗證之初步研究"，國立台灣大學電信工程研究所碩士論文，民國九十年
[8] 楊璧如，"語者與歌者識別"，國立清華大學資訊工程學研究所碩士論文，民國九十年
[9] D. A. Reynolds and R. C. Rose, "Robust text-independent speaker identification using Gaussian mixture speaker models," Speech and Audio Processing, IEEE Transactions on, vol. 3, pp. 72-83, 1995.
[10] D. A. Reynolds, "Comparison of background normalization methods for text-independent speaker verification," in EUROSPEECH, pp. 963-966, 1997.
[11] R. Vergin, et al., "Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition," Speech and Audio Processing, IEEE Transactions on, vol. 7, pp. 525-532, 1999.
[12] J. S. Jang, "Audio Signal Processing and Recognition," Available: http://neural.cs.nthu.edu.tw/jang/books/audioSignalProcessing/
[13] X. Huang, et al., Hidden Markov Models for Speech Recognition: Columbia University Press, 1990.
[14] F. Soong, et al., "A vector quantization approach to speaker recognition," pp. 387-390, 1985.
[15] Y. Linde, et al., "An Algorithm for Vector Quantizer Design," Communications, IEEE Transactions on, vol. 28, pp. 84-95, 1980.
[16] T. K. Moon, "The expectation-maximization algorithm," Signal Processing Magazine, IEEE, vol. 13, pp. 47-60, 1996.
[17] A. P. Dempster, et al., "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society, Series B, vol. 39, pp. 1-38, 1977.
[18] J. L. Gauvain and L. Chin-Hui, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," Speech and Audio Processing, IEEE Transactions on, vol. 2, pp. 291-298, 1994.
[19] C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Computer Speech & Language, vol. 9, pp. 185, 171-185, 171, 1995.
[20] O. Siohan, et al., "Joint maximum a posteriori adaptation of transformation and HMM parameters," Speech and Audio Processing, IEEE Transactions on, vol. 9, pp. 417-428, 2001.
[21] D. A. Reynolds, et al., "Speaker verification using Adapted Gaussian mixture models," Digital Signal Processing,IEEE Transaction on, vol. 10, pp. 19-41, 2000.
[22] 范世明，"高斯混合模型在語者辨識與國語語音辨認之應用"，國立交通大學電信工程研究所碩士論文，民國九十一年
[23] 陳柏仁，"應用投票演算法之語者確認系統硏究"，國立中央大學電機工程硏究所碩士論文，民國九十六年
[24] C. Jyh-Min and W. Hsiao-Chuan, "A method of estimating the equal error rate for automatic speaker verification," Chinese Spoken Language Processing, 2004 International Symposium on, pp. 285-288, 2004.
[25] A. Martin, et al., "The DET curve in assessment of detection task performance," In Proceedings of EUROSPEECH, vol. 4, pp. 1895-1898, 1997.
[26] C. Yi-Hsiang, et al., "Using Kernel Discriminant Analysis to Improve the Characterization of the Alternative Hypothesis for Speaker Verification," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 16, pp. 1675-1684, 2008.
[27] C. W. Hsu, et al., A practical guide to support vector classification, 2003.
[28] V. Wan and W. M. Campbell, "Support vector machines for speaker verification and identification," In Proceedings of IEEE Signal Processing, pp. 775-784 vol.2, 2000.
[29] V. Wan and S. Renals, "SVMSVM: support vector machine speaker verification methodology," Acoustics, Speech, and Signal Processing, IEEE Transactions on, pp. II-221-4 vol.2, 2003.
[30] S. Bengio and J. Mariethoz, "Learning the decision function for speaker verification," Acoustics, Speech, and Signal Processing, IEEE Transactions on, pp. 425-428 vol.1, 2001.
[31] D. Reynolds, "Speaker identification and verification using Gaussian mixture speaker models," Speech Commun., vol. 17, pp. 108, 91, 1995.
[32] L. Chi-Shi, et al., "Speaker verification using normalized log-likelihood score," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 4, pp. 56-56, 1996.
[33] C. Cortes and V. Vapnik, "Support-Vector Networks," MACHINE LEARNING, vol. 20, pp. 273--297, 1995.
[34] T. Cover and P. Hart, "Nearest neighbor pattern classification," Information Theory, IEEE Transactions on, vol. 13, pp. 27, 21, 1967.
[35] M. Zeidenberg, Neural networks in artificial intelligence: Ellis Horwood, 1990.
[36] Biano. Super Vector Machine 簡介. Available: http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/SVM3.pdf
[37] W. M. Campbell, et al., "Speaker Verification Using Support Vector Machines and High-Level Features," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, pp. 2085-2094, 2007.
[38] S. Raghavan, et al., "Speaker Verification using Support Vector Machines," SoutheastCon, 2006. Proceedings of the IEEE, pp. 188-191, 2006.
[39] V. N. Vapnik, Statistical Learning Theory: Wiley-Interscience, 1998.
[40] 李根逸，"支撐向量機教學文件(中文版)," Available: http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/SVM1.pdf
[41] C. J. C. Burges, "A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, vol. 2, pp. 121-167, 1998.
[42] R. Herbrich, Learning Kernel Classifiers: Theory and Algorithms: The MIT Press, 2001.
[43] B. Scholkopf, et al., "Comparing support vector machines with Gaussian kernels to radial basis function classifiers," Signal Processing, IEEE Transactions on, vol. 45, pp. 2758-2765, 1997.
[44] C. Yi-Hsiang, et al., "A Kernel-based Discrimination Framework for Solving Hypothesis Testing Problems with Application to Speaker Verification," Pattern Recognition, International Conference on, pp. 229-232, 2006.
[45] L. B. A. Higgins, and J. Porter, "Speaker verification using randomized phrase prompting," presented at the Digital Signal Process., 1991.
[46] J. D. A. E. Rosenberg, C. H. Lee, B. H. Juang, and F. K. Soong, "The use of cohort normalized scores for speaker verification," in Proc. ICSLP, Banff, AB, Canada, 1992, pp. 599-602.
[47] The NIST Year 2001 Speaker Recognition Evaluation. Available: http://www.itl.nist.gov/iad/mig/tests/sre/2001/index.html
[48] C. Chia-Ping and A. B. Jeff, "MVA Processing of Speech Features," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, pp. 257-270, 2007.
[49] 黃啟祥，"結合高斯混合及支撐向量機模型之語者確認研究"，國立中央大學電機工程研究所碩士論文，民國九十八年
[50] H. Wei, et al., "Combination of pitch and MFCC GMM supervectors for speaker verification," 2008, pp. 1335-1339.

指導教授

莊堯棠(Yau-tarng Juang)

審核日期

2010-6-21

推文