博碩士論文 100521069 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:9 、訪客IP:18.119.107.161
姓名 吳昱弘(Yu-hung Wu)  查詢紙本館藏   畢業系所 電機工程學系
論文名稱 粒子群演算法應用於語者模型訓練與調適之研究
(PSO Algorithm for Speaker Model Training and Adaptation)
相關論文
★ 小型化 GSM/GPRS 行動通訊模組之研究★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之 語者確認研究★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用★ 粒子群演算法之語者確認系統
★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究★ 利用語者特定背景模型之語者確認系統
★ 智慧型遠端監控系統★ 正向系統輸出回授之穩定度分析與控制器設計
★ 混合式區間搜索粒子群演算法★ 基於深度神經網路的手勢辨識研究
★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統★ 非監督式快速語者調適演算法研究
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 本論文將粒子群演算法應用於語者模型訓練與調適。由於簡單的概念、快速收斂與容易實現,粒子群演算法比基因演算法在處理各式各樣的工程問題上更有效。目前在本論文所使用的粒子群演算法,都是使用沒有改良過的粒子群演算法,原因在於我們的適應函數是用高斯混合的機率密度函數,此函數沒有過於複雜的數學式,所以我們僅使用最原始的粒子群演算法。在傳統的語者確認系統中,模型參數估計大多使用Expectation-maximization (EM) 演算法,在模型收斂過程中,EM演算法要花較多的時間去訓練模型,所以我們提出新的訓練方法,使用粒子群演算法來收斂模型。並從實驗的結果獲得比EM演算法更小的相等錯誤率與決錯成本函數,且其訓練模型的速度也優於EM演算法,確定所提方法的有效性。此外,在做語者模型調適時,平均向量是語者不特定模型最重要的參數,本論文結合粒子群演算法來獲得最佳的平均向量,實驗的結果顯示,本論文所提之方法,比起原本使用的Maximum a Posteriori (MAP) 調適法,可以使語者確認系統的效能提升。
摘要(英) This thesis introduces the application of Particle swarm optimization (PSO) techniques to speaker model training and adaptation problems. In convention, the Expectation-maximization (EM) algorithm is the dominant approach for model parameter estimation in speaker verification. The experimental results demonstrate that faster convergent rates for training and more accurate rates for speaker verification are obtained using the proposed PSO algorithm as compared to the EM algorithm. In addition, this thesis also utilized proposed the PSO algorithm to adjust the mean parameter in the speaker model adaptation. Experimental results again show that the proposed method outperforms the Maximum a Posteriori (MAP) adaptation in the speaker verification problem.
關鍵字(中) ★ 粒子群演算法
★ 語者模型
關鍵字(英) ★ PSO algorithm
★ Speaker model
論文目次 摘要 I
Abstract II
目錄 III
圖目錄 V
表目錄 VI
第一章 緒論 1
1.1研究動機 1
1.2語者辨識概述 2
1.3語者調適概述 4
1.4研究方向 4
1.5文獻探討 5
1.6章節架構 6
第二章 語者辨認的基本技術 8
2.1 語音特徵參數擷取 8
2.1.1音框化處理(Framing) 9
2.1.2預強調(Pre-Emphasis) 9
2.1.3加窗處理(Windowing) 10
2.1.4參數抽取 10
2.1.5差量倒頻譜參數 11
2.2 高斯混合模型 12
2.2.1模型描述 12
2.2.2語料多寡與差異對選擇高斯個數的影響 13
第三章 粒子群演算法應用於語者模型訓練 15
3.1 簡介 15
3.2 PSO演算法基本公式和模式 15
3.3 慣性權重 16
3.4 PSO演算法最佳化模型參數 21
第四章 Expectation-Maximization演算法 24
4.1 EM演算法 24
第五章 PSO演算法應用於語者模型調適 28
5.1 貝氏調適法 28
5.2 MAP調適法結合PSO演算法 32
第六章 實驗與討論 35
6.1 實驗語料 35
6.2效能評估方法 36
6.2.1相等錯誤率(Equal Error Rate, ERR) 36
6.2.2決策成本函數(Decision Cost Function, DCF) 37
6.3 EM與PSO比較 38
6.3.1實驗一 EM演算法最大化平均向量的影響 38
6.3.2實驗二 使用PSO演算法找出最佳的平均向量 41
6.4 MAP與MAP-PSO 比較 44
6.4.1實驗三 使用PSO演算法在語者模型調適中 44
6.5 EM-MAP與PSO-MAP-PSO 比較 46
6.5.1實驗四 使用PSO演算法在語者模型訓練與調適上 46
第七章 結論與未來展望 49
7.1結論 49
7.2 未來展望 51
參考文獻 52
參考文獻 [1] 吳金池, “語者辨識系統之研究,” 中央大學碩士論文, 民國90年. [2] L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol.77, pp. 257-286, 1989.
[3] Y. Tabet and M. Boughazi, “Speech synthesis techniques. A survey,” 7th International Workshop on Systems, Signal Processing and their Applications, pp. 67-70, 2011.
[4] 林品宏, “關鍵詞萃取系統及語音聲控車之應用,” 中央大學碩士論文, 民國101年.
[5] 呂易宸, “語音門禁系統,” 中央大學碩士論文, 民國100年.
[6] W. M. Campbell and D. E. Sturim, Member and D. A. Reynolds, “Support vector machines using GMM supervectors for speaker verification,” IEEE Signal Processing Letters, vol.13, pp. 308-311, 2006.
[7] D. Burton, “Text-dependent speaker verification using vector quantization source coding,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol.35, pp. 133-143, 1987.
[8] A. Roland and C. Michael and L. T. Harvey, “Score Normalization for Text-Independent Speaker Verification Systems,” ScienceDirect Digital Signal Processing, vol.10, pp. 42-54, 2000.
[9] 丁英智, “語者調適演算法及其應用於線上之研究,” 中央大學碩士論文, 民國90年.
[10] B. Chen and J. W. Kuo and W. H. Tsai, “Lightly supervised and data-driven approaches to Mandarin broadcast news transcription ,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp. I - 777-80, 2004.
[11] M. Bacchiani and B. Roark, “Unsupervised language model adaptation,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp. I-224 - I-227, 2003.
[12] A. R. Richard and F. W. Homer, “Mixture Densities, Maximum Likelihood and the Em Algorithm,” Society for Industrial and Applied Mathematics, vol.26, pp. 195-239, 1984.
[13] A. Christophe and D. F. Nando and D. Arnaud and I. J. Michael, “An Introduction to MCMC for Machine Learning,” Machine Learning, vol.50, pp. 5-43, 2003.
[14] B. S. Atal and L. Rabiner, “A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol.24, pp. 201-212, 1976.
[15] K. Igor, “Machine learning for medical diagnosis: history, state of the art and perspective,” Artificial Intelligence in Medicine, vol.23, pp. 89-109, 2001.
[16] I. Paik and E. Fujikawa and K. Sangkyung, “Aggregating Web Service matchmaking variants using web search engine and machine learning,” 2nd International Symposium on, Aware Computing, pp. 191-195, 2010.
[17] B. Biggio and G. Fumera and F. Roli, “Learning sparse kernel machines with biometric similarity functions for identity recognition,” IEEE Fifth International Conference on, Biometrics: Theory, Applications and Systems, pp. 325-330, 2012.
[18] T. Kinsman and M. Fairchild and J. Pelz, “Color is not a metric space implications for pattern recognition, machine learning, and computer vision,” Western New York, Image Processing Workshop, pp. 37-40, 2012.
[19] H. Chao and W. J. Liu, “Speaker adaptation of stochastic segment models using Maximum Likelihood Linear Regression,” 7th International Symposium on, Chinese Spoken Language Processing, pp. 119-122, 2010.
[20] B. Mak and T. J. Kwok and S. Ho, “Using kernel PCA to improve eigenvoice speaker adaptation,” International Conference on, Machine Learning and Cybernetics, vol.5, pp. 3062-3067, 2004.
[21] 張文杰, “模型調適之語者識別系統,” 中央大學碩士論文, 民國94年.
[22] 李信廷, “改善最小錯誤鑑別式之語者辨認方法,” 中央大學碩士論文, 民國95年.
[23] R. Saeidi and H. R. S. Mohammadi and R. D. Rodman, “Particle Swarm Optimization for Sorted Adapted Gaussian Mixture Models,” IEEE Transactions on Audio, Speech, and Language Processing, vol.17, pp. 344-353, 2009.
[24] X. Q. Su and X. L. Fu and C. Jian, “Design of Sound Recognition System Based on Modified Neural Network,” Applied Mechanics and Materials, vol.278-280, pp. 1178-1181, 2013.
[25] B. A. Laleh and M. N. Vahid, “Speech Enhancement Using Particle Swarm Optimization Techniques,” International Conference on Measuring Technology and Mechatronics Automation, vol.3, pp. 441-444, 2010.
[26] J. Gao and J. Y. Peng and Z. Li, “Application of Improved PSO-SVM Approach in Image Classification,” Symposium on Photonics and Optoelectronic, pp. 1-4, 2010.
[27] M. S. Kim and I. H. Yang and H. J. Yu, “Maximizing Distance between GMMs for Speaker Verification Using Particle Swarm Optimization,” Fourth International Conference on Natural Computation, vol.6, pp. 175-178, 2008.
[28] A. R. Douglas and F. Q. Thomas and B. D. Robert, “Speaker Verification Using Adapted Gaussian Mixture Models,” Digital Signal Processing, vol.10, pp. 19-41, 2000.
[29] J. Kennedy and R. Eberhart, “Particle swarm optimization,” IEEE International Conference on Neural Networks, vol.4, pp. 1942-1948, 1995.
[30] Z. Caiqing and Z. Jinging and G. Xihua, “The Application of Hybrid Genetic Particle Swarm Optimization Algorithm in the Distribution Network Reconfigurations Multi-Objective Optimization,” Third International Conference on Natural Computation, vol.2, pp. 455-459, 2007.
[31] 賴易烽, “粒子群演算法應用於語者確認系統之研究,” 中央大學碩士論文, 民國101年.
[32] D. Y. Sha and C. Y. Hsu, “A hybrid particle swarm optimization for job shop scheduling problem,” Computers & Industrial Engineering, vol.51, pp. 791-808, 2006.
[33] Y. Shi and R. C. Eberhart, “Parameter Selection in Particle Swarm Optimization,” Evolutionary Programming VII. Lecture Notes in Computer Science, vol.1447, pp. 591–600, 1998.
[34] A. P. Dempster and N. M. Laird and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, vol.39, pp. 1-38, 1977.
[35] Y. Wu, “Based on Machine Learning of Data Mining to Further Explore,” International Conference on, Computer Science and Information Processing, pp. 1235-1238, 2012.
[36] M. T. Islam and M. Shaikh and A. Nayak and S. Ranganathan, “Extracting Biomarker Information Applying Natural Language Processing and Machine Learning,” 4th International Conference on, Bioinformatics and Biomedical Engineering, pp. 1-4, 2010.
[37] D. Ashlock and E. Warner, “Classifying synthetic and biological DNA sequences with side effect machines,” IEEE Symposium on, Computational Intelligence in Bioinformatics and Computational Biology, pp. 22-29, 2008.
[38] P. D. Yoo and M. H. Kim and T. Jan, “Financial Forecasting: Advanced Machine Learning Techniques in Stock Market Analysis,” IEEE INMIC 9th International Multitopic Conference, pp. 1-7, 2005.
[39] S. B. E. Raj and A. A. Portia, “Analysis on credit card fraud detection methods,” International Conference on, Computer, Communication and Electrical Technology, pp. 152-156, 2011.
[40] R. J. Mammone and Z. Xiaoyu and R. P. Ramachandran, “Robust speaker recognition: a feature-based approach,” Signal Processing Magazine, vol.13, 1996.
[41] R. Kuhn and P. Nguyen and N. Niedzielski, “Rapid speaker adaptation in eigenvoice space,” IEEE Transactions on Speech and Audio Processing, vol.8, pp. 695-707, 2000.
[42] The NIST Year 2001 Speaker Recognition Evaluation, Available at http://www.itl.nist.gov/iad/mig/tests/sre/2001/index.html.
指導教授 莊堯棠(Yau-tarng Juang) 審核日期 2013-7-5
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明