語者辨識之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：52.15.223.239

姓名

賴彥輔(Yen-Fu Lai) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

語者辨識之研究
(The study of speaker recognition)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 應用投影法作受擾動奇異系統之強健性分析
★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究	★ 結合高斯混合超級向量與微分核函數之語者確認研究
★ 敏捷移動粒子群最佳化方法	★ 改良式粒子群方法之無失真影像預測編碼應用
★ 粒子群演算法應用於語者模型訓練與調適之研究	★ 粒子群演算法之語者確認系統
★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究	★ 利用語者特定背景模型之語者確認系統
★ 智慧型遠端監控系統	★ 正向系統輸出回授之穩定度分析與控制器設計
★ 混合式區間搜索粒子群演算法	★ 基於深度神經網路的手勢辨識研究
★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統	★ 非監督式快速語者調適演算法研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

在本論文中，我們針對文字不特定的語者辨識系統，以高斯混合模型來代表每一位語者的聲紋特性。但是傳統的高斯混合模型需要大量的訓練語料，而且模型訓練時間長；為了改善這些缺點，我們利用語者調適的技術，將一個訓練良好的語者不特定模型調適成特定的語者模型。
我們使用訊號偏壓移除的技術來消除訓練語料中的通道效應，以獲得一個乾淨的語者不特定模型；此外，由於語者不特定模型的訓練語料龐大，為了減短訓練時間，我們採用向量量化的方法，事先將訓練語料作分群，再對每一群訓練一個高斯混合模型。
我們也將比較不同調適方法在語者辨識系統上的效果。在調適語料充足時，貝氏調適法可以有不錯的效果；但是在少量調適語料的情況下，模型中沒有調適的高斯分布會使得辨識的效能降低。因此對於少量的調適語料，我們提出一個加入模糊控制器的向量場平滑化演算法，以提升系統的辨識效能。
在本論文中，以100位語者來作語者辨識實驗。由實驗的結果可發現，本論文所使用之方法能夠在少量的語料下，快速的訓練出語者模型，並且也有良好的辨識效果。

摘要(英)

In this thesis, we focus on the text-independent speaker recognition by using Gaussian mixture models (GMMs). However, general GMMs need large amounts of training data and training time; in order to improve these shortcomings, we use adapted GMM to replace the general GMMs.
We get a clean speaker-independent model by using signal bias removal (SBR), and reduce the training time by vector quantization (VQ). Furthermore, we apply different adaptation methods to adapt the speaker models from a speaker-independent model. Maximum a posteriori (MAP) estimation has a good performance. However, on the condition of sparse adaptation data, some untrained parameters may reduce the performance. For this problem, we propose the approach of vector field smoothing by using a fuzzy controller to improve the performance.

關鍵字(中)

★ 調適高斯混合模型
★ 語者辨識
★ 語者識別
★ 語者驗證

關鍵字(英)

★ adapted Gaussian mixture model
★ speaker verification
★ speaker recognition
★ speaker identification

論文目次

摘要．．．．．．．．．．．．．．．．．．．．．．．．．．．Ⅰ
目錄．．．．．．．．．．．．．．．．．．．．．．．．．．．Ⅱ
附圖目錄．．．．．．．．．．．．．．．．．．．．．．．．．V
附表目錄．．．．．．．．．．．．．．．．．．．．．．．．． Ⅶ
第一章緒論．．．．．．．．．．．．．．．．．．．．．．．1
1.1 研究動機．．．．．．．．．．．．．．．．．．．． 1
1.2 語者辨識概述．．．．．．．．．．．．．．．．．． 2
1.3 研究方向．．．．．．．．．．．．．．．．．．．． 4
1.4 章節概要．．．．．．．．．．．．．．．．．．．． 5
第二章語者辨識之基本技術．．．．．．．．．．．．．．．．6
2.1 特徵參數萃取．．．．．．．．．．．．．．．．．． 6
2.2 語者模型建立．．．．．．．．．．．．．．．．．． 10
2.2.1 高斯混合模型．．．．．．．．．．．．．．．． 10
2.2.2 向量量化．．．．．．．．．．．．．．．．． 11
2.2.3 EM演算法．．．．．．．．．．．．．．．． 14
2.2.4 語者模型訓練流程．．．．．．．．．．．．．． 15
2.3 語者辨識．．．．．．．．．．．．．．．．．．．．16
2.3.1 語者識別．．．．．．．．．．．．．．．．． 16
2.3.2 語者驗證．．．．．．．．．．．．．．．．． 17
2.3.3 背景語者模型．．．．．．．．．．．．．．．19
2.3.4 辨識效能評估．．．．．．．．．．．．．．．． 20
第三章系統架構．．．．．．．．．．．．．．．．．．．．．23
3.1 語者不特定模型的訓練．．．．．．．．．．．．．．24
3.1.1 訊號偏壓移除．．．．．．．．．．．．．．． 24
3.1.2 向量量化高斯混合模型．．．．．．．．．．．． 27
3.2 調適的語者模型．．．．．．．．．．．．．．．．．29
3.2.1 貝氏調適法．．．．．．．．．．．．．．．． 30
3.2.2 調適高斯混合模型．．．．．．．．．．．．．． 31
3.2.3 對數相似度比快速計分法．．．．．．．．．．． 35
3.3 少量語料的語者模型調適方法．．．．．．．．．．．37
3.3.1 向量場平滑化．．．．．．．．．．．．．．． 38
3.3.2 加入模糊控制器之向量場平滑化．．．．．．． 42
第四章語者辨識實驗．．．．．．．．．．．．．．．．．． 47
4.1 語音資料庫．．．．．．．．．．．．．．．．．．．47
4.2 語者不特定模型實驗．．．．．．．．．．．．．．． 48
4.2.1 高斯分布個數的影響．．．．．．．．．．．． 49
4.2.2 向量量化高斯混合模型的影響．．．．．．．．． 51
4.2.3 訊號偏壓移除的影響．．．．．．．．．．．．． 53
4.3 調適語者模型實驗．．．．．．．．．．．．．．．．56
4.3.1 傳統高斯混合模型與調適高斯混合模型的比較． 56
4.3.2 調適語料長度對貝氏調適法的影響．．．．．．． 60
4.3.3 加入向量場平滑化的影響．．．．．．．．．．． 62
第五章結論與未來展望．．．．．．．．．．．．．．．．．66
5.1 結論．．．．．．．．．．．．．．．．．．．．．．66
5.2 未來展望．．．．．．．．．．．．．．．．．．．．69
參考文獻．．．．．．．．．．．．．．．．．．．．．．．．．71

參考文獻

【1】L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
【2】X. Huang, A. Acero, H. Hon, Spoken Language Processing, Prentice Hall, 2001.
【3】C. Becchetti, L. Prina Ricotti, Speech Recognition- Theory and C++ implementation, Johy Wiley and Sons, 1999.
【4】G. R. Doddington, “Speaker recognition – identifying people by their voices”, Proc. IEEE, 73(11): 1651-1664, 1985.
【5】Campbell J.P. Jr., “Speaker Recognition: A Tutorial”, Proceedings of the IEEE, Vol.85, Sep 1997, pp. 1437-1462.
【6】D. A. Reynolds and R. C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Models”, IEEE Tran. On Speech and Audio Processing, 3(1): 72-83, January 1995.
【7】John R. Deller, Jr., John G. Proakis, John H. L. Hansen, “Discrete-Time Processing of Speech Signals”, 1987.
【8】Todd K. Moon, ”The Expectation-Maximization Algorithm”, IEEE Signal Processing Magazine, November 1996, pp. 47-60.
【9】Chi-Shi Liu, Hsiao-Chuan Wang and Chin-Hui Lee, “Speaker verification using normalized log-likelihood score”, IEEE Trans.on Speech and Audio Processing, Jan 1996, pp.57-60.
【10】A. E. Rosenberg, J. Delogn, C. H. Lee, B. H. Juang, and F. K. Soong, “The Use of cohort normalized scores for speaker verification”, In Proc. ICASSP, Nov 1992, pp.59-62.
【11】Y. Zhang, D. Zhang and Z. Shu, “A novel text-independent speaker verification method based on the global speaker model”, IEEE Trans. Systems, Man, and Cybernetics, 30(5):598-602, 2000.
【12】D. E. Sturim, D. A. Reynolds, R. B. Dunn and T. F. Quatieri, ” Speaker Verification Using Text-constrained Gaussian Mixture Models”, ICASSP ’02, Vol. 1, pp.677-680, 2002.
【13】Mazin G. Rahim and B. H. Juang, ”Signal Bias Removal by Maximum Likelihood Estimation for Robust Telephone Speech Recognition”, IEEE Trans. Speech and Audio Processing, Vol. 4, pp. 19-30, January, 1996.
【14】C. H. Lee, C. H. Lin, and B. H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models”, IEEE Trans. on Sig. Proc., Vol. 39, No. 4, pp. 806-814, April 1991.
【15】D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker Verification Using Adapted Gaussian Mixture Models”, Digital Signal Processing 10, pp. 19-41, 2000.
【16】D. A. Reynolds, “Comparison of background normalization methods for text-independent speaker verification”, In Proceedings of the European Conference on Speech Communication and Technology, September 1997, pp.963-966.
【17】J. Takahashi and S. Sagayama, “Vector-Field-Smoothed Bayesian Learning for Incremental Speaker Adaptation”, ICASSP-95, Vol. 1, pp696-699, 1995.
【18】M. Tonomura, T.Kosaka, S. Matsunaga, “Speaker Adaptation Based on Transfer Vector Field Smoothing Using Maximum A Posteriori Probability Estimation”, ICASSP-95, Vol. 1, pp. 688-691, 1995.
【19】吳金池，“語者辨識系統之研究”，國立中央大學電機工程研究所碩士論文，民國九十一年。
【20】鍾偉仁，“語者辨認與驗證之初步研究”，國立台灣大學電信工程研究所碩士論文，民國九十年。

指導教授

莊堯棠(Yau-Tarng Juang)

審核日期

2003-6-13

推文