語者調適演算法及其應用於線上之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：40

、訪客IP：13.59.95.170

姓名

丁英智(Ing-Jr Ding ) 查詢紙本館藏

畢業系所

電機工程研究所

論文名稱

語者調適演算法及其應用於線上之研究

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

非監督式快速語者調適的目的就是希望語者能利用很少量的語料就能將語音模型做精準的調整，而藉著調整過後的模型，整個系統的性能對於目前的語者將會更好。而當語者持續對系統進行測試時，語音模型亦能同步地做相對的調整，故能使得語者在不知不覺地測試中達到提升辨識性能的結果。
在本論文中，我們即針對語者調適演算法做深入的研究。而這些演算法是貝氏調適法(Bayesian adaptation, MAP)、最大可能性線性迴歸法(Maximum Likelihood Linear Regression, MLLR)、修正最大可能性線性迴歸法(Modified Maximum Likelihood Linear Regression)、利用最大可能性理論求取轉換參數法、及利用貝氏理論求取轉換參數法。而經由實驗發現，同屬於參數轉換的後三者演算法在非監督式少量語料下調適皆有不錯的調適效果。而貝氏調適法由於是屬於精調式的方法故在非監督式調適下則無顯著的調適效果，另外，最大可能性線性迴歸法若不加入修正則在極少量語料下(僅一句、二句或甚少的語料)仍會發生調差語音模型的現象。再者，我們發現貝氏調適法雖不適合做非監督式的語者調適，但在自我監督式調適若配合適當設計的模糊控制器，則其會有較穩定的調適性能，並且當語料充足時則會有接近完全辨識的效果。
本論文的研究是採先以離線式的方式對各個調適演算法做性能上的評估，之後再以線上的方式測試語者調適的效果，而在線上測試時亦加入了對於調適語料確認的簡單方法。

關鍵字(中)

★ 語者調適
★ 隱藏式馬可夫模型

關鍵字(英)

★ HMM
★ speaker adaptation

論文目次

目　錄
? 摘　要
? 附圖目錄
? 附表目錄
? 第一章序論 1
1.1緣由1
1.2研究動機2
1.3研究方向及目標2
1.4論文大綱3
? 第二章語者調適相關技術 4
2.1貝氏調適法(MAP)4
2.2加入模糊控制器的貝氏調適法5
2.2.1 模糊理論概述5
2.2.2 加入模糊控制器之修正貝氏調適法7
2.3最大可能性線性回歸(MLLR)8
2.3.1 MLLR 理論8
2.3.2 MLLR高斯分布轉換矩陣的估計10
2.3.3 MLLR對角化之推導13
2.4向量場平滑化(VFS) 15
2.5加入權重之修正MLLR調適方法 17
2.6演算法之合併使用 20
2.7最大相似法則求取轉換參數 20
2.8最大事後機率法則求取轉換參數 23
2.9調適語料的確認技術 25
? 第三章系統架構 29
3.1實驗環境29
3.1.1 實驗設備29
3.1.2 系統設定29
3.1.3 訓練、調適及測試語料29
3.2初始模型─(使用右相關次音節模型)30
3.3辨識模組的組成及排列32
3.4調適實驗架構34
3.4.1 調適實驗初始模型34
3.4.2 監督批次式調適架構(SB)34
3.4.3 非監督式增量調適架構35
3.4.4 線上非監督式增量調適架構37
? 第四章實現及結果 39
4.1不特定語者實驗結果39
4.2含有模糊控制器之MAP自我調適實驗39
4.3含有權重之MLLR調適實驗42
4.4含有權重之MLLR+VFS調適實驗45
4.5利用 ML法則求取轉換參數調適實驗47
4.6利用 MAP法則求取轉換參數調適實驗50
4.7實驗結果總結54
4.7.1 極少調適語料(僅一句)時之調適性能比較54
4.7.2(a) 增量調適實驗之調適性能比較一56
4.7.2(b) 增量調適實驗之調適性能比較二59
4.7.3調適語料與測試語料內容相同之的調適實驗比較61
4.7.4 錯誤語料的排名昇降調適實驗比較64
4.7.5 調適時間的比較67
4.7.6 關於語料確認的實驗結果68
4.7.7 總結68
4.8線上辨識及調適的介面70
4.8.1 UI_1線上調適介面70
4.8.2 UI_2線上調適介面73
? 第五章結論及未來發展方向 75
5.1結論75
5.2未來研究方向75
?

? 參考文獻
? 附　錄

參考文獻

[1] X. Huang and K.F. Lee, “On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition”. IEEE Trans. on Speech and Audio Proc., Vol. 12, pp. 150—157, April 1993.
[2] Seyed Mohammad Ahadi-Sarkani, “Bayesian and Predictive Techniques for Speaker Adaptation”. Ph.D. Thesis, Cambridge University, U.K., 1996.
[3] Lawrence Rabiner and B-H. Juang, “Fundamentals of Speech Recognition”. Prentice Hall, 1993.
[4] C-H. Lee, C-H. Lin, and B-H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models”. IEEE Trans. on Sig. Proc., Vol. 39, No. 4, pp. 806—814, April 1991.
[5] M. Tonomura, T. Kosaka and S. Matsunaga, “Speaker Adaptation Based on Transfer Vector Field Smoothing using Maximum a Posteriori Probability Estimation”. ICASSP-95, Vol. 1, pp. 688—691, 1995.
[6] Heidi Christensen, “Speaker Adaptation of Hidden Markov Models using Maximum Likelihood Linear Regression”. MSc.E.E. Thesis. Aalborg University, Denmark, June 1996.
[7] C.J. Leggetter and P.C. Woodland, “Speaker Adaptation of HMM’s using Linear Regression”. Technical Report GUED/F-INFENG/ TR.181, Cambridge University, June 1994.
[8] C.J. Leggetter and P.C. Woodland, “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models”. Computer Speech and Language, Vol. 9, pp. 171—185, 1995.
[9] C.J. Leggetter and P.C. Woodland, “Flexible Speaker Adaptation using Maximum Likelihood Linear Regression”. Proc. ARPA Spoken Language Technology Workshop, pp. 104—109, Feb. 1995.
[10] C.J. Leggetter and P.C. Woodland, “Speaker Adaptation of continuous density HMMs using Multivariate Linear Regression”. ICSLP-94, Vol. 2, pp. 451—454, Yokohama, 1994.
[11] M.J.F. Gales, “Maximum Likelihood Linear Transformation for HMM-Based Speech Recognition”. Technical Report GUED/F-INFENG/TR.291, Cambridge University, May 1997.
[12] B.F. Necioglu, M. Ostendorf, and J.R. Rohlicek, “A Bayesian Approach to Speaker Adaptation for the Stochastic Segment Model”. ICASSP-92, Vol. 1, pp. 437—440, 1992.
[13] J-I. Takahashi and S. Sagayama, “Fast Telephone Channel Adaptation Based on Vector Field Smoothing Technique”. Second IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, pp. 97—100, 1994.
[14] J. Takahashi and S. Sagayama, “Vector-Field-Smoothed Bayesian Learning for Incremental Speaker Adaptation”. ICASSP-95, Vol. 1, pp. 696—699, 1995.
[15] J. Takahashi and S. Sagayama, “Minimum Classification Error Training for a Small Amount of Data Enhanced by Vector-Field-Smoothed Bayesian Learning”. ICASSP-96, Vol.: 2, pp. 597—600, 1996.
[16] A. Sankar and C-H. Lee, “A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition”. IEEE Trans. on Speech and Audio Proc., Vol. 4, pp. 190—202, May 1996.
[17] V.V. Digalakis, D. Rtischev and L.G. Neumeyer, “Speaker adaptation using constrained estimation of Gaussian mixtures”. IEEE Trans. Speech Audio Process. 3, pp. 357-366, 1995.
[18] J.T. Chien and H.C. Wang, “Telephone speech recognition based on Bayesian adaptation of hidden Markov models”. Speech Communication 22, pp. 369-384, 1997.
[19] L.G. Neumeyer, V.V. Digalakis and M. Weintraub, “Training issues and channel equalization techniques for the construction of telephone acoustic models using a high-quality speech corpus”. IEEE Trans. Speech Audio Process. 2, pp. 590-597, 1994.
[20] J.T. Chien, L.M. Lee and H.C. Wang, “Channel estimation for reference model adaptation in telephone speech recognitiion”. Proc. 4th European Conf. Speech Communication and Technology, Vol. 2, pp. 1541-1544, 1995.
[21] J.T. Chien, L.M. Lee and H.C. Wang, “Estimation of channel bias for telephone speech recognition”. Proc. Internat. Conf. Spoken Language Processing, Vol. 3, pp. 1840-1843, 1996.
[22] B. Widrow and S.D. Stearns, “Adaptive Signal Processing”. Prentice-Hall, Englewood Cliffs, NJ, pp.56-60, 1985.
[23] S. Homma, K. Aikawa, S. Sagayama, “Improved Estimation of Supervision in Unsupervised Speaker Adaptation”. Proc. ICASSP-97, Vol. 2, pp. 1023-1026, 1997.
[24] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”. Proc. IEEE, Vol. 77, No.2, pp. 257—286, Feb. 1989.
[25] R. Kuhn, P. Nguyen, J. —C. Junqua, N. Niedzielski, “Rapid Speaker Adaptation in Eigenvoice Space”. IEEE Trans. on Speech and Audio Proc., Vol. 8, pp. 695-707, Nov. 2000.

指導教授

莊堯棠(Yau-Tarng Juang)

審核日期

2001-6-1

推文