雜訊環境下經驗模態分解法於語音辨識之應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：14

、訪客IP：18.117.158.147

姓名

陳文杰(Wen-Jay Chen) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

雜訊環境下經驗模態分解法於語音辨識之應用
(An Application of Empirical Mode Decomposition Method to Speech Recognition in Noisy Environment)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在本論文中，我們應用黃鍔博士所提出的經驗模態分解法（Empirical Mode Decomposition, EMD），利用信號內部變化的時間尺度作為能量與頻率的直接析出，可將信號分解成數個本質模態函數（Intrinsic Mode Function, IMF）之組合，其基底含有信號中不同尺度的特性，能夠表達信號中之物理特性。利用基底中的資訊應用在關鍵詞萃取技術，改善在低訊雜比、均勻分布（uniformly distributed）的白雜訊之環境的辨識率。
我們利用黃鍔博士所提出的經驗模態分解法，藉由所析出的第一個本質模態函數，對雜訊環境下的語音信號做去除雜訊的定性與定量的初步分析，在「前端處理」時去除在語音信號中的部分雜訊，辨識過程中可以改善辨識率，達到類似語音增強（speech enhancement）的效果。此外，我們也利用經驗模態分解法所析出的第一個本質模態函數估測信號的訊雜比，藉以進行「模型切換」的動作，決定在辨識階段時所需要的語音模型，實驗結果也發現可以改善系統在低訊雜比環境下的辨識率。
最後，我們將上述2種方法結合，希望能夠再改善系統辨識率。經由實驗的結果，我們可以正確地估算測試語料的訊雜比屬於在哪各區間，並在辨識階段切換至該區間較佳的語音模型進行辨識，切換正確率可達到97.95%。經由此方式我們在訊雜比SNR = 0dB與SNR = 10dB時，可分別達到相對改善率為56.25%與27.56%。

摘要(英)

In this thesis, we study the Dr. Huang’’s Empirical Mode Decomposition method, EMD, which use yardstick change of time within signals to resolve signals into the combination of several Intrinsic Mode Functions, IMFs. IMFs contain different characteristics of signals and can express the physical characteristic in signals. We apply the information of the first IMF to the keyword spotting technique, and found that can improve recognition rate in different uniformly distributed SNRs of white noise environment.
We apply EMD method to speech signals and make noise reduction procedure in the front-end processing according to qualitative and quantitative initially analysis of the first IMF. This method can improve the recognition rate in noisy conditions and get results like speech enhancement. In addition, we use the information of the first IMF to estimate SNR of a speech signal and switching system to the better acoustic model in recognition stage. Experimental results found that can improve recognition rate in low SNR environment.
Finally, above-mentioned two kinds of methods are combined to improve the recognition rate systematically again. Results show we can estimate correctly test material in which SNR condition and switch system to the better acoustic model in recognition stage. By this way, we can switch correctly up to 97.95% and reach relative improvement 56.25% and 27.56% at SNR=0dB and SNR=10dB conditions respectively.

關鍵字(中)

★ 語音辨識
★ 經驗模態分解法
★ 白雜訊

關鍵字(英)

★ Speech Recogniton
★ Empirical Mode Decomposition
★ White Noise

論文目次

摘要 …………………………………………………………………… Ⅰ
英文摘要…………………………………………………………………II
目錄 ……………………………………………………………………IV
附圖目錄 ………………………………………………………………VI
附表目錄 ……………………………………………………………… VII
第一章緒論……………………………………………………………1
1.1 研究動機……………………………………………………1
1.2 關鍵詞萃取概述……………………………………………1
1.3 研究目標……………………………………………………2
1.4 章節概要 ……………………………………………………3
第二章語音辨識基本技術……………………………………………4
2.1 特徵參數擷取 ………………………………………………4
2.2 次音節模型的訓練與建立……………………………………8
2.2.1 聲學模型……………………………………………… 9
2.2.2 狀態排列………………………………………………13
2.2.3 維特比演算法…………………………………………14
2.2.4 訓練流程………………………………………………15
2.3 語音辨識……………………………………………………17
2.3.1 連續語音辨識技術模型……………………………17
2.2.2 辨識流程……………………………………………19
第三章經驗模態分解法 …………………………………………20
3.1 瞬時頻率……………………………………………………21
3.2 本質模態函數………………………………………………21
3.3 經驗模態分解法……………………………………………23
3.4 經驗模態分解法於語音辨識之應用………………………29
第四章實驗與討論 ………………………………………………… 32
4.1 實驗環境………………………………………………………32
4.2 前端處理實驗與討論…………………………………………33
4.2.1 傳統語音辨識在雜訊環境下的辨識率………………33
4.2.2 語音信號在雜訊環境下的亂度分析…………………34
4.2.3 全域型雜訊降低指標對辨識率的定性分析…………36
4.2.4 適應型雜訊降低指標對辨識率的分析………………37
4.3 模型切換實驗與討論………………………………………39
4.3.1 模型混合數對不同訊雜比環境辨識率的分析………39
4.3.2 前端處理與模型切換的結合…………………………41
第五章結論與未來展望……………………………………………… 44
5.1 結論……………………………………………………………44
5.2 未來展望………………………………………………………45
參考文獻 …………………………………………………………………47

參考文獻

[1] D. Gabor, “Theory of communication,” Proceedings of the Institute of
Electrical Engineers, vol. 26, pp. 429-457, 1993, 1946.
[2] E. Bedrosian, “A product theorem for Hilbert transform,” Proc. of IEEE, vol.
51, pp. 868-869, 1963.
[3] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An introduction to the
application of the theory of probabilistic function of a markov process to
automatic speech recognition,” The Bell System Technical Journal, vol. 62, no.
4, April 1983.
[4] H. Ney, “The use of a one stage dynamic programming algorithm for connected
word recognition,” IEEE Trans. Acoustic, Speech, Signal Processing, vol.32,
no.2, April 1984.
[5] L. R. Rabiner, “A tutorial on hidden markov models and selected application in
speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, Feb. 1989.
[6] L. R. Rabiner and B. H. Juang, Fundamentals of speech recognition, Prentice
Hall, New Jersey, 1993.
[7] L. Cohen, Time-Frequency Analysis, Prentice-Hall, Englewood Cliffs, New
Jersey, 1995.
[8] M. G. Rahim, C. H. Lee, and B. H. Juang, “Discriminative utterance
verification for connected digits recognition,” IEEE Trans. on Speech and
Audio Processing, vol. 5, no. 3, May 1997.
[9] N. E. Huang, Z. Shen, S. R. Long, et al., “The Empirical Mode Decomposition
and Hilbert spectrum for nonlinear and non-stationary time series analysis,”
Proc. Roy. Soc. London A, vol. 454 , pp. 903-995, 1998.
[10] B. H. Juang, “The past, present, and future of speech processing,” IEEE Trans.
on Signal Processing, pp. 24-28, May 1998.
[11] T. Kawahara, C. H. Lee, and B. H. Juang, “Flexible speech understanding
48
based on combined key-phrase Detection and Verification,” IEEE Trans. on
Speech and Audio Processing, vol. 6, Nov. 1998.
[12] N. E. Huang, Z. Shen, and R. L. Long, “A new view of nonlinear water waves :
the Hilbert spectrum,” Ann. Rev. Fluid Mech, vol. 31, pp. 417-457, 1999.
[13] Chi-Min Liu, Chin-Chih Chiu, and Hung-Yuan Chang, “Design of vocabulary
-independent mandarin keyword spotters,” IEEE Trans. on Speech and Audio
Processing, Vol. 8, No. 4, July 2000.
[14] Lin Xin and Bing-Xi Wang, “Utterance verification for spontaneous mandarin
speech keyword spotting,” IEEE Proceedings ICII 2001, Beijing, vol.3. pp.
397-401, 2001.
[15] Jeih-Weih Huang, Jia-Lin Shen, and Lin-Shan Lee, “New Approaches for
Domain Transformation and Parameter Combination（PMC）Techniques,” IEEE
Trans. on Speech and Audio Processing, Vol. 9, No. 8, Nov. 2001.
[16] M. W. Koo, C. H. Lee, and B. H. Juang, “Speech recognition and utterance
verification based on a generalized confidence score,” IEEE Trans. on Speech
and Audio Processing, Vol. 9, No. 8, Nov. 2001.
[17] N. E. Huang, M. L. Wu, S. R. Long, et al., “A confidence limit for the
Empirical Mode Decomposition and Hilbert spectral analysis,” Proc. Roy. Soc.
London A, vol. 459, pp. 2317-2345, 2003.
[18] Z. Wu and N. E. Huang, "A study of the characteristics of white noise using the
empirical mode decomposition method," Proc. Roy. Soc. London A, vol. 460,
pp. 1597–1611, 2004.
[19] 黃國璋, “國語語音強健辨識之研究,” 國立中央大學博士論文, 中華民國九
十二年五月。
[20] 蔡炎興, “關鍵詞萃取及語者辨識系統之研製,” 國立中央大學碩士論文, 中
華民國九十二年六月。
[21] 陳厚君, “經驗模態分解法之語音辨識,” 國立中央大學碩士論文, 中華民國
九十四年六月。

指導教授

莊堯棠(Yau-Tarng Juang)

審核日期

2006-7-5

推文