改良式梅爾倒頻譜參數應用於關鍵字萃取

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：80

、訪客IP：3.138.69.153

姓名

郭又禎(Yo-zhen Kuo) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

改良式梅爾倒頻譜參數應用於關鍵字萃取
(Improved Mel-scale Frequency Cepstral Coefficients for Keyword Spotting Technique)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在語音辨識系統中，梅爾倒頻譜係數(Mel frequency cepstral coefficients, MFCCs)為常用的特徵值參數，然而隨著MFCC被廣泛地應用，許多研究MFCC改良的方法也被提出，本論文針對三角帶通濾波器能量組進行權重調整，以粒子群演算法尋找濾波器組的最佳權重，演算法中以語料能量統計曲線與濾波器組包絡線曲線之差作為適應函數，使濾波器組更能符合人耳感受度，以提升辨識效果。由實驗結果得知，改良後的MFCC的辨識效果優於傳統MFCC，且其抗高頻雜訊能力也優於傳統MFCC。

摘要(英)

In the speech recognition system, Mel frequency cepstral coefficients (MFCCs) are the feature parameters that are used widely. Because of the wide applications of MFCC in the audio signal processing, lots of studies on the improvement of MFCCs were presented. In this study, we use particle swarm optimization algorithm to optimize the weight of MFCC filter bank. We utilize the difference between voice training database’s energy statistical curve and MFCC filter bank’s envelope as fitness function. Experimental results show that the proposed MFCCs method improves the recognition rate. In noisy environment experiments, the presented MFCCs method also improves the recognition performance.

關鍵字(中)

★ 梅爾倒頻譜系數
★ 粒子群演算法
★ 關鍵詞萃取

關鍵字(英)

★ MFCC
★ PSO
★ keyword spotting

論文目次

摘要 I
Abstract II
致謝辭 III
目錄 IV
圖目錄 VI
表目錄 VIII
附錄 IX
第一章緒論 1
1-1 研究動機 1
1-2 文獻回顧 2
1-3 章節架構 5
第二章語音識別 6
2-1 預處理 6
2-2 特徵值擷取 9
2-3 隱藏式馬可夫模型 12
2-4 聲學模型及模型訓練 14
2-4-1 聲學模型 14
2-4-2 模型訓練與參數重估 15
2-5 關鍵字萃取 20
2-5-1 關鍵字萃取架構 20
2-5-2 關鍵字辨識流程 22
2-5-3 ㄧ階動態規畫演算法 24
第三章粒子群演算法應用於濾波器組 26
3-1 粒子群演算法 26
3-2 梅爾濾波器組權重 30
3-2-1 遮蔽效應 30
3-2-2 變數設定及適應函數 31
3-2-3 調整梅爾濾波器組權重 34
第四章實驗結果 38
4-1 實驗環境 38
4-1 混合數對辨識率的影響 41
4-2 調整梅爾濾波器之權重實驗 43
4-2-1 調整三角帶通濾波器權重 43
4-2-2 調整三角帶通濾波器中心頻率且調整權重 46
4-3 雜訊環境實驗 49
第五章結論與未來展望 56
5-1 結論 56
5-2 未來展望 57
參考文獻 58
附錄 62

參考文獻

[1] A. J. Oxenham and C. J. Plack, “Suppression and the upward spread of masking,” Journal of the Acoustical Society of America, 104 (6), pp. 3500-3510, December 1998.
[2] B. H. Juang, “The past, present, and future of speech processing,” IEEE Signal Processing Magazine, pp. 24-28, May 1998.
[3] F. Zheng, G. Zhang and Z. Song, “Comparison of different implementations of MFCC,” Journal of Computer Science and Technology, Vol. 16, pp. 582-589, 2001.
[4] H. Ney, “The use of a one-stage dynamic programming algorithm for connected word recognition,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 32, pp. 263-271, 1984.
[5] H. Bourlard, B. D’hoore and J. M. Boite, “Optimizing recognition and rejection performance in word spotting systems,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. I373-I376, 1994.
[6] J. R. Deller, J. G. Proakis and J. H. L. Hansen, Discrete-time Processing of Speech Signals, Wiley-IEEE Press, 1999.
[7] J. Kennedy and R. Eberhart, “Particle swarm optimization,” IEEE International Conference on Neural Networks, Vol. 4, pp. 1942-1948, 1995.
[8] J. Junkawitsch, L. Neubauer, H. Hoge and G. Ruske, “A new keyword spotting algorithm with pre-calculated optimal thresholds,” Proceeding of Fourth International Conference on Spoken Language Proceedings, Vol. 4, pp. 2067-2070, 1996.
[9] J. Bradbury, “Linear predictive coding,” 2000.
[10] J. Z. Hua and Y. Zhen, “Voice conversion using Viterbi algorithm based on Gaussian mixture model,” IEEE International Symposium on Intelligent Signal Processing and Communication Systems, pp. 32-35, November 2007.
[11] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Recognition Signals, Prentice Hall, 1978.
[12] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” IEEE proceedings, Vol. 77, pp. 257-286, 1989.
[13] L. R. Rabiner and B. H. Juang, Fundamentals of Speech recognition, Prentice Hall, 1993.
[14] M. R. Schroeder, J. H. Hall and B. S. Atal, “Optimizing digital speech coders by exploiting masking properties of the human ear,” Journal of the Acoustical Society of America, pp. 1647-1652, 1979.
[15] M. W. koo, C. H. Lee and B. H. Juang, “Speech Recognition and Utterance Verification Based on a Generalized Confidence Score,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 9, pp. 821-832, November 2001.
[16] R. C. Rose and D. B. Paul, “A hidden Markov model based keyword recognition system,” IEEE Transactions on Acoustic, Speech, and Signal Processing, pp. 129-132, 1990.
[17] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 7, pp. 525-532, September 1999.
[18] R. J. Schilling and S. L. Harris, Fundamentals of Digital Signal Processing, Clarkson University Potsdam, NY.
[19] S. E. Levinson, L. R. Rabiner and M. M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition,” The Bell System Technical Journal, Vol. 62, April 1983.
[20] S. Umesh and R. Sinha, “A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children′s Speech,” IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 15(8), pp. 2418-2430, November 2007.
[21] S. Chakroborty and S. Goutam, “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.
[22] W. W. Hung and H. C. Wang, “On the use of Weighted Filter Bank Analysis for the derivation of Robust MFCCs,” IEEE Signal Processing Letters, Vol. 8, No.3, March 2001.
[23] W. Han, C. F. Chan, C. S. Choy and K. P. Pun, “An Efficient MFCC Extraction Method in Speech Recognition,” International Symposium on Circuits and Systems, pp. 21-24, 2006.
[24] Y. Shi and R. Eberhart, “A modified particle swarm optimizer,” IEEE International Conference on Evolutionary Computation Proceedings, pp. 69-73, 1998.
[25] 國音學，國立臺灣師範大學國音教編輯委員會，2001。
[26] 高志杰，粒子群演算法應用於梅爾濾波器組之研究，國立中央大學碩士論文， 2013。
[27] 大五碼，台灣財團法人資訊工業策進會，1983。
[28] 黃國彰，關鍵詞萃取與確認之研究，國立中央大學碩士論文，1996。
[29] 周智勳，最佳化梅爾倒頻譜係數之研究及其於音樂曲風辨識之應用，Journal of Information Technology and Applications, Vol. 4, No. 1, pp. 53-58, 2010.
[30] 蔡炎興，關鍵詞萃取即語者辨識系統之研製，國立中央大學碩士論文，2003。
[31] 王小川，張月琴，國科會計畫報告「國語語音資料庫(MAT)之標音技術與語音特徵參數分析，2000。
[32] 王小川，語音訊號處理，修訂二版，全華圖書股份有限公司，2009年2月。

指導教授

莊堯棠(Yau-Tarng Juang)

審核日期

2014-7-15

推文