改良式梅爾倒頻譜係數混合多種語音特徵之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：35

、訪客IP：18.116.63.174

姓名

唐曲亮(Chu-Liang Tang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

改良式梅爾倒頻譜係數混合多種語音特徵之研究
(Improved Mel Frequency Cepstral Coefficients Combined with Multiple Speech Features)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 利用語者特定背景模型之語者確認系統
★ 智慧型遠端監控系統	★ 正向系統輸出回授之穩定度分析與控制器設計
★ 混合式區間搜索粒子群演算法	★ 基於深度神經網路的手勢辨識研究
★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統	★ 非監督式快速語者調適演算法研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

本篇論文主要研究的主題是語音辨識系統中的特徵值擷取以及特徵參數補償的部分，前者目的是將不同的特徵值做合併，其中將線性預估倒頻譜係數與梅爾倒頻譜係數結合的效果是最佳的，本論文使用高斯型的梅爾濾波器組來取代原本梅爾倒頻譜係數中的三角濾波器組，而經過實驗證實，將線性預估倒頻譜係數與梅爾倒頻譜係數以1:1的方式做合併效果是最好的，除了將特徵參數做合併之外，本論文還利用倒頻譜平均值與變異數正規化法來補償倒頻譜係數並提升整體系統的辨識效果。

摘要(英)

This thesis studies the speech feature extracting and feature compensation in speech recognition. Several speech features are selected for combinations. The best one is cascading Linear Prediction Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficient (MFCC). The MFCCs used here are obtained by utilizing a Gaussian Mel-Frequency band instead of using a triangular filter bank. And by experiments, it is found that the best combination ratio of LPCC and MFCC is 1:1. The thesis also showed that further improved performance is possible if Cepstral Mean and Variance Normalization (CMVN) is added.

關鍵字(中)

★ 語音辨識
★ 特徵合併
★ 梅爾倒頻譜係數
★ 關鍵詞萃取

關鍵字(英)

★ speech recognition
★ feature combination
★ MFCC
★ keyword spotting

論文目次

摘要 I
Abstract II
致謝 III
目錄 IV
圖目錄 VI
表目錄 VII
第一章緒論 1
1.1 研究動機 1
1.2 研究目標 1
1.3 文獻回顧 2
1.4 章節摘要 4
第二章語音訊號處理 5
2.1 特徵參數擷取[17] 5
2.2 隱藏式馬可夫模型 10
2.3 聲學模型 12
2.4 模型訓練 13
2.4.1 狀態排列(State Arrangement) 14
2.4.2 初始化(Initial Model) 14
2.4.3 維特比演算法(Viterbi Algorithm) 15
2.4.4 參數調適、機率估算 16
第三章特徵參數的改良以及合併 17
3.1 LPCC 17
3.2 MFCC 20
3.3 PLPCC 23
3.4 多種梅爾濾波器組 25
3.5 合併特徵參數的方法 29
第四章關鍵字萃取 30
4.1 關鍵詞萃取架構 30
4.2 辨識流程 32
第五章實驗結果 35
5.1 實驗環境 35
5.2 特徵補償的實驗 37
5.3 單一特徵參數實驗 40
5.4合併特徵參數實驗 41
5.5 維度分配實驗 44
第六章結論與未來展望 46
6.1 結論 46
6.2 未來展望 47
參考文獻 48
附錄 53

參考文獻

[1]J. P. Campbell and JR., “Speaker recognition: a tutorial,” Proceedings of the IEEE , vol. 85, no. 9, pp. 1437-1462, 1997.
[2]呂易宸，「語音門禁系統」，桃園：國立中央大學碩士論文，2011。
[3]B. H. Juang and S. Furui, “Automatic recognition and understanding of spoken language - a first step toward natural human-machine communication,” Proceedings of the IEEE , vol. 88, no. 8, pp. 1142-1165, 2000.
[4]林品宏，「關鍵詞萃取系統及語音聲控車之應用」，桃園：國立中央大學碩士論文，2012。
[5]J. Bradbury, “Linear predictive coding,” Online PDF, pp. 1-23, 2000.
http://my.fit.edu/~vkepuska/ece5525/lpc_paper.pdf
[6]R. Vergin, D. O′Shaughnessy and A. Farhat, “Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition,” IEEE Transactions on Speech and Audio Processing, vol.7, no. 5, pp. 525-532, 1999.
[7]B. J. Shannon and K. K. Paliwal, “Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition,” Science Direct Speech Communication, vol. 48, pp. 1458-1485, 2006.
[8]J. Wu and J. Yu, “An improved arithmetic of MFCC in speech recognition system,” International Conference on Electronics, Communications and Control (ICECC), pp. 719-722, 2011.
[9]J. G. Wilpon, L. Rabiner, C. H. Lee and E. R. Goldman, “Automatic recognition of keywords in unconstrained speech using hidden Markov models,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 38, no. 11, pp. 1870-1878, 1990.
[10]L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE , vol. 77, no. 2, pp. 257-286, 1989.
[11]R. P. Lippmann, “An introduction to computing with neural nets,” ASSP Magazine, IEEE , vol. 4, no. 2, pp. 4-22, 1987.
[12]A. E. Rosenberg, C. H. Lee and F. K. Soong, “Cepstral channel normalization techniques for HMM-based speaker verification,” International Conference on Spoken Language Processing (ICSLP), pp. 1835-1838, 1994.
[13]O. Viikki and K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” ScienceDirect Speech Communication, vol. 25, pp. 133-147, 1998.
[14]C. W. Hsu and L. S. Lee, “Higher order cepstral moment normalization for Improved robust speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 17, no. 2, pp. 205-220, 2009.
[15]N. V. Prasad and S. Umesh, “Improved cepstral mean and variance normalization using Bayesian framework,” IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 156-161, 2013.
[16]F. Hilger and H. Ney, “Quantile based histogram equalization for noise robust large vocabulary speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 3, pp. 845-854, 2006.
[17]王小川，「語音訊號處理」，修訂二版，台北：全華圖書股份有限公司，2009。
[18]杜文祥，「組合式倒頻譜統計正規化法於強健性語音辨識之研究」，南投：暨南國際大學，2009。
[19]何冠旻, “併合式倒頻譜統計正規化技術於強健性語音辨識之研究,” 南投：暨南國際大學. 2009
[20]R. C. Rose and D. B. Paul, “A hidden Markov model based keyword recognition system,” IEEE Acoustics, Speech, and Signal Processing, pp. 129-132, 1990.
[21]黃國彰，「關鍵詞萃取與確認之研究」，桃園：國立中央大學碩士論文，1996。
[22]蔡炎興，「關鍵詞萃取即語者辨識系統之研製」，桃園：國立中央大學碩士論文，2003。
[23]「國音學」，台北：國立臺灣師範大學國音教編輯委員會，2001。
[24]「大五碼」，台北：台灣財團法人資訊工業策進會，1983。
[25]R. W. Schafer and L. R. Rabiner, “Digital representations of speech signals,” Proceedings of the IEEE , vol. 63, no. 4, pp. 662-677, 1975.
[26]R. M. Nickel, “Feature - Automatic speech character identification,” IEEE Circuits and Systems Magazine, vol. 6, no. 4, pp. 10-31, 2006.
[27]H. Hermansky, “Perceptual linear predictive analysis of speech,” J Acoustic. SOC. Am, vol. 87, no. 4, pp. 1738-1752, 1990.
[28]X. Zhu, Y. Chen, J. Liu and R. Liu, “Feature selection in Mandarin large vocabulary continuous speech recognition,” IEEE International Conference on Signal Processing, vol. 1, pp. 508-511, 2002.
[29]張志豪，「強健性和鑑別力語音特徵擷取技術於大詞彙連續語音辨識之研究」，台北：國立師範大學碩士論文，2005。
[30]謝宗學，「加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識」，南投：國立暨南國際大學碩士論文，2006。
[31]林銘駿，「環境中低頻噪音之量測及管制策略研究」，桃園：國立中央大學碩士論文，2008。
[32]許時懷，「語音特徵值擷取濾波器之改良」，桃園：國立中央大學碩士論文，2015。
[33]張智傑，「多種語音特徵的合併及其在智慧型手機上之應用」，桃園：國立中央大學碩士論文，2014。
[34]J. Junkawitsch, L. Neubauer, H. Hoge and G. Ruske, “A new keyword spotting algorithm with pre-calculated optimal thresholds,” Proceedings of Fourth International Conference on Spoken Language, vol. 4, pp. 2067-2070, 1996.
[35]M. W. Koo, C. H. Lee and B. H. Juang, “Speech recognition and utterance verification based on a generalized confidence score,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 8, pp. 821-832, 2001.
[36]H. Ney, “The use of a one-stage dynamic programming algorithm for connected word recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 32, no. 2, pp. 263-271, 1984.
[37]郭又偵，「改良式梅爾倒頻譜參數應用於關鍵字萃取」，桃園：國立中央大學碩士論文，2014。

指導教授

莊堯棠(Yau-tarng Juang)

審核日期

2015-7-13

推文