語音特徵參數擷取之濾波器改良

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：104

、訪客IP：3.17.77.61

姓名

許時懷(Shih-huai Hsu) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

語音特徵參數擷取之濾波器改良
(Improved Filter-bank of Speech Feature Coefficient Extraction)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文研究之主題為針對語音關鍵詞辨識系統中的特徵參數擷取部分進行改良。在整個關鍵詞辨識系統的架構中，擷取語音特徵參數主要是想凸顯每段不同聲音個別的特性，並且在擷取的過程又可達到減低資料量的效果，很多學者都曾在文獻中提出不同的方式來擷取出語音特徵參數，或是對其中的擷取方法來進行改良。
　　本論文主要為討論在梅爾倒頻譜係數中數種改良後的濾波器組，將效果最好的濾波器組取代原本的梅爾三角濾波器組，經實驗結果發現，應用此改良後的濾波器組能夠提升關鍵詞萃取系統的辨識率，故證明此濾波器組能有效的加強擷取出之語音的特性。

摘要(英)

The theme of this thesis is to improve the part of feature extraction in the speech keyword recognition. In the framework of the entire keyword recognition system, feature extraction is to highlight the individual features of different voices, and can reduce the amount of data by means of the extract process. Many researchers have presented different ways to extract the speech features in the literature, or on which making improvements at extracting feature coefficient method.
　　This thesis discusses several improved filter bank in mel-frequency cepstral coefficients (MFCC). The best filter bank is used to replace the original mel-triangular filter set. The experimental results showed that the application of this improved filter bank can effectively improve the recognition rate of the keyword extraction system.

關鍵字(中)

★ 梅爾濾波器組
★ 語音特徵
★ 關鍵詞萃取

關鍵字(英)

★ mel-filterbank
★ speech feature
★ keyword spotting

論文目次

摘要 I
Abstract II
致謝 III
目錄 IV
圖目錄 V
表目錄 VI
第一章緒論 1
1.1 研究動機 1
1.2 文獻回顧 2
1.3 章節概要 4
第二章語音處理 6
2.1 語音特徵參數擷取 7
2.2 特徵參數的補償 15
2.3 隱藏式馬可夫模型 16
2.4 聲學模型 20
2.5 模型訓練 25
第三章多種梅爾濾波器組 30
3.1 遮蔽效應 31
3.2 傳統 MFCC三角濾波器組 32
3.3 不同之梅爾濾波器組 35
3.3.1 矩形濾波器組(Rectangle filter) 36
3.3.2 梯形濾波器組(Trapezoid filter) 37
3.3.3 高斯濾波器組(Gaussian filter) 38
第四章關鍵詞萃取 41
4.1 關鍵詞萃取系統架構 41
4.2 一階動態規劃系統 44
4.3 關鍵詞辨識流程 48
第五章實驗結果與分析 50
5.1 實驗環境 50
5.2 實驗結果 52
第六章結論與未來展望 58
6.1 結論 58
6.2 未來展望 58
參考文獻 60

參考文獻

[1] Juang B. H., “Speech recognition in adverse environment,” Computer Speech and language, 5, pp275-294, 1991.
[2] Imai, S., “Cepstral analysis synthesis on the mel frequency scale,” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ′83., vol.8, no., pp.93-96, 1983.
[3] Mansour, D. and Juang, B.H., “The short-time modified coherence representation and noisy speech recognition, ” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.37, no.6, pp.795-804, 1989.
[4] Singer, H., Umezaki, T. and Itakura, F., “Low bit quantization of the smoothed group delay spectrum for speech recognition,” Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on, vol., no., pp.761-764 vol.2, 3-6, 1990.
[5] Shannon, B. J. and Paliwal, K. K., “Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition,” Science Direct Speech Communication, Vol.48, pp. 1458-1485, 2006.
[6] Junqin, Wu. and Junjun, Yu., “An improved arithmetic of MFCC in speech recognition system,” Electronics, Communications and Control (ICECC), 2011 International Conference on, vol., no., pp.719-722, 9-11., 2011.
[7] Xiaojia, Zhao. and DeLiang, Wang., “Analyzing noise robustness of MFCC and GFCC features in speaker identification,” Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, vol., no., pp.7204-7208, 26-31., 2013.
[8] Jun, Qi., Dong, Wang., Yi, Jiang. and Runsheng, Liu., “Auditory features based on Gammatone filters for robust speech recognition,” Circuits and Systems (ISCAS), 2013 IEEE International Symposium on , vol., no., pp.305-308, 19-23., 2013.
[9] Davis, S. and Mermelstein, P., “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.28, no.4, pp.357-366, 1980.
[10] Zufeng, Weng., Lin, Li. and Donghui, Guo., “Speaker recognition using weighted dynamic MFCC based on GMM,”Anti-Counterfeiting Security and Identification in Communication (ASID), 2010 International Conference on, vol., no., pp.285-288, 18-20., 2010.
[11] Mitra, V., Franco, H., Graciarena, M. and Mandal, A., “Normalized amplitude modulation features for large vocabulary noise-robust speech recognition,” Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, vol., no., pp.4117-4120, 25-30., 2012.
[12] Devi, M. R. and Ravichandran, T., “A novel approach for speech feature extraction by Cubic-Log compression in MFCC,” Pattern Recognition, Informatics and Mobile Engineering (PRIME), 2013 International Conference on, vol., no., pp.182-186, 21-22., 2013.
[13] Wilpon, J. G., Rabiner, L., Chin-Hui, Lee. and Goldman, E.R., “Automatic recognition of keywords in unconstrained speech using hidden Markov models,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.38, no.11, pp.1870-1878, 1990.
[14] Dong, Yu., Li, Deng. and Seide, F., “The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition,” Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, no.2, pp.388-396, 2013.
[15] Hai-Son, Le., Oparin, I., Allauzen, A., Gauvain, J. and Yvon, F., “Structured Output Layer Neural Network Language Models for Speech Recognition,” Audio, Speech, and Language Processing, IEEE Transactions on, vol.21, no.1, pp.197-206, 2013.
[16] 王小川，「語音訊號處理」，全華圖書股份有限公司，2009。
[17] Shamsul Alam, S.M. and Khan, S., “Response of different window methods in speech recognition by using dynamic programming,” Electrical Engineering and Information & Communication Technology (ICEEICT), 2014 International Conference on, vol., no., pp.1,6, 10-12., 2014.
[18] Nickel, R. M., “Feature-Automatic speech character identification,” Circuits and Systems Magazine, IEEE, vol.6, no.4, pp.10,31, Fourth Quarter 2006.
[19] 王祐邦，“Advanced DSP Final Report:Speech Signal Time-Frequency Analysis and Mel-FilterCepstral Coefficient ─A Tutorial,” 2010.
[20] 林品宏，「關鍵詞萃取系統及語音聲控車之應用」，國立中央大學碩士論文，2012。
[21] Ronsenberg, A.E., Lee, C.H. and Soong, F.K., “Cepstral channel normalization techniques for HMM-based speaker verification,” International Conference on Spoken Language Processing (ICSLP), pp. 1835-1838, 1994.
[22] Viikki, O. and Laurila, K., “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Science Direct Speech Communication, Vol. 25, pp. 133-147, 1998.
[23] Tiberewala, S. and Hermansky, H., “Multiband and adaptation approaches to robust speech recognition,” Eurospeech97, 1997, pp. 107-110, 1997.
[24] Rose, R. C. and Paul, D. B., “A hidden Markov model based keyword recognition system,” Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on, vol., no., pp.129-132 vol.1, 3-6., 1990.
[25] 張智傑，「多種語音特徵的合併及其在智慧型手機上之應用」，國立中央大學碩士論文，2014。
[26] 蔡炎興，「關鍵詞萃取即語者辨識系統之研製」，國立中央大學碩士論文，2003。
[27] 簡忠弘，「關鍵詞辨認系統的研究與實現」，國立清華大學碩士論文，1997。
[28] J Jian Zhi-Hua; Yang Zhen, “Voice conversion using Viterbi algorithm based on Gaussian mixture model,” Intelligent Signal Processing and Communication Systems, 2007. ISPACS 2007. International Symposium on, vol., no., pp.32-35, 2007.
[29] 「大五碼」，台灣財團法人資訊工業策進會，1983。
[30] Oxenham, A. J. and Plack, C. J., “Suppression and the upward spread of masking,” Journal of the Acoustical Society of America, 104 (6), pp. 3500-3510, 1998.
[31] 「遮蔽效應 Masking Effect」，國立中央大學音視訊處理實驗室。http://vaplab.ce.ncu.edu.tw/chinese/pcchang/course2009a/avsp/Masking%20Effect.pdf
[32] Xuan, Zhu., Yining, Chen., Jia, Liu. and Runsheng, Liu., “Feature selection in Mandarin large vocabulary continuous speech recognition,” Signal Processing, 2002 6th International Conference on, vol.1, no., pp.508-511 vol.1, 26-30., 2002.
[33] 呂易宸，「語音門禁系統」，國立中央大學碩士論文，2011。
[34] Ney, H., “The use of a one-stage dynamic programming algorithm for connected word recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol.32, no.2, pp.263-271, 1984.
[35] Jhing-Fa, Wang., Chung-Hsien, Wu., Chaug-Ching, Haung. and Jau-Yien, Lee., “Integrating neural nets and one-stage dynamic programming for speaker independent continuous Mandarin digit recognition,” Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on, vol., no., pp.69,72 vol.1, 14-17., 1991.
[36] 林佑輯，「互動式語音導覽系統」，國立中央大學碩士論文，2010。
[37] “MAT Speech Database,” 中華民國計算語言學學會。
[38] 高志杰，「粒子群演算法應用於梅爾濾波器組之研究」，國立中央大學碩士論文，2013。

指導教授

莊堯棠(Yau-tarng Juang)

審核日期

2015-7-17

推文