粒子群演算法應用於梅爾濾波器組之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：35

、訪客IP：3.129.69.134

姓名

高志杰(Chih-Chieh Kao) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

粒子群演算法應用於梅爾濾波器組之研究
(PSO Algorithm for Mel- Filterbank)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文主要針對特徵值擷取方法梅爾倒頻譜係數MFCC 中的梅爾濾波器組做研究。在基於粒子群演算法最佳化濾波器組的中心頻率與邊界頻率上，提出不同於一般使用辨識率當適應函數的方法，而是以統計曲線與濾波器組包絡線的相似度做為適應函數進行最佳化，而本論文依照語音訊號在能量頻譜上的特性，以能量統計圖及能量差異性統計圖為依據，得到兩組最佳化的結果，並分別進行關鍵詞辨識和三種常見雜訊環境下的測試。最後的實驗結果顯示，此方法有提升特徵值擷取效果的能力，提高了關鍵詞萃取系統的辨識率，且在強健性上亦含有特定環境的抗雜訊能力。

摘要(英)

In this thesis, a study for feature extraction using filter bank applied to mel frequency cepstrum coefficients (MFCC) is presented. We propose a novel approach to use particle swarm optimization (PSO) to optimize the parameters of MFCC filterbank, such as the central and side frequencies. The proposed PSO algorithm utilizes filter similarity between statistical curve and filterbank’s envelope as fitness function. According to the energy and energy difference statistical charts that comply with characteristics of the speech signal in the energy spectrum, we obtained two optimal results by PSO. Then keyword recognization and three noisy environments are considered for tests. The results of our experiments show that the proposed method improves the recognition rate of keyword spotting system and the robustness against the testing noisy environments.

關鍵字(中)

★ 梅爾濾波器組
★ 粒子群演算法
★ 梅爾倒頻譜系數
★ 關鍵詞萃取

關鍵字(英)

★ Mel- Filterbank
★ PSO
★ MFCC
★ keyword spotting

論文目次

摘要....................... I
Abstract.....................II
致謝.....................III
目錄.....................IV
圖目錄......................VI
表目錄.................... VII
附錄.......................VIII
第一章緒論...................1
1.1 研究動機....................1
1.2 文獻探討....................1
1.3 章節架構....................4
第二章背景知識.....................5
2.1 特徵參數擷取................5
2.1.1 MFCC ................5
2.1.2 LPCC................12
2.2 特徵參數的補償...............13
2.2.1 倒頻譜消去法 (CMS) ..............13
2.2.2 倒頻譜平均值與變異數正規化法 (CMVN)........15
2.3 隱藏式馬可夫模型................16
2.4 聲學模型..................17
第三章粒子群演算法應用於濾波器組.............21
3.1 粒子群演算法...................21
3.1.1 粒子群演算法模式..............21
3.1.2 慣性權重...............24
3.2 PSO 用於最佳化濾波器組...............25
3.2.1 變數設定...............25
3.2.2 適應函數 (fitness function)..........26
第四章實驗結果...................29
4.1 關鍵詞萃取..................29
4.1.1 關鍵詞萃取架構..............29
4.1.2 辨識流程...............32
4.2 實驗環境.................33
4.3 通道效應實驗...................34
4.4 PSO 最佳化濾波器組實驗...............37
4.5 雜訊環境實驗...................41
第五章結論與未來展望.................46
5.1 結論.....................46
5.2 未來展望..................47
參考文獻.......................48

參考文獻

[1] Aggarwal, R. K. and Dave, M., “Filterbank optimization for robust ASR using GA and PSO,” International Journal of Speech Technology, Vol.15, pp. 191-201, 2012.
[2] Bou-Ghazale, S. E. and Hansen, J. H. L., “A comparative study of traditional and newly proposed features for recognition of speech under stress,” IEEE Transactions on Speech and Audio Processing, Vol.8, pp. 429-442, 2000.
[3] Bradbury, J., “Linear Predictive Coding,” Online PDF,pp.1-23, 2000.
[4] Chakroborty, S. and Goutam, S., “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.
[5] Charbuillet, C., Gas, B., Chetouani, M., and Zarader, J., “Multi Filter Bank Approach for Speaker Verification Based on Genetic Algorithm,” NOLISP, pp. 105-113, 2007.
[6] Hung, W. and Wang, H., “On the use of weighted filter bank analysis for the derivation of robust MFCCs,” IEEE Signal Processing Letters, Vol.8, pp. 70-73, 2001.
[7] Kennedy, J. and Eberhart, R., “Particle swarm optimization,” IEEE International Conference on, Vol.4, pp.1942-1948, 1995.
[8] Lee, C., Hyun, D., Choi, E., Go, J. and Lee, C., “Optimizing feature extraction for speech recognition,” IEEE Transactions on Speech and Audio Processing, Vol.11, pp. 80-87, 2003.
[9] Nickel, R. M., “Feature-Automatic Speech Character Identification,” IEEE Circuits and Systems Magazine, pp. 10-31, 2006.
[10] Ney, H., “The use of a one stage dynamic programming algorithm
for connected word recognition,” IEEE Acoustic, Speech Signal,
Processing, Vol. 32, pp. 263-271, 1984.
[11] Rosenberg, A. E., Lee, C. H. and Soong, F. K., “Cepstral channel
normalization techniques for HMM-based speaker verification,”
International Conference on Spoken Language Processing (ICSLP), pp. 1835-1838, 1994.
[12] Rabiner, L. R., “A Tutorial on Hidden Markov Models and Selected
Applications in Speech Recognition,” IEEE Proceedings, Vol.77, pp. 257-286, 1989.
[13] Rose, R. C. and Paul, D. B., “A hidden Markov model based
keyword recognition system,”, IEEE Acoustics, Speech, and Signal Processing, pp.129-132, 1990.
[14] Shi, Y. and Eberhart, R., “A modified particle swarm optimizer,”
IEEE International Conference on Evolutionary Computation Proceedings, pp. 69-73, 1998.
[15] Schafer, R. W. and Wbiner, L., “Digital representations of speech
signals,” IEEE Journals & Magazines, Vol.63, pp. 662-677, 1975.
[16] Shannon, B. J. and Paliwal K. K., “Feature extraction from
higher-lag autocorrelation coefficients for robust speech recognition,” ScienceDirect Speech Communication, Vol.48, pp. 1458-1485, 2006.
[17] Skowronski, M. and Harris, J., “Increased mfcc filter bandwidth for
noise-robust phoneme recognition,”IEEE Acoustics, Speech and Signal Processing, Vol.1, pp. 801-804, 2002.
[18] Skowronski, M. and Harris, J., “Improving the filter bank of a
classic speech feature extraction algorithm,” International Symposium on Circuits and Systems (ISCAS), Vol.4, pp. 281-284, 2003.
[19] Tiberewala, S. and Hermansky, H., "Multiband and adaptation
approaches to robust speech recognition", Eurospeech97, 1997, pp. 107-110, 1997.
[20] Vignolo, L. D., Rufiner, H. L., Milone, D. H. and Goddard, J. C.,
“Genetic optimization of cepstrum filterbank for phoneme classification,” Bio-inspired Systems and Signal Processing, pp. 179-185, 2009.
[21] Vignolo, L. D., Rufiner, H. L., Milone, D. H. and Goddard, J. C.,
“Evolutionary cepstral coefficientts,” ScienceDirect Applied Soft Computing, Vol.11, pp. 3419-3428, 2011.
[22] Viikki, O. and Laurila, K., “Cepstral domain segmental feature
vector normalization for noise robust speech recognition,” ScienceDirect Speech Communication, Vol. 25, pp. 133-147, 1998.
[23] Wu, J. and Yu, J., “An Improved Arithmetic of MFCC in Speech
Recognition System,” International Conference on Electronics, Communications and Control (ICECC), pp 719-722, 2011.
[24] Zheng, F., Zhang, G. and Song, Z., “Comparison of different
implementations of MFCC,” Journal of Computer Science and Technology, Vol.16, pp. 582-589, 2001.
[25] Zabidi, A., Mansor, M., Lee, Y. K., Yassin, I. M. and Sahak , R.,
“Discrete Mutative Particle Swarm Optimisation of MFCC computation for classifying hypothyroidal infant cry,” Computer Applications and Industrial Electronics(ICCAIE), pp.588-592, 2010.
[26] 蔡炎興, “關鍵詞萃取即語者辨識系統之研製,” 國立中央大學碩
士論文, 2003.
[27] 簡忠弘, “關鍵詞辨認系統的研究與實現,” 國立清華大學碩士論
文, 1997.
[28] 王小川,“語音訊號處理,” 全華圖書股份有限公司, 2009.
[29] “國音學,” 國立臺灣師範大學國音教編輯委員會,2001.
[30] “大五碼,” 台灣財團法人資訊工業策進會,1983.
[31] “MAT Speech Database,” 中華民國計算語言學學會
http://www.aclclp.org.tw/doc/mat2500_brief.pdf

指導教授

莊堯棠(Y.-T. Juang)

審核日期

2013-7-10

推文