多種語音特徵的合併及其在智慧型手機上之應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：87

、訪客IP：3.145.191.22

姓名

張智傑(Chih-chieh Chang) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

多種語音特徵的合併及其在智慧型手機上之應用
(Combination of Multiple Speech Features and its Application on Smartphone)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文研究主題為針對語音辨識中的特徵值擷取部分進行改良。特徵值擷取在語音辨識上是很重要的一個部分，具有降低資料量與突顯聲音特性兩個優點，許多學者都曾提出不同的特徵參數或改良方式以突顯不同的語音特性，本論文主要為提出一種合併特徵參數的方法，用以將不同的特徵值方法擷取出的語音特性結合在一起。經實驗結果發現，依此方法合併後的特徵參數能有效的提升關鍵詞萃取系統的辨識率，證明合併的方法能有效的加強聲音的特性。
本論文第二部分在於將關鍵詞萃取系統應用於iPhone智慧型手機App上實作出一個聲控的小遊戲，並於遊戲中實現即時語音辨識的功能。

摘要(英)

This thesis deals with the improvement on the speech feature extracting part in speech recognition. Feature extraction is a very important part in speech recognition, by having two advantages of reducing the amount of data and highlighting the characteristics of voice. Many researchers have been published different extracting methods or improving methods for speech features for highlighting different characteristics of voice. This thesis presents a method for combining different speech features, and binding the characteristics of different feature methods together. The result of our experiments showed that the proposed method improves the recognition rate of the keyword spotting system, and also proved that the method can effectively improve the characteristics of voice.
In the second part of this thesis, we apply the keyword spotting system to iPhone smartphone app and build a voice-controlled game to achieve real-time speech recognition.

關鍵字(中)

★ 語音辨識
★ 特徵
★ 合併
★ 智慧型手機
★ iPhone
★ 關鍵詞萃取

關鍵字(英)

★ speech recognition
★ feature
★ combination
★ smartphone
★ iphone
★ keyword spotting

論文目次

摘要 I
Abstract II
致謝 III
目錄 IV
圖目錄 VI
表目錄 VIII
第一章緒論 1
1.1 研究動機 1
1.2 研究目標 1
1.3 文獻回顧 2
1.4 章節摘要 4
第二章系統概述 6
2.1 特徵參數擷取 7
2.1.1 LPCC 7
2.1.2 MFCC 11
2.1.3 PLPCC 14
2.2 特徵參數補償 16
2.3 隱藏式馬可夫模型 17
2.4 聲學模型 20
2.5 模型訓練 25
第三章多種特徵參數的合併 29
3.1 語音特性 29
3.1.1 LPCC 29
3.1.2 MFCC 31
3.1.3 PLPCC 32
3.2 合併特徵參數的方法 33
第四章實驗結果與分析 37
4.1 關鍵詞萃取系統 37
4.1.1 關鍵詞系統架構 37
4.1.2 辨識演算法 39
4.2 實驗結果 41
4.2.1 實驗環境 41
4.2.2 單一特徵參數實驗 43
4.2.3 合併特徵參數實驗 45
4.2.4 權重向量實驗 49
4.2.5 特徵參數維度實驗 51
第五章系統應用 56
5.1 開發環境 56
5.1.1 開發平台 56
5.1.2 程式語言 59
5.2 系統介紹 63
5.2.1 錄音功能說明 65
5.2.2 辨識功能說明 69
5.2.3 畫面展示 71
第六章結論與未來展望 74
6.1 結論 74
6.2 未來展望 75
參考文獻 77
附錄 83

參考文獻

[1] Bradbury, J., “Linear Predictive Coding,” Online PDF, pp. 1-23, 2000.

[2] Chakroborty,S. and Goutam, S., “Improved Text-Independent Speaker Identification using Fused MFCC & IMFCC Feature Sets based on Gaussian Filter,” International Journal of Signal Processing, Vol.5, pp. 1-9, 2009.

[3] Charbuillet, C., Gas, B., Chetouani, M., and Zarader, J., “Multi Filter Bank Approach for Speaker Verification Based on Genetic Algorithm,” NOLISP, pp. 105-113, 2007.

[4] Chanwoo K. and Stern R.M., “Power-normalized cepstral coefficients (PNCC) for robust speech recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4101-4104, 2012.

[5] Dubey R.K. and Kumar A., “Non-intrusive objective speech quality assessment using a combination of MFCC, PLP and LSF features,” IEEE International Conference on Signal Processing and Communication (ICSC), pp. 297-302, 2013.

[6] Davis S.B. and Mermelstein P., “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE T Acoust., Speech Signal P., pp. 357–366, 1980.

[7] Evgeny K., “Real-time speaker identification,” Thesis of University of Joensuu, 2003.

[8] Hermansky H., “Perceptual Linear Predictive Analysis of Speech,” J Acoustic SOC America, v87, 114, 1990.

[9] Li J., Zhao B. and Zhang H., “Face recognition based on PCA and LDA combination feature extraction,” IEEE International Conference on Information Science and Engineering (ICISE), pp. 1240-1243, 2009.

[10] Mitra V., Franco H., Graciarena M. and Mandal A., “Normalized amplitude modulation features for large vocabulary noise-robust speech recognition,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4117-4120, 2012.

[11] Mporas, Iosif, et al. "Comparison of Speech Features on the Speech Recognition Task." Journal of Computer Science 3.8, 2007.

[12] Nickel R.M., “Feature-Automatic Speech Character Identification,” IEEE Circuits and Systems Magazine, pp. 10-31, 2006.

[13] Ney,H., “The use of a one stage dynamic programming algorithm for connected word recognition,” IEEE Acoustic, Speech Signal, Processing,Vol. 32,pp. 263-271, 1984.

[14] Qian Z., Liu L.Y. and Li Z.Y., “Speaker identification based on MFCC and IMFCC,” IEEE International Conference on Information Science and Engineering (ICISE), pp. 5416-5419, 2009.

[15] Patel I. and Rao Y.S., “Speech recognition using hidden markov model with MFCC-subband technique,” IEEE International Conference on Telecommunication and Computing, pp. 168-172, 2010.

[16] Rose, R. C. and Paul, D. B., “A hidden Markov model based keyword recognition system,” IEEE Acoustics, Speech, and Signal Processing, pp.129-132, 1990.

[17] Revathi A. and Venkataramani Y., “Speaker independent continuous speech and isolated digit recognition using VQ and HMM,” IEEE International Conference on Communications and Signal Processing (ICCSP), pp. 198-202, 2011.

[18] Schafer, R.W. and Wbiner, L., “Digital representations of speech signals,” IEEE Journals & Magazines, Vol.63, pp. 662-677, 1975.

[19] Shrawankar U. and Thakare V., “Feature Extraction for a speech recognition system in noisy environment: A study,” IEEE Second International Conference on Computer Engineering and Applications, Vol.1, pp. 358-361, 2010.

[20] Shannon, B. J. and Paliwal K. K., “Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition,” Science Direct Speech Communication, Vol.48, pp. 1458-1485, 2006.

[21] Skowronski, M. and Harris, J., “Increased mfcc filter bandwidth for noise-robust phoneme recognition,” IEEE Acoustics, Speech and Signal Processing, Vol.1, pp. 801-804, 2002.

[22] Skowronski M.D. and Harris J.G., “Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition,” J. Acoust. Soc. Am., pp. 1774–1780, 2004.

[23] Tiberewala,S. and Hermansky,H., "Multiband and adaptation approachesto robust speech recognition", Eurospeech97, 1997, pp. 107-110, 1997.

[24] Viikki O. and Laurila K., “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Science Direct Speech Communication, Vol. 25, pp. 133-147, 1998.

[25] Wu, J. and Yu, J., “An Improved Arithmetic of MFCC in Speech Recognition System,” International Conference on Electronics, Communications and Control (ICECC), pp. 719-722, 2011.

[26] Weng Z.F., Li L. and Guo D., “Speaker recognition using weighted dynamic MFCC based on GMM,” IEEE International Conference on Anti-Counterfeiting Security and Identification in Communication (ASID), pp. 285-288, 2010.

[27] Wei H., Chan C.F., Choy C.S. and Pun K.P., “An efficient MFCC extraction method in speech recognition,” IEEE International Symposium on Circuits and Systems (ISCAS), 2006.

[28] Wisesty U.N., Liong T.H. and Adiwijaya, “Indonesian speech recognition system using Discriminant Feature Extraction-Neural Predictive Coding (DFE-NPC) and Probabilistic Neural Network,” IEEE International Conference on Computational Intelligence and Cybernetics, pp. 158-162, 2012.

[29] Yuan Y., Zhao P. and Zhou Q., “Research of speaker recognition based on combination of LPCC and MFCC,” IEEE International Conference on Intelligent Computing and Intelligent Systems, Vol.3, pp. 765-767, 2010.

[30] Z. Tufekci and J.N. Gowdy, “Feature Extraction using discrete wavelet transform for speech recognition,” IEEE Southeastcon 2000, pp. 116-123, 2000.

[31] Zhu X., Chen Y., Liu J. and Liu R., “Feature selection in Mandarin large vocabulary continuous speech recognition,” IEEE International Conference on Signal Processing, Vol.1, pp. 508-511, 2002.

[32] Zhao X., Shao Y. and Wang D., “CASA-based robust speaker identification,” IEEE Journals & Magazines, Vol.20, pp. 1608-1616, 2012.

[33] Zhao X., Shao Y. and Wang D., “Analyzing noise robustness of MFCC and GFCC features in speaker identification,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7204-7208, 2013.

[34] 王祐邦, “Advanced DSP final report: Speech signal time-frequency analysis and Mel-filter cepstral coefficient-A tutorial,” Thesis of National Taiwan University, 2010.

[35] 蔡炎興, “關鍵詞萃取即語者辨識系統之研製,” 國立中央大學碩士論文, 2003.
[36] 呂易宸, “語音門禁系統,” 國立中央大學碩士論文, 2011.
[37] 林品宏, “關鍵詞萃取系統及語音聲控車之應用,” 國立中央大學碩士論文, 2012.
[38] 高志杰, “粒子群演算法應用於梅爾濾波器組之研究,” 國立中央大學碩士論文, 2013.
[39] 林銘駿, “環境中低頻噪音之量測及管制策略研究,” 國立中央大學碩士論文, 2008.
[40] 蘇培智, “基於藉語音再取樣萃取共振峰變化之聲調調整技術,” 國立中央大學碩士論文, 2004.
[41] 簡忠弘, “關鍵詞辨認系統的研究與實現,” 國立清華大學碩士論文, 1997.
[42] 張志豪, “強健性和鑑別力語音特徵擷取技術於大詞彙連續語音辨識之研究,” 國立師範大學碩士論文, 2005.
[43] 謝宗學, “加成性雜訊環境下運用特徵參數統計補償法於強健性語音辨識,” 國立暨南國際大學碩士論文, 2006.
[44] 朱斯詠, “使用長時域特徵參數的串接式辨識系統,” 國立臺灣大學碩士論文, 2008.
[45] 王小川, “語音訊號處理,” 全華圖書股份有限公司,2009.
[46] 林柏全, “iPhone創意程式設計家第二版,” 松崗資產管理股份有限公司, 2010.
[47] “大五碼,” 台灣財團法人資訊工業策進會,1983.
[48] “MAT Speech Database,” 中華民國計算語言學學會 http://www.aclclp.org.tw/doc/mat2500_brief.pdf

指導教授

莊堯棠(Y.T. Juang)

審核日期

2014-7-4

推文