多媒體應用之語音辨識系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：31

、訪客IP：18.117.127.127

姓名

溫家誠(Chia-chen Wen) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

多媒體應用之語音辨識系統
(Multimedia Applications for Speech Recognition System)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著電子多媒體系統的迅速發展，使得多媒體服務有無限可能。其中藍芽系統已成為無線通訊技術發展的新領域，這代表著所有的應用將可透過藍芽技術整合功能，而能夠讓使用者更便利的利用這項服務，關鍵詞萃取語音辨識系統就成了重要的方式之ㄧ。
在本論文中，我們首先將針對語音辨識發展理念規劃一套多媒體應用語音辨識系統，模擬使用者使用多媒體系統的情況。所提出的服務則基於駕駛者在車內最常使用的操控模式，包括聽音樂、打電話及導航系統等等，透過問答方式的人機互動介面讓操作者感到友善，且本系統中將採用語音合成來模擬人聲以作為回應。
我們以關鍵詞萃取為主的辨識技術可提升系統的移植性與擴展性，而階層式架構設計可於各種環境下增加語音辨識的可靠度。然而環境噪音以及雜音干擾，我們將進行強健性語音辨識，利用強建語音參數及模型調適等方面的技術來降低測試環境的影響。最後，我們再對系統進一步增建個人化使用的設計，藉由語者辨識技術提供專屬的服務，且再運用語者模型調適技術來強化系統的辨識效能。

摘要(英)

Vehicle electronic multimedia system with the rapid development of the car makes the services provide immense possibilities. In which, the Bluetooth wireless technology has become a new area, and then all the applications will be integrated through this technology. However, the crucial role to play in that is speech recognition.
In this thesis, we develop a speech recognition system of multimedia applications in car environment to mimic the using of multimedia for the driver and passengers. Our service is based on the most common use of control modes, including listening to music, phone and navigation systems, and so on. The user-friendly interface will be made through the interactive question-and-answer approach. Speech synthesis is adopted in our system to simulate human voices as response.
Keyword spotting-based recognition system can improve the portability and system scalability and the design of hierarchical structure can increase speech recognition reliability in car environment. However, the vehicle noise and interference from vehicle environment is still a challenge, so we carry out the robustness speech recognition. Robust features and model adaptation methods are adopted to reduce the environmental impact of testing. Finally, we build a more personalized system for providing exclusive services. By the speaker recognition techniques, we also expect to strengthen the recognition system performance further.

關鍵字(中)

★ 語音辨識系統

關鍵字(英)

★ Speech Recognition System

論文目次

第一章緒論
1.1 研究動機．．．．．．．．．．．．．．．．．．．．．． 1
1.2 研究目標．．．．．．．．．．．．．．．．．．．．．2
1.3 章節概要．．．．．．．．．．．．．．．．．．．．3
第二章語音辨識之基本技術
2.1 特徵參數擷取．．．．．．．．．．．．．．．．．．．．4
2.2 隱藏式馬可夫模型．．．．．．．．．．．．．．．．．．6
2.3 聲學模型．．．．．．．．．．．．．．．．．．．．．．8
2.4 模型訓練與參數預估．．．．．．．．．．．．．．．．12
2.4.1 訓練演算法．．．．．．．．．．．．．．．．．．12
2.4.2 訓練流程圖．．．．．．．．．．．．．．．．．．15
第三章系統方法
3.1 關鍵詞萃取．．．．．．．．．．．．．．．．．．．．． 17
3.1.1 關鍵詞模組．．．．．．．．．．．．．．．．．．17
3.1.2 無關辭模組．．．．．．．．．．．．．．．．．．17
3.1.3 辨識模組的排列．．．．．．．．．．．．．．．．18
3.1.4 辨識演算法．．．．．．．．．．．．．．．．．．18
3.1.5 辨識流程．．．．．．．．．．．．．．．．．．．21
3.2 MVA參數處理法．．．．．．．．．．．．．．．．．． 23
3.2.1 MVA流程．．．．．．．．．．．．．．．．．．．23
3.2.2 MVA處理結果．．．．．．．．．．．．．．．．．25
3.3 語者調適．．．．．．．．．．．．．．．．．．．．．26
3.3.1 語者調適概論．．．．．．．．．．．．．．．．．26
3.3.2 最大相似度線性迴歸．．．．．．．．．．．．．．26
第四章系統架構
4.1 系統環境．．．．．．．．．．．．．．．．．．．．．29
4.2 系統基本架構．．．．．．．．．．．．．．．．．．．30
4.3 階層式設計．．．．．．．．．．．．．．．．．．．．32
4.4 系統強健化．．．．．．．．．．．．．．．．．．．．34
第五章系統展示
5.1 系統功能展示．．．．．．．．．．．．．．．．．．．36
5.2 功能說明．．．．．．．．．．．．．．．．．．．．．37
第六章　結論與未來展望
6.1 結論．．．．．．．．．．．．．．．．．．．．．．．42
6.2 未來展望．．．．．．．．．．．．．．．．．．．．．43
參考文獻．．．．．．．．．．．．．．．．．．．．．．．．．44

參考文獻

[1] M.-W. Koo, C.-H. Lee, and B.-H Juang, “ Speech Recognition and Utterance Verification Based on a Generalized Confidence Score, ” IEEE Trans .on Speech and Audio Processing, vol. 9, No. 8, Nov. 2001.
[2] Chi-Min Liu, Chin-Chih Chiu, and Hung-Yuan Chang “ Design of Vocabulary -Independent Mandarin Keyword Spotters, ” IEEE Trans. on Speech and Audio Processing, vol. 8, No. 4, July 2000.
[3] B. H. Juang, “ The past, present, and future of speech processing, ” IEEE Trans. on Signal Processing, pp. 24-28, May 1998.
[4] Huang, Kuo-Chang; Juang, Yau-Tarng; Chang, Wen-Chieh; September, 2006“ Robust integration for speech features” Signal Processing Volume: 86, Issue: 9, September, 2006, pp. 2282-2288(SCI).
[5] L. R. Rabiner and B. H. Juang, “ Fundamentals of Speech Recognition, ” Prentice Hall, New Jersey, 1993.
[6] R. Vergin and D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[7] John R. Deller, Jr., John G. Proakis, John H. L. Hansen, “ Discrete-Time Processing of Speech Signals ”, 1987
[8] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “ An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition, ” The Bell System Technical Journal, vol. 62, No. 4, April 1983.
[9] L. R. Rabiner, “ A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition, ” Proceedings of the IEEE, vol. 77, No. 2, Feb. 1989.
[10] D. Burshtein, “ Robust parametric modeling of duration in hidden Markov models, ” IEEE Trans. on Speech Audio Processing, vol. 4, pp. 240-242, May 1996.
[11] H. Ney, “ The use of a one stage dynamic programming algorithm for connected word recognition, ” IEEE Trans on. Acoustic, Speech, Signal Processing, vol.32, No.2, April 1984.
[12] Lin Xin and Bing-Xi Wang “ Utterance Verification For Spontaneous Mandarin Speech Keyword Spotting, ” IEEE Proceedings ICII 2001, Beijing, pp. 397-401 vol.3
[13] M. J. F. Gales and P. C. Woodland, “Mean and variance adaptation within the MLLR Framework,” Computer Speech and Language, Vol. 10, pp. 249-264, 1996.
[14] C. J. Leggeter and P. C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Computer Speech and Language, pp.171-185, 1995.
[15] N.J.-C. Wang, S.S.-M. Lee, F. Seide and Lin-Shan Lee; “Rapid speaker adaptation using a priori knowledge by eigenspace analysis of MLLR parameters”. Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on, Vol. 1 , 7-11 May 2001 Page(s): 345 -348 vol.1
[16] Chia-Ping Chen; J.A. Bilmes ”MVA Processing of Speech Features”, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007
[17] R. M. Stern, A. Acero, F.-H. Liu, and Y. Ohshima, “Signal processingfor robust speech recognition,” in Speech Recognit., C.-H. Lee and F. Soong, Eds. Boston, MA: Kluwer, 1996, pp. 351–378.
[18] C.-P. Chen, J. Bilmes, and D. Ellis, Blind MVA speech feature processing on Aurora 2.0 Dept. Elect. Eng., Univ. Washington, Seattle, WA, Tech. Rep. UWEETR-2004-0017, 2004 [Online]. Available: http://www.ee.washington.edu/techsite/papers
[19] 黃國彰，“國語語音強健辨認之研究”，國立中央大學電機工程研究所博士論文，民國九十二年。
[20] 陳文杰，“雜訊環境下經驗模態分解法於語音辨識之應用”，國立中央大學電機工程研究所碩士論文，民國九十五年。
[21] 蔡炎興，“關鍵詞萃取及語者辨識系統之研製”，國立中央大學碩士論文，中華民國九十二年六月。
[22] 張文杰，“模型調識之語者辨識系統”，國立中央大學碩士論文，中華民國九十四年六月。

指導教授

莊堯棠(Yau-Tarng Juang)

審核日期

2008-6-23

推文