即時語音辨識多媒體系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：4

、訪客IP：3.12.165.112

姓名

邱介川(Chieh-Chuan Chiu) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

即時語音辨識多媒體系統
(Real-time speech recognition Multimedia system)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文主要是開發一套即時辨識多媒體系統，整合在車上常用的功能，提供簡單但實用的服務，配合自動錄音的技術，即時偵測指令的下達與否；使用關鍵詞萃取的技術來判斷屬於哪種服務，此技術是使用訓練好的次音節模型來辨識，無需因為更改服務而重複訓練模型
，提升辨識效率與系統移植性。
系統採用階層式架構，漸進式的引導使用者熟悉本系統，配合語音合成技術(Text To Speech, TTS)模擬人聲與使用者互動，系統開發工具是使用Borland C++ 6.0來實現視窗化的人機介面，達到即時辨識的效果。

摘要(英)

This thesis develops a real-time voice recognition multimedia system to provide simple but useful services. System detects whether commands were made or not by using automatic recording technology, then determining what kind of service is with keyword spotting technology. This technology implements recognition with sub-syllable models, which don’t need to repeat training, to improve the performance efficiency and portability.
System uses a hierarchical structure for keyword spotting with TTS (Text To Speech) to let user familiar with system. The system achieved by the Borland C + + 6.0 Windows based interface to realize real-time recognition.

關鍵字(中)

★ 隱藏式馬可夫模型
★ 關鍵字擷取

關鍵字(英)

★ Hidden Markov Model
★ keyword spotting

論文目次

摘要 i
Abstract ii
目錄 iii
附圖目錄 vi
附表目錄 viii
第一章緒論 1
1.1 研究動機 1
1.2 研究目標 3
1.3 章節概要 4
第二章語音訊號處理 5
2.1 語音端點偵測 5
2.1.1 能量計算 7
2.1.2 越零率 7
2.2 語音訊號前處理 8
2.2.1 切割音框 8
2.2.2 預強調 9
2.2.3 漢明窗 9
2.3 特徵參數擷取(Feature Extraction) 10
2.4 隱藏式馬可夫模型(Hidden Markov Model, HMM) 14
2.5 建立次音節模型 15
2.6 聲學模型的訓練 17
2.6.1 狀態排列 17
2.6.2 模型初始化 18
2.6.3 維特比演算法 18
2.6.4 參數重估 20
第三章系統架構 21
3.1 自動錄音 21
3.2 關鍵詞萃取架構 25
3.2.1 無關詞模組 26
3.2.2 關鍵詞模組 27
3.2.3 一階動態規劃演算法 27
3.3 辨識流程 30
3.4 系統功能 31
3.4.1 階層式架構 32
第四章實驗與結果 34
4.1 實驗環境 34
4.2 系統實驗 36
第五章結論與未來展望 40
5.1 結論 40
5.2 未來展望 40
參考文獻 42
附錄一 ..................................................48
附錄二 (Ⅰ)..............................................49
附錄二 (Ⅱ)..............................................50
附錄二 (Ⅲ)..............................................51

參考文獻

[1] 白育昇，“語音關鍵詞辨識擷取系統”，國立中央大學碩士論文，中華民國九十八年六月。
[2] 李柏蒼，“自發性國語語音辨識”，國立交通大學碩士論文，中華民國九十七年六月。
[3] 謝華君，“電話網路上國語連續音節辨認的初步研究”，國立交通大學碩士論文，中華民國八十六年六月。
[4] 江智堯，“語音命令辨識系統之研究”，國立彰化師範大學碩士論文，九十八年六月。
[5] H. Ney, “The use of a one stage dynamic programming algorithm for connected word recognition,” IEEE Trans. on Acoustic, Speech Signal, Processing, vol. 32, no. 2, pp. 263-271, April 1984.
[6] W. Jhing-Fa, W. Chung-Hsien, H. Chaug-Ching, and L. Jau-Yien, “Integrating Neural Nets and One-Stage Dynamic Programming for Speaker Independent Continuous Mandarin Digit Recognition,” Acoustics, Speech, and Signal Processing, 1991, vol. 1, pp. 69-72, Apr 1991.
[7] 梁振豊，“台語語音辨識及智慧型口語對話汽車導航系統”，國立交通大學碩士論文，九十六年。
[8] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
[9] John R. Deller, Jr. , John G Proakis, and John H. L. Hansen, Discrete-Time Processing of Speech Signals, 1987.
[10] 王小川，“語音訊號處理”，全華，民國九十三年三月。
[11] Q. Li, A. Tsai, Jinsong Zheng and Qiru Zhou, “Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition,” IEEE, Transations on Speech and Audio Processing, vol. 10, no.3, March 2002.
[12] Xu Haiguo Li Husheng Liu Jia Liu Runsheng Dept. of Electron. Eng., Tsinghua Univ., Beijing, China, “Endpoint detection algorithm for Mandarin digit recognition using DSP,” Signal Processing, 2002 6th International Conference on, vol. 1, pp. 548, Aug. 2002.
[13] MAT 2500 MAT_FILE_FORMAT_V4.DOC
[14] https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
[15] R. Vergin, D. O’Shaughnessy, and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,”IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[16] Changsheng Ai, Honghua Zhao, Rujian Ma, and Xueren Dong, “Pipeline damage and leak detection based on sound spectrum LPCC and HMM,” Proceeding of the Sixth International Conference on Intelligent Systems Design and Applications, vol. 1, pp. 829-833, Oct.2006.
[17] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition,” Proceedings of the IEEE, vol. 77, no. 2, Feb. 1989.
[18] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition,” The Bell System Technical Journal, vol. 62, no. 4, April 1983.
[19] S. Z. Selim, and M. A. Ismail, “K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. PAMI-6, pp. 81-87, Jan. 1984.
[20] 蔡永琪，“基於次音節單元之關鍵詞辨識”，國立中央大學碩士論文，中華民國八十四年六月。
[21] http://zh.wikipedia.org/wiki/Big5
[22] Chi-Min Liu, Chin-Chih Chiu, and Hung-Yuan Chang, “Design of vocabulary-independent mandarin keyword spotters,” IEEE Trans. on Speech and Audio Processing, vol. 8, no. 4, July 2000.
[23] J. Z. Hua, and Y. Zhen, “Voice conversion using Viterbi algorithm based on Gaussian mixture model,” ISPACS 2007, pp. 32-35, Nov. 2007.
[24] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1-38, 1977.
[25] T. K. Moon, “The Expectation-Maximization Algorithm,” IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, November 1996.
[26] 林隆煥，“視窗程式設計函式庫: Win 32 API”，金禾資訊，2004.
[27] 王國榮，“Visual Basic 6.0 與Windows API 講座”，旗標，1998.
[28] Lingyun Gu and Stephen A. Zahorian, “A New Robust Algorithm for Isolated Word Endpoint Detection,” International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, pp. IV-4161, May. 2002.
[29] R. C. Rose & D. B. Paul, “A hidden Markov model based keyword recognition system,” ICASSP , vol. 1, pp. 129-132, 1990.
[30] J. G. Wilpon, L. R. Rabiner, C. H. Lee, E. R. Goldmn, “Automatic recognition of keyword in unconstrained speech using hidden Markov models,” IEEE Trans. ASSP, vol.38, no.11, Nov. 1990.
[31] Lin Xin and Bing-Xi Wang, “Utterance verification for spontaneous mandarin speech keyword spotting,” IEEE Proceedings ICII 2001, Beijing, vol. 3, pp. 397-401, 2001.
[32] 黃國彰，“關鍵詞萃取與確認之研究”，國立中央大學碩士論文，中華民國八十五年六月。
[33] H. Bourlard, B. D’hoore, and J. M. Boite, “Optimizing recognition and rejection performance in wordspotting systems,” ICASSP-94, vol. 1, pp. I/373-I/376, 1994.
[34] M.-W. koo, C.-H. Lee, and B.-H. Juang, “Speech Recognition and Utterance Verification Based on a Generalized Confidence Score,” IEEE Trans. on Speech and Audio Processing, vol. 9, no. 8, pp. 821-832, Nov. 2001.
[35] S. Uppgard, “Implementation and Analysis of Pitch Tracking Algorithms,” Report for Master of Science Thesis Project, KHT, Stockholm, Sweden, 2001.
[36] 王維邦，“連續國語語音關鍵詞萃取系統之研究與發展”，國立中央大學碩士論文，中華民國八十六年六月。
[37] Nobuo Hataoka, Yasunari Obuchi, Teruko Mitamura, Eric Nyberg, “Robust Speech Dialog Interface for Car Telematics Service,” Consumer Communications and Networking Conference, First IEEE, CCNC, 2004.
[38] Hapeshi, K., “Design guidelines for using speech in interactive multimedia systems,” Interative Speech Technology (London: Taylor & Francis), pp.177-188, 1993.
[39] 楊宗誌，”C++ Builder 6程式設計實務”，文魁資訊股份有限公司，2002年9月20日.
[40] 楊鎮光，“Visual Basic 與語音辨識－讓電腦聽話”，文魁資訊股份有限公司，2002.
[41] S. Furui, T. Kikuchi, Y. Shinnaka, C. Hori, “Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech,” Speech and Audio Processing, IEEE Transactions on, vol. 12, no. 4, pp.401- 408, July 2004.
[42] 葉人豪，林新德，郭雅秀，“多媒體槪論”，學貫行銷股份有限公司，2007.
[43] http://code.google.com/intl/zh-TW/apis/maps/documentation/places/
[44] Tsaipei Wang; Dept. of Comput. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan, “CA-Tree: A Hierarchical Structure for Efficient and Scalable Coassociation-Based Cluster Ensembles,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, no. 3, pp. 686-698, June 2011.

指導教授

莊堯棠(Y.-T. Juang)

審核日期

2011-7-19

推文