互動式語音導覽系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：26

、訪客IP：3.135.194.153

姓名

林佑輯(You-ji Lin) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

互動式語音導覽系統
(An Interactive Speech Guidance System)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文主要是設計一個互動式語音導覽系統。我們模擬遊客在博物館中使用多媒體系統的情形，此系統所提供的服務包括地名、人物、產業及景點等的介紹，經由人機互動的問答方式提供友善的使用者介面，且採用語音合成來模擬人聲作為回應。
以次音節單元的關鍵詞萃取辨識技術可提高系統的可換性與移植性，在端點偵測的研究中，我們加入語音前後段的門檻值限制，以提升偵測正確率。藉由關鍵詞字彙結構的相關性，將所有的關鍵詞予以分類成一階層式架構，不但能降低非相關性字彙的誤判，還可大幅的減少辨識時間，在本論文裡，我們以8男8女的語料，針對50個關鍵詞來做辨識率的測試，實驗結果得到95.7%的辨識率及平均辨識一個句子需要0.25秒的時間。

摘要(英)

This thesis deals with the design of an interactive speech guidance system for Dasi and Longtan. We use a hierarchical structure for keyword spotting to improve the recognition capability of the system. Through a series of questions and answers, a user-friendly interface is established. The developed speech guidance system provides interesting information for Dasi and Longtan, including the geographical names, some famous persons, industries, scenic spots and so on.
In our experiments, over 800 utterances pronounced by 8 males and 8 females are used to test the system performance. In average 0.25 seconds is spent for identifying a keyword and a recognition rate of 95.7% is obtained for the developed speech guidance system.

關鍵字(中)

★ 語音活動偵測
★ 關鍵詞萃取
★ 語音導覽系統

關鍵字(英)

★ speech guidance system
★ keyword spotting
★ voice activity detection

論文目次

中文摘要..................................................i
英文摘要.................................................ii
誌謝....................................................iii
目錄.....................................................iv
附圖目錄.................................................vi
附表目錄...............................................viii
第一章緒論...............................................1
1.1 研究動機..............................................1
1.2 研究目標..............................................2
1.3 章節概要..............................................2
第二章語音處理的相關技術.................................3
2.1 特徵參數的擷取........................................3
2.2 隱藏式馬可夫模型......................................6
2.3 聲學模型..............................................8
2.4 模型的訓練演算法.....................................13
2.4.1 訓練流程圖.........................................13
2.4.2 維特比演算法.......................................15
第三章關鍵詞萃取技術....................................17
3.1 概論.................................................17
3.2 關鍵詞萃取架構.......................................17
3.2.1 關鍵詞模組.........................................18
3.2.2 無關詞模組.........................................18
3.3 一階動態規劃演算法...................................19
3.4 關鍵詞辨識流程.......................................22
3.5 階層式關鍵詞萃取架構.................................23
第四章語音導覽系統架構..................................25
4.1 音訊錄製與處理.......................................25
4.2語音活動偵測..........................................27
4.3 即時語音辨識系統.....................................29
4.3.1 Windows API的基本觀念..............................29
4.3.2 系統基本架構.......................................29
4.4系統功能說明與展示....................................32
第五章實驗與結果........................................37
5.1 實驗環境.............................................37
5.2關鍵詞萃取實驗........................................40
第六章結論與未來展望....................................47
6.1 結論.................................................47
6.2 未來展望.............................................48
參考文獻.................................................49

參考文獻

[1] L. R. Rabiner and B. H. Juang, “Fundamentals of Speech Recognition,” Prentice Hall, New Jersey, 1993.
[2] Yumin Lee and Lin-Shan Lee, “Continuous Hidden Markov Models integrating transitional and instantaneous features for Mandarin syllable recognition,” Computer Speech and Language, vol.7, pp.247-263, 1993.
[3] Changsheng Ai, Honghua Zhao, Rujian Ma, and Xueren Dong, “Pipeline damage and leak detection based on sound spectrum LPCC and HMM,” Proceeding of the Sixth International Conference on Intelligent Systems Design and Applications, pp.829-833, Oct.2006.
[4] John R. Deller, Jr., John G. Proakis, John H. L. Hansen, “Discrete-Time Processing of Speech Signals,” 1987.
[5] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition,”Proceedings of the IEEE, vol.77, No.2, Feb.1989.
[6] Changsheng Ai, Xuan Sun, Honghua Zhao, Honghua Zhao, and Xueren Dong, “Pipeline damage and leak sound recognition based on HMM,” Proceeding of the 7th World Congress on Intelligent Control and Automation, pp.25-27, June.2008.
[7] M.-W. Koo, C.-H. Lee, and B.-H. Juang, “Speech Recognition and Utterance Verification Based on a Generalized Confidence Score,” IEEE Trans .on Speech and Audio Processing, vol. 9, No. 8, Nov. 2001.
[8] 蔡永琪，“基於次音節單元之關鍵詞辨識”，國立中央大學碩士論文，中華民國八十四年六月。
[9] H. Bourlard, B. D’hoore, J. M. Boite, “Optimizing recognition and rejection performance in wordspotting systems,” ICASSP 1994.
[10] H. Ney, “The use of a one-stage Dynamic Programming Algorithm for connected word rcognition,” IEEE Trans Acoustics Speech Signal Proc., vol.32, No.2, pp.263-271, April 1984.
[11] 楊鎮光，“Visual Basic與語音辨識－讓電腦聽話”，文魁資訊股份有限公司，2002.
[12] S. Uppgard, “Implementation and Analysis of Pitch Tracking Algorithms,” Report for Master of Science Thesis Project, KHT, Stockholm, Sweden, 2001.
[13] 林隆煥，“視窗程式設計函式庫: Win 32 API”，金禾資訊，2004.
[14] Lingyun Gu and Stephen A. Zahorian, “A New Robust Algorithm for Isolated Word Endpoint Detection,” IV-4161 International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, pp.13-17, May 2002.
[15] 蔡炎興，“關鍵詞萃取及語者辨識系統之研製”，國立中央大學碩士論文，中華民國九十二年六月。
[16] K. F. Lee, “Large-Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System,” Ph.D Dissertation, Computer Science Department, Carnegie Mellon University, Apr. 1988.
[17] R. C. Rose & D. B. Paul, “A hidden Markov model based keyword recognition system,” ICASSP 1990.
[18] J. G. Wilpon, L. R. Rabiner, C. H. Lee, E. R. Goldmn, “Automatic recognition of keyword in unconstrained speech using hidden Markov models,” IEEE Trans. ASSP Vol.38, No.11, Nov. 1990.
[19] J. G. Wilpon, et al., “Automatic Recognition of Keyword in Unconstrained Speech Using Hidden Markov Models,” IEEE ASSP Magazine, Vol.38, pp. 1870-1878, 1990.
[20] Hapeshi, K., “Design guidelines for using speech in interactive multimedia systems,” inc. Baber and J. M. Noyes(eds), Interative Speech Technology (London: Taylor & Francis), pp.177-188, 1993.
[21] Q. Li, A. Tsai, Jinsong Zheng and Qiru Zhou, “Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition,” IEEE, Transations on Speech and Audio Processing, vol.10, No.3, March 2002.
[22] Nobuo Hataoka, Yasunari Obuchi, Teruko Mitamura, Eric Nyberg, “Robust Speech Dialog Interface for Car Telematics Service,” Consumer Communications and Networking Conference, First IEEE, CCNC 2004.
[23] Fengqin Yang, Changhai Zhang, “An Effective Hybrid Optimization Algorithm for HMM,” vol.4, pp.80-84, ICNC 2008.
[24] 黃國彰，“關鍵詞萃取與確認之研究”，國立中央大學碩士論文，中華民國八十五年六月。
[25] 王國榮，“Visual Basic 6.0與Windows API講座”，旗標，1998.
[26] 王小川，“語音訊號處理”，全華，2007.
[27] 葉人豪，林新德，郭雅秀，“多媒體槪論”，學貫行銷，2007.

指導教授

莊堯棠(Yau-tarng Juang)

審核日期

2010-6-19

推文