語音關鍵詞辨識擷取系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：17

、訪客IP：18.222.205.5

姓名

白育昇(Yu-sheng Pai) 查詢紙本館藏

畢業系所

通訊工程學系在職專班

論文名稱

語音關鍵詞辨識擷取系統
(A system for Keyword Spotting)

相關論文

★ 手機用之平面倒F天線設計	★ 跳頻通訊干擾系統之干擾效能模擬與研析
★ 無線電交叉定位法運用於多目標之研究	★ WCDMA及DVB-T之整合天線設計
★ WCDMA射頻前端設計	★ 寬頻衛星通訊系統在展頻技術運用之鏈路分析研究
★ 運用SIFT特徵進行光學影像目標識別	★ 多來源遙測影像融合與色差校正之研究
★ 適用於筆記型電腦之WiMAX天線研究	★ 應用於凱氏天線X頻段之低雜訊放大器設計
★ 適用於802.11a/b/g WLAN USB dongle曲折型單極天線設計改良	★ 應用於行動裝置上的雙頻(GPS/BT)天線
★ SDH設備單體潛伏性障礙效能分析與維運技術	★ 無風扇嵌入式觸控液晶平板系統小型化之設計
★ 自動化RFID海關通關系統設計	★ 發展軟體演算實現線性調頻連續波雷達測距系統之設計

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

本論文主要的目標為研究語音辨識相關技術，並加以實現出一套可移植性高、靈活性強、實用性好及辨識率佳的語音關鍵詞辨識擷取系統，此系統主要由三大部份構成，分別為語料讀取程式及關鍵詞語音擷取程式作業於Windows XP SP2作業系統下，以Borland C++ Builder 5為主要開發平台，語音關鍵詞辨識程式作業於Linux Fedora 5作業系統下，使用HTK 3.3工具進行開發。
在此系統中我們使用HTK工具開發HMM來建立聲學模型，並以21個聲母、36個韻母所組成的411個音節，訓練出一個以HMM狀態數、高斯混合數分別為6、17的最佳聲學模型，其訓練語料擷取率高達92%，假警報率低於13%。在進行非訓練語料實驗時，純關鍵詞模組其擷取率及假警報率更是維持僅各差約3%，分別為89%及16%。
最後以HMM狀態數、高斯混合數分別為6、17的聲學模型建構一套語音關鍵詞辨識擷取系統，並設計其介面程式提供使用者便於操作。

摘要(英)

This paper’s goal is to research voice reorganization technique and to develop a speech keyword spotting system which can be working on any operation system and have the feature of probability and easy to use. This system are consist of three part, voice data reading program and keyword spotting program are working in the Microsoft Windows XP SP system, and develop platform is Borland C++ Builder 5. Speech keyword reorganization program is developed by HTK 3.3 and working in the Linux Fedora 5system.
In this system we use HTK to develop HMM and to build the acoustics model, and we use 411 syllables which is build by 21 initials and 36 finals to develop a acoustics model which HMM state and mixtures is 6 and 17. In this model the training speech detection ratio must reach 92%, false alarm rate must under 13%. In the practical keywod model speech material input experiment, the differential between detection ratio and false alarm ratio keep in 3%, and detection ratio must reach 89%, false alarm rate under 16%.
Finally we will use this model to build a speech keyword spotting reorganization system, and we will design a human interface program to provide to the operator, so that they can easy to use this system.

關鍵字(中)

★ 語音辨識
★ 關鍵詞
★ 馬可夫

關鍵字(英)

★ HMM
★ speech
★ keywrod
★ spotting
★ HTK

論文目次

摘要 I
Abstract II
誌謝 III
目錄 IV
附圖目錄 VII
表格目錄 IX
第一章緒論 1
1.1 研究動機 1
1.2 研究目標 1
1.3 論文大綱 2
第二章語音辨識基本技術 4
2.1 HTK工具簡介 4
2.1.1 HTK運作簡介 5
2.1.2 字典編輯與文法規則簡介 5
2.2 特徵參數擷取 9
2.2.1 語音特徵參數擷取步驟 11
2.3 隱藏式馬可夫模型 13
2.4 聲學模型 15
2.5 HMM訓練流程與演算法 17
2.5.1 訓練流程 17
2.5.2 維特比搜尋演算法 18
第三章語音關鍵詞擷取系統建立 19
3.1 系統開發環境 19
3.2 系統架構 19
3.3 語料庫簡介 20
3.4 特徵參數抽取係數 21
3.5 語音辨識模型建立 21
3.5.1 聲學模型建立 21
3.5.2 關鍵詞模型建立 22
3.5.3 無關鍵詞模型建立 22
3.5 關鍵詞擷取架構 23
3.6 語音辨識使用HTK工具訓練 24
3.7 語音辨識使用HTK工具辨識 27
第四章實驗與結果 29
4.1 實驗環境 29
4.1.1 實驗設備 29
4.1.2 實驗語料 29
4.2 擷取率與假警報率 31
4.3 關鍵詞擷取實驗 31
4.3.1 訓練語料之HMM狀態數與高斯混合數組合 31
4.3.2 非訓練語料實測 36
4.4 實驗方法與結果比較 39
4.4.1 語料庫比較 39
4.4.2 研究方法及相關參數比較 40
4.4.3 實驗結果比較 41
4.4.4 系統運作比較 44
4.5 系統實現 46
第五章結論與未來展望 50
5.1 結論 50
5.2 未來展望 50
參考文獻 52

參考文獻

[1] Steve Young, Gunnar Evermann, Thomas Hain, Dan Kershaw, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev, Phil Woodland, The HTK Book ( for HTK version 3.3), Cambridge University Engineering Department, 2005.
[2] http://www.speech.kth.se/wavesurfer/index.html
[3] L. R. Rabiner and B. H. Juang, “Fundamentals of speechrecognition,” Prentice Hall, New Jersey, 1993.
[4] L. R. Rabiner and R. W. Schafer, “Digital processing of speech recognition signals,” Prentice-Hall Co. Ltd, 1978.
[5] Steven Young, Gunnar Evermann, Dan Kershaw, Gareth Moore,Julian Odell, Dave Ollason, Valtcho Valtchev and Phil Woodland, The HTK Book (for HTK Version 3.1), Cambridge University Engineering Department, 2001.
[6] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[7] Eric Chang, Jianlai Zhou, Shuo Di, Chao Huang, Kai-Fu Lee, “Large Vocabulary Mandarin Speech Recognition with Different Approaches in Modeling Tones”, International Conference on Spoken Language Processing, ICSLP’00, pp.983-976, 2000.
[8] Tranzai Lee, Fang Zheng, Wenhu Wu, “Reference Point Alignment Frequency Warp Method for Speaker Adaptation”, International Conference on Signal Pocessings, ICSP’02, pp.756-759, 2000.
[9] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, Spoken language processing, Prentice Hall, 2001.
[10] Berlin Chen, Hsin-min Wang, Lee-feng Chien and Lin-shan Lee,“A*-admissible key-phrase spotting with sub-syllable level utterance verification,” in Proc. International Conference on Spoken Language Processing (ICSLP98), Sydney, Australia, Dec 1998.
[11] 呂儲仰，國語連續音節辨認系統之改進與分析，國立交通大學碩士論文，2002。
[12] 李健平，語音辨認應用於PDA 之作業控制研究，私立中原大學碩士論文，2001。
[13] 許志文，“國語關鍵詞擷取與發音確認之研究＂，國立台灣大學碩士論文，中華民國八十九年。
[14] 許勝銘，“大詞彙客語語音辨識系統之初步研究＂，國立台灣科技大學碩士論文，中華民國九十六年一月十七日。
[15] 邱政湧，“標記傳遞模式應用於中文連續語音關鍵詞辨認系統＂，私立中原大學碩士論文，中華民國九十二年七月。
[16] 郭智超，“以音節為基礎之中文語音文件檢索系統的研究”，國立清華大學碩士論文，中華民國九十二年六月。
[17] 蔡炎興，“關鍵詞萃取及語者辨識系統之研製”，國立中央大學碩士論文，中華民國九十二年六月。
[18] 楊景嵐，“電話語音應用整合語者辨識與關鍵詞萃取”，國立中央大學碩士論文，中華民國九十三年六月。
[19] 張展嘉，“自由音節解碼在全文資訊檢索及語句辨識上之應用”，國立清華大學碩士論文，中華民國八十九年。

指導教授

林嘉慶、蔡木金

審核日期

2009-7-8

推文