摘要(英) |
This paper’s goal is to research voice reorganization technique and to develop a speech keyword spotting system which can be working on any operation system and have the feature of probability and easy to use. This system are consist of three part, voice data reading program and keyword spotting program are working in the Microsoft Windows XP SP system, and develop platform is Borland C++ Builder 5. Speech keyword reorganization program is developed by HTK 3.3 and working in the Linux Fedora 5system.
In this system we use HTK to develop HMM and to build the acoustics model, and we use 411 syllables which is build by 21 initials and 36 finals to develop a acoustics model which HMM state and mixtures is 6 and 17. In this model the training speech detection ratio must reach 92%, false alarm rate must under 13%. In the practical keywod model speech material input experiment, the differential between detection ratio and false alarm ratio keep in 3%, and detection ratio must reach 89%, false alarm rate under 16%.
Finally we will use this model to build a speech keyword spotting reorganization system, and we will design a human interface program to provide to the operator, so that they can easy to use this system.
|
參考文獻 |
[1] Steve Young, Gunnar Evermann, Thomas Hain, Dan Kershaw, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, Valtcho Valtchev, Phil Woodland, The HTK Book ( for HTK version 3.3), Cambridge University Engineering Department, 2005.
[2] http://www.speech.kth.se/wavesurfer/index.html
[3] L. R. Rabiner and B. H. Juang, “Fundamentals of speechrecognition,” Prentice Hall, New Jersey, 1993.
[4] L. R. Rabiner and R. W. Schafer, “Digital processing of speech recognition signals,” Prentice-Hall Co. Ltd, 1978.
[5] Steven Young, Gunnar Evermann, Dan Kershaw, Gareth Moore,Julian Odell, Dave Ollason, Valtcho Valtchev and Phil Woodland, The HTK Book (for HTK Version 3.1), Cambridge University Engineering Department, 2001.
[6] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[7] Eric Chang, Jianlai Zhou, Shuo Di, Chao Huang, Kai-Fu Lee, “Large Vocabulary Mandarin Speech Recognition with Different Approaches in Modeling Tones”, International Conference on Spoken Language Processing, ICSLP’00, pp.983-976, 2000.
[8] Tranzai Lee, Fang Zheng, Wenhu Wu, “Reference Point Alignment Frequency Warp Method for Speaker Adaptation”, International Conference on Signal Pocessings, ICSP’02, pp.756-759, 2000.
[9] Xuedong Huang, Alex Acero, Hsiao-Wuen Hon, Spoken language processing, Prentice Hall, 2001.
[10] Berlin Chen, Hsin-min Wang, Lee-feng Chien and Lin-shan Lee,“A*-admissible key-phrase spotting with sub-syllable level utterance verification,” in Proc. International Conference on Spoken Language Processing (ICSLP98), Sydney, Australia, Dec 1998.
[11] 呂儲仰,國語連續音節辨認系統之改進與分析,國立交通大學碩士論文,2002。
[12] 李健平,語音辨認應用於PDA 之作業控制研究,私立中原大學碩士論文,2001。
[13] 許志文,“國語關鍵詞擷取與發音確認之研究",國立台灣大學碩士論文,中華民國八十九年。
[14] 許勝銘,“大詞彙客語語音辨識系統之初步研究",國立台灣科技大學碩士論文,中華民國九十六年一月十七日。
[15] 邱政湧,“標記傳遞模式應用於中文連續語音關鍵詞辨認系統",私立中原大學碩士論文,中華民國九十二年七月。
[16] 郭智超,“以音節為基礎之中文語音文件檢索系統的研究”,國立清華大學碩士論文,中華民國九十二年六月。
[17] 蔡炎興,“關鍵詞萃取及語者辨識系統之研製”,國立中央大學碩士論文,中華民國九十二年六月。
[18] 楊景嵐,“電話語音應用整合語者辨識與關鍵詞萃取”,國立中央大學碩士論文,中華民國九十三年六月。
[19] 張展嘉,“自由音節解碼在全文資訊檢索及語句辨識上之應用”,國立清華大學碩士論文,中華民國八十九年。
|