語音門禁系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：24

、訪客IP：3.21.241.17

姓名

呂易宸(Yi-Chen Lyu) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

語音門禁系統
(Speech Access System based on Speaker Identification)

相關論文

★ 小型化 GSM/GPRS 行動通訊模組之研究	★ 語者辨識之研究
★ 應用投影法作受擾動奇異系統之強健性分析	★ 利用支撐向量機模型改善對立假設特徵函數之語者確認研究
★ 結合高斯混合超級向量與微分核函數之語者確認研究	★ 敏捷移動粒子群最佳化方法
★ 改良式粒子群方法之無失真影像預測編碼應用	★ 粒子群演算法應用於語者模型訓練與調適之研究
★ 粒子群演算法之語者確認系統	★ 改良式梅爾倒頻譜係數混合多種語音特徵之研究
★ 利用語者特定背景模型之語者確認系統	★ 智慧型遠端監控系統
★ 正向系統輸出回授之穩定度分析與控制器設計	★ 混合式區間搜索粒子群演算法
★ 基於深度神經網路的手勢辨識研究	★ 人體姿勢矯正項鍊配載影像辨識自動校準及手機接收警告系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本論文主要是設計一套可用於門禁之語音辨識系統，利用語者辨識技術，判斷輸入聲音是否為核可的使用者之聲音，並結合關鍵詞萃取技術，使系統可辨識出使用者及姓名，且再配合語音合成技術，讓系統不單是純文字的回應，而是模擬人聲之回應，之後經過程式語言包裝，建立一個人機介面的系統，方便使用者操作使用。
因為是門禁系統，需要達到即時或是線上的要求，因此使用到的方法所花費之時間必須考慮，無法將許多方法通通加入，沒辦法讓使用者等待太久才得知結果，所以在方法必須有所篩選，這當然對辨識率有一定程度的影響，但也只能以時間為先決條件，去選擇合適的演算法。在語者辨識部份，經過自行錄製的實驗測試，直接使用使用者的聲音各自建立專屬模型，效果會比經貝氏調適法調適後的模型好。而在關鍵詞部份，因為系統有可新增使用者之功能，所以不可能事先知道使用者姓名，然後針對使用者姓名做模型訓練，改成使用次音節模型，再串成對應的模型，省去各別訓練的時間提高實用性。
從自行測試的實驗結果得知，系統核可使用者人數 38 人，全部測試人數 40 人，有兩個人是模擬仿冒者情況進行測試，語者辨識率 94.9% ，錯誤接受率 0.8% ，關鍵詞辨識率 90.6% ，而平均辨識一句都各自約為 0.5 秒，辨識已可達即時之要求。

摘要(英)

The purpose of this thesis is to design a speech access system with speaker recognition technology which can determine whether the input sound of the user voice is valid or not. Combined with keywords spotting technology, the system can identify the name of users. And coupled with text-to-speech technology, the system uses not only a text but also human voice response. System built by Microsoft Foundation Classes (MFC) windows based interface is facilitated for the user to operate.
Because access control system needs to meet the requirements of real-time or online, as the result, the consumed time of used methods must take into account because users would not spend much time waiting for results. Therefore, methods must be selective since they affect the recognition rate and time seems to be regarded as the prerequisite element while selecting the appropriate algorithm.
There are 40 participants join this test, and there are 38 target users among them, while the other two are imposers. Speaker recognition rate is 94.9%, the false acceptance rate is 0.8%, and the keyword recognition rate is 90.6%. The average recognition sentences are about 0.5 seconds each. Identification has been up to the real-time requirements.

關鍵字(中)

★ 關鍵字擷取
★ 高斯混合模型
★ 最大事後機率

關鍵字(英)

★ Maximum a posterior
★ Gaussian Mixture Model
★ keywords spotting

論文目次

摘要．．．．．．．．．．．．．．．．．．．．．．．．． i
Abstract．．．．．．．．．．．．．．．．．．．．．．． ii
目錄．．．．．．．．．．．．．．．．．．．．．．．．． iii
附圖目錄．．．．．．．．．．．．．．．．．．．．．．． vi
附表目錄．．．．．．．．．．．．．．．．．．．．．．． viii
第一章緒論
1.1 研究動機．．．．．．．．．．．．．．．．．．．．． 1
1.2 研究目標．．．．．．．．．．．．．．．．．．．．． 2
1.3 語音辨識簡介．．．．．．．．．．．．．．．．．．． 2
1.3.1 動態時間軸校準．．．．．．．．．．．．．．．．． 3
1.3.2 類神經網路．．．．．．．．．．．．．．．．．．． 4
1.3.3 隱藏式馬可夫模型．．．．．．．．．．．．．．．． 5
1.4 門禁系統簡介．．．．．．．．．．．．．．．．．．． 6
1.5 語者辨識概述．．．．．．．．．．．．．．．．．．． 7
1.6 章節概要．．．．．．．．．．．．．．．．．．．．． 10
第二章語音處理與關鍵詞萃取
2.1 特徵參數擷取．．．．．．．．．．．．．．．．．．． 11
iii
2.2 隱藏式馬可夫模型．．．．．．．．．．．．．．．．． 16
2.3 聲學模型及訓練．．．．．．．．．．．．．．．．．． 18
2.4 關鍵詞萃取．．．．．．．．．．．．．．．．．．．． 23
2.4.1 關鍵詞萃取架構．．．．．．．．．．．．．．．．． 24
2.4.2 一階動態演算法．．．．．．．．．．．．．．．．． 27
2.4.3 關鍵詞辨識流程．．．．．．．．．．．．．．．．． 29
2.5 關鍵詞確認．．．．．．．．．．．．．．．．．．．． 30
2.5.1 關鍵詞確認流程．．．．．．．．．．．．．．．．． 31
2.5.2 次音節的假設測試．．．．．．．．．．．．．．．． 33
2.5.3 關鍵詞確認的信任測度．．．．．．．．．．．．．． 34
第三章語者辨識與確認
3.1 語音模型建立．．．．．．．．．．．．．．．．．．． 36
3.1.1 高斯混合模型．．．．．．．．．．．．．．．．．． 36
3.1.2 向量量化．．．．．．．．．．．．．．．．．．．． 38
3.1.3 期望值最大化演算法．．．．．．．．．．．．．．． 40
3.2 語者模型調適．．．．．．．．．．．．．．．．．．． 42
3.2.1 通用背景模型．．．．．．．．．．．．．．．．．． 43
3.2.2 貝式調式法．．．．．．．．．．．．．．．．．．． 44
3.3 語者識別．．．．．．．．．．．．．．．．．．．．． 47
iv
3.4 語者確認．．．．．．．．．．．．．．．．．．．．． 48
第四章語音門禁系統架構及結果
4.1 實驗環境．．．．．．．．．．．．．．．．．．．．． 51
4.2 系統架構．．．．．．．．．．．．．．．．．．．．． 52
4.3 系統流程．．．．．．．．．．．．．．．．．．．．． 56
4.4 系統實驗．．．．．．．．．．．．．．．．．．．．． 59
4.4.1 語者辨識實驗．．．．．．．．．．．．．．．．．． 59
4.4.2 關鍵詞萃取實驗．．．．．．．．．．．．．．．．． 65
4.5 相關文獻．．．．．．．．．．．．．．．．．．．．． 68
第五章結論與未來展望
5.1 結論．．．．．．．．．．．．．．．．．．．．．．． 70
5.2 未來展望．．．．．．．．．．．．．．．．．．．．． 71
參考文獻．．．．．．．．．．．．．．．．．．．．．．． 72
附錄．．．．．．．．．．．．．．．．．．．．．．．．． 82

參考文獻

[1] X. Huang, A. Acero and H. W. Hon, Spoken Language Processing, Prentice Hall, 2001.
[2] G. R. Doddington, “Speaker recognition—Identifying people by their voices,” Processing of the IEEE, vol. 73, pp. 1651-1664, 1985.
[3] D. O'Shaughnessy, “Speaker recognition,” ASSP Magazine , IEEE, vol. 3, pp. 4-17, 1986.
[4] J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles, Addison Wesley, 1974.
[5] L. S. Lee and Y. Lee, “Voice Access of Global Information for Broad-Band Wireless: Technologies of Today and Challenges of Tomorrow,” Proceedings of the IEEE, vol. 89, no. 1, pp. 41-57, January 2001.
[6] L. Zao, A. Alcaim and R. Coelho, “Robust Access based on Speaker Identification for Optical Communications Security,” Digital Signal Processing, 2009 16th International Conferenxe on, pp. 1-5, 2009.
[7] Wahyudi, W. Astuti and S. Mohamed, “A Comparison of Gaussian Mixture and Artificaial Neural Network Models for Voiced-based Access Control System of Building Security,” Information Technology, 2008. ITSim 2008. International Symposium on, vol. 3, pp. 1-8, 2008.
[8] 蔡仲齡，“含語者驗證之小型場所人臉辨識門禁系統的研發＂，國立成功大學碩士論文，中華民國九十七年七月。
[9] X. D. Huang and K. F. Lee, “On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 1, pp 150-157, 1993.
[10] H. Sakoe and S. Chiba, “Dynamic Programming Algorithm Optimization for Spoken Word Recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 26, pp. 43-49, 1978.
[11] C. Myers, L. Rabiner and A. Rosenberg, “Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 28, pp. 623-635, 1980.
[12] D. P. Morgan and C. L. Scofield, Neural Network and Speech Processing, Kluwer Academic, 1991.
[13] P. Pujol, S. Pol, C. Nadeu and A. Hagen, “Comparison and Combination of Features in a Hybrid HMM/MLP and a
HMM/GMM Speech Recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 13, pp. 14-22, 2005.
[14] W. Dong-Liang, W.W.Y. Ng, P.P.K. Chan and D. Hai-Lan, “Access control by RFID and face recognition based on neural network,” ICMLC, 2010 International on, vol. 2, pp. 675-680, 2010.
[15] S. Jieun and K. Howon, “The RFID Middleware System Supporting Context-Aware Access Control Service,” ICACT 2006, vol. 1, pp. 863-866, 2006.
[16] Y. Gizatdinova and V. Surakka, “Feature-Based Detection of Facial Landmarks from Neutral and Expressive Facial Images,” Pattern Analysis and Machine Intelliqence, IEEE Transactions on, vol. 28, pp. 135-139, 2006.
[17] P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” Pattern Analysis and Machine Intelliqence, IEEE Transactions on, vol. 19, pp. 711-720, 1997.
[18] M. J. Er, S. Wu, J. Lu and H. L. Toh, “Face Recognition With Radial Basis Function (RBF) Neural Network,” Neural Networks, IEEE Transactions on, vol. 13, no. 3, pp. 697-710, 2002.
[19] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” Speech and Audio Processing, IEEE Transactions on, vol. 3, pp. 72-83, 1995.
[20] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, New Jersey, 1993.
[21] 王小川，“語音訊號處理＂，全華，民國九十三年三月。
[22] R. Vergin, D. O’Shaughnessy and A. Farhat, “Generalized Mel Frequency Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, September 1999.
[23] John R. Deller, Jr. , John G Proakis and John H. L. Hansen, Discrete-Time Processing of Speech Signals, 1987.
[24] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition,” Proceedings of the IEEE, vol. 77, no. 2, Feb. 1989.
[25] S. E. Levinson, L. R. Rabiner and M. M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Function of a Markov Process to Automatic Speech Recognition,” The Bell System Technical Journal, vol. 62, no. 4, April 1983.
[26] Changsheng Ai, Xuan Sun, Honghua Zhao and Xueren Dong, “Pipeline damage and leak sound recognition based on HMM,” Proceedings of the 7th World Congress on Intelligent Control and Automation, pp. 1940-1944, June. 2008.
[27] 蔡永琪，“基於次音節單元之關鍵詞辨識＂，國立中央大學碩士論文，中華民國八十四年六月。
[28] M.-W. koo, C.-H. Lee and B.-H. Juang, “Speech Recognition and Utterance Verification Based on a Generalized Confidence Score,” IEEE Trans. on Speech and Audio Processing, vol. 9, no. 8, pp. 821-832, Nov. 2001.
[29] J. Zhi-Hua and Y. Zhen, “Voice conversion using Viterbi algorithm based on Gaussian mixture model,” ISPACS 2007, pp. 32-35, Nov. 2007.
[30] 黃國彰，“關鍵詞萃取與確認之研究＂，國立中央大學碩士論文，中華民國八十五年六月。
[31] 王維邦，“連續國語語音關鍵詞萃取系統之研究與發展＂，國立中央大學碩士論文，中華民國八十六年六月。
[32] H. Bourlard, B. D’hoore and J. M. Boite, “Optimizing recognition and rejection performance in wordspotting systems,” ICASSP-94, vol. 1, pp. I/373-I/376, 1994.
[33] H. Ney, “The use of a one stage dynamic programming algorithm for connected word recognition,” IEEE Trans. on Acoustic, Speech Signal, Processing, vol. 32, no. 2, pp. 263-271, April 1984.
[34] W. Jhing-Fa, W. Chung-Hsien, H. Chaug-Ching and L. Jau-Yien, “Integrating Neural Nets and One-Stage Dynamic Programming for Speaker Independent Continuous Mandarin Digit Recognition,” Acoustics, Speech, and Signal Processing, 1991, vol. 1, pp. 69-72, Apr 1991.
[35] J. Neyman and E. S. Pearson, “On the problem of the most efficient tests of statistical hypotheses,” phil. Trans. R. Soc. Lond. A, vol. 231, pp. 289-337, 1933.
[36] J. Neyman and E. S. Pearson, “On the use and interpretation of certain test criteria for purpose of statistical inference,” Biometrika, pt I, vol. 20A, pp. 175-240, 1928.
[37] T. Kawahara, C.-H. Lee and B.-H. Juang, “Flexible Speech Understanding Based on Combined Key-Phrase Detection and Verification,” IEEE Trans. on Speech and Audio Processing, vol. 6, no. 6, pp. 558-568, Nov. 1998.
[38] Tatsuya Kawahara, C.-H. Lee and B.-H. Juang, “Combining Key-Phrase Detection and Subword-Based Verification For Flexible Speech Understanding,” Proc IEEE Int. Conf. Acoustic, Speech, Signal Processing, vol. 2, pp. 1159-1162, Munich Germany, May. 1997.
[39] T. Chee-Ming, S.-H. Salleh, T. Tian-Swee and A. K. Ariff, “Text Independent Speaker Identification Using Gaussian Mixture Model,” ICIAS 2007, pp. 194-198, Nov. 2007.
[40] 黃夢晨，“最小錯誤鑑別式應用於語者辨識之競爭語者探討＂，國立中央大學碩士論文，中華民國九十七年六月。
[41] F. Soong, A. Rosenberg, L. Rabiner and B. Juang, “A vector quantization approach to speaker recognition,” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ‘85, vol. 10, pp. 387-390, 1985.
[42] Y. Linde, A. Buzo and R. Gray, “An Algorithm for Vector Quantizer Design,” Communications, IEEE Transactions on, vol. 28, no. 1, pp. 84-95, 1980.
[43] T. K. Moon, “The Expectation-Maximization Algorithm,” IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, November 1996.
[44] S. Z. Selim and M. A. Ismail, “K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 81-87, Jan. 1984.
[45] A. P. Dempster, N. M. Laird and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1-38, 1977.
[46] D. A. Reynolds, “Comparison of background normalization methods for text-independent speaker verification,” EUROSPEECH ‘97, 5th European Conference on Speech Communication and Technology, pp. 963-966, 1997.
[47] J.-L. Gauvain and L. Chin-Hui, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” Speech and Audio Processing, IEEE Transactions on, vol. 2, pp. 291-298, 1994.
[48] C. J. Leggetter and P. C. Woodland, “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, ” Computer Speech & Language, vol. 9, pp. 171-185, 1995.
[49] A. Sankar and L. Chin-Hui, “A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition, ” Speech and Audio Processing, IEEE Transactions on, vol. 4, pp. 190-202, May 1996.
[50] O. Siohan, C. Chesta and Lee Chin-Hui, “Joint maximum a posteriori adaptation of transformation and HMM parameters,” Speech and Audio Processing, IEEE Transactions on, vol. 9, pp. 417-428, 2001.
[51] D. A. Reynolds, T. F. Quatieri and R. B. Dunn, “Speaker verification using Adapted Gaussian mixture models, ” Digital Signal Processing, vol. 10, pp. 19-41, 2000.
[52] 范世明，“高斯混合模型在語者辨識與國語語音辨認之應用＂，國立交通大學碩士論文，中華民國九十一年。
[53] 位元文化，“精通MFC視窗程式設計-Visual Studio 2008版＂，文魁資訊，民國九十七年。
[54] R. F. Raposa，“C++與MFC視窗程式設計＂，陳智湧、歐世亮和林志偉譯，文魁資訊，民國九十七年。
[55] 溫家誠，“多媒體應用之語音辨識系統＂，國立中央大學碩士
文，中華民國九十七年六月。

指導教授

莊堯棠(Y.-T. Juang)

審核日期

2011-7-20

推文