博碩士論文 92532009 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:7 、訪客IP:3.85.245.126
姓名 鄭復榕(Fu-Rong Cheng)  查詢紙本館藏   畢業系所 資訊工程學系在職專班
論文名稱 使用視位與語音生物特徵作即時線上身分辨識
(A Real-Time Web-Based Personal Authentication System with Visemes and Acoustic Biometric Features)
相關論文
★ 以影像為基礎之SMD包裝料帶對位系統★ 手持式行動裝置內容偽變造偵測暨刪除內容資料復原的研究
★ 基於SIFT演算法進行車牌認證★ 基於動態線性決策函數之區域圖樣特徵於人臉辨識應用
★ 基於GPU的SAR資料庫模擬器:SAR回波訊號與影像資料庫平行化架構 (PASSED)★ 利用掌紋作個人身份之確認
★ 利用色彩統計與鏡頭運鏡方式作視訊索引★ 利用欄位群聚特徵和四個方向相鄰樹作表格文件分類
★ 筆劃特徵用於離線中文字的辨認★ 利用可調式區塊比對並結合多圖像資訊之影像運動向量估測
★ 彩色影像分析及其應用於色彩量化影像搜尋及人臉偵測★ 中英文名片商標的擷取及辨識
★ 利用虛筆資訊特徵作中文簽名確認★ 基於三角幾何學及顏色特徵作人臉偵測、人臉角度分類與人臉辨識
★ 一個以膚色為基礎之互補人臉偵測策略★ 利用指紋紋路分佈順序及分佈模型作指紋自動分類
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 音素是語言的基本聲音的要素,而視位是嘴唇對一個字的發音外觀。根據研究顯示[8][9],發音的特點或方式足以作為個人身分辨識的依據,因此本系統將使用這些生物說話特徵作為個人身分辨識系統之主要特徵。於本研究中,主要的生物說話特徵包含有臉部正面的視覺資訊[1][2][3]、視位[4]與音素[4]。本研究將提出一個以使用中文的生物說話特徵作為身分辨識依據之web-based個人發音身分辨識系統,使用者只需要配備一台web camera即可以進行身分註冊與系統登入。
本研究的主要目標是︰比較並分析這些生物說話特徵,使用GMM建立身分辨識分類器,並比較分析這些生物說話特徵應用於分類器的效能,最後建立一個個人身分認證機制的決策分類器。針對所提出的身份辨識系統設計方面,因為考量主從系統的擴充性與網路傳輸效能,本篇所採用的方法可能會犧牲一部分的辨識率;實驗結果顯示,本研究使用的生物特徵、辨識方法、系統設計架構,足以在合理的回應時間內處理320*240以上解析度的人臉正面的視訊資料,辨識率可以達到80%以上。
摘要(英) Phonemes are the basic acoustic elements of a language, and the visemes are the lip shapes of a word while speaking. Since the uttering characteristic/manner is unique to each individual [8] [9], a personal authentication system can fully make use of the biometric information. The biometric uttering features include lip visual features [1-3], visemes [4], and phonemes [4] of an individual. In this thesis, we propose an effectively web-based personal authentication system by utilizing filter model with biometric uttering features in Chinese, that is a user equipped with a web camera can register and log in the system later by the registered biometric uttering features.
In our work, we compare and analyze these biometric features, then build 2 GMM classifier, and finally fuse the outputs to a personal authentication system. The consideration of system scalability and networking bandwidth may more or less sacrifice the recognition rate. However, experiments results shows that the proposed system can handle data stream of personal frontal view in 320*240 pixels with 30*15 pixels of lip in reasonable response time and satisfactory recognition rate above 80% can be achieved.
關鍵字(中) ★ 視位 關鍵字(英) ★ biometric information
★ GMM classifier
★ visemes
論文目次 Table of Contents
Chapter 1 Introduction 1
Chapter 2 Related Works 4
2-1. Visual feature selection in speaker recognition 4
2-1-2. Model -based features 6
2-1-3. Template -based features 7
2-2. Acoustic feature selection in speaker recognition 9
2-3. A Review of recognition methodologies in speaker recognition 9
2-3-1.Algorithms of speaker recognition 11
2-3-2.Fusion mechanism of speaker recognition 11
Chapter 3 Overview Of The Proposed System 15
3-1. Visual and acoustic information pre-processing 17
3-1-1. Visual information pre-processing 17
3-1-2. Acoustic information pre-processing 26
3-2. Visual and acoustic feature selection 28
3-2-1. Feature selection of visual information 31
3-2-2. Feature selection of acoustic information 33
3-3. Recognition engine(s) and the algorithms used 35
3-3-1. Algorithms 36
3-3-2. Lately fusion model 36
3-4. System architecture of personal biometric authentication 37
Chapter 4 Simulation Experiments 42
4-1. Testing by data groups 46
4-2. Testing in three types of pixel resolutions 49
Chapter 5 Conclusions And Future Works 53
5-1. Conclusions 53
5-2. Future works 54
References 56
Appendix A: The Testing Log 59
Appendix B: The Detailed Testing Data 72
List of Figures
Figure 1. The workflow diagram of the RWPAS 16
Figure 2. The visual feature pre-processing workflow diagram of the RWPAS 21
Figure 3. Lip Image pre-processing-1 22
Figure 4. Lip Image pre-processing-2 23
Figure 5. Face extraction 23
Figure 6. Geometric information of lip 24
Figure 7. Shape information of lip (Ellipse model) 24
Figure 8. Inner lip information 25
Figure 9. Movement information of lip and inner lip region 25
Figure 10. The pre-processing of acoustic information 28
Figure 11. The basic network services of RWPAS 37
Figure 12. The workflow diagram of the RWPAS- Registry stage-1 38
Figure 13. The workflow diagram of the RWPAS- Registry stage-2 39
Figure 14. The workflow diagram of the RWPAS- enroll stage-1 40
Figure 15. The workflow diagram of the RWPAS- enroll stage-2 41
Figure 16. Image samples from extracted-success cases in three types of pixel resolutions 44
Figure 17. Image samples from extracted-failed cases 44
Figure 18. Images in different pixel resolutions 45
Figure 19. Images in low pixel resolutions 45
List of Tables
Table 1. A comparison of the four references 13
Table 2. Technologies of Image processing 19
Table 3. Technologies of voice processing for acoustic information pre-processing 27
Table 4. A check list [5] for feature selection 29
Table 5. The inside test results of visual features 46
Table 6. The outside test results of visual features-1 48
Table 7. The outside test results of visual features-2 49
Table 8. The FRR, FAR, and recognition rate in different pixel resolutions 51
Table 9. The FRR, FAR, and recognition rate in different data stream 51
參考文獻 References
1. L.L. Mok, W.H. Lau, S.H. Leung, S.L. Wang and H. Yan,“ Lip features selection with application to person authentication”, Proc of IEEE ICASSP, vol. 3, pp. 397-400, May 2004.
2. J.O. Kim, W. Lee, J. Hwang, K.S. Baik, C.H. Chung, “Lip print recognition for security systems by multi-resolution architecture”, Future Generation Comp. Syst. vol. 20, no. 2,pp.295-301, 2004
3. M.N. Kaynak, Q. Zhi, A.D. Cheok, K. Sengupta, Z. Jian and K. C. Chung, “Analysis of Lip Geometric Features for Audio-Visual Speech Recognition”, IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems & Humans, vol. 34, no. 4, pp. 564–570, July 2004.
4. Say Wei Foo, Yong Lian, and Liang Dong. “Recognition of visual speech elements using adaptively boosted hidden Markov models”, IEEE Transations on Circuits and Systems for Video Technology, vol. 14,no. 5,pp. 693-705, May 2004.
5. Guyon and A. Elisseeff, “An introduction to variable and feature selection”, Journal of Machine Learning Research, vol. 3, pp. 1157-1182, 2003.
6. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Trans. Speech, Audio Processing, vol. 3, no. 1, pp. 72-83, January 1995.
7. H. Pan, S.E. Levinson, T.S. Huang, Z.P. Liang, “A Fused Hidden Markov Model With Application to Bimodal Speech Processing”, IEEE Transactions on Signal Processing, vol. 52, no. 3, March 2004
8. X.Zhang, R.M. Mersereau and M. Clements, “Automatic speechreading with application to speaker verification”, Proc of ICASSP, vol.1 , pp.685-688, May 2002
9. J. Luettin, N.A. Thacker and S.W. Beet, “Active Shape Models for Visual Speech Feature Extraction”, Speechreading by Humans and Machines: Models, Systems, and Applications, Springer-Verlag, New York, pp. 383-390, 1996.
10. S. W. Foo and L. Dong, “Recognition of visual speech elements using hidden Markov models,” in 3rd IEEE Pacific Rim Conf. Multimedia, vol. LNCS 2532, pp. 607–614, December 2002.
11. J.S. Jang, "Data Clustering and Pattern Recognition," (in Chinese) available at the links for on-line courses at the author's homepage at http://www.cs.nthu.edu.tw/~jang.
12. J.M. Zhang, L.M. Wang, D.J. Niu, Y.Z. Zhan, “Research and implementation of a real time approach to lip detection in video sequences,” Machine Learning and Cybernetics, 2003 International Conference, vol. 5,pp. 2795- 2799 , Nov. 2003
13. D.A. Reynolds, R.C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Trans. Speech Audio Processing, vol. 3, no.1, pp. 72-83, 1995
14. David J.C. Mackay, “Introduction to Gaussian Processes”, Department of Physics, Cambridge University. May, 1998.
15. R.G. Bachrach, A. Navot, N. Tishby, “Large Margin Principals for Feature Selection - a Tutorial”. Ross Moore, Mathematics Department, Macquarie University, Sydney.
16. Cortes, C. and V. Vapnik, “Support-vector network”, Machine Learning, vol. 20, pp. 273–297, 1995
17. R. Begleiter, R.E. Yaniv, G. Yona, “On Prediction Using Variable Order Markov Models ”, Journal of Artificial Intelligence Research”, vol. 22 , pp. 385-421, 2004
18. R. Meir, R.E. Yaniv, S.B. David,“ Localized boosting”, In Proceedings of the 13th Annual Conference on Computational Learning Theory, pp.190-199, 2000.
19. S.W. Foo, Y. Lian, L. Dong,“ Recognition of visual speech elements using adaptively boosted hidden Markov models”, IEEE Trans. Circuits Syst. Video Techn. Vol. 14, no. 5, pp 693-705, 2004
指導教授 范國清(Kuo-Chin Fan) 審核日期 2005-7-19
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明