基於稀疏表示之人臉驗證與唇語辨識系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：59

、訪客IP：3.19.28.64

姓名

許徑嘉(Ching-chia Hsu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於稀疏表示之人臉驗證與唇語辨識系統
(Face Verification and Lip Reading Systems based on Sparse Representation)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

人臉驗證的應用範圍很廣，如何將其用於真實世界一直是眾多學者研究的議題，我們對人臉擷取SIFT參數，其對於旋轉、平移和尺度皆有不變的特性，並用其來建立稀疏表示的字典，藉由K-means以及資訊理論，我們提出兩種擴增字典的方法，實驗結果顯示，藉由擴增字典，可以有效的增加稀疏係數的稀疏性，並改善驗證率以及重建訊號的殘餘值。本論文利用BCS求解最佳化問題，相較於以往的OMP演算法，BCS除了求解最佳化問題外，所獲得的共變異數可以用於改善遞增字典，以降低觀測向量的不確定性，實驗結果顯示，遞增字典確實可使重建訊號的殘餘值減少。

傳統唇語辨識都是用ASM或AAM取得唇形作為參數，可能會遺失部分有用的資訊，本論文考慮唇語的整體影像，利用SIFT作為參數，藉由BOF，可以將多個SIFT特徵點轉化為向量，並利用其訓練HMM模型。我們測試英文字母A~Z，其實驗結果也好於Baseline系統。

摘要(英)

Face verification has many applications. The critical problem which lots of researchers concern is how to apply to real-world. In order to robust orientation, translation and scaling of face images, we extract SIFT features of face images which is built dictionary of sparse representation. We propose two kinds of method to extend dictionary via K-means and information theory(extended dictionary and incremental dictionary). Experiments show that we can increase sparseness of sparse coefficients efficiently, also can improve verification rate and reconstruction error via extended dictionary. This paper utilize BCS to solve optimization problem. Compare to OMP algorithm, BCS not only can solve optimization problem but also can improve dictionary by covariance which can decrease uncertainty of observation vectors. Experiments show that incremental dictionary do increases residual of reconstruction error.
Lip reading has utilized ASM or AAM as features past few years. We concern that it might lose some useful information, therefore we consider whole image information by extracting SIFT features. In order to train HMM model via SIFT features, we utilize BOF to transform matrices of SIFT features into vectors. We experiment letters A-Z, and the result show that performance of proposed method is better than baseline systems.

關鍵字(中)

★ 稀疏表示

關鍵字(英)

★ sparse representation

論文目次

摘要 i
Abstract ii
圖目錄 iii
表目錄 v
章節目次 vi
第1章緒論 - 1 -
1.1 前言 - 1 -
1.2 研究動機與目的 - 2 -
1.3 論文架構 - 3 -
第2章文獻探討 - 5 -
2.1 Eigenface和Fisherface - 5 -
2.2 區域保留投影(Locality Preserving Projection, LPP) - 6 -
2.3 Histogram of Gabor Phase Pattern(HGPP) - 6 -
2.4 區域二元特徵(Local Binary Patterns, LBP) - 7 -
2.5 分類器(Classifier) - 7 -
第3章稀疏表示(Sparse Representation) - 8 -
3-1 稀疏表示問題 - 8 -
3-2 應用於人臉辨識之稀疏表示問題 - 9 -
第4章研究方法 - 12 -
4-1 貝式壓縮感測(Bayesian Compressive Sensing) - 12 -
4-1-1 稀疏事前機率(Sparseness Prior) - 12 -
4-1-2 透過Relevance Vector Machine估測稀疏係數 - 13 -
4-2 SIFT特徵參數 - 16 -
4-2-1 Detect scale-space extrema - 17 -
4-2-2 Keypoint localization - 20 -
4-2-3 Orientation assignment and Generate image descriptor - 21 -
4-3 建立字典 - 22 -
4-4 人臉驗證 - 23 -
4-5 擴增字典 - 25 -
4-5-1 K-Means群聚演算法 - 25 -
4-5-2 利用K-means建立擴增字典 - 26 -
4-6 人臉驗證演算法 - 28 -
4-7 遞增字典(Incremental Dictionary) - 29 -
第5章唇語辨識 - 32 -
5-1 Bag-of-Features(BOF) - 32 -
5-1-1 BOF應用於SIFT特徵參數 - 33 -
5-2 隱藏馬可夫模型 - 35 -
5-2-1 向前演算法(Forward Algorithm) - 37 -
5-2-2 EM演算法 - 37 -
5-3 Bayesian Sensing Hidden Markov Model - 39 -
第6章實驗結果 - 40 -
6-1 Baseline系統比較 - 41 -
6-1-1 Extended YaleB資料庫 - 41 -
6-1-2 LFW資料庫 - 45 -
6-2 不同群聚中心個數的比較 - 46 -
6-3 分類器效能比較 - 47 -
6-4 稀疏性(Sparseness)比較 - 49 -
6-5 遞增字典(Incremental Dictionary) - 53 -
6-5-1 遞增字典殘餘值比較 - 53 -
6-5-2 遞增字典與隨機字典比較 - 54 -
6-5-3 遞增字典收斂變化 - 55 -
6-6 唇語辨識 - 56 -
第7章結論與未來 - 57 -
參考文獻 - 58 -
附錄一 Extended YaleB資料庫 - 63 -
附錄二 LFW資料庫 - 65 -

參考文獻

[1] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009.
[2] P. Nagesh, and B. Li, “A compressive sensing approach for expression-invariant face recognition,” IEEE Conf. Computer Vision and Pattern Recognition., pp. 1518 – 1525, June 2009.
[3] Z. Zeng, H. Li, W. Liang, and S. Zhang, “Similarity- Towards image classification via kernelized sparse representation,” IEEE conf. Image Processing, pp. 277-280, Sept. 2010.
[4] W. Dong, L. Zhang: G. Shi, and X. Wu, “Image deblurring and super-resolution by adaptive sparse domain selection and adaptive,” IEEE trans. Signal Process., vol. 20, no. 20, pp. 1838-1857, Jul. 2011.
[5] J. Yang, J. Wright, T. A. Huang, and Y. Ma, “Image Super-Resolution Via Sparse Representation,” IEEE Trans Signal Process., vol. 19, no. 11, pp. 2861-2873, Nov. 2010.
[6] M. Elad and M. Aharon, “ Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans Signal Process., vol. 15, no. 12, pp. 3736-3745, Dec. 2006.
[7] P. Chatterjee and P. Milanfar, “Patch-based near-optimal image denoising,” IEEE Trans. Signal Process., vol. 21, no. 4, pp. 1635-1649, Apr. 2012.
[8] D. Donoho, “For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution,” Comm. On Pure and Applied Math, vol. 59, no. 6, pp. 797–829, 2006.
[9] E. Cand`es, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. on Pure and Applied Math, vol. 59, no. 8, pp. 1207–1223, 2006.
[10] E. Cand`es and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. Information Theory, vol. 52, no. 12, pp. 5406–5425, 2006.
[11] J. A. Tropp and A. C. Gilbert, “Signal recovery from partial information via orthogonal matching pursuit,” Apr. 2005, Preprint.
[12] D. L. Donoho, Y. Tsaig, I. Drori, and J.-C. Starck, “Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit,” Mar. 2006, Preprint.
[13] S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 33–61, 1999.
[14] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the Royal Statistical Society(Series B), vol. 58, pp. 267-288, 1996.
[15] S. Ji and Y. Xue, “Bayesian compressive sensing,” IEEE trans. Signal Processing, vol. 56, June 2008.
[16] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, pp. 211–244, 2001.
[17] T. M. Cover and J. A. Thomas, Elements of information theory. New York, NY: Wiley, 1991.
[18] M.A. Turk and A.P. Pentland, "Face recognition using eigenfaces," IEEE conf. Computer Vision and Pattern Recognition, pp.586-591, Jun. 1991.
[19] P.N Belhumeur, J.P. Hespanha, and D.J. Kriegman,, "Eigenfaces vs. Fisherfaces: recognition using class specific linear projection," IEEE Transactions, Pattern Analysis and Machine Intelligence, vol. 19, pp. 711-720, Jul. 1997.
[20] J. Wright, A. Wagner, A. Ganesh, Z. Zhou and Y. Ma, “ Towards a Practical Face Recognition System: Robust Registration and Illumination via Sparse Representation,” IEEE Computer Vision and Pattern Recognition, pp. 597-604, June 2009.
[21] D. Needell and R. Vershynin, “Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit,” IEEE J. Selected Topics Signal Process., vol. 4, no. 2, pp. 310-316, Apr. 2010
[22] E. Cand`es, “Compressive sampling,” in Proceedings of the International Congress of Mathematicians, 2006.
[23] L. W. Kang, C. Y. Hsu, H. W. Chen, C. S. Lu, C. Y. Lin and S. C. Pei, “Feature-Based Sparse Representation for Image Similarity Assessment,” in IEEE Transactions on Multimedia, vol. 13, no. 5, Oct. 2011.
[24] T. Ahonen, A. Hadid, and M. Pietika¨inen, “Face Description with Local Binary Patterns: Application to Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037-2041, Dec. 2006.
[25] L. Zhu, Y. L. Zhu, H. Mao, and M. H. Gu, “A new method for sparse signal denoising based on compressed sensing,” Int. Symp. Knowledge Acquisition and Modeling, 2009, pp. 35-38.
[26] S. G. Mallat and Z. F. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397-3415, Dec. 1993.
[27] J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory, vol. 53, pp. 4655-4666, 2007.
[28] D. Needell and R. Vershynin, “Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit,” IEEE J. Selected Topics Signal Process., vol. 4, no. 2, pp. 310-316, Apr. 2010
[29] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman, "From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose", IEEE Trans. Pattern Anal. Mach. Intelligence, vol. 23, no.6, pp. 643-660, 2001.
[30] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database for studying face recognition in unconstrained environments,” University of Massachusetts, Amherst, Tech. Rep. 07-49, October 2007, http://vis-www.cs.umass.edu/lfw/.
[31] J. Sivic and A. Zisserman, “Video Google: A text retrieval approach to object matching in videos,” in Proc. IEEE Int. Conf. Computer Vision, Nice, France, Oct. 2003, vol. 2, pp. 1470–1477.
[32] T.F. Cootes, G.J. Edwards, and C.J. Taylor, “Active Appearance Models,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, June 2001.
[33] T.F. Cootes, G.J. Edwards, and C.J. Taylor, “Active Appearance Models,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, June 2001.
[34] B. Zhang S. Shan, X. Chen and W. Gao, “Histogram of Gabor phase Patterns(HGPP): A Novel Object Representation Approach for Face Recognition,” IEEE Transactions on Image Processing, pp. 57-68, 2007.
[35] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, pp. 211–244, 2001.
[36] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, 2004.
[37] X. He, S. Yan Y. Ho, P. Niyogi and J. Zhang, “Face recognition using Laplacianfaces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 328-340, 2005.
[38] D. Cai, X. He, J. Han, and H. Zhang, “Orthogonal Laplacianfaces for face recognition,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3608-3614, 2006.
[39] B. Schölkopf, A. Smola, and K. R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998.
[40] S.Mika, G. Ra¨tsch, J.Weston, B. Scho¨lkopf, and K.-R.Mu¨ller, “Fisher Discriminant Analysis with Kernels,” Proc. IEEE Int’l Workshop Neural Networks for Signal Processing IX, pp. 41-48, Aug. 1999.
[41] J. Yang, D Zhang, A. Frangi, and J. Yang, “Two-dimensional PCA: A new approach to appearance-based face representation and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 131-137, 2005.
[42] H. Xiong, M. N. S. Swamy and M. O. Ahmad, “Two-dimensional FLD for face recognition,” Pattern Recognition, vol. 38, pp. 1121-1124, 2005.
[43] C. Liu and H. Wechsler, “Gabor Feature Based Classification Using the Enhanced Fisher Linear Discriminant Model for Face Recognition” IEEE Transaction on Image Processing, vol. 11, no. 4, pp. 467-476, 2002.
[44] J. Ho, M. Yang, J. Lim, K. Lee, and D. Kriegman, “Clustering appearances of objects under varying illumination conditions,” in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2003, pp. 11–18.
[45] C. Chang and C. Lin, LIBSVM: A Library for Support Vector Machines, 2001, Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[46] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. John Wiley & Sons, 2001.
[47] I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey, “Extraction of visual features for lipreading,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 2, pp. 198–213, 2002.
[48] G. Saon and J. T. Chien, “Bayesian Sensing Hidden Markov Models,” IEEE Trans. Audio, Speech and Language Processing, vol. 20, no. 1, January 2012.
[49] I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey, “Extraction of visual features for lipreading,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 2, pp. 198–213, 2002.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2013-8-26

推文