階層卷積神經網路的人臉偵測與辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：18.226.172.168

姓名

蕭寧諄(Ning-Chun Hsiao) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

階層卷積神經網路的人臉偵測與辨識
(Face detection and recognition based on a cascaded convolutional neural network)

相關論文

★ 適用於大面積及場景轉換的視訊錯誤隱藏法	★ 虛擬觸覺系統中的力回饋修正與展現
★ 多頻譜衛星影像融合與紅外線影像合成	★ 腹腔鏡膽囊切除手術模擬系統
★ 飛行模擬系統中的動態載入式多重解析度地形模塑	★ 以凌波為基礎的多重解析度地形模塑與貼圖
★ 多重解析度光流分析與深度計算	★ 體積守恆的變形模塑應用於腹腔鏡手術模擬
★ 互動式多重解析度模型編輯技術	★ 以小波轉換為基礎的多重解析度邊線追蹤技術(Wavelet-based multiresolution edge tracking for edge detection)
★ 基於二次式誤差及屬性準則的多重解析度模塑	★ 以整數小波轉換及灰色理論為基礎的漸進式影像壓縮
★ 建立在動態載入多重解析度地形模塑的戰術模擬	★ 以多階分割的空間關係做人臉偵測與特徵擷取
★ 以小波轉換為基礎的影像浮水印與壓縮	★ 外觀守恆及視點相關的多重解析度模塑

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來，在卷積神經網路 (convolutional neural network, CNN) 的發展帶動下，人臉偵測 (face detection) 和人臉辨識 (face recognition) 取得很大的進步；多種獨特且新穎的神經網路架構被提出以解決各種人臉偵測與辨識的問題。不同應用需要不同的架構，像是過海關只需要確認人臉；但在監視或門禁系統上，則大都需要在大畫面中先偵測人臉，再辨識人臉。
我們提出了一個結合人臉偵測和人臉辨識的卷積神經網路架構，人臉偵測是使用類似 Faster R-CNN 中的 RPN 架構先提出可能是人臉的候選區域，再透過一個從粗到細 (coarse-to-fine) 的階層卷積神經網路 (cascaded CNN) 來確認這些候選區域是否真的是人臉。我們使用 RPN 架構取代原本滑動窗口的提出候選區域方法，避免每一個位置、每一種大小都一一去試而花費太多時間，改用 RPN 後每張 1920x1080 影像的偵測只需 0.08 秒，相較於改動前的 0.18 秒，速度有明顯提升，而偵測效果則保持和改動前差不多。
完成人臉偵測後，我們接著使用 FaceNet 來提取辨識用的特徵。因為損失函數的定義方式，兩張臉分別得到的特徵間之距離就可以直接反應兩張臉的相似度；也就是說，我們可以只透過求特徵之間的距離就完成分類，不需要額外再使用複雜的分類器，這也使我們的系統就算更換辨識目標，也不用重新訓練網路參數。而我們的網路辨識準確率達到 97%，雖然相比需要重新訓練的網路準確率稍微低了一點，但是考慮到不用重新訓練的方便性，我們認為得到的好處明顯大於損失的準確率。

摘要(英)

In recent years, thanks to the development of CNN (convolutional neural network), researchers have made great progress on face detection and face recognition. Many unique and novel network structures have been proposed to solve different face detection or recognition problems. To use which network structure depends on the application, for example, we only need to perform face recognition on an image with only one face at customs. However, in monitoring or access control system, we need to perform face detection first to find where faces are and then recognize every faces.
We propose a CNN structure which combines face detection and face recognition. We use the RPN structure from Faster R-CNN to propose candidate regions which may be faces. We then use a coarse-to-fine cascaded CNN to check each candidate regions and filter out the regions which are not faces. By using RPN structure instead of using sliding widow to propose candidate region, we can avoid checking regions in every sizes and at every places one by one. The system needs only 0.08 seconds with RPN structure, compared to 0.18 seconds with sliding window method, we get better execution speed, and the detection capability remains nearly the same.
After finishing face detection, we then use FaceNet to extract features for recognition. Due to the definition of the loss function, the distance between two feature vectors extracted from two facial images can reflect the similarity of the two facial images. That is, we can recognize faces by only calculate the distance between feature vectors without using any complex classifiers, which allows us to use the same recognition system in different situations. The recognition accuracy of the proposed method can reach 97%, which is slightly lower than the methods that need to be retrained. However, considering the convenience of using the same recognition system without retraining, we think it’s still a great deal.

關鍵字(中)

★ 深度學習
★ 卷積神經網路
★ 人臉辨識
★ 人臉偵測

關鍵字(英)

論文目次

目錄

摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
第一章緒論 1
1.1 研究動機 1
1.2 系統架構 2
1.3 論文架構 7
第二章相關研究 8
2.1 人臉偵測 8
2.2 人臉辨識 14
第三章人臉偵測 18
3.1 提出候選區域 18
3.2 確認候選區域是否為人臉 25
第四章人臉辨識 33
4.1 特徵擷取 33
4.2 分類人臉 42
第五章實驗 46
5.1 實驗設備與環境 46
5.2 實驗結果展示 46
5.3 人臉偵測實驗與結果 49
5.4 人臉辨識實驗與結果 52
第六章結論 56
參考文獻 57

參考文獻

[1] M. Yang, D. Kriegman, and N. Ahuja, "Detecting faces in images: A survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, no.1, pp.34-58, 2002.
[2] G. Yang and T. Huang, "Human face detection in a complex background, "Pattern Recognition, vol.27, no.1, pp.53-63, 1994.
[3] C. Han, H. Liao, K. Yu, and L. Chen, "Fast face detection via morphology-based pre-processing," Pattern Recognition, vol.33, no.10, pp.1701-1712, 2000.
[4] D. Chai and K. Ngan, "Locating facial region of a head-and-shoulders color image," in Proc. Third IEEE Int. Conf. on Automatic Face and Gesture Recognition, Nara, Japan, Apr.14-16, 1998, pp.124-129.
[5] M. Augusteijn and T. Skufca, "Identification of human faces through texture-based feature recognition and neural network technology," in Proc. IEEE Int. Conf. on Neural Networks, San Francisco, CA, Mar.28-Apr.1, 1993, pp. 392-398.
[6] T. Sakai, M. Nagao, and S. Fujibayashi, "Line extraction and pattern detection in a photograph," Pattern Recognition, vol.1, no.3, pp.233-248, 1969.
[7] P. Viola and M. Jones, "Robust real-time face detection," Int. Journal of Computer Vision, vol.57, no.2, pp.137-154, 2004.
[8] F. Crow, "Summed-area tables for texture mapping," ACM SIGGRAPH Computer Graphics, vol.18, no.3, pp.207-212, 1984.
[9] Y. Freund and R. Schapire, "A desicion-theoretic generalization of on-line learning and an application to boosting," Computer and System Sciences, vol.55, no.1, pp.119-139, 1995.
[10] H. Li, Z Lin, X. Shen, J. Brandt, and G. Hua, ′′A convolutional neural network cascade for face detection,′′ in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, Jun.7-12, 2015, pp. 5325-5334.
[11] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol.23, no.10, pp.1499-1503, 2016.
[12] M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neurosicence, vol. 3, no.1, pp.71-86, 1991.
[13] C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol.20, no.3, pp.273-297, 1995.
[14] T. Ahonen, A. Hadid, and M. Pietikainen, "Face recognition with local binary patterns," in Proc. European Conf. on Computer Vision (ECCV), Prague, Czech Republic, May 11-14, 2004, vol.3021, pp.469-481.
[15] K. Grauman and T. Darrell, "The pyramid match kernel: Discriminative classification with sets of image features," in Proc. IEEE Conf. on Computer Vision, Beijing, China, Oct.17-21, 2005, vol.2, pp.1458-1465.
[16] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, " Deepface: Closing the gap to human-level performance in face verification," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, OH, Jun.23-28, 2014, pp.1701-1708.
[17] F. Schroff, D. Kalenichenko, and J. Philbin, "Facenet: A unified embedding for face recognition and clustering," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, Jun.7-12, 2015, pp.815-823.
[18] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.39, Is.6, pp.1137-1149, 2016.
[19] R. Girshick, "Fast R-CNN," in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, Dec.11-18, 2015, pp.1440-1448.
[20] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Proc. Int. Conf. on Learning Representations (ICLR), San Diego, CA, May 7-9, 2015.
[21] M. Zeiler and R. Fergus, "Visualizing and understanding convolutional neural networks," in Proc. European Conf. on Computer Vision (ECCV), Zurich, Switzerland, Sep.6-12, 2014, pp.818-833.
[22] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
[23] C. Szegedy, S. Ioffe, V. Vanhoucke, A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning," in Proc. of The Thirty-First AAAI Conf. on Artificial Intelligence, San Francisco, CA, Feb.4-9, 2017, pp.4278-4284.
[24] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.7-12, 2015, pp.1-9.
[25] Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult," IEEE Trans. on Neural Networks, vol.5, Is.2, pp.157-166, 1994.
[26] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, "Tensorflow: a system for large-scale machine learning," in Proc. USENIX Symposium on Operating Systems Design and Implementation (OSDI), Savannah, GA, Nov.2-4,2016, pp.265-283.
[27] S. Yang, P. Luo, C. Loy, and X. Tang, "Wider face: A face detection benchmark," in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.5525-5533.
[28] Y. Guo, L. Zhang, Y. Hu, X. He, J. Gao, "Ms-celeb-1m: A dataset and benchmark for large-scale face recognition," in Proc. European Conf. on Computer Vision (ECCV), Amsterdam, Netherlands, Oct.11-14, 2016, pp.87-102.
[29] D. Yi, Z. Lei, S. Liao, and S. Li, "Learning face representation from scratch," arXiv preprint arXiv:1411.7923, 2014.
[30] G. Huang, M. Mattar, T. Berg, E. Learned-Miller, "Labeled faces in the wild: A database forstudying face recognition in unconstrained environments," Technical Report 07-49, Dept. of Computer Science, University of Massachusetts, Amherst, MA, 2007.

指導教授

曾定章(Din-Chang Tseng)

審核日期

2018-7-31

推文