基於特徵金字塔與三元損失組之單級人臉偵測與人臉辨識神經網路

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：18

、訪客IP：3.15.4.105

姓名

紀柏廷(Po-Ting Chi) 查詢紙本館藏

畢業系所

電機工程學系

論文名稱

基於特徵金字塔與三元損失組之單級人臉偵測與人臉辨識神經網路
(A Single-Stage Face Detection and Face Recognition Deep Neural Network Based on Feature Pyramid and Triplet Loss)

相關論文

★ 即時的SIFT特徵點擷取之低記憶體硬體設計	★ 即時的人臉偵測與人臉辨識之門禁系統
★ 具即時自動跟隨功能之自走車	★ 應用於多導程心電訊號之無損壓縮演算法與實現
★ 離線自定義語音語者喚醒詞系統與嵌入式開發實現	★ 晶圓圖缺陷分類與嵌入式系統實現
★ 語音密集連接卷積網路應用於小尺寸關鍵詞偵測	★ G2LGAN: 對不平衡資料集進行資料擴增應用於晶圓圖缺陷分類
★ 補償無乘法數位濾波器有限精準度之演算法設計技巧	★ 可規劃式維特比解碼器之設計與實現
★ 以擴展基本角度CORDIC為基礎之低成本向量旋轉器矽智產設計	★ JPEG2000靜態影像編碼系統之分析與架構設計
★ 適用於通訊系統之低功率渦輪碼解碼器	★ 應用於多媒體通訊之平台式設計
★ 適用MPEG 編碼器之數位浮水印系統設計與實現	★ 適用於視訊錯誤隱藏之演算法開發及其資料重複使用考量

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著科技的發展，人工智慧的技術不斷演進，從1950年代開始的各種人工智慧哲學思想，到1980年代開始機器學習技術崛起，各式各樣的人工智慧技術像是決策樹(Decision Tree)、隨機森林(Random Forest)，支持向量機(Support Vector Machine)，神經網路(Neural Network)等等算法被提出並且經過了不斷地改良以加強其效能。再到近十年的深度學習演算法熱潮，配合GPU或其他捲積運算加速硬體的加速，深度神經網路(Deep Neural Network)在各種任務上都獲得了顯著性的改進。
實際上的人臉辨識系統，從影像鏡頭的輸入到身分的輸出，可區分為人臉偵測(Face Detection)，人臉校正(Face Alignment)，特徵擷取(Feature Extraction)，特徵比對(Feature Matching)四個主要任務，這些任務如果都需要以原圖輸入會相當的耗費時間。在神經網路的優化下，已經可以將人臉偵測與人臉校正整合成人臉偵測網路，由特徵金字塔(Feature Pyramid)結合錨框(Anchor Box)來定位，由神經網路的回歸層(Regression Layer)進行校正。並將特徵擷取與特徵比對整合成人臉辨識網路，藉由捲積(Convolution)運算擷取特徵，透過全連接層(Fully Connect Layer)與Softmax函數進行比對。
本論文提出一個結合特徵金字塔與三元損失子(Triplet Loss)的多任務學習方式(Multi-task Learning)來訓練一單級的人臉偵測與人臉辨識深度神經網路，僅需一個主要的骨幹網路(Backbone Network)便可同時輸出各項任務的結果，透過分享捲積網路的權重來避免各項任務的重複運算。整個網路結合特徵金字塔與錨框進行定位，並輸出藉由三元損失子訓練的人臉特徵，最後使用一單純的數學函式進行相似度比對以取得人臉辨識結果。在Nvidia RTX 2080Ti的加速下，此系統輸入640x640解析度的圖片時可以達到212FPS的速度。

摘要(英)

With the development of technology, the algorithm of artificial intelligence continues to evolve. From various artificial intelligence method has been proposed began in the 1950s, to the rise of machine learning algorithm in the 1980s. Various of artificial intelligence algorithm such as decision forests, support vector machines neural networks and other algorithms have been proposed and further imporved to enhance their performance. Eventually, with the exploding of deep learning algorithms in the past decade, by using the GPU or other accelerator hardware, deep neural networks have achieved significant improvements in various tasks.
A practical deep learning face recognition system can be divide into four main tasks: face detection, face alignment, feature extractor and feature matching. This task might be time-consuming if we execute each task with the original image as input data. Under the optimization of deep neural network, it is possible to integrate face detection task and face alignment task into a single detection network, localizing the face location by feature pyramid combined with anchor boxes and aligning the face position by training the regression layer of the neural network. After that, the feature extraction task and feature matching task can be combined by using convolution to extract the face feature and full connection layer with softmax function to match the person identification.
In this paper, we propose a multi-task training method based on feature pyramid and triplet loss to train a single-stage face detection and face recognition deep neural network. Every task’s data is pass through the same backbone network, in order to avoid the duplicate computation by sharing the weights and computations. The whole network are established using feature pyramid and anchor boxes to localize the face position, using triplet loss to establish the feature extractor and finally matching the feature through a simple math function. On a Nvidia 2080Ti GPU accelerator, this system can achieve 212 FPS for 640x640 resolution input.

關鍵字(中)

★ 影像處理
★ 神經網路
★ 深度學習
★ 人臉偵測
★ 人臉辨識
★ 多任務學習

關鍵字(英)

★ Image Processing
★ Neural Network
★ Deep Learning
★ Face Detection
★ Face Recognition
★ Multi-task Learning

論文目次

摘要 I
ABSTRACT II
致謝 III
1. 序論 1
1.1. 研究背景與動機 1
1.2. 論文架構 4
2. 文獻探討 5
2.1. 人臉偵測 5
2.2. 人臉辨識 8
2.3. 多任務學習 10
2.4. 人臉切割任務 13
3. 網路模型設計 15
3.1. 人臉偵測任務資料集 15
3.2. 人臉辨識資料集 17
3.3. 人臉切割資料集 18
3.4. 切割模型選擇與結果 19
3.5. 整合網路所需之虛擬資料產生方式 21
3.6. 網路設計 22
4. 單級人臉辨識訓練策略設計與過程 26
4.1. 圖片前處理 26
4.2. 網路訓練參數 27
4.3. 訓練過程 28
4.4. 網路後處理 32
4.5. 訓練環境 33
5. 網路實現與結果討論 34
5.1. 人臉偵測驗證結果 34
5.2. 人臉辨識驗證結果 36
5.3. 系統速度 38
6. 結論 40
參考文獻 41

參考文獻

[1] T. Sakai, M. Nagao and Takeo Kanade, “Computer Analysis and Classification of Photographs of Human Faces”, Proceedings of Proc. First USA-JAPAN Computer Conference, pp. 55-62, January, 1972
[2] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” IEEE Conf. Computer Vision and Pattern Recognition, San Diego, CA, USA, June 2005
[3] DG. Lowe.: “Object Recognition from Local Scale-Invariant Features.” Proceedings of the International Conference on Computer Vision, Kerkyra, Corfu, Greece, September 20-25, 1999. pp.1150–1157
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. “Rich feature hierarchies for accurate object detection and semantic segmentation.” In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014
[5] R. B. Girshick, "Fast R-CNN," In International Conference on Computer Vision, 2015.
[6] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. “SSD: Single shot multibox detector.” In ECCV, pages 21–37, 2016.
[7] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. “You Only Look Once: Unified, real-time object detection.” In IEEE Conference Computer Vision and Pattern Recognition (CVPR), 2016.
[8] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua. “A convolutional neural network cascade for face detection.” In CVPR, pages 5325–5334, 2015.
[9] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol.23, no.10, pp.1499-1503, 2016.
[10] J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia, and S. Zafeiriou. “Retinaface: Single-stage dense face localisation in the wild.” arXiv preprint arXiv:1905.00641, 2019
[11] S. Yang, P. Luo, C. C. Loy and X. Tang, "WIDER FACE: A Face Detection Benchmark," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 5525-5533, doi: 10.1109/CVPR.2016.596.
[12] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. “Focal loss for dense object detection.” In ICCV, 2017
[13] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. “Deepface: Closing the gap to human-level performance in face verification.” In Conference on Computer Vision and Pattern Recognition, 2014
[14] G. B. Huang, M. Ramesh, T. Berg, and E. L. Miller. “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments.” TR of University of Massachusetts, Amherst, Oct, 2007.
[15] F. Schroff, D. Kalenichenko and J. Philbin, "FaceNet: A unified embedding for face recognition and clustering," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 815-823, doi: 10.1109/CVPR.2015.7298682.
[16] A. Dadashzadeh, A. T. Targhi, M. Tahmasbi, M. Mirmehdi, “HGR-Net: A Fusion Network for Hand Gesture Segmentation and Recognition,” arXiv:1806.05653, 2018.
[17] Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)
[18] R. Ranjan, S. Sankaranarayanan, C. D. Castillo, and R. Chellappa, “An all-in-one convolutional neural network for face analysis,” in Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on. IEEE, 2017, pp. 17–24.
[19] Z. Liao, P. Zhou, Q. Wu and B. Ni, "Uniface: A Unified Network for Face Detection and Recognition," 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, 2018, pp. 3531-3536, doi: 10.1109/ICPR.2018.8545051.
[20] J. Long, E. Shelhamer, and T. Darrell. “Fully convolutional networks for semantic segmentation,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431– 3440, 2015.
[21] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam. “Rethinking atrous convolution for semantic image segmentation,” arXiv:1706.05587, 2017.
[22] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” arXiv:1802.02611, 2018.
[23] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in CVPR, 2016, pp. 2818–2826.
[24] V. Jain and E. Learned-Miller. “FDDB: a benchmark for face detection in unconstrained settings.” Technical Report UMCS-2010-009, University of Massachusetts, Amherst, 2010
[25] Q. Cao, L. Shen, W. Xie, O. M. Parkhi and A. Zisserman, "VGGFace2: A Dataset for Recognising Faces across Pose and Age," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi′an, 2018, pp. 67-74, doi: 10.1109/FG.2018.00020.
[26] D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. arXiv preprint arXiv:1411.7923, 2014.
[27] V. Badrinarayanan, A. Kendall and R. Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 1 Dec. 2017, doi: 10.1109/TPAMI.2016.2644615.
[28] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[29] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang and S. Z. Li, "S^3FD: Single Shot Scale-Invariant Face Detector," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 192-201, doi: 10.1109/ICCV.2017.30.

指導教授

蔡宗漢(Tsung-Han Tsai)

審核日期

2020-7-20

推文