隨著科技的發展,人工智慧的技術不斷演進,從1950年代開始的各種人工智慧哲學思想,到1980年代開始機器學習技術崛起,各式各樣的人工智慧技術像是決策樹(Decision Tree)、隨機森林(Random Forest),支持向量機(Support Vector Machine),神經網路(Neural Network)等等算法被提出並且經過了不斷地改良以加強其效能。再到近十年的深度學習演算法熱潮,配合GPU或其他捲積運算加速硬體的加速,深度神經網路(Deep Neural Network)在各種任務上都獲得了顯著性的改進。 實際上的人臉辨識系統,從影像鏡頭的輸入到身分的輸出,可區分為人臉偵測(Face Detection),人臉校正(Face Alignment),特徵擷取(Feature Extraction),特徵比對(Feature Matching)四個主要任務,這些任務如果都需要以原圖輸入會相當的耗費時間。在神經網路的優化下,已經可以將人臉偵測與人臉校正整合成人臉偵測網路,由特徵金字塔(Feature Pyramid)結合錨框(Anchor Box)來定位,由神經網路的回歸層(Regression Layer)進行校正。並將特徵擷取與特徵比對整合成人臉辨識網路,藉由捲積(Convolution)運算擷取特徵,透過全連接層(Fully Connect Layer)與Softmax函數進行比對。 本論文提出一個結合特徵金字塔與三元損失子(Triplet Loss)的多任務學習方式(Multi-task Learning)來訓練一單級的人臉偵測與人臉辨識深度神經網路,僅需一個主要的骨幹網路(Backbone Network)便可同時輸出各項任務的結果,透過分享捲積網路的權重來避免各項任務的重複運算。整個網路結合特徵金字塔與錨框進行定位,並輸出藉由三元損失子訓練的人臉特徵,最後使用一單純的數學函式進行相似度比對以取得人臉辨識結果。在Nvidia RTX 2080Ti的加速下,此系統輸入640x640解析度的圖片時可以達到212FPS的速度。 ;With the development of technology, the algorithm of artificial intelligence continues to evolve. From various artificial intelligence method has been proposed began in the 1950s, to the rise of machine learning algorithm in the 1980s. Various of artificial intelligence algorithm such as decision forests, support vector machines neural networks and other algorithms have been proposed and further imporved to enhance their performance. Eventually, with the exploding of deep learning algorithms in the past decade, by using the GPU or other accelerator hardware, deep neural networks have achieved significant improvements in various tasks. A practical deep learning face recognition system can be divide into four main tasks: face detection, face alignment, feature extractor and feature matching. This task might be time-consuming if we execute each task with the original image as input data. Under the optimization of deep neural network, it is possible to integrate face detection task and face alignment task into a single detection network, localizing the face location by feature pyramid combined with anchor boxes and aligning the face position by training the regression layer of the neural network. After that, the feature extraction task and feature matching task can be combined by using convolution to extract the face feature and full connection layer with softmax function to match the person identification. In this paper, we propose a multi-task training method based on feature pyramid and triplet loss to train a single-stage face detection and face recognition deep neural network. Every task’s data is pass through the same backbone network, in order to avoid the duplicate computation by sharing the weights and computations. The whole network are established using feature pyramid and anchor boxes to localize the face position, using triplet loss to establish the feature extractor and finally matching the feature through a simple math function. On a Nvidia 2080Ti GPU accelerator, this system can achieve 212 FPS for 640x640 resolution input.