摘要: | 近年來,在卷積神經網路 (convolutional neural network, CNN) 的發展帶動下,人臉偵測 (face detection) 和人臉辨識 (face recognition) 取得很大的進步;多種獨特且新穎的神經網路架構被提出以解決各種人臉偵測與辨識的問題。不同應用需要不同的架構,像是過海關只需要確認人臉;但在監視或門禁系統上,則大都需要在大畫面中先偵測人臉,再辨識人臉。 我們提出了一個結合人臉偵測和人臉辨識的卷積神經網路架構,人臉偵測是使用類似 Faster R-CNN 中的 RPN 架構先提出可能是人臉的候選區域,再透過一個從粗到細 (coarse-to-fine) 的階層卷積神經網路 (cascaded CNN) 來確認這些候選區域是否真的是人臉。我們使用 RPN 架構取代原本滑動窗口的提出候選區域方法,避免每一個位置、每一種大小都一一去試而花費太多時間,改用 RPN 後每張 1920x1080 影像的偵測只需 0.08 秒,相較於改動前的 0.18 秒,速度有明顯提升,而偵測效果則保持和改動前差不多。 完成人臉偵測後,我們接著使用 FaceNet 來提取辨識用的特徵。因為損失函數的定義方式,兩張臉分別得到的特徵間之距離就可以直接反應兩張臉的相似度;也就是說,我們可以只透過求特徵之間的距離就完成分類,不需要額外再使用複雜的分類器,這也使我們的系統就算更換辨識目標,也不用重新訓練網路參數。而我們的網路辨識準確率達到 97%,雖然相比需要重新訓練的網路準確率稍微低了一點,但是考慮到不用重新訓練的方便性,我們認為得到的好處明顯大於損失的準確率。 ;In recent years, thanks to the development of CNN (convolutional neural network), researchers have made great progress on face detection and face recognition. Many unique and novel network structures have been proposed to solve different face detection or recognition problems. To use which network structure depends on the application, for example, we only need to perform face recognition on an image with only one face at customs. However, in monitoring or access control system, we need to perform face detection first to find where faces are and then recognize every faces. We propose a CNN structure which combines face detection and face recognition. We use the RPN structure from Faster R-CNN to propose candidate regions which may be faces. We then use a coarse-to-fine cascaded CNN to check each candidate regions and filter out the regions which are not faces. By using RPN structure instead of using sliding widow to propose candidate region, we can avoid checking regions in every sizes and at every places one by one. The system needs only 0.08 seconds with RPN structure, compared to 0.18 seconds with sliding window method, we get better execution speed, and the detection capability remains nearly the same. After finishing face detection, we then use FaceNet to extract features for recognition. Due to the definition of the loss function, the distance between two feature vectors extracted from two facial images can reflect the similarity of the two facial images. That is, we can recognize faces by only calculate the distance between feature vectors without using any complex classifiers, which allows us to use the same recognition system in different situations. The recognition accuracy of the proposed method can reach 97%, which is slightly lower than the methods that need to be retrained. However, considering the convenience of using the same recognition system without retraining, we think it’s still a great deal. |