摘要(英) |
Abstract
The COVID-19 pandemic, which erupted in October 2019, prompted a widespread shift to facial recognition technology at entrances and exits, replacing manual checks. However, with the integration of masks into daily life, facial occlusion frequently led to recognition errors, undermining the effectiveness of facial recognition systems. In response to these challenges, this study focuses on facial occlusion, particularly in the context of rapid facial recognition, necessitating model compression. The investigation evaluates the impact of model compression strategies, specifically pruning, on accuracy and execution efficiency under specific conditions.
To run efficiently on constrained devices such as smartphones and embedded systems, where computational power and storage are limited, model compression becomes crucial. The three main compression methods include pruning, quantization, and distillation, with this paper specifically delving into the pruning technique.
Using Resnet50, Vgg16, and Mobilenet models and a VggFace2 database containing 30,000 images, the study conducts tests with 100 random cases, each comprising 15 images, totaling 1500 balanced samples. The goal is to achieve better accuracy (maintain above 90%) and reduce model size across the three networks. The experiment comprises three main parts:
The first part validates the effectiveness of sequential pruning compared to single pruning. Sequential pruning, particularly non-structural pruning, proves more effective in reducing model size without significantly compromising accuracy. Analysis of eigenface feature maps is performed, resulting in pruning and accuracy improvements for Resnet50 and Vgg16, achieving 99% accuracy for both, while Mobilenet incurs a loss of some weight information, reaching 80%.
The second part involves a detailed analysis of the impact of accuracy decline in each layer on pruning ratio differences (sensitivity). Pruning ratios are determined based on sensitivity, resulting in accuracy rates of 93.93% for Resnet50 and 92.19% for Vgg16, with corresponding model sizes of 414,428KB and 14,142KB and inference times of 3.87ms and 1.9ms.
The third part addresses masked facial recognition, introducing a simulated mask image and partial facial feature test dataset. An attention-based mechanism is incorporated to further improve accuracy in multiple pruning cycles for Vgg16 and Resnet50. Accuracy rates increase from 85% and 84% to 95.13% and 96.81%, and 95.37% and 96.89%, respectively. Model sizes are 444,001KB and 16,225KB, with inference times of 1.51ms and 1.28ms.
In conclusion, this paper contributes by validating the effectiveness of multiple pruning cycles compared to a single cycle without setting sensitivity. It enhances model efficiency by determining pruning ratios based on fixed sensitivity and introduces an attention mechanism to improve accuracy, especially in facial recognition scenarios with masks. The trained models effectively reduce model size, improve inference speed, and significantly enhance overall accuracy. |
參考文獻 |
參考文獻
1. Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner. (1998). "Gradient-Based Learning Applied to Document Recognition," Proceedings of the IEEE, 86(11), 2278-2324.
2. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. (2012). "ImageNet Classification with Deep Convolutional Neural Networks," Advances in Neural Information Processing Systems, 25, 1097-1105.
3. Karen Simonyan, Andrew Zisserman. (2014). "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556.
4. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. (2016). "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.
5. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. (2017). "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861.
6. Yann LeCun, John S. Denker, Sara A. Solla. (1990). "Optimal Brain Damage," Advances in Neural Information Processing Systems, 2, 598-605.
7. Hassibi, B., & Stork, D. G. (1993). "Second order derivatives for network pruning: Optimal Brain Surgeon," Advances in Neural Information Processing Systems, 5, 164-171.
8. Song Han, Hao Zhu, William J. Dally. (2016). "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," International Conference on Learning Representations (ICLR).
9. Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2016). "Pruning Convolutional Neural Networks for Resource Efficient Inference," International Conference on Learning Representations (ICLR), Workshop Track.
10. He, Y., Zhang, X., & Sun, J. (2017). "Channel Pruning for Accelerating Very Deep Neural Networks," International Conference on Computer Vision and Pattern Recognition (CVPR), 1389-1397.
11. Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., & Zhang, C. (2017). "Learning Efficient Convolutional Networks Through Network Slimming," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2818-2826.
12. Chin, T. J., Gray, A., Razavian, A. S., Karimi, H., & Shaban, A. (2019). "Towards Efficient Model Compression via Learned Global Ranking," arXiv preprint arXiv:1905.04760.
13. Wang, T., Gong, C., Liu, X., & Tao, D. (2019). "Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4340-4349.
14. 黃慶昀(2020)。卷積神經網路於細粒度影像分類之模型剪枝評估。﹝碩士論文。國立臺北大學﹞臺灣博碩士論文知識加值系統。
15. Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H. P. (2017). "Pruning Filters for Efficient ConvNets," arXiv preprint arXiv:1608.08710.
16. Luo, J.-H., Wu, J., & Lin, W. (2017). "ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5068-5076. |