摘要(英) |
Polyester films is a kind of synthetic plastic that can be used in the outer packaging of many products; for example, equipment packaging, food packaging, freight packaging, pharmaceutical packaging, electronic component packaging, etc. In addition to diversified applications, polyester film also needs to be considered. The quality of the products is traditionally inspected by human eyes during the production process of polyester film. However, the production process of polyester film has been continuously moving forward. There are many uncertain factors in the manual inspection of the mobile production process; for example, eyes Fatigue, unfamiliar newcomers, difficulty in judging the size of flaws while moving, etc. In this research, we propose a polyester films defect classification system based on convolution neural network. This system will first use the original network model and improve the convolutional neural network after evaluation. For example, reduce the model, expand the model, increase the number of training, error analysis, etc., and then use SinGAN to make up for the insufficient number of data samples to cause the imbalance of the data, so that the system has defects in the classification of polyester films; for example, bubble, hole, crystal, mosquito, lack, spray, carbon, and hoard, can have better precision, recall, and accuracy. Therefore, this system can be used for product inspection instead of artificial vision check to maintain product quality and stability.
This paper is divided into five parts: The first part is to use the three original network models of VGG16, GoogLeNet and DenseNet for training. The three models with the best classification results will proceed to the next step. The second part is to reduce the model. Conditional restrictions are carried out based on the use of DenseNet. The purpose is to understand in detail in the classification process of polyester films in the convolutional neural network, low-level features, high-level features or model depth affect the model the key to accuracy, the result is the optimal modification direction. The third part is the augmentation model, which deepens the number of layers for the convolutional neural network to improve the classification results. The fourth part is to use SinGAN to generate flawed samples, in order to avoid overfitting of the training model when the number of flawed samples is too small to maintain high accuracy. The fifth part is to increase the number of training and error analysis, increase the stability of the model, and check whether the original defective sample has manual classification errors, so as to improve the accuracy, recall and accuracy of classification.
In the experiment, we collected images of defects produced during the production of polyester film. There are eight types of defects, including carbon: 581 sheets, bubble: 428 sheets, mosquito: 32 sheets, spray: 234 sheets, lack: 98 sheets, holes: 28 sheets, hoard: 430 sheets, and crystal: 460 sheets, total is 2291 sheets, and the image resolution is 224×224. After SinGAN is not used to generate defect samples and the data is enhanced, the training sample is 6414 sheets, the test sample is 2750 sheets, total is 9164 sheets, the accuracy rate of VGG16 is 88.84%, the accuracy rate of GoogLeNet is 69.93%, and the accuracy rate of DenseNet is 91.38%. While using SinGAN to generate flawed samples, the mosquito data is enhance to 266 sheets, the training samples are 7070, the test samples are 3030, and the total is 10100. After using SinGAN, the accuracy rate is 96.57%, an increase of 5.19%. The improved DenseNet accuracy rate is 100%, which is 8.62% higher than the original DenseNet accuracy rate. The execution speed is 275 frames per second, and the model size is 45.5MB. |
參考文獻 |
[1] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556.
[2] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.7-12, 2015, pp.1-9.
[3] G. Huang, Z. Liu, L. V. D. Maaten and K. Q. Weinberger, ′′Densely connected convolutional networks,′′ in Proc. IEEE Conf. on Pattern Recognition and Computer Vision (CVPR), Honolulu, Hawaii, Jul.22-25, 2017, pp.4700-4708.
[4] A. Krizhevsky, L. Sutskever and G. E.Hinton, "ImageNet classification with deep convolutional neural networks," in NIPS′12: Proc. of the 25th Int. Conf. on Neural Information Processing Systems - Volume 1, Lake Tahoe, Nevada, Dec.3-6, 2012, pp.1097-1105.
[5] M. Zeiler and R.Fergus, "Visualizing and understanding convolutional networks," in Proc. European Conf. on Computer Vision (ECCV), Zürich, Switzerland, Sep.6-12, 2014, pp.818-833.
[6] K. He, X. Zhang, S. Ren, J. Sun, "Deep residual learning for image recognition ," in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
[7] J. Hu, L. Shen, G. Sun, "Squeeze-and-excitation networks," in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, Jun.18-23, 2018, pp.7132-7141.
[8] M.D. Bloice, C. Stocker, and A. Holzinger, "Augmentor: An Image Augmentation Library for Machine Learning," arXiv: 1708.04680.
[9] Z. Zhang, M. Sabuncu, "Generalized cross entropy loss for training deep neural networks with noisy labels," arXiv:1805.07836.
[10] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, Jun.23-28, 2014, pp.580-587.
[11] J. Uijlings, K. Sande, T. Gevers, and A. Smeulders, "Selective search for object recognition," Int. Journal of Computer Vision (IJCV), vol.104, is.2, pp.154-171, 2013.
[12] L. Andreone, F. Bellotti, A. D. Gloria, and R. Lauletta, ′′SVM-based pedestrian recognition on near-infrared images,′′ in Proc. 4th IEEE Int. Symp. on Image and Signal Processing and Analysis, Torino, Italy, Sep.15-17, 2005, pp.274-278.
[13] R. Girshick, "Fast R-CNN," in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, Dec.11-18, 2015, pp.1440-1448.
[14] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real- time object detection with region proposal networks," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.39, is.6, pp.1137-1149, 2016.
[15] K. He, X. Zhang, S. Ren, and J. Sun, ′′Spatial pyramid pooling in deep convolutional networks for visual recognition,′′ IEEE Trans. Pattern Analysis and Machine Intelligence, vol.37, is.9, pp.1904-1916, 2015.
[16] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask R-CNN," in Proc. IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, Oct.22-29, 2017, pp. 2980-2988.
[17] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A.C. Berg, "SSD: Single shot multi-box detector," in Proc. European Conf. on Computer Vision (ECCV), Amsterdam, Holland, Oct.8-16, 2016, pp.21- 37.
[18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: unified, real-time object detection," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp.779- 788.
[19] S. Han, H. Mao, and W. J. Dally, "Deep compression:compressing deep neural networks with pruning trained quantization and huffman coding," in Proc. Int. Conf. Learn. Represent (ICLR), San Juan, Puerto Rico, May.2-4, 2016, pp.1-14.
[20] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnornet: imagenet classification using binary convolutional neural networks," in Proc. European Conf. on Computer Vision (ECCV), Amsterdam, Netherlands, Oct.11-14, 2016, pp.525-542.
[21] M. Lin, Q. Chen, and S. Yan, ′′netwok in network,′′ in Proc. Int. Conf. Learn. Represent (ICLR), Banff, Canada, Apr.14-16, 2014, pp.274-278.
[22] X. Zhang, X. Zhou, M. Lin, and J. Sun, ′′ShuffleNet: an extremely efficient convolutional neural network for mobile devices,′′ arXiv:1707.01083.
[23] N. Ma, X. Zhang, H. Zheng, and J. Sun, ′′ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design,′′ arXiv: 1807.11164.
[24] A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, ′′Mobilenets: efficient convolutional neural networks for mobile vision applications,′′ arXiv:1704.04861.
[25] G. Hinton, O. Vinyals, and J. Dean, ′′Distilling the knowledge in a neural network,′′ arXiv: 1503.02531.
[26] F. Chollet, ′′Xception: deep learning with depthwise deparable convolutions,′′ in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition(CVPR), Honolulu, Hawaii, Jul.22-25, 2017, pp.1800-1807.
[27] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, ′′Generative adversarial nets,′′ arXiv:1406.2661.
[28] A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," arXiv:1511.06434.
[29] T. R. Shaham, T. Dekel, and T. Michaeli, "SinGAN: Learning a generative model from a single natural image," arXiv:1905.01164.
[30] X. Shen, Y. Chen, X. Tao, and J. Jia, "Convolutional neural pyramid for image processing," arXiv: 1704.02071. |