摘要(英) |
Nowadays, with the continuous advancement of computer vision, human body skeleton detection technology based on two-dimensional images is becoming more and more mature. Therefore, more and more applications based on human skeleton detection have been developed. However, when the human body in the input image is blocked by a large object or the object’s color is similar to the human body, it will result in a significant impact on the estimation of the human skeleton. Therefore, this thesis tries to propose an algorithm based on the Generative Adversarial Network to reduce the above two major interference factors. The proposed algorithm can automatically generate the corresponding blocks that are blocked, so that the 2-D skeleton detection effect can be greatly improved.
This thesis takes the home environment as the main application scenario. In this application scenario, there are total of eight common postures in daily life that we care about and these eight postures will be the goal of subsequent analysis. Because of the home environment, the body is often easily occluded by various types of furniture, resulting in poor estimation of the human skeleton. Therefore, this thesis tries to train a generative adversarial network, so that the network can automatically generate the corresponding body image to complement the area that was originally blocked by a furniture. Via this kind of amendment, the accuracy of the skeleton detection algorithm can be further improved.
Based on the generalization performance comparisons of different people and different backgrounds, the proposed method improves the 80% misjudgment compared with the original skeleton detection algorithm. These simulation results demonstrate that the proposed algorithm can effectively solve the occlusion problem and provide a stable recovery image so as to improve the performance of the original 2-D skeleton detection algorithm. |
參考文獻 |
[1] I. Goodfellow et al., "Generative adversarial nets," in Advances in neural information processing systems, pp. 2672-2680, 2014.
[2] Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields," arXiv preprint arXiv:1812.08008, 2018.
[3] Kinect, Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Kinect [Accessed: 20-Jun-2019]
[4] H. T. Kam, "Random Decision Forest," in Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 17, no. 8, pp. 790-799, 1995.
[5] Y. Cheng, "Mean shift, mode seeking, and clustering," IEEE transactions on pattern analysis and machine intelligence, vol. 17, no. 8, pp. 790-799, 1995.
[6] Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291-7299, 2017.
[7] M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, "2d human pose estimation: New benchmark and state of the art analysis," in Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, pp. 3686-3693, 2014.
[8] Wikipedia perceptron. [Online]. Available: https://zh.wikipedia.org/wiki/感知機 [Accessed: 21-Jun-2019]
[9] 蘇木春、張孝德, 機器學習:類神經網路、模糊系統以及基因演算法則,第二版. 全華科技圖書, 民國一百零一年.
[10] "IBM Deep learning architectures." [Online]. Available: https://www.ibm.com/developerworks/library/cc-machine-learning-deep-learning-architectures/index.html.
[11] "Wikipedia Convolution Neural Network." [Online]. Available: https://en.wikipedia.org/wiki/Convolutional_neural_network.
[12] "github generative adversarial networks." [Online]. Available: https://github.com/jonbruner/generative-adversarial-networks/blob/master/gan-notebook.ipynb.
[13] "GAN Introduction." [Online]. Available: https://ithelp.ithome.com.tw/articles/10196828.
[14] Y. Yu, Z. Gong, P. Zhong, and J. Shan, "Unsupervised Representation Learning with Deep Convolutional Neural Network for Remote Sensing Images," in International Conference on Image and Graphics: Springer, pp. 97-108, 2017.
[15] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-image translation with conditional adversarial networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134, 2017.
[16] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," in Proceedings of the IEEE international conference on computer vision, pp. 2223-2232, 2017.
[17] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, "Generative image inpainting with contextual attention," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5505-5514, 2018.
[18] P. KaewTraKulPong and R. Bowden, "An improved adaptive background mixture model for real-time tracking with shadow detection," in Video-based surveillance systems: Springer, pp. 135-144, 2002. |