應用生成對抗網路於人體姿態映射與全身風格轉換之演算法;A Generative Adversarial Network-based Framework for Human Pose Mapping and Full Body Style Transformation

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/89950

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/89950

Title:	應用生成對抗網路於人體姿態映射與全身風格轉換之演算法;A Generative Adversarial Network-based Framework for Human Pose Mapping and Full Body Style Transformation
Authors:	陳逸星;Chen, Yi-Hsin
Contributors:	資訊工程學系
Keywords:	風格轉換;深度學習;生成對抗網路;影像處理;電腦視覺;Style Transfer;Deep Learning;Generative Adversarial Network;Image Processing;Computer Vision
Date:	2022-08-15
Issue Date:	2022-10-04 12:05:41 (UTC+8)
Publisher:	國立中央大學
Abstract:	在過去，對影像中人物的姿勢與動作進行轉換的這項工作，需要仰賴許多影像特效師花費大量時間進行後製。傳統的方法像是使用3D環繞攝影機來捕捉人物的動作，同時建立一個3D的動畫模型去對應人物的各個支點。隨著科技的演進，人們可以使用像是生成對抗網路(Generative Adversarial Network) 或是其餘深度學習之神經網路來幫助生成這些圖像。在生成圖像的同時，為了能夠捕捉人物的細節材質等，這些深度學習的方法常使用人物的骨架、立體的網格、身體各部位的語意分割或是使用UV座標來幫助捕捉這些細節。本論文將提出一個基於生成對抗網路的演算法，能夠重新生成一個人的各項細節至特定的姿態。本研究的演算法包含(1)使用Pix2pix網路來將圖像從骨架圖片轉至對應UV座標圖片， (2)將人物的輪廓、UV座標圖片、以及原始圖片當作輸入，使用基於StyleGAN的網路來生成人物的圖像至指定姿態。而根據本論文的實驗，本研究在使用骨架生成UV圖片的SSIM有0.932，而在姿態與風格轉換上的SSIM有0.7524，因此來證明本論文提供之演算法有一定程度之可用性。;In the past, pose re-rendering relied on skilled visual effects artists and time-consuming post-production. Traditional methods such as building 3D camera arrays to capture a human′s pose and build human keypoints to fit the animation model. Nowadays people use learning-based tools to generate images such as GAN(Generative Adversarial Network)s or other neural network frameworks. In order to capture human appearance, these methods tend to use skeleton, mesh, body part segmentation or dense UV coordinates to capture fine appearance details. In this paper, we present a framework that could re-render a person from a single source image to a specific pose. Our framework includes (1) using Pix2pix network to generate UV coordinates image from a keypoint skeleton image. (2) Take human foreground mask, UV coordinate image and original images as input, use StyleGAN network to translate a person from source to target image. According to the results of the experiments, the results of our skeleton keypoints to the UV coordinate model shows 0.932 on SSIM. And the results of our pose rerendering model shows 0.7524 on SSIM. Therefore, our framework has a certain degree of usability.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	51	View/Open

社群 sharing

Loading...