多尺度特徵融合之姿勢遷移用於自動人像生成

、線上人數：67

、訪客IP：3.15.21.48

姓名	陳暐恩(Wei-En Chen) 查詢紙本館藏	畢業系所	資訊工程學系
論文名稱	多尺度特徵融合之姿勢遷移用於自動人像生成 (Multi-Scale Feature Fusion on Pose Transfer for Automatic Person Image Generation)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] 至系統瀏覽論文 (2026-7-27以後開放)
摘要(中)	人體姿勢轉移已應用於許多領域，例如用於人員重新識別的數據增強、動作識別、視頻合成和視頻編輯。然而，這仍然是一個具有挑戰性的問題，即模型必須具有生成新圖像的能力，同時保持與源圖像相同的體型和服裝，尤其是在源圖像人物姿勢和目標圖像姿勢完全不同的情況下。在潛在空間進行取樣來生成或編輯出全新的影像是目前人工智慧最受歡迎的應用之一，本論文開發出一套姿勢遷移系統，讓機器可以藉由人物圖像與目標姿態生成符合目標姿態的圖像。本篇論文提出了一個多尺度特徵融合姿勢轉移網路架構，融合不同尺度的特徵圖以豐富特徵資訊，並且透過漸進式的方式來彌補人物圖像在遷移時造成的資訊損失。本論文採用 Market-1501 資料集進行訓練以及測試，與之前的工作相比，我們的網絡在客觀定量分數方面表現出優異的性能。有效降低姿勢轉移中背景所帶來的影響，以及生成更細緻的圖像。在未來的研究中，希望可以改善遮蔽物對整體圖像的表現，對於衣服上的細節部分也需再進一步的優化。
摘要(英)	Human pose transfer has been applied into many fields, such as data augmentation for person re-identification, action recognition, video synthesis and video editing. However, this still a challenging problem that the model must have the ability to generate a new image while maintaining the same body shape and clothing as the source image, especially in the case the source and target pose are quite different. Sampling in the latent space to create new images or edit existing images is currently one of the most popular applications of AI. This paper has developed a pose transfer system, which can make computer generate character images that matches the target pose automatically. This paper proposes a multi-scale feature fusion pose transfer model architecture, which fuses feature maps of different scales to enrich feature information, and uses a progressive method to compensate for the loss of information caused by the image transfer. This paper uses the Market-1501 dataset for training and testing. Compared with previous work, our network shows excellent performance in objective quantitative scores. Effectively reduce the influence of the background in the pose transfer, and generate more detailed facial images. In future research, it is hoped that the overall image performance of the mask can be improved, and the details of the clothes need to be further optimized.
關鍵字(中)	★ 姿勢遷移 ★ 生成對抗網路 ★ OpenPose ★ 多尺度模型	關鍵字(英)	★ pose transfer ★ generative adversarial network、 ★ openpose ★ multi-scale modeling
論文目次	中文摘要 i 英文摘要 ii 目錄 iii 圖目錄 v 表目錄 vii 第一章緒論 1 1.1 研究動機 1 1.2 研究目的 3 1.3 論文架構 4 第二章相關文獻 4 2.1 生成網路模型 5 2.1.1 AutoEncoder 6 2.1.2 U-net 7 2.1.3 生成對抗網路 8 2.1.4 圖像轉換 9 2.2 ResNet 11 2.3 Attention Mechanism 13 2.4 多尺度模型(Multiscale model) 16 2.4.1多尺度特徵融合網絡 16 第三章研究方法與系統架構 20 3.1 Proposed System Overview 20 3.2 Symbol Define 22 3.3 Generator 23 3.3.1 Encoder 23 3.3.2 MPATN 24 3.3.3 Decoder 27 3.4 Discriminator 27 3.5 損失函數(Loss function) 28 3.6 訓練細節 29 第四章實驗結果 30 4.1 資料集 30 4.2 開發工具與環境 31 4.3 驗證指標 32 4.3.1 Inception Score 32 4.3.2 SSIM 33 4.4 方法比較 34 4.4.1 與先進方法比較 34 4.4.2 不同尺度特徵融合的結果 41 4.4.3不同的MPATB數量結果 44 4.5 失敗的結果 47 第五章結論 49 參考文獻 50
參考文獻	[1] Ma et al. "Pose guided person image generation," arXiv preprint arXiv:1705.09368, 2017. [2] Isola et al. "Image-to-image translation with conditional adversarial networks," Proceedings of the IEEE conference on computer vision and pattern recognition, 2017. [3] Ma et al. "Disentangled person image generation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. [4] Esser et al. "A variational u-net for conditional appearance and shape generation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. [5] Siarohin et al. "Deformable GANs for pose-based human image generation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. [6] Pumarola et al. "Unsupervised person image synthesis in arbitrary poses," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. [7] Kingma et al. "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013. [8] Goodfellow et al. "Generative adversarial networks," arXiv preprint arXiv:1406.2661, 2014. [9] Natalia Neverova et al. “Dense pose transfer,” arXiv preprint arXiv:1809.01995, 2018. [10] L. Ma et al. "Progressive pose attention transfer for person image generation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. [11] He et al. "Deep residual learning for image recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. [12] Barlow et al. "Unsupervised learning," Neural computation, pp. 295-311, 1989. [13] Ronneberger et al. "U-net: Convolutional networks for biomedical image segmentation," International Conference on Medical image computing and computer-assisted intervention, 2015. [14] Noh et al. "Learning deconvolution network for semantic segmentation," Proceedings of the IEEE international conference on computer vision, 2015. [15] Johnson et al. "Perceptual losses for real-time style transfer and super-resolution," European conference on computer vision, 2016. [16] Ledig et al. "Photo-realistic single image super-resolution using a generative adversarial network," Proceedings of the IEEE conference on computer vision and pattern recognition, 2017. [17] Ma et al. "Exemplar guided unsupervised image-to-image translation with semantic consistency," arXiv preprint arXiv:1805.11145, 2018. [18] Mirza et al. "Conditional generative adversarial nets," arXiv preprint arXiv:1411.1784, 2014. [19] Odena et al. "Conditional image synthesis with auxiliary classifier GANs," International conference on machine learning, 2017. [20] Yu et al. "Free-form image inpainting with gated convolution," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. [21] Zhu et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks," Proceedings of the IEEE international conference on computer vision, 2017. [22] Mirza et al. "Conditional generative adversarial nets," arXiv preprint arXiv:1411.1784, 2014. [23] Hochreiter et al. "The vanishing gradient problem during learning recurrent neural nets and problem solutions," International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, pp. 107-116, 1998. [24] Hu et al. "Squeeze-and-excitation networks," Proceedings of the IEEE conference on computer vision and pattern recognition, 2018. [25] Woo et al. "CBAM: Convolutional block attention module," Proceedings of the European conference on computer vision, 2018. [26] Krizhevsky et al. "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, pp. 1097-1105, 2012. [27] Szegedy et al. "Rethinking the inception architecture for computer vision," Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. [28] Chen et al. "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587, 2017. [29] Li et al. "Scale-aware trident networks for object detection," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. [30] Zhao et al. "Pyramid scene parsing network," Proceedings of the IEEE conference on computer vision and pattern recognition, 2017. [31] Chen et al. "Big-little net: An efficient multi-scale feature representation for visual and speech recognition." arXiv preprint arXiv:1807.03848, 2018. [32] Long et al. "Fully convolutional networks for semantic segmentation," Proceedings of the IEEE conference on computer vision and pattern recognition, 2015. [33] Andriluka et al. "2d human pose estimation: New benchmark and state of the art analysis," Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, 2014. [34] Cao et al. "Realtime multi-person 2d pose estimation using part affinity fields," Proceedings of the IEEE conference on computer vision and pattern recognition, 2017. [35] Ma et al. "Multi-layers feature fusion of convolutional neural network for scene classification of remote sensing," IEEE Access, 2019. [36] Van den Branden Lambrecht et al. "Perceptual quality measure using a spatiotemporal model of the human visual system," Digital Video Compression: Algorithms and Technologies, 1996. [37] Simonyan et al. "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014. [38] Wang et al. "Image quality assessment: from error visibility to structural similarity," IEEE transactions on image processing, pp. 600-612, 2004. [39] Salimans et al. "Improved techniques for training gans," arXiv preprint arXiv:1606.03498, 2016. [40] Li et al. "Dense intrinsic appearance flow for human pose transfer," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. [41] Chen et al. "PMAN: Progressive Multi-Attention Network for Human Pose Transfer," IEEE Transactions on Circuits and Systems for Video Technology, 2021.
指導教授	范國清高巧汶(Kuo-Chin Fan Chiao-Wen Kao)	審核日期	2021-8-2
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 108522069 詳細資訊