多尺度區域強化之姿態遷移用於自動人像生成

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：74

、訪客IP：3.144.90.217

姓名

陳思頴(Sih-Ying Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

多尺度區域強化之姿態遷移用於自動人像生成
(Multi-Scale Region Reinforcement on Pose Transfer for Automatic Person Image Generation)

相關論文

★ 影片指定對象臉部置換系統	★ 以單一攝影機實現單指虛擬鍵盤之功能
★ 基於視覺的手寫軌跡注音符號組合辨識系統	★ 利用動態貝氏網路在空照影像中進行車輛偵測
★ 以視訊為基礎之手寫簽名認證	★ 使用膚色與陰影機率高斯混合模型之移動膚色區域偵測
★ 影像中賦予信任等級的群眾切割	★ 航空監控影像之區域切割與分類
★ 在群體人數估計應用中使用不同特徵與回歸方法之分析比較	★ 以視覺為基礎之強韌多指尖偵測與人機介面應用
★ 在夜間受雨滴汙染鏡頭所拍攝的影片下之車流量估計	★ 影像特徵點匹配應用於景點影像檢索
★ 自動感興趣區域切割及遠距交通影像中的軌跡分析	★ 基於回歸模型與利用全天空影像特徵和歷史資訊之短期日射量預測
★ Analysis of the Performance of Different Classifiers for Cloud Detection Application	★ 全天空影像之雲追蹤與太陽遮蔽預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-14以後開放)

摘要(中)

随著人工智慧與深度學習領域的蓬勃發展，已被廣泛應用於不同領域中，
不論是在語義分析、影像識別等，都有相當顯著的貢獻。如今人工智慧的目標
不在是讓電腦擁有智慧，而是希望讓電腦也具有創造力，如寫詩、作曲、或者
是影像生成等，透過人工智慧，無中生有，創造出無限潛能。本篇論文提供一
個姿態遷移系統，藉由人物圖像與目標姿態，讓電腦自動生成出符合目標姿態
的人物圖像。
本論文使用了漸進式的姿態遷移生成模型架構，透過漸近式的方式將人物
圖像的姿態轉換至目標姿態。在轉換的過程中，我們提出了多尺度區域提取器
(Multi-Scale Region Extractor)，透過擷取人物影像中特定的區域位置的特徵圖，
來改善自動編碼器遺失資料訊息的問題，同時也降低了姿態遷移中斷肢的可能
性。並針對於多尺度區域特徵提取器，設計了區域風格損失函數 (Region Style
Loss)，來優化訓練生成模型的過程。最後，基於本系統的架構下，只要使用一
張人物圖像，便可以針對喜好生成出不同舞蹈風格的影片。

摘要(英)

With the vigorous development of artificial intelligence and deep learning, they
have been widely used in different fields. Whether in semantic analysis, image
recognition, etc., there are quite significant contributions. The goals of artificial
intelligence are to make computer creative, such as writing poems, composing, or
making images, making out of noting, rather than to have intelligence. This thesis
proposes a DanceGAN, which can make computer generate character images that
matches the target posture automatically.
In this thesis, we use a progressive pose transfer to generate a model architecture,
which transforms the pose of the character images to the target pose in an asymptotic
manner. In the transform process, we propose the Multi-Scale Region Extractor to
capture specific area of the character image to improve the missing data message
problems of auto encoder. We also design the Region Style Loss for Multi-Scale
Region Extractor to improve the training process of generating model. Finally, based
on the architecture of this system, we can generate different dancing style according
to your favorite using only one character image.

關鍵字(中)

★ 生成對抗網路
★ 姿態轉換
★ OpenPose

關鍵字(英)

★ Generative Adversarial Network
★ Pose Transfer
★ OpenPose

論文目次

摘要 V
Abstract VI
致謝 VII
目錄 VIII
圖目錄 X
表目錄 XI
第一章緒論 1
1.1 研究動機 1
1.2 相關文獻 2
1.3 系統架構 5
1.4 論文架構 6
第二章文獻回顧 7
2.1 DeepFashion資料集 7
2.2 VGG-19 網路模型的特徵提取器 8
2.3 圖像語義切割 9
2.4 生成網路模型 11
2.4.1 AutoEncoder 12
2.4.2 Generative Adversarial Network 13
第三章研究方法與系統程式 16
3.1 資料集 17
3.2 遷移式的生成模型 18
3.2.1 Encoder & Decoder 19
3.2.1.1 Multiple Scale Region Extractor 19
3.2.1.2 Learnable Region Normalization 21
3.2.2 Pose-Attentional Transfer Network 25
3.3 鑑別器 27
3.4 損失函數 27
第四章實驗結果 30
4.1 設備環境 30
4.2 資料集 30
4.3 驗證指標 31
4.3.1 Inception Score 31
4.3.2 SSIM（Structural Similarity） 32
4.4 方法比較 33
4.4.1 Pose-Transfer GAN v.s. DanceGAN 33
4.4.2 不同的區域提取用於MSRE 37
4.4.3 不同標準化的影響 44
4.5 速度評測 47
第五章結論與未來研究方向 48
參考文獻 49

參考文獻

[1] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. CoRR, abs/1312.6114, 2013.

[2] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proc. NIPS, pages 2672–2680, 2014.

[3] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014

[4] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proc. CVPR 2017, 2017.

[5] Christoph Lassner, Gerard Pons-Moll, and Peter V. Gehler. A generative model of people in clothing. In Proc. ICCV, pages 853–862, 2017.

[6] Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. Pose guided person image generation. In Proc. NIPS, pages 405–415, 2017

[7] Aliaksandr Siarohin, Enver Sangineto, Stephane Lathuili ere, and Nicu Sebe. Deformable gans for pose-based human image generation. CoRR, abs/1801.00055, 2018.

[8] Natalia Neverova, Riza Alp Guler, and Iasonas Kokkinos. Dense pose transfer. arXiv preprint arXiv:1809.01995, 2018.

[9] Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu1, Bofei Wang, Xiang Bai1. Progressive Pose Attention Transfer for Person Image Generation. Arxiv preprint arXiv:1904.03349v3 [cs.CV] 13 May 2019

[10] Liu, Ziwei and Luo, Ping and Qiu, Shi and Wang, Xiaogang and Tang, Xiaoou. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016.

[11] Karen Simonyan, Andrew Zisserman. Very Deep Convolutional Networks For Large-Scale Image Recognition, Published as a conference paper at ICLR 2015
[12] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille. Semantic Image Segmentation With Deep Convolutional Nets And Fully Connected CRFs, Arxiv preprint arXiv:1412.7062v4 [cs.CV] 7 Jun 2016

[13] Liang-Chieh Chen, George Papandreou, Kevin Murphy, and Alan L. Yuille, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. Arxiv preprint arXiv:1606.00915v2 [cs.CV] 12 May 2017

[14] Liang-Chieh Chen George Papandreou Florian Schroff Hartwig Adam. Rethinking Atrous Convolution for Semantic Image Segmentation, Arxiv preprint arXiv:1706.05587v3 [cs.CV] 5 Dec 2017

[15] Tim Salimans, Ian J. Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Proc. NIPS, pages 2226–2234, 2016.

[16] Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P.Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Processing, 13(4):600–612, 2004.

[17] Tao Yu, Zongyu Guo, Xin Jin, Shilin Wu, Zhibo Chen, Weiping Li, Zhizheng Zhang, Sen Liu. Region Normalization for Image Inpainting. Arxiv preprint arXiv:1911.10375v1 [cs.CV] 23 Nov 2019

[18] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation using part affinity fields. In Proc. CVPR, pages 1302–1310, 2017.

[19] Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In Proc. CVPR, 2014

[20] Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, and Jiashi Feng. Multi-view image generation from a single-view. In 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018, pages 383–391, 2018.

[21] Hao Zhu, Hao Su, Peng Wang, Xun Cao, and Ruigang Yang. View extrapolation of human body from a single image. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[22] Amit Raj, Patsorn Sangkloy, Huiwen Chang, James Hays, Duygu Ceylan, and Jingwan Lu. Swapnet: Image based garment transfer. In Computer Vision- ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XII, pages 679–695, 2018.

[23] Mihai Zanfir, Alin-Ionut Popa, Andrei Zanfir, and Cristian Sminchisescu. Human appearance transfer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[24] Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Frdo Durand, and John Guttag. Synthesizing images of humans in unseen poses. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[25] Caroline Chan, Shiry Ginosar, Tinghui Zhou, andAlexei A Efros. Everybody dance now. arXiv preprint arXiv:1808.07371, 2018.

[26] Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, and Mario Fritz. Disentangled person image generation. In IEEE Conference on Computer Vision and Pattern Recognition, 2018.

[27] Chenyang Si, Wei Wang, Liang Wang, and Tieniu Tan. Multistage adversarial losses for pose-based human image synthesis. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.

[28] Matthew Loper, Naureen Mahmood, Javier Romero,Gerard Pons-Moll, and Michael J. Black. SMPL:A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, oct 2015.

[29] Patrick Esser, Ekaterina Sutter, and Bjorn Ommer. A ¨variational u-net for conditional appearance and shape generation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 8857–8866, 2018.

[30] Ioffe, S., and Szegedy, C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.

[31] J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation. in CVPR, 2015.

[32] V. Badrinarayanan, A. Kendall, and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561, 2015.

[33] F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions. in ICLR, 2016.

[34] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. In Proc. IEEE, 1998.

[35] Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II, pages 694–711, 2016.

[36] C. J. van den Branden Lambrecht and O. Verscheure, Perceptual quality measure using a spatio-temporal model of the human visual system, in Proc. SPIE, vol. 2668, pp. 450–461, 1996.

[37] Z. Wang and A. C. Bovik, Embedded foveation image coding, IEEE Trans. Image Processing, vol. 10, pp. 1397–1410, Oct. 2001.

[38] J. Xing, An image processing model of contrast perception and discrimination of the human visual system, in SID Conference, (Boston), May 2002.

指導教授

鄭旭詠

審核日期

2020-7-20

推文