人臉動畫化 : 臉部關鍵點辨識與感知色彩距 離之特徵加權循環生成對抗網路

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：22

、訪客IP：18.188.205.249

姓名

羅士倫(Shih-Lun Lo) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

人臉動畫化 : 臉部關鍵點辨識與感知色彩距離之特徵加權循環生成對抗網路
(Face Animation: Feature Weighted CycleGAN With Facial Landmark Recognition and Perceptual Color Distance)

相關論文

★ 影片指定對象臉部置換系統	★ 以單一攝影機實現單指虛擬鍵盤之功能
★ 基於視覺的手寫軌跡注音符號組合辨識系統	★ 利用動態貝氏網路在空照影像中進行車輛偵測
★ 以視訊為基礎之手寫簽名認證	★ 使用膚色與陰影機率高斯混合模型之移動膚色區域偵測
★ 影像中賦予信任等級的群眾切割	★ 航空監控影像之區域切割與分類
★ 在群體人數估計應用中使用不同特徵與回歸方法之分析比較	★ 以視覺為基礎之強韌多指尖偵測與人機介面應用
★ 在夜間受雨滴汙染鏡頭所拍攝的影片下之車流量估計	★ 影像特徵點匹配應用於景點影像檢索
★ 自動感興趣區域切割及遠距交通影像中的軌跡分析	★ 基於回歸模型與利用全天空影像特徵和歷史資訊之短期日射量預測
★ Analysis of the Performance of Different Classifiers for Cloud Detection Application	★ 全天空影像之雲追蹤與太陽遮蔽預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-19以後開放)

摘要(中)

隨著近年日本動畫的熱潮，透過虛擬動畫角色與觀眾互動的新興行業 Vtuber 有非常大的商業潛力，然而，角色的建模過程十分複雜，本篇論文將以生成對抗網路進行人臉風格轉換解決此問題。而人臉影像風格轉換在電腦視覺領域中是一項困難的任務，動畫人臉與真人人臉在結構上具有明顯的差異，如何將兩者間進行轉換，同時保留相似特徵非常具有挑戰性。
在架構上，我們以 U-GAT-IT 模型為基礎，修改其中標準化方法以得到更多特徵資訊，並且提出 Facial Landmark Loss 計算生成影像與真人五官位置誤差，來幫助模型學到更精確的五官位置，而對於 U-GAT-IT 本身色彩偏差問題，我們則使用可微分之 CIEDE2000 色差公式作為損失函數來得到更加符合人眼色彩感知的影像。
在評估上，由於現階段沒有合理的指標足以評估動畫角色的真實程度，我們提出 Fréchet Anime Inception Distance 計算在高維空間中生成動畫影像與真實動畫影像在分佈上的距離，藉此來了解生成動畫影像品質的好壞。
最後，根據實驗結果與使用者表單回饋，我們所提出的方法在多項指標上，均有較好的表現。

摘要(英)

With the boom of Japanese animation in recent years, the emerging industry of Vtuber, which interacts with the audience through virtual animation characters, has great commercial potential. However, the process of creating character model is complicated. With the significant difference between human face and anime face, image style conversion is a difficult task in the field of computer vision. In this paper, we will solve the problem of face style conversion by generative adversarial network.
Our model is based on U-GAT-IT and modify the normalization function to obtain more feature information. To make the face feature position of anime face similar to human face, we propose Facial Landmark Loss to calculate the error between the generated image and real human face image. Because of the obvious color deviation of images of U-GAT-IT, we introduced Perceptual Color Loss into loss function.
Since there is no reasonable metrics to evaluate the realism of the animated images, we propose Fréchet Anime Inception Distance to calculate the distance between the distribution of the generated animated images and the real animated images in high- dimensional space, so as to understand the quality of the generated animated images.
According to the experimental results and user feedback, our proposed method has a better performance in many metrics.

關鍵字(中)

★ 生成對抗網路
★ 動畫人臉風格轉換

關鍵字(英)

★ Generative Adversarial Network
★ Anime Face Style Transfer

論文目次

摘要................................................................i
Abstract............................................................ii
致謝..............................................................iii
目錄...............................................................iv
圖目錄.............................................................vi
表目錄............................................................vii
第一章緒論.........................................................1
1.1 研究背景與動機.............................................................................................1
1.2 論文架構.........................................................................................................3
第二章相關研究.....................................................4
2.1 資料集.............................................................................................................4
2.1.1 Selfie2Anime ........................................................................................4
2.1.1 Annotated Facial Landmarks in the Wild (AFLW)..............................4
2.2 生成對抗網路.................................................................................................5
2.3 CycleGAN........................................................................................................6
2.4 Class Activation Map.......................................................................................8
2.5 Convolutional Pose Machines........................................................................10
2.6 CIEDE2000....................................................................................................11
2.7 PatchGAN Discirminator...............................................................................16
第三章研究方法....................................................17
3.1 資料集...........................................................................................................17
3.2 模型架構.......................................................................................................18
3.2.1 生成模型架構....................................................................................18
3.2.1.1 Adaptive Point-wise Layer Instance Normalization
(AdaPoLIN) .........................................................................................18
3.2.1.2 架構.........................................................................................19
3.2.2 鑑別模型架構....................................................................................24
3.3 損失函數.......................................................................................................28
3.3.1 Facial Landmark Loss.........................................................................28
3.3.2 Perceptual Color Loss.........................................................................30
3.3.3 Adversarial Loss .................................................................................31
3.3.4 Cycle loss............................................................................................31
3.3.5 Identity loss.........................................................................................32
3.3.6 CAM Loss...........................................................................................32
3.3.7 Total Loss............................................................................................33
第四章實驗結果....................................................34
4.1 設備環境設定...............................................................................................34
4.2 資料集...........................................................................................................35
4.3 驗證指標.......................................................................................................36
4.3.1 Fréchet Anime Inception Distance (FAID).........................................36
4.3.2 Perceptual Color Distance...................................................................38
4.3.3 Facial Landmark Distance ..................................................................39
4.3.4 人臉風格轉換在相似度上的評估標準問題....................................40
4.3.5 使用者表單........................................................................................41
4.4 完整模型之實驗比較結果...........................................................................42
4.4.1 定性實驗結果....................................................................................42
4.4.2 定量實驗結果....................................................................................43
4.4.3 實驗結果分析....................................................................................44
4.5 消融實驗(Ablation Experiments)................................................................46
4.5.1 加入 Perceptual Color Loss 之影響................................................46
4.5.2 加入 Adaptive Point-wise Layer Instance Normalization
(AdaPoLIN)之影響 .....................................................................................48
4.5.3 加入 Facial Landmark Loss 之影響................................................50
第五章結論與未來研究方向..........................................52
參考文獻...........................................................53

參考文獻

[1] T. Nakajo, "Live2D," Live2D Inc., 31 7 2006. [Online]. Available: https://www.live2d.com/en/.
[2] "Animaze by Facerig | Custom Avatars | Create Your own Avatar," Holotech Studios, Inc., [Online]. Available: https://www.animaze.us/.
[3] Holger Winnemöller, Sven C. Olsen, Bruce Gooch , "Real-time video abstraction," ACM Transactions on Graphics, Volume 25, pp. 1221-1226, 1 7 2006.
[4] Alexei A. Efros, Thomas K. Leung, "Texture Synthesis by Non-Parametric Sampling," in Proceedings of the International Conference on Computer Vision (ICCV), 1999.
[5] Aaron Hertzmann, Charles E. Jacobs, Nuria Oliver, Brian Curless, David H. Salesin , "Image analogies," in Proceedings of the 28th annual conference on Computer graphics and interactive techniques, 2001.
[6] Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, "Image Style Transfer Using Convolutional Neural Networks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[7] Justin Johnson, Alexandre Alahi, Li Fei-Fei , "Perceptual Losses for Real-Time Style Transfer and Super-Resolution," in European Conference on Computer Vision (ECCV), 2016.
[8] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros , "Image-to-Image Translation with Conditional Adversarial Nets," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[9] Jun-Yan Zhu*, Taesung Park*, Phillip Isola, and Alexei A. Efros. (* indicates equal contributions), "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks," in IEEE International Conference on Computer Vision (ICCV), 2017.
[10] Junho Kim, Minjae Kim, Hyeonwoo Kang, Kwang Hee Lee, "U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation," in International Conference on Learning Representations, 2020.
[11] Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba, "Learning Deep Features for Discriminative Localization," in IEEE Conference on
60
Computer Vision and Pattern Recognition (CVPR), 2016.
[12] Bing Li, Yuanlue Zhu, Yitong Wang, Chia-Wen Lin, Bernard Ghanem, Linlin Shen, "AniGAN: Style-Guided Generative Adversarial Networks for Unsupervised Anime Face Generation," IEEE Transactions on Multimedia, pp. 4077-4091, 20 9 2021.
[13] Shih-En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh, "Convolutional Pose Machines," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[14] Zhengyu Zhao, Zhuoran Liu, Martha Larson, "Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance," in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[15] Heusel, Martin; Ramsauer, Hubert; Unterthiner, Thomas; Nessler, Bernhard; Hochreiter, Sepp, "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium," in Advances in Neural Information Processing Systems 30, 2017.
[16] Martin Koestinger, Paul Wohlhart, Peter M. Roth and Horst Bischof, "Annotated Facial Landmarks in the Wild: A Large-scale, Real-world Database for Facial Landmark Localization," in Proc. First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies, 2011.
[17] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, "Generative Adversarial Nets," in Advances in Neural Information Processing Systems 27, 2014.
[18] G. Sharma, W. Wu, and E. N. Dalal, "The CIEDE2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations," Color Research & Application, pp. 21-30, 2005.
[19] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros, "Image-To-Image Translation With Conditional Adversarial Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[20] Min Lin, Qiang Chen, Shuicheng Yan, "Network In Network," in International Conference on Learning Representations (ICLR), 2014.
[21] Jonathan Long, Evan Shelhamer, Trevor Darrell, "Fully convolutional networks for semantic segmentation," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[22] H. Martin, "Which color differencing equation should be used," International Circular of Graphic Education and Research 6, pp. 20-33, 2013.
[23] Sergey Ioffe, Christian Szegedy, "Batch normalization: accelerating deep 61

network training by reducing internal covariate shift," in Proceedings of the 32nd International Conference on Machine Learning, 2015.
[24] Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky, "Instance Normalization: The Missing Ingredient for Fast Stylization," arXiv preprint, arXiv:1607.08022 , 2016.
[25] Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton, "Layer Normalization," arXiv preprint, arXiv:1607.06450, 2016.
[26] nicehuster, "Github," [Online]. Available: https://github.com/nicehuster/cpm- facial-landmarks.
[27] kanosawa, "Github," [Online]. Available: https://github.com/kanosawa/anime_face_landmark_detection.
[28] Karen Simonyan, Andrew Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in International Conference on Learning Representations (ICLR), 2015.
[29] Xudong Mao, Qing Li, Haoran Xie, Raymond Y.K. Lau, Zhen Wang, Stephen Paul Smolley, "Least Squares Generative Adversarial Networks," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2794-2802, 2017.
[30] Gwern Branwen, Arfafax, Shawn Presser, Anonymous, Danbooru Community, "Anime Crop Datasets: Faces, Figures, & Hands," 2020. [Online]. Available: https://gwern.net/crop#danbooru2019-figures.
[31] Griffin, Gregory and Holub, Alex and Perona, Pietro , "Caltech-256 Object Category Dataset," California Institute of Technology, 19 4 2007. [Online]. Available: https://resolver.caltech.edu/CaltechAUTHORS:CNS-TR-2007-001.
[32] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, "Improved techniques for training GANs," in Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016.
[33] Yihao Zhao, Ruihai Wu, Hao Dong, "Unpaired Image-to-Image Translation using Adversarial Consistency Loss," in European Conference on Computer Vision (ECCV), 2020.
[34] Ori Nizan, Ayellet Tal, "Breaking the Cycle - Colleagues Are All You Need," in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[35] Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution), “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, pp. 211-252, 2015.
62

[36] Patryk Chrabaszcz, Ilya Loshchilov, Frank Hutter, “A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets,” arXiv preprint, arXiv:1707.08819, 2017.
[37] Xun Huang, Serge Belongie, "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization," in IEEE International Conference on Computer Vision (ICCV), 2017.

指導教授

鄭旭詠(HSU-YUNG CHENG)

審核日期

2023-7-20

推文