基於內容感知與語義分割圖的圖像轉換用於修復圖像

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：71

、訪客IP：3.145.186.60

姓名

楊舜丞(Shun-Cheng Yang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於內容感知與語義分割圖的圖像轉換用於修復圖像
(Inpainting image with image to image translation based on contextual attention and semantic segmentation map)

相關論文

★ 以熵最小化之無監督領域自適應用於圖像語義分割

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-29以後開放)

摘要(中)

近年來，隨著深度學習技術的崛起，無論影像辨識、推薦系統、視覺應用、自然語言處理與自動駕駛等領域上深度學習都展現了優異的效能。在圖形識別與視覺應用交集的領域，特別是在圖像編輯(image editing)和圖像生成(image generation)技術都蓬勃發展，圖像修復(image inpainting)則是合併兩者技術下發展出的應用，修復指的是將圖像缺陷的部分(missing region)依照背景或周圍的圖像，針對圖像損失的部分進行修復或是重建(reconstruction)。在應用面上，除了修復舊圖像之外，也可以運用在眨眼修復、身分混淆或為清除圖像上不想或不需要的物件後補上合理的內容等應用範疇。以清除圖像上物件後再進行修補為例，可拆分成多個階段進行處理：首先需拆解圖像內物件的分佈組成；其次，針對特定物件進行刪除，也就是所謂的編輯階段，最後則是針對缺失即已刪除的部分進行內容生成。
因此，本論文提出了一個圖像編輯系統，藉由滑鼠的座標位置可以得知物件在label map以及instance map對應的像素值，取得物件在圖上所涵蓋的區域位置，以此作為物件刪除範圍的基準，而在生成階段則以Deepfill為基礎網路架構，結合內容感知網路(contextual attention)透過提取一塊patch鄰近的背景區域，推估出修復區域的顏色數值，但若直接將推估出的顏色數值作為pix2pixHD架構的圖像轉換(image to image translation)輸入，則容易導致生成破碎物件或甚至不成物體只有一堆雜訊。所以，為了解決這個問題，我們會先針對推估出的顏色數值執行顏色對映程序，進行實例分割標註的整合，在影像生成出的結果可以是更完整且合理的內容。

摘要(英)

In recent years, with the rise of deep learning technology, deep learning has demonstrated excellent performance in image recognition, recommendation systems, visual applications, natural language processing, and autonomous driving. In the field of intersection of pattern recognition and visual applications, image inpainting is developed by combining image editing and image generation. Image inpainting refers to repairing or reconstructing image-missing part according to background or surrounding image. In practical applications, it not only repairs old images, but is also used to remove unwanted or unnecessary objects on an image such as blink repairing, or identity confusion. In order to achieve the repair goal, image inpainting can be divided into three stages : first, distribution and composition of objects in and image need to be disassembled; second, specific objects are deleted, which is called editing stage; Finally, content is generated for the missing or deleted part.
In this paper, we propose an image editing system. Pixel’s position of an object on label map and instance map can be obtained by the coordinate of a mouse cursor, and obtained position of an area which is covered by the object as basis of range for deletion. In the generation stage, Deepfill is used as a basic network architecture and combines with contextual attention by extracting a patch of adjacent background areas to estimate color value of repaired areas. However, the estimated color value directly input as image to image translation of pixel2pixelHD architecture, it is easy to cause a broken object or even a lot of noise. Therefore, in order to solve this problem, we conduct a color mapping procedure on estimated color values, and integrate instance segmentation and labeling. The result of generated image can be more complete and reasonable.

關鍵字(中)

★ 卷積神經網路
★ 圖像轉換
★ 圖像修復

關鍵字(英)

★ convolutional neural network
★ image-to-image translation
★ image inpainting

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vii
第一章緒論 1
1.1 研究背景與動機 1
1.2 研究目的 3
1.3 論文架構 3
第二章相關研究 4
2.1 卷積神經網路(Convolutional Neural Networks) 4
2.1.1 ResNet(Residual Network) 4
2.1.2 ResNeXt 6
2.2 生成對抗網路(Generative Adversarial Networks) 9
2.3 圖像轉換 10
2.3.1 pix2pix 11
2.3.2 pix2pixHD 12
2.4 圖像修復 14
第三章研究方法 19
3.1 系統架構 19
3.2 資料輸入 20
3.2.1 Cityscapes 20
3.2.2 語義分割圖標註內容 23
3.3 內容感知修復網路 24
3.3.1 Inpainting 25
3.3.2 Label Mapping 26
3.4 pix2pixHD-MSF Model 27
3.4.1 多尺度特徵萃取與融合(Multi-Scale Feature) 28
3.4.2 網路架構 29
3.5 編輯系統 30
第四章實驗設計與結果 32
4.1 實驗環境與數據分析 32
4.1.1 評估指標 32
4.1.2 實驗一：Inpainting 34
4.1.3 實驗二：成像質量比較 35
4.1.4 實驗三：不同裁剪修復比較 39
4.1.5 實驗四：修復結果 40
第五章結論與未來展望 42
參考文獻 43

參考文獻

[1] Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu and Bin Xiao, “Deep High-Resolution Representation Learning for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp.1-16, 2020.
[2] Bowen Cheng, Maxwell D. Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam and Liang-Chieh Chen, “Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.12475-12485, 2020.
[3] Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang and Hanqing Lu, “Dual attention network for scene segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3146-3154, 2019.
[4] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff and Hartwig Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” Proceedings of the European Conference on Computer Cision, pp.801-818, 2018.
[5] Hengshuang Zhao, Yi Zhang, Shu Liu, Jianping Shi, Chen Change Loy, Dahua Lin and Jiaya Jia, “Psanet: Point-wise spatial attention network for scene parsing,” Proceedings of the European Conference on Computer Vision, pp.267-283, 2018.
[6] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin and Aaron Courville, “Improved training of wasserstein gans,” Advances in neural information processing systems, pp.5767-5777, 2017.
[7] Takeru Miyato, Toshiki Kataoka, Masanori Koyama and Yuichi Yoshida. “Spectral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018.
[8] Xing Di, Vishwanath A. Sindagi and Vishal M. Patel, “Gp-gan: Gender preserving gan for synthesizing faces from landmarks,” International Conference on Pattern Recognition, pp.1079-1084. IEEE, 2018.
[9] Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang and Xiaodong He, “Attngan: Fine-grained text to image generation with attentional generative adversarial networks.” Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1316-1324, 2018.
[10] Antreas Antoniou, Amos Storkey and Harrison Edwards, “Data augmentation generative adversarial networks,” arXiv preprint arXiv:1711.04340, 2017
[11] Han Zhang, Ian Goodfellow, Dimitris Metaxas and Augustus Odena. “Self-attention generative adversarial networks,” International Conference on Machine Learning pp.7354-7363, 2019.
[12] Yantao Lu, Burak Kakillioglu and Senem Velipasalar, “Autonomously and Simultaneously Refining Deep Neural Network Parameters by a Bi-Generative Adversarial Network Aided Genetic Algorithm,” arXiv preprint arXiv:1809.10244, 2018.
[13] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz and Bryan Catanzaro, “High-Resolution Image Synthesis and Semantic Manipulation With Conditional GANs,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.8798-8807, 2018.
[14] Jun-Yan Zhu, Taesung Park, Phillip Isola and Alexei A. Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” Proceedings of the IEEE international conference on computer vision, pp.2223-2232, 2017.
[15] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu and Thomas S. Huang, “Generative Image Inpainting With Contextual Attention,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.5505-5514, 2018.
[16] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu and Kaiming He. “Aggregated Residual Transformations for Deep Neural Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1492-1500, 2017.
[17] Liang-Chieh Chen, George Papandreou, Florian Schroff and Hartwig Adam, “Rethinking Atrous Convolution for Semantic Image Segmentation,” arXiv preprint arXiv:1706.05587, 2017.
[18] LeCun Y, Bottou L, Bengio Y and Haffner P, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, pp.2278-2324, 1998.
[19] Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp.770-778, 2016.
[20] Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet, Pierre, Reed, Scott, Anguelov, Dragomir, Erhan, Dumitru, Vanhoucke, Vincent and Rabinovich, Andrew, “Going deeper with convolutions,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1-9, 2015
[21] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio, “Generative adversarial nets,” Advances in neural information processing systems, pp.2672-2680, 2014.
[22] Mehdi Mirza and Simon Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
[23] Ian Goodfellow, Mehdi Mirza, Aaron Courville and Yoshua Bengio. “Multi-prediction deep Boltzmann machines,” Advances in Neural Information Processing Systems, pp.548-556, 2013.
[24] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang and Wenzhe Shi, “Photo-realistic single image super-resolution using a generative adversarial network.” Proceedings of the IEEE conference on computer vision and pattern recognition, pp.4681-4690, 2017.
[25] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele and Honglak Lee, “Generative adversarial text to image synthesis,” arXiv preprint arXiv:1605.05396, 2016.
[26] Emily Denton, Soumith Chintala, Arthur Szlam and Rob Fergus, “Deep generative image models using a laplacian pyramid of adversarial networks,” Advances in neural information processing systems, pp.1486-1494, 2015.
[27] Alec Radford, Luke Metz and Soumith Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
[28] Emily Denton, Soumith Chintala, Arthur Szlam and Rob Fergus, “Deep generative image models using a laplacian pyramid of adversarial networks,” Advances in neural information processing systems, pp.1486-1494, 2015.
[29] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou and Alexei A. Efros, “Image-To-Image Translation With Conditional Adversarial Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1125-1134, 2017.
[30] Olaf Ronneberger, Philipp Fischer and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” International Conference on Medical image computing and computer-assisted intervention, pp.234-241. Springer, Cham, 2015.
[31] Alexey Dosovitskiy and Thomas Brox, “Generating images with perceptual similarity metrics based on deep networks,” Advances in Neural Information Processing Systems, 2016.
[32] Leon A. Gatys, Alexander S. Ecker and Matthias Bethge, “Image style transfer using convolutional neural networks,” IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[33] Justin Johnson, Alexandre Alahi and Li Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” Proceedings of the European Conference on Computer Vision, 2016.
[34] Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng and Shuicheng Yan, “Perceptual generative adversarial networks for small object detection,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1222-1230, 2017.
[35] Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell and Alexei A. Efros, “Context Encoders: Feature Learning by Inpainting.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2536-2544, 2016.
[36] Chao Yang, Xin Lu, Zhe Lin, Eli Shechtman, Oliver Wang and Hao Li, “High-resolution image inpainting using multi-scale neural patch synthesis.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6721-6729, 2017.
[37] Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao and Bryan Catanzaro, “Image inpainting for irregular holes using partial convolutions,” Proceedings of the European Conference on Computer Vision, pp.85-100, 2018.
[38] Hongyu Liu, Bin Jiang, Yi Xiao and Chao Yang, “Coherent semantic attention for image inpainting,” Proceedings of the IEEE International Conference on Computer Vision, pp.4170-4179, 2019.
[39] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth and Bernt Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3213-3223, 2016.
[40] Jonathan Long, Evan Shelhamer and Trevor Darrell, “Fully Convolutional Networks for Semantic Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3431-3440, 2015.
[41] Fisher Yu and Vladlen Koltun, “Multi-scale context aggregation by dilated convolutions,” International Conference on Learning Representations, pp.1-13, 2016.

指導教授

范國清高巧汶(Kuo-Chin Fan)

審核日期

2020-7-30

推文