使用生成對抗學習的全卷積網路移除影像中的外嵌文字

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：65

、訪客IP：18.225.56.51

姓名

陳書恆(Shu-Heng Chen) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

使用生成對抗學習的全卷積網路移除影像中的外嵌文字
(Removing Embedded Text in Images via Fully Convolutional Networks with Generative Adversarial Learning)

相關論文

★ 適用於大面積及場景轉換的視訊錯誤隱藏法	★ 虛擬觸覺系統中的力回饋修正與展現
★ 多頻譜衛星影像融合與紅外線影像合成	★ 腹腔鏡膽囊切除手術模擬系統
★ 飛行模擬系統中的動態載入式多重解析度地形模塑	★ 以凌波為基礎的多重解析度地形模塑與貼圖
★ 多重解析度光流分析與深度計算	★ 體積守恆的變形模塑應用於腹腔鏡手術模擬
★ 互動式多重解析度模型編輯技術	★ 以小波轉換為基礎的多重解析度邊線追蹤技術(Wavelet-based multiresolution edge tracking for edge detection)
★ 基於二次式誤差及屬性準則的多重解析度模塑	★ 以整數小波轉換及灰色理論為基礎的漸進式影像壓縮
★ 建立在動態載入多重解析度地形模塑的戰術模擬	★ 以多階分割的空間關係做人臉偵測與特徵擷取
★ 以小波轉換為基礎的影像浮水印與壓縮	★ 外觀守恆及視點相關的多重解析度模塑

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

影像加上文字是網路上最普遍被使用的媒介之一。舉例來說，網民會製作大量的梗圖 (memes) 使用在許多的目的上。然而在某些情況下，這些外加的文字會破壞影像的美觀而且增加其他應用的難度，像是場景的辨識、物體的分類…等。因此，本研究主要的目標是提出一個能夠自動清除影像中外嵌文字並補全影像的系統。
隨著新世代電腦技術的發展，深度學習技術可以應用在影像處理技術上並且表現優於傳統的影像處理方法。在我們提出的系統中，為了獲得更佳的結果，我們利用最新的深度學習框架，建立了兩個模組：文字遮罩生成模組和影像補全模組。文字遮罩生成模組用來自動偵測給定影像中的嵌入文字，再輸出對應的遮罩。影像補全模組則是將受汙染的影像和對應的遮罩影像作為輸入，然後產生修補後的影像。
我們透過實驗與兩種已經成熟發展的非深度學習的影像修補技術進行比較。結果顯示我們提出的方法比傳統的影像修復技術，修補後的影像更自然且更少瑕疵。

摘要(英)

An image embedded by texts is one of the most common 2D media in the web; for example, the netizen produce lots of this kind pictures or memes for different purposes. In some situations, the added texts make a beauty picture into a garbage. For example, we cannot use the image for some other purposes, such as scene recognition, object classification, …, etc. Therefore, in this study, we aim to propose a system that can clean texts automatically on a given image and inpaint or restore the image.
With novel generation of computer technology, the deep learning architecture can be applied on the inpainting problem and perform better results than several traditional methods. In the proposed system, we construct two modules using the latest and novel deep learning frameworks to get a great result. The first module, mask generation module, is used for detecting the embedded texts in a given image automatically and products the corresponding bitmap image mask. The second module, image completion module, can inpaint the corrupt images based on the given mask image.
In the experiments, we compare our results with two fully developed and without deep learning technique methods. We show that the proposed method can provide more natural and less flawed results than the classic image inpainting methods provided.

關鍵字(中)

★ 影像修復
★ 深度學習
★ 生成對抗網路

關鍵字(英)

★ image inpainting
★ deep learning
★ generative adversarial network

論文目次

Abstract i
Table of Contents ii
List of Figures iv
List of Tables vi
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 System overview 2
1.3 Thesis organization 4
Chapter 2 Related Works 5
2.1 Image inpainting 5
2.1.1 Diffusion-based methods 6
2.1.2 Examplar-based methods 6
2.1.3 Others 7
2.2 Deep learning 8
2.2.1 Convolutional neural networks 8
2.2.2 Fully convolutional networks 9
2.2.3 Generative adversarial nets 9
Chapter 3 Methods 11
3.1 System overview 11
3.1.1 Mask generation module 12
3.1.2 Image completion module 14
3.1.3 Overall Architecture 16
3.2 Training 17
3.2.1 Loss functions 17
3.2.2 Learning algorithm 18
Chapter 4 Experiments 20
4.1 Dataset 20
4.1.1 Build training dataset 20
4.1.2 Preprocessing 22
4.2 Environment setting 22
4.3 Results 23
4.3.1 Results on mask generation module 23
4.3.2 Results on image completion module 26
Chapter 5 Evaluation and Comparison 29
Chapter 6 Conclusion and Future Works 34
References 35

參考文獻

[1] C. Guillemot and O. LeMeur, “Image inpainting: overview and recent advances,” IEEE Signal Processing Magazine, vol.31, no.1, pp.127-144, 2014.
[2] M. Bertalmio, G. Sapiro, V. Caselles, et al., “Image inpainting,” in Proc. ACM SIGGRAPH Conf., New Orleans, LA, Sep.23-28, 2000, pp.417-424.
[3] T. F. Chan and J. Shen, “Nontexture inpainting by curvature-driven diffusions,” Journal of Visual Communication and Image Representation, vol.12, no.4, pp.436-449, 2001.
[4] A. Telea, “An image inpainting technique based on the fast marching method,” Journal of Graphics Tools, vol.9, no.1, pp.23-34, 2004.
[5] C. Ballester, M. Bertalmio, V. Caselles, et al., “Filling-in by joint interpolation of vector fields and gray levels,” IEEE Trans. Image Processing, vol.10, no.8, pp.1200-1211, 2001.
[6] T. Chan and J. Shen, “Local inpainting models and tv inpainting,” SIAM Journal on Applied Mathematics, vol.62, no.3, pp.1019-1043, 2001.
[7] A. Levin, A. Zomet, and Y. Weiss, “Learning how to inpaint from global image statistics,” in Proc. IEEE In. Conf. on Computer Vision, Nice, France, Oct.13-16, 2003, pp.305-312.
[8] L. Y. Wei and M. Levoy, “Fast texture synthesis using tree-structured vector quantization,” in Proc. ACM SIGGRAPH Conf., New Orleans, LA, Sep.23-28, 2000, pp.479-488.
[9] M. Ashikhmin, “Synthesizing natural textures,” in Proc. ACM SIGGRAPH Conf. on Symp. Interactive 3D Graphics, Research Triangle Park, NC, Mar.19-21, 2001, pp.217-226.
[10] A. A. Efros and W. T. Freeman, “Image quilting for texture synthesis and transfer b1 b2 random placement input texture,” in Proc. ACM SIGGRAPH Conf., Los Angeles, CA, Aug.12-17, 2001, pp.341-346.
[11] V. Kolmogorov and R. Zabih, “What energy functions can be minimized via graph cuts?,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.26, no.2, pp.147-159, 2004.
[12] P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM Trans. Graphics, vol.22, no.3, pp.313-318, 2003.
[13] A. Bugeau, M. Bertalmío, V. Caselles, et al., “A comprehensive framework for image inpainting,” IEEE Trans. Image Processing, vol.19, no.10, pp.2634-2645, 2010.
[14] J. Liu, P. Musialski, P. Wonka, et al., “Tensor completion for estimating missing values in visual data,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.35, no.1, pp.208-220, 2013.
[15] D. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol.52, no.4, pp.1289-1306, 2006.
[16] M. Elad, J. L. Starck, P. Querre, et al., “Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA),” Applied and Computational Harmonic Analysis, vol.19, no.3, pp.340-358, 2005.
[17] M. Elad and M. Aharon, “Image denoising via sparse and redundant representation over learned dictionaries,” IEEE Trans. Image Processing, vol.15, no.12, pp.3736-3745, 2006.
[18] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image restoration,” IEEE Trans. Image Processing, vol.17, no.1, pp.53-69, 2008.
[19] M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing, 1st ed., New York, NY, Springer Publishing Company, Incorporated, New York, NY, 2010.
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Advances In Neural Information Processing Systems, Lake Tahoe, NV, Dec.3-6, 2012, pp.1097-1105.
[21] Y .LeCun, B. E. Boser, J. S. Denker, et al., “Handwritten digit recognition with a back-propagation network,” in Proc. Advances in Neural Information Processing Systems, Lake Tahoe, NV, Nov.26-29, 1990, pp.396-404.
[22] Y. LeCun, L. Bottou, Y. Bengio, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol.86, no.11, pp.2278-2323, 1998.
[23] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol.20, no.3, pp.273-297, 1995.
[24] N. Srivastava, G. Hinton, A. Krizhevsky, et al., “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol.15, no.1, pp.1929-1958, 2014.
[25] M. Zeiler, D. Krishnan, G. Taylor, et al., “Deconvolutional networks for feature learning,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, Jun.13-18, 2010, pp.2528-2535.
[26] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Proc. European Conf. on Computer Vision, Zurich, Switzerland, Sep.8-11, 2014, pp.818-833.
[27] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, Jun.7-12, 2015, pp.3431-3440.
[28] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Boston, MA, Jun.7-12, 2015, pp.427-436.
[29] I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial nets,” in Proc. Advances in Neural Information Processing Systems, Montreal, Quebec, Canada, Dec.8-13, 2014, pp.2672-2680.
[30] D. Pathak, J. Donahue, and A. A. Efros, “Context encoders : feature learning by inpainting,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, Jun.26-Jul.1, 2016, pp.2536-2544.
[31] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: a deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, Preprint, 2017.
[32] S. Iizuka, E. Simo-Serra, and H. Ishikawa, “Globally and locally consistent image completion,” ACM Trans. Graphics, vol.36, no.4, p.107:1-107:14, 2017.
[33] J. T. Springenberg, A. Dosovitskiy, T. Brox, et al., “Striving for simplicity: the all convolutional net,” in Proc. Int. Conf. for Learning Representations (workshop track), May 7-9, 2015.
[34] F.Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proc. Int. Conf. for Learning Representations, San Diego, CA, May 2-4, 2016.
[35] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. for Learning Representations, San Diego, CA, May 7-9, 2015.
[36] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proc. Int. Conf. on Machine Learning, Lille, France, Jul.6-11, 2015, pp.448-456.
[37] R. Yeh, C. Chen, T. Y. Lim, et al., “Semantic image inpainting with perceptual and contextual losses,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, Jul.21-26, 2017.
[38] D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” in Proc. Int. Conf. for Learning Representations., San Diego, CA, May 7-9, 2015.
[39] T. Salimans, I. Goodfellow, W. Zaremba, et al., “Improved techniques for training gans,” in Proc. Advances in Neural Information Processing Systems, San Diego, CA, Dec.5-10, 2016, pp.2234-2242.
[40] B. C. Russell, A. Torralba, K. P. Murphy, et al., “Labelme: a database and web-based tool for image annotation,” Int. Journal of Computer Vision, vol.77, no.1-3, pp.157-173, 2008.
[41] Jia Deng, Wei Dong, R. Socher, et al., “Imagenet: a large-scale hierarchical image database,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Miami, FL, Jun.20-25, 2009, pp.248-255.
[42] Y. Lin, J. B. Michel, E. L. Aiden, et al., “Syntactic annotations for the google books ngram corpus,” in Proc. ACL Conf. System Demonstrations, Jeju Island, Korea, Jul.9-11, 2012, pp.169-174.
[43] M. Abadi, A. Agarwal, P. Barham, et al., Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, Technique, Report on tensorflow.org, 2015.
[44] C. Barnes, E. Shechtman, A. Finkelstein, et al., “Patchmatch: a randomized correspondence algorithm for structural image editing,” ACM Trans. Graphics, vol.28, no.3, p.24:1-24:11, 2009.

指導教授

曾定章

審核日期

2017-8-22

推文