博碩士論文 107522066 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:113 、訪客IP:18.117.254.51
姓名 凌浩維(Hao-Wei Ling)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 以熵最小化之無監督領域自適應用於圖像語義分割
(Entropy-based Unsupervised Domain Adaptation for Semantic Image Segmentation)
相關論文
★ 基於內容感知與語義分割圖的圖像轉換用於修復圖像
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2025-7-24以後開放)
摘要(中) 近年來,深度學習的可用性和多樣性的蓬勃發展,無論在圖像識別、語音識別、自然語言處理、影像生成、無人車等領域,深度學習都取得了優異的成果。然而深度學習大多都是採用監督式學習,需要有大量的訓練樣本及相對應的標註才能訓練出效果優良的模型。但在實際應用中取得訓練樣本及標註的往往成為主要的難題。而在圖像語義分割領域中,需要大量手動標記像素等級的標註,既昂貴而且費時。訓練與測試域的數據分布往往也存在很大的差距,導致運行時嚴重的性能損失。這些問題使訓練上變得更為困難。
因此本論文採用遷移學習的方式來降低標記的成本及改善性能損失的問題。遷移學習的核心概念就是要從以前的任務當中去學習知識或經驗,並應用於新的任務當中。而我們要解決的問題屬於遷移學習的一個分支稱為無監督領域自適應(Unsupervised Domain Adaptation),UDA方法是藉由有標註的合成資料集上的學到的資訊推廣到無標註的真實資料集上。使用從電腦遊戲中自動蒐集語義標註的合成的資料集,例如: GTA5、SYNTHIA,可以減少大量的手動標記圖像的時間,在測試推廣到真實資料集的效能越好,訓練與測試域的分布差距越小。
本篇論文提出了一個整合多種UDA方法的架構,首先,本文將利用生成對抗網路(GAN)作為數據增強的方法。其次,使用基於像素預測的半監督式熵損失,採用直接最小化熵與間接最小化熵兩個方法,直接最小化熵會使用最大平方損失搭配圖像加權比重,解決類別不平衡的問題;間接最小化會透過在自我資訊圖上引入對抗損失,加強局部語義之間的空間訊息。在實驗結果中,驗證本論文提出的網路架構在GTA5→Cityscapes、SYNTHIA→Cityscapes這兩個具有挑戰性的合成到真實的案例都具有效的遷移效果。
摘要(英) In recent years, the availability and diversity of the deep learning have flourished, especially in terms of image recognition, speech recognition, natural language processing, image generation, unmanned vehicles, and more. However, most of the deep learning methods belong to the field of supervised learning. In supervised learning, in order to get a good performance model, it needs an amount set of labeled data that acts as the orientation for data training and testing exercises. The major challenge in practical applications is to collect the training samples and correspond labeling. And, massive amounts of pixel-level annotations are essential for the field of image semantic segmentation. However, data annotation is a task that requires a lot of manual work, which is expensive and time-consuming. Another challenge, it is often a large gap between the distribution of data feature in the training and testing domains that leads to a serious loss of performance. The above questions will make the training process more difficult.
Therefore, this paper proposes a method which based on transfer learning in the field of machine learning in order to reduce the cost of labeling and improve the problem of losing performance. The kernel concept of transfer learning is to simulate the process models of human learning and design a model can apply knowledge learned previously to solve the new problems better. And, the task in this paper belongs to a branch of transfer learning called Unsupervised Domain Adaptation (UDA), which adapts features in different domains with similar tasks. The automatic collection of semantic annotated synthetic datasets from computer games, such as GTA5、SYNTHIA, can really reduce the labeling effort. Through the better transfer learning effect also can demonstrate the smaller gap between feature space of training and testing domains.
Above all, we propose an architecture by integrates multiple UDA methods. First, this paper adopts Generative Adversarial Networks (GAN) as a method of data augmentation. After that, a semi-supervised entropy loss is used for pixel-based prediction. There are two methods used for the loss function: direct minimization entropy and indirect minimization entropy. The method of Direct minimization merges the maximum squared loss function and the weighted proportion of the image for solving the unbalance category. And the method of Indirect minimization integrates the adversarial loss function onto the self-information map for enhancing spatial information between local semantics. In the experimental results, the network architecture proposed in this paper that in training on synthetic images and used the real image to test, including GTA5→Cityscapes and SYNTHIA→Cityscapes to real cases, achieves effective transfer performance.
關鍵字(中) ★ 域自適應
★ 遷移學習
★ 圖像語義分割
★ 深度學習
關鍵字(英) ★ Domain adaptation
★ Transfer learning
★ Semantic Image Segmentation
★ Deep learning
論文目次 摘要 i
Abstract ii
目錄 iv
圖目錄 vi
表目錄 viii
第一章 緒論 1
1.1 研究動機及目的 1
1.2 論文架構 4
第二章 相關文獻 5
2.1 Semantic Image Segmentation 5
2.2 Unsupervised Domain Adaptation 6
2.3 UDA for Semantic Image Segmentation 8
第三章 研究方法與系統架構 14
3.1 Proposed System Overview 14
3.2 Symbol Define 19
3.3 Image-to-Image Translation 20
3.4 Semantic Image Segmentation 22
3.5 Direct Entropy Minimization 26
3.6 Indirect Entropy Minimization 29
3.7 Ensemble Learning 32
第四章 實驗結果 33
4.1 資料集 33
4.1.1 GTA5 Dataset(Grand Theft Auto V Dataset) 33
4.1.2 SYNTHIA Dataset 35
4.1.3 CITYSCAPES Dataset 37
4.2 實驗數據分析 40
4.2.1 測試指標 40
4.2.2 訓練的環境以及模型參數 41
4.2.3 模型的遷移結果的分析 41
4.2.4 基於圖像風格轉換的分析 45
4.2.5 圖像語義分割網路超參數學習 48
4.2.6 不同的遷移方法的消融實驗 49
4.3 遷移結果分析 51
4.3.1 合成圖像轉換真實圖像風格 51
4.3.2 模型遷移前後圖像語義分割結果比對 53
4.3.3 不同的遷移方法的結果比對 56
第五章 結論以及未來工作 59
參考文獻 60
參考文獻 [1] Long, Jonathan, Evan Shelhamer and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3431-3440, 2015.
[2] Badrinarayanan, Vijay, Alex Kendall and Roberto Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp.2481-2495, 2017.
[3] Ronneberger, Olaf, Philipp Fischer and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” International Conference on Medical Image Computing and Computer-assisted Intervention, pp.234-241, 2015.
[4] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy and Alan L Yuille. “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, ” IEEE transactions on Pattern Analysis and Machine Intelligence, pp.834-848, 2018.
[5] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio, “Generative adversarial nets,” Conference on Neural Information Processing Systems, pp.1-9, 2014.
[6] Simonyan, Karen and Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition, ” International Conference on Learning Representations, pp.1-14, 2015.
[7] Kaiming He , Xiangyu Zhang , Shaoqing Ren and Jian Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.770-778, 2016.
[8] Chen, Minghao, Hongyang Xue and Deng Cai, “Domain Adaptation for Semantic Segmentation with Maximum Squares Loss,” Proceedings of the IEEE International Conference on Computer Vision, pp.2090-2099, 2019.
[9] Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord and Patrick Perez, “ADVENT: adversarial entropy minimization for domain adaptation in semantic segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2517-2526, 2019.
[10] Shannon and Claude E, “A mathematical theory of communication,” The Bell System Technical Journal, pp.379-423, 1948.
[11] Wei-Chih Hung , Yi-Hsuan Tsai , Xiaohui Shen , Zhe Lin , Kalyan Sunkavalli , Xin Lu and Ming-Hsuan Yang, “Scene parsing with global context embedding,” Proceedings of the IEEE International Conference on Computer Vision, pp.2650-2658,2017.
[12] Hengshuang Zhao , Jianping Shi , Xiaojuan Qi , Xiaogang Wang and Jiaya Jia, “Pyramid scene parsing network,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2631-2639, 2017.
[13] Fisher Yu and Vladlen Koltun, “Multi-scale context aggregation by dilated convolutions,” International Conference on Learning Representations , pp.1-13, 2016.
[14] Stephan R. Richter, Vibhav Vineet, Stefan Roth and Vladlen Koltun, “Playing for data: Ground truth from computer games,” European Conference on Computer Vision, pp.102-118, 2016.
[15] German Ros , Laura Sellart , Joanna Materzynska , David Vazquez and Antonio M. Lopez, “The SYNTHIA Dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3234-3243, 2016.
[16] Yaroslav Ganin and Victor Lempitsky, “Unsupervised domain adaptation by backpropagation,” International Conference on Machine Learning, pp.1180-1189, 2015.
[17] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand and Victor Lempitsky, “Domain-adversarial training of neural networks,” The Journal of Machine Learning Research, pp.2096-2030, 2016.
[18] Mingsheng Long, Yue Cao, Jianmin Wang and Michael I. Jordan, “Learning transferable features with deep adaptation networks,” International Conference on Machine Learning, pp.97-105, 2015.
[19] Eric Tzeng, Judy Hoffman, Trevor Darrell and Kate Saenko, “Simultaneous deep transfer across domains and tasks,” Proceedings of the IEEE International Conference on Computer Vision, pp.4068-4076, 2015.
[20] Eric Tzeng, Judy Hoffman, Kate Saenko and Trevor Darrell, “Adversarial discriminative domain adaptation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.7167-7176, 2017.
[21] Mingsheng Long, Han Zhu, Jianmin Wang and Michael I. Jordan, “Unsupervised domain adaptation with residual transfer networks,” Conference on Neural Information Processing Systems, pp.136-144, 2016.
[22] Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan and Dilip Krishnan, “Unsupervised pixel-level domain adaptation with generative adversarial networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3722-3731, 2017.

[23] Judy Hoffman, Dequan Wang, Fisher Yu and Trevor Darrell, “Fcns in the wild: Pixel-level adversarial and constraint-based adaptation,” arXiv preprint arXiv:1612.02649, 2016.
[24] Jun-Yan Zhu, Taesung Park, Phillip Isola and Alexei A Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” Proceedings of the IEEE International Conference on Computer Vision, pp.2223-2232, 2017.
[25] Ming-Yu Liu, Thomas Breuel and Jan Kautz, “Unsupervised image-to-image translation networks,” Advances in Neural Information Processing Systems, pp.700-708, 2017.
[26] Xun Huang, Ming-Yu Liu, Serge Belongie and Jan Kautz, “Multimodal unsupervised image-to-image translation,” Proceedings of the European Conference on Computer Vision, pp.172-189, 2018.
[27] Judy Hoffman, Eric Tzeng, Taesung Park, Jun-Yan Zhu, Phillip Isola, Kate Saenko, Alexei Efros and Trevor Darrell, “CyCADA: Cycle-consistent adversarial domain adaptation,” International Conference on Machine Learning, pp.1989-1998, 2018.
[28] Rui Gong, Wen Li, Yuhua Chen and Luc Van Gool, “DLOW: Domain flow for adaptation and generalization,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2477-2488, 2019.
[29] Yunsheng Li, Lu Yuan and Nuno Vasconcelos, “Bidirectional learning for domain adaptation of semantic segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6936-6945, 2019.
[30] Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu and Tao Mei, “Fully convolutional adaptation networks for semantic segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6810-6818, 2018.
[31] Yang Zou, Zhiding Yu, B.V.K. Vijaya Kumar and Jinsong Wang, “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,” Proceedings of the European Conference on Computer Vision, pp.289-305, 2018.
[32] Qing Lian, Fengmao Lv, Lixin Duan and Boqing Gong, “Constructing self-motivated pyramid curriculums for crossdomain semantic segmentation: A non-adversarial approach,” Proceedings of the IEEE International Conference on Computer Vision, pp.67758-6767, 2019.
[33] Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu and Yi Yang, “Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.2507-2516, 2019.
[34] Yonghao Xu, Bo Du, Lefei Zhang, Qian Zhang, Guoli Wang and Liangpei Zhang, “Self-ensembling attention networks: Addressing domain shift for semantic segmentation,” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33, pp.5581-5588, 2019.
[35] Choi, Jaehoon, Taekyung Kim and Changick Kim, “Self-Ensembling with GAN-based Data Augmentation for Domain Adaptation in Semantic Segmentation,” Proceedings of the IEEE International Conference on Computer Vision, pp.6830-6840, 2019.
[36] Grandvalet, Yves and Yoshua Bengio, “Semi-supervised learning by entropy minimization,” Advances in Neural Information Processing Systems, pp.529-536, 2005.
[37] Springenberg and Jost Tobias, “Unsupervised and semi-supervised learning with categorical generative adversarial networks,” International Conference on Learning Representations, 2016.
[38] Himalaya Jain, Joaquin Zepeda, Patrick Pérez and Rémi Gribonval, “Subic: A supervised, structured binary code for image search,” Proceedings of the IEEE International Conference on Computer Vision, pp.833-842, 2017.
[39] Himalaya Jain, Joaquin Zepeda, Patrick Pérez and Rémi Gribonval, “Learning a complete image indexing pipeline,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4933-4941, 2018.
[40] Mingsheng Long, Han Zhu, Jianmin Wang and Michael I. Jordan, “Unsupervised domain adaptation with residual transfer networks,” Advances in Neural Information Processing Systems, pp.136-144, 2016.
[41] Takeru Miyato, Toshiki Kataoka, Masanori Koyama and Yuichi Yoshida, “Spectral normalization for generative adversarial networks,” International Conference on Learning Representations, ,2018.
[42] Ankit Dixit, Radovan Kavicky and Apeksha Jain, “Ensemble Machine Learning,” Birmingham: Packt Publishing, 2017.
[43] Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He and Piotr Dollar, “Focal loss for dense object detection,” Proceedings of the IEEE International Conference on Computer Vision, pp.2980-2988, 2017.
[44] Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu and Tao Mei, “Fully convolutional adaptation networks for semantic segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.6810-6818 2018.
[45] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth and Bernt Schiele, “The cityscapes dataset for semantic urban scene understanding,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3213-3223, 2016.
指導教授 范國清 高巧汶(Kuo-Chin Fan) 審核日期 2020-7-24
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明