摘要: | 近年來,深度學習的可用性和多樣性的蓬勃發展,無論在圖像識別、語音識別、自然語言處理、影像生成、無人車等領域,深度學習都取得了優異的成果。然而深度學習大多都是採用監督式學習,需要有大量的訓練樣本及相對應的標註才能訓練出效果優良的模型。但在實際應用中取得訓練樣本及標註的往往成為主要的難題。而在圖像語義分割領域中,需要大量手動標記像素等級的標註,既昂貴而且費時。訓練與測試域的數據分布往往也存在很大的差距,導致運行時嚴重的性能損失。這些問題使訓練上變得更為困難。 因此本論文採用遷移學習的方式來降低標記的成本及改善性能損失的問題。遷移學習的核心概念就是要從以前的任務當中去學習知識或經驗,並應用於新的任務當中。而我們要解決的問題屬於遷移學習的一個分支稱為無監督領域自適應(Unsupervised Domain Adaptation),UDA方法是藉由有標註的合成資料集上的學到的資訊推廣到無標註的真實資料集上。使用從電腦遊戲中自動蒐集語義標註的合成的資料集,例如: GTA5、SYNTHIA,可以減少大量的手動標記圖像的時間,在測試推廣到真實資料集的效能越好,訓練與測試域的分布差距越小。 本篇論文提出了一個整合多種UDA方法的架構,首先,本文將利用生成對抗網路(GAN)作為數據增強的方法。其次,使用基於像素預測的半監督式熵損失,採用直接最小化熵與間接最小化熵兩個方法,直接最小化熵會使用最大平方損失搭配圖像加權比重,解決類別不平衡的問題;間接最小化會透過在自我資訊圖上引入對抗損失,加強局部語義之間的空間訊息。在實驗結果中,驗證本論文提出的網路架構在GTA5→Cityscapes、SYNTHIA→Cityscapes這兩個具有挑戰性的合成到真實的案例都具有效的遷移效果。 ;In recent years, the availability and diversity of the deep learning have flourished, especially in terms of image recognition, speech recognition, natural language processing, image generation, unmanned vehicles, and more. However, most of the deep learning methods belong to the field of supervised learning. In supervised learning, in order to get a good performance model, it needs an amount set of labeled data that acts as the orientation for data training and testing exercises. The major challenge in practical applications is to collect the training samples and correspond labeling. And, massive amounts of pixel-level annotations are essential for the field of image semantic segmentation. However, data annotation is a task that requires a lot of manual work, which is expensive and time-consuming. Another challenge, it is often a large gap between the distribution of data feature in the training and testing domains that leads to a serious loss of performance. The above questions will make the training process more difficult. Therefore, this paper proposes a method which based on transfer learning in the field of machine learning in order to reduce the cost of labeling and improve the problem of losing performance. The kernel concept of transfer learning is to simulate the process models of human learning and design a model can apply knowledge learned previously to solve the new problems better. And, the task in this paper belongs to a branch of transfer learning called Unsupervised Domain Adaptation (UDA), which adapts features in different domains with similar tasks. The automatic collection of semantic annotated synthetic datasets from computer games, such as GTA5、SYNTHIA, can really reduce the labeling effort. Through the better transfer learning effect also can demonstrate the smaller gap between feature space of training and testing domains. Above all, we propose an architecture by integrates multiple UDA methods. First, this paper adopts Generative Adversarial Networks (GAN) as a method of data augmentation. After that, a semi-supervised entropy loss is used for pixel-based prediction. There are two methods used for the loss function: direct minimization entropy and indirect minimization entropy. The method of Direct minimization merges the maximum squared loss function and the weighted proportion of the image for solving the unbalance category. And the method of Indirect minimization integrates the adversarial loss function onto the self-information map for enhancing spatial information between local semantics. In the experimental results, the network architecture proposed in this paper that in training on synthetic images and used the real image to test, including GTA5→Cityscapes and SYNTHIA→Cityscapes to real cases, achieves effective transfer performance. |