dc.description.abstract | In recent years, the availability and diversity of the deep learning have flourished, especially in terms of image recognition, speech recognition, natural language processing, image generation, unmanned vehicles, and more. However, most of the deep learning methods belong to the field of supervised learning. In supervised learning, in order to get a good performance model, it needs an amount set of labeled data that acts as the orientation for data training and testing exercises. The major challenge in practical applications is to collect the training samples and correspond labeling. And, massive amounts of pixel-level annotations are essential for the field of image semantic segmentation. However, data annotation is a task that requires a lot of manual work, which is expensive and time-consuming. Another challenge, it is often a large gap between the distribution of data feature in the training and testing domains that leads to a serious loss of performance. The above questions will make the training process more difficult.
Therefore, this paper proposes a method which based on transfer learning in the field of machine learning in order to reduce the cost of labeling and improve the problem of losing performance. The kernel concept of transfer learning is to simulate the process models of human learning and design a model can apply knowledge learned previously to solve the new problems better. And, the task in this paper belongs to a branch of transfer learning called Unsupervised Domain Adaptation (UDA), which adapts features in different domains with similar tasks. The automatic collection of semantic annotated synthetic datasets from computer games, such as GTA5、SYNTHIA, can really reduce the labeling effort. Through the better transfer learning effect also can demonstrate the smaller gap between feature space of training and testing domains.
Above all, we propose an architecture by integrates multiple UDA methods. First, this paper adopts Generative Adversarial Networks (GAN) as a method of data augmentation. After that, a semi-supervised entropy loss is used for pixel-based prediction. There are two methods used for the loss function: direct minimization entropy and indirect minimization entropy. The method of Direct minimization merges the maximum squared loss function and the weighted proportion of the image for solving the unbalance category. And the method of Indirect minimization integrates the adversarial loss function onto the self-information map for enhancing spatial information between local semantics. In the experimental results, the network architecture proposed in this paper that in training on synthetic images and used the real image to test, including GTA5→Cityscapes and SYNTHIA→Cityscapes to real cases, achieves effective transfer performance. | en_US |