dc.description.abstract | Training a high-accuracy deep-learning model depends on various factors, such as the model architecture and training method. In addition, a large number of high-quality labeled datasets is necessary. However, it must be an unaffordable cost to collect such large-scale and high-quality datasets, which also becomes the barrier to train a high-accuracy model in the framework of supervised learning. Recently, the concept of self-supervised learning has been proposed. We can pre-train a deep learning model with the unlabeled dataset, and achieve a higher accuracy deep learning model by finetuning on the few labeled datasets. Therefore, the aforementioned issue is alleviated by applying the framework of self-supervised learning.
In self-supervised learning, most of the previous works measure the contrastive loss based on the feature extracted from the entire image. These kinds of measurements based on the instance level are suitable for the classification task. However, it is not ideal for tasks that require pixel-level information, such as object detection and instance segmentation. Therefore, we have proposed a pixel-level contrastive learning method based on mask attention, which is called Heuristic Attention Pixel-Level Contrastive Learning (HAPiCL). In HAPiCL, we generate the binary mask to split the input image into the foreground and background features through an unsupervised learning method. During the training stage, the model will measure the pixel-level contrastive loss with the foreground and background features. Such a method results in better performance in object detection as well as instance segmentation. | en_US |