dc.description.abstract | Recent advances in self-supervised learning have shown promise as an alternative to supervised learning, particularly for addressing its critical shortcomings: the need for abundant labeled data and the inability to leverage prior knowledge and skills. Self-supervised learning involves pre-training deep neural networks on pretext tasks using easily acquirable, unlabeled data and then fine-tuning it on downstream tasks of interest, requiring fewer labeled data than supervised learning. Notably, self-supervised learning has demonstrated success in diverse domains, including text, vision, speech, etc.
In this thesis, we present several novel self-supervised learning methods for visual representation learning that can improve the performance of multiple computer vision downstream tasks. These methods are designed to leverage the input data itself for generating learning targets. Our first method, HAPiCLR, leverages pixel-level information from an object′s contextual representation with a contrastive learning objective, allowing it to learn more robust and efficient image representations for downstream tasks. The second method, HARL, introduces a heuristic attention-based approach that maximizes the abstract object-level embedding in vector space, resulting in higher quality semantic representations. Finally, the MVMA framework combines multiple augmentation pipelines and leveraging both global and local information from each training sample, the MVMA framework can explore a vast range of image appearances. This approach results in representations that are not only scale-invariant but also invariant to nuisance-factors, making them more robust and efficient for downstream tasks.
These methods have notably improved performance in tasks like image classification, object detection, and semantic segmentation. They demonstrate the ability of self-supervised algorithms to transform high-level image properties, thereby enhancing deep neural network efficiency in various computer vision tasks. This thesis not only introduces new learning algorithms but also provides a comprehensive analysis of self-supervised representations and the distinct factors that differentiate various models. Overall, it presents a suite of innovative, adaptable, and efficient approaches to self-supervised learning in image representation, significantly boosting the robustness and effectiveness of learned features. | en_US |