深度學習基礎模型與自監督學習

、線上人數：7

、訪客IP：18.223.124.244

姓名	陳文研(Tran Van Nhiem) 查詢紙本館藏	畢業系所	資訊工程學系
論文名稱	深度學習基礎模型與自監督學習 (Deep Learning Foundation Model with Self-Supervised Learning)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載] 本電子論文使用權限為同意立即開放。已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。
摘要(中)	最近在自監督式學習的發展讓我發現其取代傳統監督式學習的可能性，尤其是自監督式學習解決了傳統監督式學習的需要大量標記資料及對不同任務泛化性不高的問題。自監督式學習使用容易獲得的未標記數據對深度神經網絡進行預訓練，然後在下游任務上進行微調，相比於監督式學習需要更少的標記資料。值得注意的是，自監督學習在包括文本、視覺、語音等多個領域均展現出成功。在本簡報中，我們提出了數種新穎的自監督式學習方法，用於視覺表徵學習，可以提高多個計算機視覺下游任務的效果。這些方法目標是利用輸入數據本身來生成學習目標。我們的第一種方法HAPiCLR利用影像的上下文表徵中的像素級信息，並結合對比式學習目標，使其能夠為下游任務學習更有效的圖像表徵。第二種方法HARL引入了一種基於啟發式注意力的方法，最大化向量空間中抽象對象級嵌入，從而產生更高質量的語義表徵。最後，MVMA框架結合了多個資料擴增的輸入，利用每個訓練樣本的全局和局部信息， MVMA框架可以探索廣泛的圖像外觀，這種方法產生的表徵具有對於不同尺度的影像有很高的魯棒性，使其對下游任務有更高的泛化性及提高訓練的效率。這些方法顯著改善了圖像分類、物件偵測和語義分割等任務的性能。它們展示了自監督式學習提取圖像特徵的能力，從而提高了在各種計算機視覺任務中的深度神經網絡效果及效率。本論文不僅介紹了新的學習算法，還提供了對自監督表徵的全面分析，揭示了不同模型之間的區別因素。總的來說，它展示了一套創新、高效、泛化性高的自監督學習在方法，使自監督式模型更好的泛化到下游任務的能力。
摘要(英)	Recent advances in self-supervised learning have shown promise as an alternative to supervised learning, particularly for addressing its critical shortcomings: the need for abundant labeled data and the inability to leverage prior knowledge and skills. Self-supervised learning involves pre-training deep neural networks on pretext tasks using easily acquirable, unlabeled data and then fine-tuning it on downstream tasks of interest, requiring fewer labeled data than supervised learning. Notably, self-supervised learning has demonstrated success in diverse domains, including text, vision, speech, etc. In this thesis, we present several novel self-supervised learning methods for visual representation learning that can improve the performance of multiple computer vision downstream tasks. These methods are designed to leverage the input data itself for generating learning targets. Our first method, HAPiCLR, leverages pixel-level information from an object′s contextual representation with a contrastive learning objective, allowing it to learn more robust and efficient image representations for downstream tasks. The second method, HARL, introduces a heuristic attention-based approach that maximizes the abstract object-level embedding in vector space, resulting in higher quality semantic representations. Finally, the MVMA framework combines multiple augmentation pipelines and leveraging both global and local information from each training sample, the MVMA framework can explore a vast range of image appearances. This approach results in representations that are not only scale-invariant but also invariant to nuisance-factors, making them more robust and efficient for downstream tasks. These methods have notably improved performance in tasks like image classification, object detection, and semantic segmentation. They demonstrate the ability of self-supervised algorithms to transform high-level image properties, thereby enhancing deep neural network efficiency in various computer vision tasks. This thesis not only introduces new learning algorithms but also provides a comprehensive analysis of self-supervised representations and the distinct factors that differentiate various models. Overall, it presents a suite of innovative, adaptable, and efficient approaches to self-supervised learning in image representation, significantly boosting the robustness and effectiveness of learned features.
關鍵字(中)	★ 自監督學習 ★ 計算機視覺 ★ 視覺表徵學習 ★ 深度神經網絡 ★ 圖像分析 ★ 特徵學習	關鍵字(英)	★ Self-Supervised Learning ★ Deep Learning Foundation Model ★ Computer Vision Foundation Model ★ Visual Representation learning ★ Deep Neural Network ★ Image Processing
論文目次	List of Contents List of Figures IX List of Tables XII List of Abbreviations XV Chapter I. Introduction 1 1-1. Introduction 1 1-2. Thesis Contributions 6 1-3. Chapter Guide 7 Chapter II. Self-Supervised Learning History Development and Current State 10 2-1. Representation Learning. 10 2-1-1. Foundation Model Representation Learning via Supervised Learning 10 2-1-2. Foundation Model Representation Learning via Self-supervised 11 2-2. History and evolution of self-supervised learning. 13 2-3. Main Categories of Self-supervised Learning 16 2-3-1. Contrastive learning methods 16 2-3-2. Predictive learning Distillation-based methods 17 2-3-3. Redundancy reduction methods 17 2-3-4. Reconstruction Self-supervised methods 18 2-3-5. Generative SSL methods 18 2-4. Research Gaps and Limitations 20 Chapter III. Self-supervised Contrastive Learning on Pixel-Level 21 3-1. Introduction 21 3-2. Related Work 22 3-3. Methodology 23 3-4. Implementation Detail 27 3-4-1. Dataset and image augmentation. 27 3-4-2. Neural Network Architecture. 28 3-4-3. Optimization Objective. 28 3-5. Evaluation Protocol 28 3-5-1. Performance with Linear Evaluation and Semi-supervised Learning on ImageNet Dataset. 28 3-5-2. Transfer Learning to Other Downstream Tasks. 29 3-6. Ablation and Analysis 30 3-6-1. Mask Cropping Strategies. 31 3-6-2. Objective Loss Functions. 32 3-6-3. Batch Size. 33 3-6-4. Projection Head 34 3-7. Chapter Summary 35 3-8. Supplement Section 35 3-8-A. Implementation Details 35 3-8-A-1. Heuristic Mask Proposal Generator 35 3-8-A-2. Implementation: Data Augmentation 36 3-8-B. Evaluation on ImageNet and Transfer Learning 37 3-8-B-1. Linear evaluation semi-supervised protocol on ImageNet. 37 3-8-B-2. Transfer Learning 38 Chapter IV. Heuristic Attention Representation Learning for Predictive Learning Self-Supervised Pretraining 41 4-1. Introduction 41 4-2. Related Work 43 4-3. Methods 44 4-3-1. HARL Framework 44 4-3-2. Heuristic Binary Mask 47 4-4. Experiments 48 4-5. Evaluation Protocol 49 4-5-1. Linear Evaluation and Semi-Supervised Learning on ImageNet Dataset 49 4-5-2. Transfer Learning to Other Downstream Tasks. 50 4-6. Ablation and Analysis 51 4-6-1. The Output of Spatial Feature Map (Size and Dimension) 52 4-6-2. Objective Loss Functions 53 4-6-2-1. Mask loss 54 4-6-2-2. Hybrid loss 54 4-6-2-3. Mask loss versus hybrid loss 55 4-6-3. The Impact of Heuristic Mask Quality 55 4-7. Conclusion 58 4-8. Supplement Implementation Detail 59 4-8-1. Implementation Data Augmentation 59 4-8-2. Implementation Masking Feature 60 4-8-3. Evaluation on the ImageNet and Transfer Learning 61 4-8-3-1. Linear evaluation semi-supervised protocol on ImageNet 61 4-8-3-2. Transfer via linear classification and fine-tuning 62 4-8-3-3. Transfer learning to other vision tasks 62 4-8-4. Heuristic Mask Proposal Methods 63 4-8-4-1. Heuristic binary mask generates using DRFI 63 4-8-4-2. Heuristic binary mask generates using unsupervised deep learning 63 Chapter V. Multi-View and Multi-Augmentation for Self-Supervised Visual Representation Learning 66 5-1. Introduction 66 5-2. Related Work 67 5-2-1. Self-Supervised Learning 67 5-2-2. Cropping Strategy 68 5-2-3. Multi-Cropping 69 5-2-4. Data Augmentation Searching 70 5-3. Methodology 71 5-3-1. Multi-Cropping 72 5-3-2. Multi-Data Augmentation 72 5-3-3. Loss Function 76 5-4. Experiments 79 5-4-1. SSL Pre-training Setup 79 5-4-2. Evaluation Protocol and Main Results 81 5-4-2-1. Evaluation on ImageNet 81 5-4-2-2. Evaluation on multiple natural image classification tasks 82 5-4-2-3. Evaluation on downstream task transfer 82 5-4-2-4. Discovering semantic scene layouts by observing the self-attention map 84 5-5. Ablation Study 86 5-5-1. Global and Local View Crop Ratio and Resolution 86 5-5-1. Number of Cropped Views 86 5-5-2. Number of Augmentation Strategies 88 5-5-3. Global- and Local-View Loss 89 5-6. Supplement Implementation Detail 90 5-6-1. Implement of MVMA multi-data augmentation 90 5-7. Conclusion 96 Chapter VI. Conclusion 97 6-1. Summary 97 6-2. Discussion 98 6-2-1. Implications and Applications of Self-supervised Learning 98 6-2-2. Limitations 99 6-3. Future Direction 100 6-3-1. Improving the Quality of Representation 100 6-3-2. Building Self-Supervised Multi-Modal Models 101 6-3-3. Exploring New Self-Supervised Application Domain 101 Bibliography 103
參考文獻	1 Tan, M., and Le, Q.V.: ‘EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks’, ArXiv, 2019, abs/1905.11946 2 He, K., Zhang, X., Ren, S., and Sun, J.: ‘Deep Residual Learning for Image Recognition’, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778 3 Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I.: ‘Attention is All you Need’, ArXiv, 2017, abs/1706.03762 4 Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I.: ‘Robust Speech Recognition via Large-Scale Weak Supervision’, ArXiv, 2022, abs/2212.04356 5 Abdel-Hamid, O., Mohamed, A.-r., Jiang, H., Deng, L., Penn, G., and Yu, D.: ‘Convolutional Neural Networks for Speech Recognition’, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22, pp. 1533-1545 6 Sun, C., Shrivastava, A., Singh, S., and Gupta, A.K.: ‘Revisiting Unreasonable Effectiveness of Data in Deep Learning Era’, 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 843-852 7 Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., and Houlsby, N.: ‘Big Transfer (BiT): General Visual Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Big Transfer (BiT): General Visual Representation Learning’ (2019, edn.), pp. 8 LeCun, Y., Bengio, Y., and Hinton, G.: ‘Deep Learning’, Nature, 2015, 521, pp. 436-444 9 Eslami, S.M.A., Jimenez Rezende, D., Besse, F., Viola, F., Morcos, A.S., Garnelo, M., Ruderman, A., Rusu, A.A., Danihelka, I., Gregor, K., Reichert, D.P., Buesing, L., Weber, T., Vinyals, O., Rosenbaum, D., Rabinowitz, N.C., King, H., Hillier, C., Botvinick, M.M., Wierstra, D., Kavukcuoglu, K., and Hassabis, D.: ‘Neural scene representation and rendering’, Science, 2018, 360, pp. 1204 - 1210 10 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., and Fei-Fei, L.: ‘ImageNet Large Scale Visual Recognition Challenge’, International Journal of Computer Vision, 2015, 115, pp. 211-252 11 Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I.: ‘Learning Transferable Visual Models From Natural Language Supervision’, in Editor (Ed.)^(Eds.): ‘Book Learning Transferable Visual Models From Natural Language Supervision’ (2021, edn.), pp. 12 Misra, Y.L.a.I.: ‘ Self-supervised learning: The dark matter of intelligence.’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised learning: The dark matter of intelligence.’ (2022, edn.), pp. 13 Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.: ‘A simple framework for contrastive learning of visual representations’, in Editor (Ed.)^(Eds.): ‘Book A simple framework for contrastive learning of visual representations’ (PMLR, 2020, edn.), pp. 1597-1607 14 Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., and Gheshlaghi Azar, M.: ‘Bootstrap your own latent-a new approach to self-supervised learning’, Advances in neural information processing systems, 2020, 33, pp. 21271-21284 15 Goyal, P., Caron, M., Lefaudeux, B., Xu, M., Wang, P., Pai, V., Singh, M., Liptchinsky, V., Misra, I., Joulin, A., and Bojanowski, P.: ‘Self-supervised Pretraining of Visual Features in the Wild’, ArXiv, 2021, abs/2103.01988 16 Caron, M., Touvron, H., Misra, I., J′egou, H.e., Mairal, J., Bojanowski, P., and Joulin, A.: ‘Emerging Properties in Self-Supervised Vision Transformers’, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9630-9640 17 Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., and Zhuang, Y.: ‘Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10326-10335 18 Alwassel, H., Mahajan, D.K., Torresani, L., Ghanem, B., and Tran, D.: ‘Self-Supervised Learning by Cross-Modal Audio-Video Clustering’, ArXiv, 2019, abs/1911.12667 19 Baevski, A., Zhou, H., Mohamed, A.-r., and Auli, M.: ‘wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations’, ArXiv, 2020, abs/2006.11477 20 Gong, Y., Lai, C.-I., Chung, Y.-A., and Glass, J.R.: ‘SSAST: Self-Supervised Audio Spectrogram Transformer’, in Editor (Ed.)^(Eds.): ‘Book SSAST: Self-Supervised Audio Spectrogram Transformer’ (2021, edn.), pp. 21 Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.: ‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’, ArXiv, 2019, abs/1810.04805 22 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V.: ‘RoBERTa: A Robustly Optimized BERT Pretraining Approach’, ArXiv, 2019, abs/1907.11692 23 Xie, Y., Xu, Z., Wang, Z., and Ji, S.: ‘Self-Supervised Learning of Graph Neural Networks: A Unified Review’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 45, pp. 2412-2429 24 Goyal, P., Mahajan, D.K., Gupta, A.K., and Misra, I.: ‘Scaling and Benchmarking Self-Supervised Visual Representation Learning’, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6390-6399 25 Goyal, P., Duval, Q., Seessel, I., Caron, M., Misra, I., Sagun, L., Joulin, A., and Bojanowski, P.: ‘Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision’, ArXiv, 2022, abs/2202.08360 26 Bengio, Y., Courville, A.C., and Vincent, P.: ‘Representation Learning: A Review and New Perspectives’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35, pp. 1798-1828 27 Bottou, L.: ‘Large-Scale Machine Learning with Stochastic Gradient Descent’, in Editor (Ed.)^(Eds.): ‘Book Large-Scale Machine Learning with Stochastic Gradient Descent’ (2010, edn.), pp. 28 Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio, Y.: ‘Contractive Auto-Encoders: Explicit Invariance During Feature Extraction’, in Editor (Ed.)^(Eds.): ‘Book Contractive Auto-Encoders: Explicit Invariance During Feature Extraction’ (2011, edn.), pp. 29 Goldberg, Y., and Levy, O.: ‘word2vec Explained: deriving Mikolov et al.′s negative-sampling word-embedding method’, ArXiv, 2014, abs/1402.3722 30 Xie, J., Girshick, R.B., and Farhadi, A.: ‘Unsupervised Deep Embedding for Clustering Analysis’, ArXiv, 2015, abs/1511.06335 31 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., and Bengio, Y.: ‘Generative Adversarial Nets’, in Editor (Ed.)^(Eds.): ‘Book Generative Adversarial Nets’ (2014, edn.), pp. 32 Larsson, G., Maire, M., and Shakhnarovich, G.: ‘Colorization as a Proxy Task for Visual Understanding’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 840-849 33 Noroozi, M., and Favaro, P.: ‘Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’ (2016, edn.), pp. 34 Gidaris, S., Singh, P., and Komodakis, N.: ‘Unsupervised Representation Learning by Predicting Image Rotations’, ArXiv, 2018, abs/1803.07728 35 Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., and Efros, A.A.: ‘Context Encoders: Feature Learning by Inpainting’, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2536-2544 36 Oord, A.v.d., Li, Y., and Vinyals, O.: ‘Representation Learning with Contrastive Predictive Coding’, ArXiv, 2018, abs/1807.03748 37 He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R.B.: ‘Momentum Contrast for Unsupervised Visual Representation Learning’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9726-9735 38 Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A.: ‘Unsupervised Learning of Visual Features by Contrasting Cluster Assignments’, ArXiv, 2020, abs/2006.09882 39 He, K., Chen, X., Xie, S., Li, Y., Doll′ar, P., and Girshick, R.B.: ‘Masked Autoencoders Are Scalable Vision Learners’, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15979-15988 40 Bardes, A., Ponce, J., and LeCun, Y.: ‘VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning’, ArXiv, 2021, abs/2105.04906 41 Baevski, A., Hsu, W.-N., Xu, Q., Babu, A., Gu, J., and Auli, M.: ‘data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language’, in Editor (Ed.)^(Eds.): ‘Book data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language’ (2022, edn.), pp. 42 Baevski, A., Babu, A., Hsu, W.-N., and Auli, M.: ‘Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language’, ArXiv, 2022, abs/2212.07525 43 Misra, I., and Maaten, L.v.d.: ‘Self-Supervised Learning of Pretext-Invariant Representations’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6706-6716 44 Caron, M., Bojanowski, P., Joulin, A., and Douze, M.: ‘Deep Clustering for Unsupervised Learning of Visual Features’, in Editor (Ed.)^(Eds.): ‘Book Deep Clustering for Unsupervised Learning of Visual Features’ (2018, edn.), pp. 45 Cuturi, M.: ‘Sinkhorn Distances: Lightspeed Computation of Optimal Transport’, in Editor (Ed.)^(Eds.): ‘Book Sinkhorn Distances: Lightspeed Computation of Optimal Transport’ (2013, edn.), pp. 46 Hinton, G.E., Vinyals, O., and Dean, J.: ‘Distilling the Knowledge in a Neural Network’, ArXiv, 2015, abs/1503.02531 47 Chen, X., and He, K.: ‘Exploring Simple Siamese Representation Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 15745-15753 48 Gidaris, S., Bursuc, A., Puy, G., Komodakis, N., Cord, M., and Pérez, P.: ‘OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6826-6836 49 Ermolov, A., Siarohin, A., Sangineto, E., and Sebe, N.: ‘Whitening for Self-Supervised Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Whitening for Self-Supervised Representation Learning’ (2020, edn.), pp. 50 Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S.: ‘Barlow Twins: Self-Supervised Learning via Redundancy Reduction’, in Editor (Ed.)^(Eds.): ‘Book Barlow Twins: Self-Supervised Learning via Redundancy Reduction’ (2021, edn.), pp. 51 Radford, A., and Narasimhan, K.: ‘Improving Language Understanding by Generative Pre-Training’, in Editor (Ed.)^(Eds.): ‘Book Improving Language Understanding by Generative Pre-Training’ (2018, edn.), pp. 52 Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I.: ‘Language Models are Unsupervised Multitask Learners’, in Editor (Ed.)^(Eds.): ‘Book Language Models are Unsupervised Multitask Learners’ (2019, edn.), pp. 53 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T.J., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D.: ‘Language Models are Few-Shot Learners’, ArXiv, 2020, abs/2005.14165 54 Chen, M., Radford, A., Wu, J., Jun, H., Dhariwal, P., Luan, D., and Sutskever, I.: ‘Generative Pretraining From Pixels’, in Editor (Ed.)^(Eds.): ‘Book Generative Pretraining From Pixels’ (2020, edn.), pp. 55 Bao, H., Dong, L., and Wei, F.: ‘BEiT: BERT Pre-Training of Image Transformers’, ArXiv, 2021, abs/2106.08254 56 Kingma, D.P., and Welling, M.: ‘Auto-Encoding Variational Bayes’, CoRR, 2013, abs/1312.6114 57 Sohl-Dickstein, J.N., Weiss, E.A., Maheswaranathan, N., and Ganguli, S.: ‘Deep Unsupervised Learning using Nonequilibrium Thermodynamics’, ArXiv, 2015, abs/1503.03585 58 Ho, J., Jain, A., and Abbeel, P.: ‘Denoising Diffusion Probabilistic Models’, ArXiv, 2020, abs/2006.11239 59 Chen, T., Kornblith, S., Swersky, K., Norouzi, M., and Hinton, G.E.: ‘Big self-supervised models are strong semi-supervised learners’, Advances in neural information processing systems, 2020, 33, pp. 22243-22255 60 Bardes, A., Ponce, J., and LeCun, Y.: ‘Vicreg: Variance-invariance-covariance regularization for self-supervised learning’, arXiv preprint arXiv:2105.04906, 2021 61 Bachman, P., Hjelm, R.D., and Buchwalter, W.: ‘Learning representations by maximizing mutual information across views’, Advances in neural information processing systems, 2019, 32 62 Misra, I., and Maaten, L.v.d.: ‘Self-supervised learning of pretext-invariant representations’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised learning of pretext-invariant representations’ (2020, edn.), pp. 6707-6717 63 He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R.: ‘Momentum contrast for unsupervised visual representation learning’, in Editor (Ed.)^(Eds.): ‘Book Momentum contrast for unsupervised visual representation learning’ (2020, edn.), pp. 9729-9738 64 Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., and Isola, P.: ‘What makes for good views for contrastive learning?’, Advances in Neural Information Processing Systems, 2020, 33, pp. 6827-6839 65 Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A.: ‘Unsupervised learning of visual features by contrasting cluster assignments’, Advances in Neural Information Processing Systems, 2020, 33, pp. 9912-9924 66 Chen, X., and He, K.: ‘Exploring simple siamese representation learning’, in Editor (Ed.)^(Eds.): ‘Book Exploring simple siamese representation learning’ (2021, edn.), pp. 15750-15758 67 Gidaris, S., Bursuc, A., Puy, G., Komodakis, N., Cord, M., and Pérez, P.: ‘Online bag-of-visual-words generation for unsupervised representation learning’, arXiv preprint arXiv:2012.11552, 2020 68 Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S.: ‘Barlow twins: Self-supervised learning via redundancy reduction’, in Editor (Ed.)^(Eds.): ‘Book Barlow twins: Self-supervised learning via redundancy reduction’ (PMLR, 2021, edn.), pp. 12310-12320 69 Putri, W.R., Liu, S.-H., Aslam, M.S., Li, Y.-H., Chang, C.-C., and Wang, J.-C.: ‘Self-Supervised Learning Framework toward State-of-the-Art Iris Image Segmentation’, Sensors, 2022, 22, (6), pp. 2133 70 Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., and Shah, R.: ‘Signature verification using a" siamese" time delay neural network’, Advances in neural information processing systems, 1993, 6 71 Chopra, S., Hadsell, R., and LeCun, Y.: ‘Learning a similarity metric discriminatively, with application to face verification’, in Editor (Ed.)^(Eds.): ‘Book Learning a similarity metric discriminatively, with application to face verification’ (IEEE, 2005, edn.), pp. 539-546 72 Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., and Bengio, Y.: ‘Learning deep representations by mutual information estimation and maximization’, arXiv preprint arXiv:1808.06670, 2018 73 Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., and Hu, H.: ‘Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning’, in Editor (Ed.)^(Eds.): ‘Book Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning’ (2021, edn.), pp. 16684-16693 74 Van Gansbeke, W., Vandenhende, S., Georgoulis, S., and Van Gool, L.: ‘Unsupervised semantic segmentation by contrasting object mask proposals’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised semantic segmentation by contrasting object mask proposals’ (2021, edn.), pp. 10052-10062 75 Wang, X., Zhang, R., Shen, C., Kong, T., and Li, L.: ‘Dense contrastive learning for self-supervised visual pre-training’, in Editor (Ed.)^(Eds.): ‘Book Dense contrastive learning for self-supervised visual pre-training’ (2021, edn.), pp. 3024-3033 76 Iizuka, S., Simo-Serra, E., and Ishikawa, H.: ‘Let there be color! Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification’, ACM Transactions on Graphics (ToG), 2016, 35, (4), pp. 1-11 77 Larsson, G., Maire, M., and Shakhnarovich, G.: ‘Colorization as a proxy task for visual understanding’, in Editor (Ed.)^(Eds.): ‘Book Colorization as a proxy task for visual understanding’ (2017, edn.), pp. 6874-6883 78 Zhang, R., Isola, P., and Efros, A.A.: ‘Colorful image colorization’, in Editor (Ed.)^(Eds.): ‘Book Colorful image colorization’ (Springer, 2016, edn.), pp. 649-666 79 Doersch, C., Gupta, A., and Efros, A.A.: ‘Unsupervised visual representation learning by context prediction’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised visual representation learning by context prediction’ (2015, edn.), pp. 1422-1430 80 Mundhenk, T.N., Ho, D., and Chen, B.Y.: ‘Improvements to context based self-supervised learning’, in Editor (Ed.)^(Eds.): ‘Book Improvements to context based self-supervised learning’ (2018, edn.), pp. 9339-9348 81 Noroozi, M., and Favaro, P.: ‘Unsupervised learning of visual representations by solving jigsaw puzzles’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised learning of visual representations by solving jigsaw puzzles’ (Springer, 2016, edn.), pp. 69-84 82 Noroozi, M., Vinjimoor, A., Favaro, P., and Pirsiavash, H.: ‘Boosting self-supervised learning via knowledge transfer’, in Editor (Ed.)^(Eds.): ‘Book Boosting self-supervised learning via knowledge transfer’ (2018, edn.), pp. 9359-9367 83 Ren, Z., and Lee, Y.J.: ‘Cross-domain self-supervised multi-task feature learning using synthetic imagery’, in Editor (Ed.)^(Eds.): ‘Book Cross-domain self-supervised multi-task feature learning using synthetic imagery’ (2018, edn.), pp. 762-771 84 Asano, Y., Patrick, M., Rupprecht, C., and Vedaldi, A.: ‘Labelling unlabelled videos from scratch with multi-modal self-supervision’, Advances in Neural Information Processing Systems, 2020, 33, pp. 4660-4671 85 Caron, M., Bojanowski, P., Joulin, A., and Douze, M.: ‘Deep clustering for unsupervised learning of visual features’, in Editor (Ed.)^(Eds.): ‘Book Deep clustering for unsupervised learning of visual features’ (2018, edn.), pp. 132-149 86 Yan, X., Misra, I., Gupta, A., Ghadiyaram, D., and Mahajan, D.: ‘Clusterfit: Improving generalization of visual representations’, in Editor (Ed.)^(Eds.): ‘Book Clusterfit: Improving generalization of visual representations’ (2020, edn.), pp. 6509-6518 87 Bojanowski, P., and Joulin, A.: ‘Unsupervised learning by predicting noise’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised learning by predicting noise’ (PMLR, 2017, edn.), pp. 517-526 88 Jenni, S., and Favaro, P.: ‘Self-supervised feature learning by learning to spot artifacts’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised feature learning by learning to spot artifacts’ (2018, edn.), pp. 2733-2742 89 Donahue, J., Krähenbühl, P., and Darrell, T.: ‘Adversarial feature learning’, arXiv preprint arXiv:1605.09782, 2016 90 Donahue, J., and Simonyan, K.: ‘Large scale adversarial representation learning’, Advances in neural information processing systems, 2019, 32 91 Mahendran, A., Thewlis, J., and Vedaldi, A.: ‘Cross pixel optical-flow similarity for self-supervised learning’, in Editor (Ed.)^(Eds.): ‘Book Cross pixel optical-flow similarity for self-supervised learning’ (Springer, 2018, edn.), pp. 99-116 92 Zhan, X., Pan, X., Liu, Z., Lin, D., and Loy, C.C.: ‘Self-supervised learning via conditional motion propagation’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised learning via conditional motion propagation’ (2019, edn.), pp. 1881-1889 93 Noroozi, M., Pirsiavash, H., and Favaro, P.: ‘Representation learning by learning to count’, in Editor (Ed.)^(Eds.): ‘Book Representation learning by learning to count’ (2017, edn.), pp. 5898-5906 94 Gidaris, S., Singh, P., and Komodakis, N.: ‘Unsupervised representation learning by predicting image rotations’, arXiv preprint arXiv:1803.07728, 2018 95 Zhang, L., Qi, G.-J., Wang, L., and Luo, J.: ‘Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data’, in Editor (Ed.)^(Eds.): ‘Book Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data’ (2019, edn.), pp. 2547-2555 96 Chaitanya, K., Erdil, E., Karani, N., and Konukoglu, E.: ‘Contrastive learning of global and local features for medical image segmentation with limited annotations’, Advances in Neural Information Processing Systems, 2020, 33, pp. 12546-12558 97 Hadsell, R., Chopra, S., and LeCun, Y.: ‘Dimensionality reduction by learning an invariant mapping’, in Editor (Ed.)^(Eds.): ‘Book Dimensionality reduction by learning an invariant mapping’ (IEEE, 2006, edn.), pp. 1735-1742 98 Li, J., Zhou, P., Xiong, C., and Hoi, S.C.: ‘Prototypical contrastive learning of unsupervised representations’, arXiv preprint arXiv:2005.04966, 2020 99 Tian, Y., Krishnan, D., and Isola, P.: ‘Contrastive multiview coding’, in Editor (Ed.)^(Eds.): ‘Book Contrastive multiview coding’ (Springer, 2020, edn.), pp. 776-794 100 Wu, Z., Xiong, Y., Yu, S.X., and Lin, D.: ‘Unsupervised feature learning via non-parametric instance discrimination’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised feature learning via non-parametric instance discrimination’ (2018, edn.), pp. 3733-3742 101 Ye, M., Zhang, X., Yuen, P.C., and Chang, S.-F.: ‘Unsupervised embedding learning via invariant and spreading instance feature’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised embedding learning via invariant and spreading instance feature’ (2019, edn.), pp. 6210-6219 102 Zhan, X., Liu, Z., Luo, P., Tang, X., and Loy, C.: ‘Mix-and-match tuning for self-supervised semantic segmentation’, in Editor (Ed.)^(Eds.): ‘Book Mix-and-match tuning for self-supervised semantic segmentation’ (2018, edn.), pp. 103 Oord, A.v.d., Li, Y., and Vinyals, O.: ‘Representation learning with contrastive predictive coding’, arXiv preprint arXiv:1807.03748, 2018 104 Chen, X., Fan, H., Girshick, R., and He, K.: ‘Improved baselines with momentum contrastive learning’, arXiv preprint arXiv:2003.04297, 2020 105 Henaff, O.: ‘Data-efficient image recognition with contrastive predictive coding’, in Editor (Ed.)^(Eds.): ‘Book Data-efficient image recognition with contrastive predictive coding’ (PMLR, 2020, edn.), pp. 4182-4192 106 Zhuang, C., Zhai, A.L., and Yamins, D.: ‘Local aggregation for unsupervised learning of visual embeddings’, in Editor (Ed.)^(Eds.): ‘Book Local aggregation for unsupervised learning of visual embeddings’ (2019, edn.), pp. 6002-6012 107 Cao, Y., Xie, Z., Liu, B., Lin, Y., Zhang, Z., and Hu, H.: ‘Parametric instance classification for unsupervised visual feature learning’, Advances in neural information processing systems, 2020, 33, pp. 15614-15624 108 Ioffe, S., and Szegedy, C.: ‘Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift’, ArXiv, 2015, abs/1502.03167 109 Nair, V., and Hinton, G.E.: ‘Rectified Linear Units Improve Restricted Boltzmann Machines’, in Editor (Ed.)^(Eds.): ‘Book Rectified Linear Units Improve Restricted Boltzmann Machines’ (2010, edn.), pp. 110 Nguyen, D.T., Dax, M., Mummadi, C.K., Ngo, T.-P.-N., Nguyen, T.H.P., Lou, Z., and Brox, T.: ‘DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision’, in Editor (Ed.)^(Eds.): ‘Book DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision’ (2019, edn.), pp. 111 Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A.: ‘Going deeper with convolutions’, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1-9 112 Zhang, S., Liew, J.H., Wei, Y., Wei, S., and Zhao, Y.: ‘Interactive Object Segmentation With Inside-Outside Guidance’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12231-12241 113 You, Y., Gitman, I., and Ginsburg, B.: ‘Scaling SGD Batch Size to 32K for ImageNet Training’, ArXiv, 2017, abs/1708.03888 114 Loshchilov, I., and Hutter, F.: ‘SGDR: Stochastic Gradient Descent with Warm Restarts’, arXiv: Learning, 2017 115 Goyal, P., Doll·r, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K.: ‘Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour’, ArXiv, 2017, abs/1706.02677 116 Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., and Zisserman, A.: ‘The Pascal Visual Object Classes (VOC) Challenge’, International Journal of Computer Vision, 2009, 88, pp. 303-338 117 Ren, S., He, K., Girshick, R.B., and Sun, J.: ‘Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39, pp. 1137-1149 118 Lin, T.-Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Doll·r, P., and Zitnick, C.L.: ‘Microsoft COCO: Common Objects in Context’, in Editor (Ed.)^(Eds.): ‘Book Microsoft COCO: Common Objects in Context’ (2014, edn.), pp. 119 He, K., Gkioxari, G., Doll·r, P., and Girshick, R.B.: ‘Mask R-CNN’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42, pp. 386-397 120 Bossard, L., Guillaumin, M., and Gool, L.V.: ‘Food-101 - Mining Discriminative Components with Random Forests’, in Editor (Ed.)^(Eds.): ‘Book Food-101 - Mining Discriminative Components with Random Forests’ (2014, edn.), pp. 121 Krizhevsky, A.: ‘Learning Multiple Layers of Features from Tiny Images’, in Editor (Ed.)^(Eds.): ‘Book Learning Multiple Layers of Features from Tiny Images’ (2009, edn.), pp. 122 Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., and Torralba, A.: ‘SUN database: Large-scale scene recognition from abbey to zoo’, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 3485-3492 123 Krause, J., Stark, M., Deng, J., and Fei-Fei, L.: ‘3D Object Representations for Fine-Grained Categorization’, 2013 IEEE International Conference on Computer Vision Workshops, 2013, pp. 554-561 124 Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., and Vedaldi, A.: ‘Describing Textures in the Wild’, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3606-3613 125 Shu, Y., Kou, Z., Cao, Z., Wang, J., and Long, M.: ‘Zoo-Tuning: Adaptive Transfer from a Zoo of Models’, ArXiv, 2021, abs/2106.15434 126 Yang, Q., Zhang, Y., Dai, W., and Pan, S.J.: ‘Transfer learning’ (Cambridge University Press, 2020. 2020) 127 You, K., Kou, Z., Long, M., and Wang, J.: ‘Co-Tuning for Transfer Learning’, in Editor (Ed.)^(Eds.): ‘Book Co-Tuning for Transfer Learning’ (2020, edn.), pp. 128 Misra, I., Shrivastava, A., Gupta, A., and Hebert, M.: ‘Cross-Stitch Networks for Multi-task Learning’, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3994-4003 129 Li, X., Xiong, H., Xu, C., and Dou, D.: ‘SMILE: Self-Distilled MIxup for Efficient Transfer LEarning’, ArXiv, 2021, abs/2103.13941 130 Tishby, N., and Zaslavsky, N.: ‘Deep learning and the information bottleneck principle’, 2015 IEEE Information Theory Workshop (ITW), 2015, pp. 1-5 131 Shwartz-Ziv, R., and Tishby, N.: ‘Opening the Black Box of Deep Neural Networks via Information’, ArXiv, 2017, abs/1703.00810 132 Amjad, R.A., and Geiger, B.C.: ‘Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42, pp. 2225-2239 133 Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.E.: ‘A Simple Framework for Contrastive Learning of Visual Representations’, ArXiv, 2020, abs/2002.05709 134 Misra, I., and Maaten, L.v.d.: ‘Self-Supervised Learning of Pretext-Invariant Representations’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6706-6716 135 Ermolov, A., Siarohin, A., Sangineto, E., and Sebe, N.: ‘Whitening for Self-Supervised Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Whitening for Self-Supervised Representation Learning’ (2021, edn.), pp. 136 Caron, M., Touvron, H., Misra, I., J′egou, H.e., Mairal, J., Bojanowski, P., and Joulin, A.: ‘Emerging Properties in Self-Supervised Vision Transformers’, ArXiv, 2021, abs/2104.14294 137 Hayhoe, M.M., and Ballard, D.H.: ‘Eye movements in natural behavior’, Trends in Cognitive Sciences, 2005, 9, pp. 188-194 138 BorjiAli, SihiteDicky, N., and IttiLaurent: ‘Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling’, IEEE Transactions on Image Processing, 2013 139 Benois-Pineau, J., and Callet, P.L.: ‘Visual Content Indexing and Retrieval with Psycho-Visual Models’, in Editor (Ed.)^(Eds.): ‘Book Visual Content Indexing and Retrieval with Psycho-Visual Models’ (2017, edn.), pp. 140 Awh, E., Armstrong, K.M., and Moore, T.: ‘Visual and oculomotor selection: links, causes and implications for spatial attention’, Trends in Cognitive Sciences, 2006, 10, pp. 124-130 141 Tian, Y., Chen, X., and Ganguli, S.: ‘Understanding self-supervised Learning Dynamics without Contrastive Pairs’, ArXiv, 2021, abs/2102.06810 142 Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A.: ‘Extracting and composing robust features with denoising autoencoders’, in Editor (Ed.)^(Eds.): ‘Book Extracting and composing robust features with denoising autoencoders’ (2008, edn.), pp. 143 Bojanowski, P., and Joulin, A.: ‘Unsupervised Learning by Predicting Noise’, ArXiv, 2017, abs/1704.05310 144 Noroozi, M., and Favaro, P.: ‘Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’ (2016, edn.), pp. 145 Zhang, R., Isola, P., and Efros, A.A.: ‘Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 645-654 146 Mundhenk, T.N., Ho, D., and Chen, B.Y.: ‘Improvements to Context Based Self-Supervised Learning’, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 9339-9348 147 Donahue, J., and Simonyan, K.: ‘Large Scale Adversarial Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Large Scale Adversarial Representation Learning’ (2019, edn.), pp. 148 Bansal, V., Buckchash, H., and Raman, B.: ‘Discriminative Auto-Encoding for Classification and Representation Learning Problems’, IEEE Signal Processing Letters, 2021, 28, pp. 987-991 149 Chen, T., Kornblith, S., Swersky, K., Norouzi, M., and Hinton, G.E.: ‘Big Self-Supervised Models are Strong Semi-Supervised Learners’, ArXiv, 2020, abs/2006.10029 150 Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D., and Makedon, F.: ‘A Survey on Contrastive Self-supervised Learning’, ArXiv, 2020, abs/2011.00362 151 He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R.B.: ‘Momentum Contrast for Unsupervised Visual Representation Learning’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9726-9735 152 Zhang, X., and Maire, M.: ‘Self-Supervised Visual Representation Learning from Hierarchical Grouping’, ArXiv, 2020, abs/2012.03044 153 Jiang, H., Yuan, Z., Cheng, M.-M., Gong, Y., Zheng, N., and Wang, J.: ‘Salient Object Detection: A Discriminative Regional Feature Integration Approach’, International Journal of Computer Vision, 2013, 123, pp. 251-268 154 Kolesnikov, A., Zhai, X., and Beyer, L.: ‘Revisiting Self-Supervised Visual Representation Learning’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1920-1929 155 Ye, M., Zhang, X., Yuen, P., and Chang, S.-F.: ‘Unsupervised Embedding Learning via Invariant and Spreading Instance Feature’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6203-6212 156 Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Trischler, A., and Bengio, Y.: ‘Learning deep representations by mutual information estimation and maximization’, ArXiv, 2019, abs/1808.06670 157 Kornblith, S., Shlens, J., and Le, Q.V.: ‘Do Better ImageNet Models Transfer Better?’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2656-2666 158 Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., and Valko, M.: ‘Bootstrap your own latent a new approach to self-supervised learning’. Proc. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada2020 pp. Pages 159 Chen, X., and He, K.: ‘Exploring Simple Siamese Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Exploring Simple Siamese Representation Learning’ (2021, edn.), pp. 160 Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., and Hu, H.: ‘Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16679-16688 161 Chen, X., Fan, H., Girshick, R.B., and He, K.: ‘Improved Baselines with Momentum Contrastive Learning’, ArXiv, 2020, abs/2003.04297 162 HÈnaff, O.J., Srinivas, A., Fauw, J.D., Razavi, A., Doersch, C., Eslami, S.M.A., and Oord, A.r.v.d.: ‘Data-Efficient Image Recognition with Contrastive Predictive Coding’, ArXiv, 2020, abs/1905.09272 163 Borji, A., Cheng, M.-M., Jiang, H., and Li, J.: ‘Salient Object Detection: A Benchmark’, IEEE Transactions on Image Processing, 2015, 24, pp. 5706-5722 164 Wang, W., Lai, Q., Fu, H., Shen, J., and Ling, H.: ‘Salient Object Detection in the Deep Learning Era: An In-Depth Survey’, IEEE transactions on pattern analysis and machine intelligence, 2021, PP 165 Zou, W., and Komodakis, N.: ‘HARF: Hierarchy-Associated Rich Features for Salient Object Detection’, 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 406-414 166 Zhang, J., Zhang, T., Dai, Y., Harandi, M., and Hartley, R.I.: ‘Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective’, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 9029-9038 167 Van Gansbeke, W., Vandenhende, S., Georgoulis, S., and Gool, L.V.: ‘Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals’, ArXiv, 2021, abs/2102.06191 168 Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.: ‘A Simple Framework for Contrastive Learning of Visual Representations’. Proc. Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research2020 pp. Pages 169 Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A.: ‘Unsupervised learning of visual features by contrasting cluster assignments’. Proc. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada2020 pp. Pages 170 Zhao, Z., Zhang, Z., Chen, T., Singh, S., and Zhang, H.: ‘Image Augmentations for GAN Training’, ArXiv, 2020, abs/2006.02595 171 Howard, A.G.: ‘Some Improvements on Deep Convolutional Neural Network Based Image Classification’, CoRR, 2014, abs/1312.5402 172 Cubuk, E.D., Zoph, B., ManÈ, D., Vasudevan, V., and Le, Q.V.: ‘AutoAugment: Learning Augmentation Strategies From Data’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 113-123 173 Cubuk, E.D., Zoph, B., Shlens, J., and Le, Q.V.: ‘Randaugment: Practical automated data augmentation with a reduced search space’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 3008-3017 174 Lim, S., Kim, I., Kim, T., Kim, C., and Kim, S.: ‘Fast AutoAugment’, in Editor (Ed.)^(Eds.): ‘Book Fast AutoAugment’ (2019, edn.), pp. 175 Caron, M., Bojanowski, P., Joulin, A., and Douze, M.: ‘Deep Clustering for Unsupervised Learning of Visual Features’, in Editor (Ed.)^(Eds.): ‘Book Deep Clustering for Unsupervised Learning of Visual Features’ (2018, edn.), pp. 176 Richemond, P.H., Grill, J.-B., Altché, F., Tallec, C., Strub, F., Brock, A., Smith, S., De, S., Pascanu, R., and Piot, B.: ‘BYOL works even without batch statistics’, arXiv preprint arXiv:2010.10241, 2020 177 Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., and Hu, H.: ‘SimMIM: a Simple Framework for Masked Image Modeling’, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 9643-9653 178 Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A.L., and Kong, T.: ‘iBOT: Image BERT Pre-Training with Online Tokenizer’, ArXiv, 2021, abs/2111.07832 179 Oquab, M., Darcet, T.e., Moutakanni, T., Vo, H.Q., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.-Y., Li, S.-W., Misra, I., Rabbat, M.G., Sharma, V., Synnaeve, G., Xu, H., Jégou, H., Mairal, J., Labatut, P., Joulin, A., and Bojanowski, P.: ‘DINOv2: Learning Robust Visual Features without Supervision’, ArXiv, 2023, abs/2304.07193 180 Tran, V.-N., Huang, C.-E., Liu, S., Yang, K.-L., Ko, T., and Li, Y.-h.: ‘Multi-Augmentation for Efficient Self-Supervised Visual Representation Learning’, 2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2022, pp. 1-4 181 Krizhevsky, A., Sutskever, I., and Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’, Communications of the ACM, 2012, 60, pp. 84 - 90 182 Touvron, H., Vedaldi, A., Douze, M., and Jégou, H.: ‘Fixing the train-test resolution discrepancy’, Advances in neural information processing systems, 2019, 32 183 Jones, D.R.: ‘A Taxonomy of Global Optimization Methods Based on Response Surfaces’, Journal of Global Optimization, 2001, 21, pp. 345-383 184 Reed, C., Metzger, S., Srinivas, A., Darrell, T., and Keutzer, K.: ‘SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2673-2682 185 Radosavovic, I., Kosaraju, R.P., Girshick, R.B., He, K., and Dollár, P.: ‘Designing Network Design Spaces’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10425-10433 186 Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N.: ‘An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale’, ArXiv, 2021, abs/2010.11929 187 Salimans, T., and Kingma, D.P.: ‘Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks’, in Editor (Ed.)^(Eds.): ‘Book Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks’ (2016, edn.), pp. 188 Loshchilov, I., and Hutter, F.: ‘Fixing Weight Decay Regularization in Adam’, ArXiv, 2017, abs/1711.05101 189 Chen, X., Xie, S., and He, K.: ‘An Empirical Study of Training Self-Supervised Vision Transformers’, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9620-9629 190 Lin, T.-Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J.: ‘Feature Pyramid Networks for Object Detection’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936-944 191 url{https://github.com/facebookresearch/detectron2, accessed 2023/11/24 2023 192 Lin, T.-Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J.: ‘Feature Pyramid Networks for Object Detection’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 936-944 193 url{https://github.com/facebookresearch/detectron, accessed 2023/11/25 2023 194 Li, Y., Mao, H., Girshick, R.B., and He, K.: ‘Exploring Plain Vision Transformer Backbones for Object Detection’, ArXiv, 2022, abs/2203.16527 195 Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., and Gool, L.V.: ‘The 2017 DAVIS Challenge on Video Object Segmentation’, ArXiv, 2017, abs/1704.00675 196 Jabri, A., Owens, A., and Efros, A.A.: ‘Space-Time Correspondence as a Contrastive Random Walk’, ArXiv, 2020, abs/2006.14613 197 Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., and Batra, D.: ‘Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization’, International Journal of Computer Vision, 2017, 128, pp. 336-359
指導教授	王家慶栗永徽(Jia-Ching Wang Yung-Hui Li)	審核日期	2024-1-16
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 109582607 詳細資訊