參考文獻 |
1 Tan, M., and Le, Q.V.: ‘EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks’, ArXiv, 2019, abs/1905.11946
2 He, K., Zhang, X., Ren, S., and Sun, J.: ‘Deep Residual Learning for Image Recognition’, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778
3 Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I.: ‘Attention is All you Need’, ArXiv, 2017, abs/1706.03762
4 Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I.: ‘Robust Speech Recognition via Large-Scale Weak Supervision’, ArXiv, 2022, abs/2212.04356
5 Abdel-Hamid, O., Mohamed, A.-r., Jiang, H., Deng, L., Penn, G., and Yu, D.: ‘Convolutional Neural Networks for Speech Recognition’, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22, pp. 1533-1545
6 Sun, C., Shrivastava, A., Singh, S., and Gupta, A.K.: ‘Revisiting Unreasonable Effectiveness of Data in Deep Learning Era’, 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 843-852
7 Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., and Houlsby, N.: ‘Big Transfer (BiT): General Visual Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Big Transfer (BiT): General Visual Representation Learning’ (2019, edn.), pp.
8 LeCun, Y., Bengio, Y., and Hinton, G.: ‘Deep Learning’, Nature, 2015, 521, pp. 436-444
9 Eslami, S.M.A., Jimenez Rezende, D., Besse, F., Viola, F., Morcos, A.S., Garnelo, M., Ruderman, A., Rusu, A.A., Danihelka, I., Gregor, K., Reichert, D.P., Buesing, L., Weber, T., Vinyals, O., Rosenbaum, D., Rabinowitz, N.C., King, H., Hillier, C., Botvinick, M.M., Wierstra, D., Kavukcuoglu, K., and Hassabis, D.: ‘Neural scene representation and rendering’, Science, 2018, 360, pp. 1204 - 1210
10 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., and Fei-Fei, L.: ‘ImageNet Large Scale Visual Recognition Challenge’, International Journal of Computer Vision, 2015, 115, pp. 211-252
11 Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., and Sutskever, I.: ‘Learning Transferable Visual Models From Natural Language Supervision’, in Editor (Ed.)^(Eds.): ‘Book Learning Transferable Visual Models From Natural Language Supervision’ (2021, edn.), pp.
12 Misra, Y.L.a.I.: ‘ Self-supervised learning: The dark matter of intelligence.’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised learning: The dark matter of intelligence.’ (2022, edn.), pp.
13 Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.: ‘A simple framework for contrastive learning of visual representations’, in Editor (Ed.)^(Eds.): ‘Book A simple framework for contrastive learning of visual representations’ (PMLR, 2020, edn.), pp. 1597-1607
14 Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., and Gheshlaghi Azar, M.: ‘Bootstrap your own latent-a new approach to self-supervised learning’, Advances in neural information processing systems, 2020, 33, pp. 21271-21284
15 Goyal, P., Caron, M., Lefaudeux, B., Xu, M., Wang, P., Pai, V., Singh, M., Liptchinsky, V., Misra, I., Joulin, A., and Bojanowski, P.: ‘Self-supervised Pretraining of Visual Features in the Wild’, ArXiv, 2021, abs/2103.01988
16 Caron, M., Touvron, H., Misra, I., J′egou, H.e., Mairal, J., Bojanowski, P., and Joulin, A.: ‘Emerging Properties in Self-Supervised Vision Transformers’, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9630-9640
17 Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., and Zhuang, Y.: ‘Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10326-10335
18 Alwassel, H., Mahajan, D.K., Torresani, L., Ghanem, B., and Tran, D.: ‘Self-Supervised Learning by Cross-Modal Audio-Video Clustering’, ArXiv, 2019, abs/1911.12667
19 Baevski, A., Zhou, H., Mohamed, A.-r., and Auli, M.: ‘wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations’, ArXiv, 2020, abs/2006.11477
20 Gong, Y., Lai, C.-I., Chung, Y.-A., and Glass, J.R.: ‘SSAST: Self-Supervised Audio Spectrogram Transformer’, in Editor (Ed.)^(Eds.): ‘Book SSAST: Self-Supervised Audio Spectrogram Transformer’ (2021, edn.), pp.
21 Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.: ‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’, ArXiv, 2019, abs/1810.04805
22 Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V.: ‘RoBERTa: A Robustly Optimized BERT Pretraining Approach’, ArXiv, 2019, abs/1907.11692
23 Xie, Y., Xu, Z., Wang, Z., and Ji, S.: ‘Self-Supervised Learning of Graph Neural Networks: A Unified Review’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 45, pp. 2412-2429
24 Goyal, P., Mahajan, D.K., Gupta, A.K., and Misra, I.: ‘Scaling and Benchmarking Self-Supervised Visual Representation Learning’, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6390-6399
25 Goyal, P., Duval, Q., Seessel, I., Caron, M., Misra, I., Sagun, L., Joulin, A., and Bojanowski, P.: ‘Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision’, ArXiv, 2022, abs/2202.08360
26 Bengio, Y., Courville, A.C., and Vincent, P.: ‘Representation Learning: A Review and New Perspectives’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35, pp. 1798-1828
27 Bottou, L.: ‘Large-Scale Machine Learning with Stochastic Gradient Descent’, in Editor (Ed.)^(Eds.): ‘Book Large-Scale Machine Learning with Stochastic Gradient Descent’ (2010, edn.), pp.
28 Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio, Y.: ‘Contractive Auto-Encoders: Explicit Invariance During Feature Extraction’, in Editor (Ed.)^(Eds.): ‘Book Contractive Auto-Encoders: Explicit Invariance During Feature Extraction’ (2011, edn.), pp.
29 Goldberg, Y., and Levy, O.: ‘word2vec Explained: deriving Mikolov et al.′s negative-sampling word-embedding method’, ArXiv, 2014, abs/1402.3722
30 Xie, J., Girshick, R.B., and Farhadi, A.: ‘Unsupervised Deep Embedding for Clustering Analysis’, ArXiv, 2015, abs/1511.06335
31 Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., and Bengio, Y.: ‘Generative Adversarial Nets’, in Editor (Ed.)^(Eds.): ‘Book Generative Adversarial Nets’ (2014, edn.), pp.
32 Larsson, G., Maire, M., and Shakhnarovich, G.: ‘Colorization as a Proxy Task for Visual Understanding’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 840-849
33 Noroozi, M., and Favaro, P.: ‘Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’ (2016, edn.), pp.
34 Gidaris, S., Singh, P., and Komodakis, N.: ‘Unsupervised Representation Learning by Predicting Image Rotations’, ArXiv, 2018, abs/1803.07728
35 Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., and Efros, A.A.: ‘Context Encoders: Feature Learning by Inpainting’, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2536-2544
36 Oord, A.v.d., Li, Y., and Vinyals, O.: ‘Representation Learning with Contrastive Predictive Coding’, ArXiv, 2018, abs/1807.03748
37 He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R.B.: ‘Momentum Contrast for Unsupervised Visual Representation Learning’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9726-9735
38 Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A.: ‘Unsupervised Learning of Visual Features by Contrasting Cluster Assignments’, ArXiv, 2020, abs/2006.09882
39 He, K., Chen, X., Xie, S., Li, Y., Doll′ar, P., and Girshick, R.B.: ‘Masked Autoencoders Are Scalable Vision Learners’, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15979-15988
40 Bardes, A., Ponce, J., and LeCun, Y.: ‘VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning’, ArXiv, 2021, abs/2105.04906
41 Baevski, A., Hsu, W.-N., Xu, Q., Babu, A., Gu, J., and Auli, M.: ‘data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language’, in Editor (Ed.)^(Eds.): ‘Book data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language’ (2022, edn.), pp.
42 Baevski, A., Babu, A., Hsu, W.-N., and Auli, M.: ‘Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language’, ArXiv, 2022, abs/2212.07525
43 Misra, I., and Maaten, L.v.d.: ‘Self-Supervised Learning of Pretext-Invariant Representations’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6706-6716
44 Caron, M., Bojanowski, P., Joulin, A., and Douze, M.: ‘Deep Clustering for Unsupervised Learning of Visual Features’, in Editor (Ed.)^(Eds.): ‘Book Deep Clustering for Unsupervised Learning of Visual Features’ (2018, edn.), pp.
45 Cuturi, M.: ‘Sinkhorn Distances: Lightspeed Computation of Optimal Transport’, in Editor (Ed.)^(Eds.): ‘Book Sinkhorn Distances: Lightspeed Computation of Optimal Transport’ (2013, edn.), pp.
46 Hinton, G.E., Vinyals, O., and Dean, J.: ‘Distilling the Knowledge in a Neural Network’, ArXiv, 2015, abs/1503.02531
47 Chen, X., and He, K.: ‘Exploring Simple Siamese Representation Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 15745-15753
48 Gidaris, S., Bursuc, A., Puy, G., Komodakis, N., Cord, M., and Pérez, P.: ‘OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6826-6836
49 Ermolov, A., Siarohin, A., Sangineto, E., and Sebe, N.: ‘Whitening for Self-Supervised Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Whitening for Self-Supervised Representation Learning’ (2020, edn.), pp.
50 Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S.: ‘Barlow Twins: Self-Supervised Learning via Redundancy Reduction’, in Editor (Ed.)^(Eds.): ‘Book Barlow Twins: Self-Supervised Learning via Redundancy Reduction’ (2021, edn.), pp.
51 Radford, A., and Narasimhan, K.: ‘Improving Language Understanding by Generative Pre-Training’, in Editor (Ed.)^(Eds.): ‘Book Improving Language Understanding by Generative Pre-Training’ (2018, edn.), pp.
52 Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I.: ‘Language Models are Unsupervised Multitask Learners’, in Editor (Ed.)^(Eds.): ‘Book Language Models are Unsupervised Multitask Learners’ (2019, edn.), pp.
53 Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T.J., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D.: ‘Language Models are Few-Shot Learners’, ArXiv, 2020, abs/2005.14165
54 Chen, M., Radford, A., Wu, J., Jun, H., Dhariwal, P., Luan, D., and Sutskever, I.: ‘Generative Pretraining From Pixels’, in Editor (Ed.)^(Eds.): ‘Book Generative Pretraining From Pixels’ (2020, edn.), pp.
55 Bao, H., Dong, L., and Wei, F.: ‘BEiT: BERT Pre-Training of Image Transformers’, ArXiv, 2021, abs/2106.08254
56 Kingma, D.P., and Welling, M.: ‘Auto-Encoding Variational Bayes’, CoRR, 2013, abs/1312.6114
57 Sohl-Dickstein, J.N., Weiss, E.A., Maheswaranathan, N., and Ganguli, S.: ‘Deep Unsupervised Learning using Nonequilibrium Thermodynamics’, ArXiv, 2015, abs/1503.03585
58 Ho, J., Jain, A., and Abbeel, P.: ‘Denoising Diffusion Probabilistic Models’, ArXiv, 2020, abs/2006.11239
59 Chen, T., Kornblith, S., Swersky, K., Norouzi, M., and Hinton, G.E.: ‘Big self-supervised models are strong semi-supervised learners’, Advances in neural information processing systems, 2020, 33, pp. 22243-22255
60 Bardes, A., Ponce, J., and LeCun, Y.: ‘Vicreg: Variance-invariance-covariance regularization for self-supervised learning’, arXiv preprint arXiv:2105.04906, 2021
61 Bachman, P., Hjelm, R.D., and Buchwalter, W.: ‘Learning representations by maximizing mutual information across views’, Advances in neural information processing systems, 2019, 32
62 Misra, I., and Maaten, L.v.d.: ‘Self-supervised learning of pretext-invariant representations’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised learning of pretext-invariant representations’ (2020, edn.), pp. 6707-6717
63 He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R.: ‘Momentum contrast for unsupervised visual representation learning’, in Editor (Ed.)^(Eds.): ‘Book Momentum contrast for unsupervised visual representation learning’ (2020, edn.), pp. 9729-9738
64 Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., and Isola, P.: ‘What makes for good views for contrastive learning?’, Advances in Neural Information Processing Systems, 2020, 33, pp. 6827-6839
65 Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A.: ‘Unsupervised learning of visual features by contrasting cluster assignments’, Advances in Neural Information Processing Systems, 2020, 33, pp. 9912-9924
66 Chen, X., and He, K.: ‘Exploring simple siamese representation learning’, in Editor (Ed.)^(Eds.): ‘Book Exploring simple siamese representation learning’ (2021, edn.), pp. 15750-15758
67 Gidaris, S., Bursuc, A., Puy, G., Komodakis, N., Cord, M., and Pérez, P.: ‘Online bag-of-visual-words generation for unsupervised representation learning’, arXiv preprint arXiv:2012.11552, 2020
68 Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S.: ‘Barlow twins: Self-supervised learning via redundancy reduction’, in Editor (Ed.)^(Eds.): ‘Book Barlow twins: Self-supervised learning via redundancy reduction’ (PMLR, 2021, edn.), pp. 12310-12320
69 Putri, W.R., Liu, S.-H., Aslam, M.S., Li, Y.-H., Chang, C.-C., and Wang, J.-C.: ‘Self-Supervised Learning Framework toward State-of-the-Art Iris Image Segmentation’, Sensors, 2022, 22, (6), pp. 2133
70 Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., and Shah, R.: ‘Signature verification using a" siamese" time delay neural network’, Advances in neural information processing systems, 1993, 6
71 Chopra, S., Hadsell, R., and LeCun, Y.: ‘Learning a similarity metric discriminatively, with application to face verification’, in Editor (Ed.)^(Eds.): ‘Book Learning a similarity metric discriminatively, with application to face verification’ (IEEE, 2005, edn.), pp. 539-546
72 Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., and Bengio, Y.: ‘Learning deep representations by mutual information estimation and maximization’, arXiv preprint arXiv:1808.06670, 2018
73 Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., and Hu, H.: ‘Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning’, in Editor (Ed.)^(Eds.): ‘Book Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning’ (2021, edn.), pp. 16684-16693
74 Van Gansbeke, W., Vandenhende, S., Georgoulis, S., and Van Gool, L.: ‘Unsupervised semantic segmentation by contrasting object mask proposals’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised semantic segmentation by contrasting object mask proposals’ (2021, edn.), pp. 10052-10062
75 Wang, X., Zhang, R., Shen, C., Kong, T., and Li, L.: ‘Dense contrastive learning for self-supervised visual pre-training’, in Editor (Ed.)^(Eds.): ‘Book Dense contrastive learning for self-supervised visual pre-training’ (2021, edn.), pp. 3024-3033
76 Iizuka, S., Simo-Serra, E., and Ishikawa, H.: ‘Let there be color! Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification’, ACM Transactions on Graphics (ToG), 2016, 35, (4), pp. 1-11
77 Larsson, G., Maire, M., and Shakhnarovich, G.: ‘Colorization as a proxy task for visual understanding’, in Editor (Ed.)^(Eds.): ‘Book Colorization as a proxy task for visual understanding’ (2017, edn.), pp. 6874-6883
78 Zhang, R., Isola, P., and Efros, A.A.: ‘Colorful image colorization’, in Editor (Ed.)^(Eds.): ‘Book Colorful image colorization’ (Springer, 2016, edn.), pp. 649-666
79 Doersch, C., Gupta, A., and Efros, A.A.: ‘Unsupervised visual representation learning by context prediction’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised visual representation learning by context prediction’ (2015, edn.), pp. 1422-1430
80 Mundhenk, T.N., Ho, D., and Chen, B.Y.: ‘Improvements to context based self-supervised learning’, in Editor (Ed.)^(Eds.): ‘Book Improvements to context based self-supervised learning’ (2018, edn.), pp. 9339-9348
81 Noroozi, M., and Favaro, P.: ‘Unsupervised learning of visual representations by solving jigsaw puzzles’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised learning of visual representations by solving jigsaw puzzles’ (Springer, 2016, edn.), pp. 69-84
82 Noroozi, M., Vinjimoor, A., Favaro, P., and Pirsiavash, H.: ‘Boosting self-supervised learning via knowledge transfer’, in Editor (Ed.)^(Eds.): ‘Book Boosting self-supervised learning via knowledge transfer’ (2018, edn.), pp. 9359-9367
83 Ren, Z., and Lee, Y.J.: ‘Cross-domain self-supervised multi-task feature learning using synthetic imagery’, in Editor (Ed.)^(Eds.): ‘Book Cross-domain self-supervised multi-task feature learning using synthetic imagery’ (2018, edn.), pp. 762-771
84 Asano, Y., Patrick, M., Rupprecht, C., and Vedaldi, A.: ‘Labelling unlabelled videos from scratch with multi-modal self-supervision’, Advances in Neural Information Processing Systems, 2020, 33, pp. 4660-4671
85 Caron, M., Bojanowski, P., Joulin, A., and Douze, M.: ‘Deep clustering for unsupervised learning of visual features’, in Editor (Ed.)^(Eds.): ‘Book Deep clustering for unsupervised learning of visual features’ (2018, edn.), pp. 132-149
86 Yan, X., Misra, I., Gupta, A., Ghadiyaram, D., and Mahajan, D.: ‘Clusterfit: Improving generalization of visual representations’, in Editor (Ed.)^(Eds.): ‘Book Clusterfit: Improving generalization of visual representations’ (2020, edn.), pp. 6509-6518
87 Bojanowski, P., and Joulin, A.: ‘Unsupervised learning by predicting noise’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised learning by predicting noise’ (PMLR, 2017, edn.), pp. 517-526
88 Jenni, S., and Favaro, P.: ‘Self-supervised feature learning by learning to spot artifacts’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised feature learning by learning to spot artifacts’ (2018, edn.), pp. 2733-2742
89 Donahue, J., Krähenbühl, P., and Darrell, T.: ‘Adversarial feature learning’, arXiv preprint arXiv:1605.09782, 2016
90 Donahue, J., and Simonyan, K.: ‘Large scale adversarial representation learning’, Advances in neural information processing systems, 2019, 32
91 Mahendran, A., Thewlis, J., and Vedaldi, A.: ‘Cross pixel optical-flow similarity for self-supervised learning’, in Editor (Ed.)^(Eds.): ‘Book Cross pixel optical-flow similarity for self-supervised learning’ (Springer, 2018, edn.), pp. 99-116
92 Zhan, X., Pan, X., Liu, Z., Lin, D., and Loy, C.C.: ‘Self-supervised learning via conditional motion propagation’, in Editor (Ed.)^(Eds.): ‘Book Self-supervised learning via conditional motion propagation’ (2019, edn.), pp. 1881-1889
93 Noroozi, M., Pirsiavash, H., and Favaro, P.: ‘Representation learning by learning to count’, in Editor (Ed.)^(Eds.): ‘Book Representation learning by learning to count’ (2017, edn.), pp. 5898-5906
94 Gidaris, S., Singh, P., and Komodakis, N.: ‘Unsupervised representation learning by predicting image rotations’, arXiv preprint arXiv:1803.07728, 2018
95 Zhang, L., Qi, G.-J., Wang, L., and Luo, J.: ‘Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data’, in Editor (Ed.)^(Eds.): ‘Book Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data’ (2019, edn.), pp. 2547-2555
96 Chaitanya, K., Erdil, E., Karani, N., and Konukoglu, E.: ‘Contrastive learning of global and local features for medical image segmentation with limited annotations’, Advances in Neural Information Processing Systems, 2020, 33, pp. 12546-12558
97 Hadsell, R., Chopra, S., and LeCun, Y.: ‘Dimensionality reduction by learning an invariant mapping’, in Editor (Ed.)^(Eds.): ‘Book Dimensionality reduction by learning an invariant mapping’ (IEEE, 2006, edn.), pp. 1735-1742
98 Li, J., Zhou, P., Xiong, C., and Hoi, S.C.: ‘Prototypical contrastive learning of unsupervised representations’, arXiv preprint arXiv:2005.04966, 2020
99 Tian, Y., Krishnan, D., and Isola, P.: ‘Contrastive multiview coding’, in Editor (Ed.)^(Eds.): ‘Book Contrastive multiview coding’ (Springer, 2020, edn.), pp. 776-794
100 Wu, Z., Xiong, Y., Yu, S.X., and Lin, D.: ‘Unsupervised feature learning via non-parametric instance discrimination’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised feature learning via non-parametric instance discrimination’ (2018, edn.), pp. 3733-3742
101 Ye, M., Zhang, X., Yuen, P.C., and Chang, S.-F.: ‘Unsupervised embedding learning via invariant and spreading instance feature’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised embedding learning via invariant and spreading instance feature’ (2019, edn.), pp. 6210-6219
102 Zhan, X., Liu, Z., Luo, P., Tang, X., and Loy, C.: ‘Mix-and-match tuning for self-supervised semantic segmentation’, in Editor (Ed.)^(Eds.): ‘Book Mix-and-match tuning for self-supervised semantic segmentation’ (2018, edn.), pp.
103 Oord, A.v.d., Li, Y., and Vinyals, O.: ‘Representation learning with contrastive predictive coding’, arXiv preprint arXiv:1807.03748, 2018
104 Chen, X., Fan, H., Girshick, R., and He, K.: ‘Improved baselines with momentum contrastive learning’, arXiv preprint arXiv:2003.04297, 2020
105 Henaff, O.: ‘Data-efficient image recognition with contrastive predictive coding’, in Editor (Ed.)^(Eds.): ‘Book Data-efficient image recognition with contrastive predictive coding’ (PMLR, 2020, edn.), pp. 4182-4192
106 Zhuang, C., Zhai, A.L., and Yamins, D.: ‘Local aggregation for unsupervised learning of visual embeddings’, in Editor (Ed.)^(Eds.): ‘Book Local aggregation for unsupervised learning of visual embeddings’ (2019, edn.), pp. 6002-6012
107 Cao, Y., Xie, Z., Liu, B., Lin, Y., Zhang, Z., and Hu, H.: ‘Parametric instance classification for unsupervised visual feature learning’, Advances in neural information processing systems, 2020, 33, pp. 15614-15624
108 Ioffe, S., and Szegedy, C.: ‘Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift’, ArXiv, 2015, abs/1502.03167
109 Nair, V., and Hinton, G.E.: ‘Rectified Linear Units Improve Restricted Boltzmann Machines’, in Editor (Ed.)^(Eds.): ‘Book Rectified Linear Units Improve Restricted Boltzmann Machines’ (2010, edn.), pp.
110 Nguyen, D.T., Dax, M., Mummadi, C.K., Ngo, T.-P.-N., Nguyen, T.H.P., Lou, Z., and Brox, T.: ‘DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision’, in Editor (Ed.)^(Eds.): ‘Book DeepUSPS: Deep Robust Unsupervised Saliency Prediction With Self-Supervision’ (2019, edn.), pp.
111 Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.E., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A.: ‘Going deeper with convolutions’, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1-9
112 Zhang, S., Liew, J.H., Wei, Y., Wei, S., and Zhao, Y.: ‘Interactive Object Segmentation With Inside-Outside Guidance’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12231-12241
113 You, Y., Gitman, I., and Ginsburg, B.: ‘Scaling SGD Batch Size to 32K for ImageNet Training’, ArXiv, 2017, abs/1708.03888
114 Loshchilov, I., and Hutter, F.: ‘SGDR: Stochastic Gradient Descent with Warm Restarts’, arXiv: Learning, 2017
115 Goyal, P., Doll·r, P., Girshick, R.B., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K.: ‘Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour’, ArXiv, 2017, abs/1706.02677
116 Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., and Zisserman, A.: ‘The Pascal Visual Object Classes (VOC) Challenge’, International Journal of Computer Vision, 2009, 88, pp. 303-338
117 Ren, S., He, K., Girshick, R.B., and Sun, J.: ‘Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39, pp. 1137-1149
118 Lin, T.-Y., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Doll·r, P., and Zitnick, C.L.: ‘Microsoft COCO: Common Objects in Context’, in Editor (Ed.)^(Eds.): ‘Book Microsoft COCO: Common Objects in Context’ (2014, edn.), pp.
119 He, K., Gkioxari, G., Doll·r, P., and Girshick, R.B.: ‘Mask R-CNN’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42, pp. 386-397
120 Bossard, L., Guillaumin, M., and Gool, L.V.: ‘Food-101 - Mining Discriminative Components with Random Forests’, in Editor (Ed.)^(Eds.): ‘Book Food-101 - Mining Discriminative Components with Random Forests’ (2014, edn.), pp.
121 Krizhevsky, A.: ‘Learning Multiple Layers of Features from Tiny Images’, in Editor (Ed.)^(Eds.): ‘Book Learning Multiple Layers of Features from Tiny Images’ (2009, edn.), pp.
122 Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., and Torralba, A.: ‘SUN database: Large-scale scene recognition from abbey to zoo’, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 3485-3492
123 Krause, J., Stark, M., Deng, J., and Fei-Fei, L.: ‘3D Object Representations for Fine-Grained Categorization’, 2013 IEEE International Conference on Computer Vision Workshops, 2013, pp. 554-561
124 Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., and Vedaldi, A.: ‘Describing Textures in the Wild’, 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3606-3613
125 Shu, Y., Kou, Z., Cao, Z., Wang, J., and Long, M.: ‘Zoo-Tuning: Adaptive Transfer from a Zoo of Models’, ArXiv, 2021, abs/2106.15434
126 Yang, Q., Zhang, Y., Dai, W., and Pan, S.J.: ‘Transfer learning’ (Cambridge University Press, 2020. 2020)
127 You, K., Kou, Z., Long, M., and Wang, J.: ‘Co-Tuning for Transfer Learning’, in Editor (Ed.)^(Eds.): ‘Book Co-Tuning for Transfer Learning’ (2020, edn.), pp.
128 Misra, I., Shrivastava, A., Gupta, A., and Hebert, M.: ‘Cross-Stitch Networks for Multi-task Learning’, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3994-4003
129 Li, X., Xiong, H., Xu, C., and Dou, D.: ‘SMILE: Self-Distilled MIxup for Efficient Transfer LEarning’, ArXiv, 2021, abs/2103.13941
130 Tishby, N., and Zaslavsky, N.: ‘Deep learning and the information bottleneck principle’, 2015 IEEE Information Theory Workshop (ITW), 2015, pp. 1-5
131 Shwartz-Ziv, R., and Tishby, N.: ‘Opening the Black Box of Deep Neural Networks via Information’, ArXiv, 2017, abs/1703.00810
132 Amjad, R.A., and Geiger, B.C.: ‘Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42, pp. 2225-2239
133 Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.E.: ‘A Simple Framework for Contrastive Learning of Visual Representations’, ArXiv, 2020, abs/2002.05709
134 Misra, I., and Maaten, L.v.d.: ‘Self-Supervised Learning of Pretext-Invariant Representations’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6706-6716
135 Ermolov, A., Siarohin, A., Sangineto, E., and Sebe, N.: ‘Whitening for Self-Supervised Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Whitening for Self-Supervised Representation Learning’ (2021, edn.), pp.
136 Caron, M., Touvron, H., Misra, I., J′egou, H.e., Mairal, J., Bojanowski, P., and Joulin, A.: ‘Emerging Properties in Self-Supervised Vision Transformers’, ArXiv, 2021, abs/2104.14294
137 Hayhoe, M.M., and Ballard, D.H.: ‘Eye movements in natural behavior’, Trends in Cognitive Sciences, 2005, 9, pp. 188-194
138 BorjiAli, SihiteDicky, N., and IttiLaurent: ‘Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling’, IEEE Transactions on Image Processing, 2013
139 Benois-Pineau, J., and Callet, P.L.: ‘Visual Content Indexing and Retrieval with Psycho-Visual Models’, in Editor (Ed.)^(Eds.): ‘Book Visual Content Indexing and Retrieval with Psycho-Visual Models’ (2017, edn.), pp.
140 Awh, E., Armstrong, K.M., and Moore, T.: ‘Visual and oculomotor selection: links, causes and implications for spatial attention’, Trends in Cognitive Sciences, 2006, 10, pp. 124-130
141 Tian, Y., Chen, X., and Ganguli, S.: ‘Understanding self-supervised Learning Dynamics without Contrastive Pairs’, ArXiv, 2021, abs/2102.06810
142 Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A.: ‘Extracting and composing robust features with denoising autoencoders’, in Editor (Ed.)^(Eds.): ‘Book Extracting and composing robust features with denoising autoencoders’ (2008, edn.), pp.
143 Bojanowski, P., and Joulin, A.: ‘Unsupervised Learning by Predicting Noise’, ArXiv, 2017, abs/1704.05310
144 Noroozi, M., and Favaro, P.: ‘Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’, in Editor (Ed.)^(Eds.): ‘Book Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles’ (2016, edn.), pp.
145 Zhang, R., Isola, P., and Efros, A.A.: ‘Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 645-654
146 Mundhenk, T.N., Ho, D., and Chen, B.Y.: ‘Improvements to Context Based Self-Supervised Learning’, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 9339-9348
147 Donahue, J., and Simonyan, K.: ‘Large Scale Adversarial Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Large Scale Adversarial Representation Learning’ (2019, edn.), pp.
148 Bansal, V., Buckchash, H., and Raman, B.: ‘Discriminative Auto-Encoding for Classification and Representation Learning Problems’, IEEE Signal Processing Letters, 2021, 28, pp. 987-991
149 Chen, T., Kornblith, S., Swersky, K., Norouzi, M., and Hinton, G.E.: ‘Big Self-Supervised Models are Strong Semi-Supervised Learners’, ArXiv, 2020, abs/2006.10029
150 Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D., and Makedon, F.: ‘A Survey on Contrastive Self-supervised Learning’, ArXiv, 2020, abs/2011.00362
151 He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R.B.: ‘Momentum Contrast for Unsupervised Visual Representation Learning’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9726-9735
152 Zhang, X., and Maire, M.: ‘Self-Supervised Visual Representation Learning from Hierarchical Grouping’, ArXiv, 2020, abs/2012.03044
153 Jiang, H., Yuan, Z., Cheng, M.-M., Gong, Y., Zheng, N., and Wang, J.: ‘Salient Object Detection: A Discriminative Regional Feature Integration Approach’, International Journal of Computer Vision, 2013, 123, pp. 251-268
154 Kolesnikov, A., Zhai, X., and Beyer, L.: ‘Revisiting Self-Supervised Visual Representation Learning’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1920-1929
155 Ye, M., Zhang, X., Yuen, P., and Chang, S.-F.: ‘Unsupervised Embedding Learning via Invariant and Spreading Instance Feature’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6203-6212
156 Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Trischler, A., and Bengio, Y.: ‘Learning deep representations by mutual information estimation and maximization’, ArXiv, 2019, abs/1808.06670
157 Kornblith, S., Shlens, J., and Le, Q.V.: ‘Do Better ImageNet Models Transfer Better?’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2656-2666
158 Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., and Valko, M.: ‘Bootstrap your own latent a new approach to self-supervised learning’. Proc. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada2020 pp. Pages
159 Chen, X., and He, K.: ‘Exploring Simple Siamese Representation Learning’, in Editor (Ed.)^(Eds.): ‘Book Exploring Simple Siamese Representation Learning’ (2021, edn.), pp.
160 Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., and Hu, H.: ‘Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16679-16688
161 Chen, X., Fan, H., Girshick, R.B., and He, K.: ‘Improved Baselines with Momentum Contrastive Learning’, ArXiv, 2020, abs/2003.04297
162 HÈnaff, O.J., Srinivas, A., Fauw, J.D., Razavi, A., Doersch, C., Eslami, S.M.A., and Oord, A.r.v.d.: ‘Data-Efficient Image Recognition with Contrastive Predictive Coding’, ArXiv, 2020, abs/1905.09272
163 Borji, A., Cheng, M.-M., Jiang, H., and Li, J.: ‘Salient Object Detection: A Benchmark’, IEEE Transactions on Image Processing, 2015, 24, pp. 5706-5722
164 Wang, W., Lai, Q., Fu, H., Shen, J., and Ling, H.: ‘Salient Object Detection in the Deep Learning Era: An In-Depth Survey’, IEEE transactions on pattern analysis and machine intelligence, 2021, PP
165 Zou, W., and Komodakis, N.: ‘HARF: Hierarchy-Associated Rich Features for Salient Object Detection’, 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 406-414
166 Zhang, J., Zhang, T., Dai, Y., Harandi, M., and Hartley, R.I.: ‘Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective’, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 9029-9038
167 Van Gansbeke, W., Vandenhende, S., Georgoulis, S., and Gool, L.V.: ‘Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals’, ArXiv, 2021, abs/2102.06191
168 Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.: ‘A Simple Framework for Contrastive Learning of Visual Representations’. Proc. Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research2020 pp. Pages
169 Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A.: ‘Unsupervised learning of visual features by contrasting cluster assignments’. Proc. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada2020 pp. Pages
170 Zhao, Z., Zhang, Z., Chen, T., Singh, S., and Zhang, H.: ‘Image Augmentations for GAN Training’, ArXiv, 2020, abs/2006.02595
171 Howard, A.G.: ‘Some Improvements on Deep Convolutional Neural Network Based Image Classification’, CoRR, 2014, abs/1312.5402
172 Cubuk, E.D., Zoph, B., ManÈ, D., Vasudevan, V., and Le, Q.V.: ‘AutoAugment: Learning Augmentation Strategies From Data’, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 113-123
173 Cubuk, E.D., Zoph, B., Shlens, J., and Le, Q.V.: ‘Randaugment: Practical automated data augmentation with a reduced search space’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 3008-3017
174 Lim, S., Kim, I., Kim, T., Kim, C., and Kim, S.: ‘Fast AutoAugment’, in Editor (Ed.)^(Eds.): ‘Book Fast AutoAugment’ (2019, edn.), pp.
175 Caron, M., Bojanowski, P., Joulin, A., and Douze, M.: ‘Deep Clustering for Unsupervised Learning of Visual Features’, in Editor (Ed.)^(Eds.): ‘Book Deep Clustering for Unsupervised Learning of Visual Features’ (2018, edn.), pp.
176 Richemond, P.H., Grill, J.-B., Altché, F., Tallec, C., Strub, F., Brock, A., Smith, S., De, S., Pascanu, R., and Piot, B.: ‘BYOL works even without batch statistics’, arXiv preprint arXiv:2010.10241, 2020
177 Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., and Hu, H.: ‘SimMIM: a Simple Framework for Masked Image Modeling’, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 9643-9653
178 Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A.L., and Kong, T.: ‘iBOT: Image BERT Pre-Training with Online Tokenizer’, ArXiv, 2021, abs/2111.07832
179 Oquab, M., Darcet, T.e., Moutakanni, T., Vo, H.Q., Szafraniec, M., Khalidov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.-Y., Li, S.-W., Misra, I., Rabbat, M.G., Sharma, V., Synnaeve, G., Xu, H., Jégou, H., Mairal, J., Labatut, P., Joulin, A., and Bojanowski, P.: ‘DINOv2: Learning Robust Visual Features without Supervision’, ArXiv, 2023, abs/2304.07193
180 Tran, V.-N., Huang, C.-E., Liu, S., Yang, K.-L., Ko, T., and Li, Y.-h.: ‘Multi-Augmentation for Efficient Self-Supervised Visual Representation Learning’, 2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2022, pp. 1-4
181 Krizhevsky, A., Sutskever, I., and Hinton, G.E.: ‘ImageNet classification with deep convolutional neural networks’, Communications of the ACM, 2012, 60, pp. 84 - 90
182 Touvron, H., Vedaldi, A., Douze, M., and Jégou, H.: ‘Fixing the train-test resolution discrepancy’, Advances in neural information processing systems, 2019, 32
183 Jones, D.R.: ‘A Taxonomy of Global Optimization Methods Based on Response Surfaces’, Journal of Global Optimization, 2001, 21, pp. 345-383
184 Reed, C., Metzger, S., Srinivas, A., Darrell, T., and Keutzer, K.: ‘SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning’, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2673-2682
185 Radosavovic, I., Kosaraju, R.P., Girshick, R.B., He, K., and Dollár, P.: ‘Designing Network Design Spaces’, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10425-10433
186 Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N.: ‘An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale’, ArXiv, 2021, abs/2010.11929
187 Salimans, T., and Kingma, D.P.: ‘Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks’, in Editor (Ed.)^(Eds.): ‘Book Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks’ (2016, edn.), pp.
188 Loshchilov, I., and Hutter, F.: ‘Fixing Weight Decay Regularization in Adam’, ArXiv, 2017, abs/1711.05101
189 Chen, X., Xie, S., and He, K.: ‘An Empirical Study of Training Self-Supervised Vision Transformers’, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9620-9629
190 Lin, T.-Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J.: ‘Feature Pyramid Networks for Object Detection’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936-944
191 url{https://github.com/facebookresearch/detectron2, accessed 2023/11/24 2023
192 Lin, T.-Y., Dollár, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J.: ‘Feature Pyramid Networks for Object Detection’, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 936-944
193 url{https://github.com/facebookresearch/detectron, accessed 2023/11/25 2023
194 Li, Y., Mao, H., Girshick, R.B., and He, K.: ‘Exploring Plain Vision Transformer Backbones for Object Detection’, ArXiv, 2022, abs/2203.16527
195 Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., and Gool, L.V.: ‘The 2017 DAVIS Challenge on Video Object Segmentation’, ArXiv, 2017, abs/1704.00675
196 Jabri, A., Owens, A., and Efros, A.A.: ‘Space-Time Correspondence as a Contrastive Random Walk’, ArXiv, 2020, abs/2006.14613
197 Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., and Batra, D.: ‘Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization’, International Journal of Computer Vision, 2017, 128, pp. 336-359 |