|| Alex Nichol and John Schulman. (2018). Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999.|
 Andrei A Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. (2018). Meta-learning with latent embedding optimization. arXiv:1807.05960.
 Antreas Antoniou, Harrison Edwards, and Amos Storkey. (2019). How to train your MAM. In Proceedings of the International Conference on Learning Representations.
 Bengio, Samy, Bengio, Yoshua, Cloutier, Jocelyn, and Gecsei, Jan. (1992). On the optimization of a synaptic learning rule. In Optimality in Artificial and Biological Neural Networks. pp. 6–8.
 Boris Oreshkin, Pau Rodr´ıguez L´opez, and Alexandre Lacoste. (2018). Tadam: Task dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems.
 Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. (2015). Human-level concept learning through probabilistic program induction. Science, pp. 350(6266):1332–1338.
 Chelsea Finn, Pieter Abbeel, and Sergey Levine. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org,, pp. pp. 1126–1135.
 Deleu, T., Würfl, T., Samiei, M., Cohen, J. P., and Bengio, Y. (2019). Torchmeta: A Meta-Learning library for PyTorch.
 Edwards, Harrison and Storkey, Amos. (2017). Towards a neural statistician. International Conference on Learning Representations (ICLR).
 French., R. M. (1991). Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. In Proceedings of the 13th Annual Cognitive Science Society Conference, pp. pp. 173–178. Erlbaum.
 Gail A Carpenter and Stephen Grossberg. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer vision, graphics, and image processing, 37(1): 54–115.
 Guneet Singh Dhillon, Pratik Chaudhari, Avinash Ravichandran, and Stefano Soatto. (2020). A baseline for few-shot image classification. In ICLR.
 H. Shin, J. K. Lee, J. Kim, and J. Kim. (2017). Continual learning with deep generative replay. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, , pp. pages 2990–2999.
 He, X., Sygnowski, J., Galashov, A., Rusu, A. A., Teh, Y. W., and Pascanu, R. (2019). Task agnostic continual learning via meta learning. arXiv preprint arXiv:1906.05201.
 Jake Snell, Kevin Swersky, and Richard Zemel. (2017). Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems, pp. pp. 4077–4087.
 James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. (2017). Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, p. pp. 201611835.
 Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. (2014). How transferable are features in deep neural networks? Advances in neural information processing systems, pp. pages 3320–3328.
 Koch, Gregory. (2015). Siamese neural networks for one-shot image recognition. ICML Deep Learning Workshop.
 Lopez-Paz, D. and Ranzato, M. (2017). Gradient Episodic Memory for Continuum Learning. ArXiv.
 Lukasz, N. O. (2017). Learning to remember rare events. International Conference on Learning Representations (ICLR).
 Munkhdalai, Tsendsuren and Yu, Hong. (2017). Meta networks. International Conferecence on Machine Learning (ICML).
 Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al. (2016). Matching networks for oneshot learning. Advances in Neural Information Processing Systems, pp. pp. 3630–3638.
 R. Vuorio, D.-Y. Cho, D. Kim, and J. Kim. (2018). Meta continual learning. arXiv preprint arXiv:1806.06928.
 Sachin Ravi and Hugo Larochelle. (2016). Optimization as a model for few-shot learning.
 Schmidhuber, Jurgen. (1987). Evolutionary principles in selfreferential learning. On learning how to learn: The meta-meta-... hook.). Diploma thesis, Institut f. Informatik, Tech. Univ. Munich.
 Spyros Gidaris and Nikos Komodakis. (2018). Dynamic few-shot visual learning without forgetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4367–4375.
 Sumit Chopra, Raia Hadsell, Yann LeCun, et al. . (2005). Learning a similarity metric discriminatively, with application to face verification. CVPR , pp. pp. 539–546.
 Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, and Christoph H Lampert. (2017). icarl: Incremental classifier and representation learning. CVPR.
 Thrun, Sebastian. (1998). Lifelong learning algorithms. In Learning to learn. pp. pp. 181–209 Springer.
 Utgoff, P. E. (1986). Shift of bias for inductive concept learning. Machine learning: An artificial intelligence approach. pp. 2:107–148.
 W. C. Abraham and A. Robins. Memory retention–the synaptic stability versus plasticity dilemma. Trends in neurosciences, 28(2):73–78, 2005.
 Y. Tu, , and G. Tesauro. (2019). Learning to learn without forgetting by maximizing transfer and minimizing interference. International Conference on Learning Representations.
 Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. (2017). Meta-sgd: Learning to learn quickly for few shot learning. CoRR, abs/1707.09835.