參考文獻 |
[1] S. Hochreiter, “The vanishing gradient problem during learning recurrent neural nets and problem solutions,” International Journal of ncertainty, Fuzziness and Knowledge-Based Systems, vol. 06, no. 02, pp. 107–116, 1998.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, et al., “Decoupled neural interfaces using synthetic gradients,” in International conference on achine learning, PMLR, 2017, pp. 1627–1635.
[3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[4] C. Szegedy, W. Liu, Y. Jia, et al., “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern ecognition, 2015, pp. 1–9.
[5] Y.-W. Kao and H.-H. Chen, “Associated learning: Decomposing end-to-end backpropagation based on autoencoders and target propagation,” Neural Computation, vol. 33, no. 1, pp. 174–193, 2021.
[6] D. Y. Wu, D. Lin, V. Chen, and H.-H. Chen, “Associated learning: An alternative to end-to-end backpropagation that works on cnn, rnn, and ransformer,” in International Conference on Learning Representations, 2021.
[7] C.-K. Wang, “Decomposing end-to-end backpropagation based on scpl,” 碩士論文,國立中央大學軟體工程研究所,2022.
[8] M.-Y. Ho, “Realizing synchronized parameter updating, dynamic layer accumulation, and forward shortcuts in supervised contrastive parallel learning,” 碩士論文,國立中央大學資訊工程學系,2023.
[9] T.-H. Lin, “Enabling simultaneous parameter updates in different layers for a neural network —using associated learning and pipeline,” 碩士論文,國立中央大學資訊工程學系,2023.
[10] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals,” in International conference on machine learning, PMLR, 2019, pp. 4839–4850.
[11] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” in Artificial intelligence and statistics, Pmlr, 2015, pp. 562–570.
[12] S. A. Siddiqui, D. Krueger, Y. LeCun, and S. Deny, “Blockwise self-supervised learning at scale,” arXiv preprint arXiv:2302.01647, 2023.
[13] P. Khosla, P. Teterwak, C. Wang, et al., “Supervised contrastive learning,” Advances in neural information processing systems, vol. 33, pp. 18661–18673, 2020.
[14] S. Ozsoy, S. Hamdan, S. Arik, D. Yuret, and A. Erdogan, “Self-supervised learning with an information maximization criterion,” Advances in Neural Information Processing Systems, vol. 35, pp. 35240–35253, 2022.
[15] L. Jing, P. Vincent, Y. LeCun, and Y. Tian, “Understanding dimensional collapse in contrastive self-supervised learning,” arXiv preprint arXiv:2110.09348, 2021.
[16] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 15750–15758.
[17] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738.
[18] X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020.
[19] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, PMLR, 2020, pp. 1597–1607.
[20] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” in International conference on machine learning, PMLR, 2021, pp. 12310–12320.
[21] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
[22] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, PMLR, 2020, pp. 1597–1607.
[23] R. Linsker, “Self-organization in a perceptual network,” Computer, vol. 21, no. 3, pp. 105–117, 1988.
[24] T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). USA: Wiley-Interscience, 2006, isbn: 0471241954.
[25] M. Tschannen, J. Djolonga, P. K. Rubenstein, S. Gelly, and M. Lucic, “On mutual information maximization for representation learning,” arXiv preprint arXiv:1907.13625, 2019.
[26] A. Bardes, J. Ponce, and Y. LeCun, “Vicreg: Variance-invariance-covariance regularization for self-supervised learning,” arXiv preprint arXiv:2105.04906, 2021.
[27] S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Branchynet: Fast inference via early exiting from deep neural networks,” in 2016 23rd
international conference on pattern recognition (ICPR), IEEE, 2016, pp. 2464–2469.
[28] M. Elbayad, J. Gu, E. Grave, and M. Auli, “Depth-adaptive transformer,” arXiv preprint arXiv:1910.10073, 2019.
[29] H. Li, H. Zhang, X. Qi, R. Yang, and G. Huang, “Improved techniques for training adaptive deep networks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1891–1900.
[30] W. Zhou, C. Xu, T. Ge, J. McAuley, K. Xu, and F. Wei, “Bert loses patience: Fast and robust inference with early exit,” Advances in Neural Information Processing Systems, vol. 33, pp. 18330–18341, 2020.
[31] J. Xin, R. Tang, Y. Yu, and J. Lin, “Berxit: Early exiting for bert with better finetuning and extension to regression,” in Proceedings of the 16th conference of the European chapter of the association for computational linguistics: Main Volume, 2021, pp. 91–104.
[32] Z. Fei, X. Yan, S. Wang, and Q. Tian, “Deecap: Dynamic early exiting for efficient image captioning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12216–12226.
[33] S. Tang, Y. Wang, Z. Kong, et al., “You need multiple exiting: Dynamic early exiting for accelerating unified vision language model,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10781–10791.
[34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[35] Y. You, I. Gitman, and B. Ginsburg, “Large batch training of convolutional networks,” arXiv preprint arXiv:1708.03888, 2017.
[36] A. Krizhevsky, “One weird trick for parallelizing convolutional neural networks,” arXiv preprint arXiv:1404.5997, 2014.
[37] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pmlr, 2015, pp. 448–456.
[38] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, pp. 1735–1780, 8 1997.
[39] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, et al., Eds., vol. 30, Curran Associates, Inc., 2017.
[40] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
|