參考文獻 |
[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” Nature, vol. 323, no. 6088, p. 533, 1986.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, O. Vinyals, A. Graves, D. Silver,
and K. Kavukcuoglu, “Decoupled neural interfaces using synthetic gradients,” in
Proceedings of the 34th International Conference on Machine Learning-Volume 70,
pp. 1627–1635, JMLR. org, 2017.
[3] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient flow in recurrent
nets: The difficulty of learning long-term dependencies,” in A Field Guide to
Dynamical Recurrent Neural Networks (S. C. Kremer and J. F. Kolen, eds.), IEEE
Press, 2001.
[4] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in
Proceedings of the fourteenth International Conference on Artificial Intelligence and
Statistics, pp. 315–323, 2011.
[5] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation,
vol. 9, no. 8, pp. 1735–1780, 1997.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 770–778, 2016.
[7] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training
by reducing internal covariate shift,” in Proceedings of the 32Nd International Conference
on International Conference on Machine Learning - Volume 37, ICML’15,
pp. 448–456, JMLR.org, 2015.
[8] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent
neural networks,” in International Conference on Machine Learning, pp. 1310–1318,
2013.
[9] F. Crick, “The recent excitement about neural networks.,” Nature, vol. 337, no. 6203,
pp. 129–132, 1989.
[10] D. Balduzzi, H. Vanchinathan, and J. M. Buhmann, “Kickback cuts backprop’s
red-tape: Biologically plausible credit assignment in neural networks.,” in AAAI,
pp. 485–491, 2015.
[11] T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman, “Random synaptic
feedback weights support error backpropagation for deep learning,” Nature Communications,
vol. 7, p. 13276, 2016.
[12] A. Nøkland, “Direct feedback alignment provides learning in deep neural networks,”
in Advances in Neural Information Processing Systems, pp. 1037–1045, 2016.
[13] A. G. Ororbia, A. Mali, D. Kifer, and C. L. Giles, “Conducting credit assignment by
aligning local representations,” arXiv preprint arXiv:1803.01834, 2018.
[14] A. G. Ororbia and A. Mali, “Biologically motivated algorithms for propagating local
target representations,” arXiv preprint arXiv:1805.11703, 2018.
[15] S. Bartunov, A. Santoro, B. Richards, L. Marris, G. E. Hinton, and T. Lillicrap,
“Assessing the scalability of biologically-motivated deep learning algorithms and architectures,”
in Advances in Neural Information Processing Systems, pp. 9390–9400,
2018.
[16] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals.,” in
ICML, vol. 97 of Proceedings of Machine Learning Research, pp. 4839–4850, PMLR,
2019.
[17] Y. Bengio, “How auto-encoders could provide credit assignment in deep networks
via target propagation,” arXiv preprint arXiv:1407.7906, 2014.
[18] D.-H. Lee, S. Zhang, A. Fischer, and Y. Bengio, “Difference target propagation,”
in Joint European Conference on Machine Learning and Knowledge Discovery in
Databases, pp. 498–515, Springer, 2015.
[19] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine
Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition.,” in ICLR (Y. Bengio and Y. LeCun, eds.), 2015.
[21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998.
[22] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,”
tech. rep., Citeseer, 2009.
[23] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in
Advances in Neural Information Processing Systems, pp. 3856–3866, 2017.
[24] M. Michael and W.-C. Lin, “Experimental study of information measure and interintra
class distance ratios on feature selection and orderings,” IEEE Transactions on
Systems, Man, and Cybernetics, no. 2, pp. 172–181, 1973.
[25] Y. Luo, Y. Wong, M. Kankanhalli, and Q. Zhao, “G-softmax: Improving intraclass
compactness and interclass separability of features,” IEEE Transactions on Neural
Networks and Learning Systems, 2019.
[26] Y. Bengio, D.-H. Lee, J. Bornschein, T. Mesnard, and Z. Lin, “Towards biologically
plausible deep learning,” arXiv preprint arXiv:1502.04156, 2015.
[27] G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein, “Training
neural networks without gradients: A scalable admm approach,” in International
Conference on Machine Learning, pp. 2722–2731, 2016.
[28] Z. Huo, B. Gu, Q. Yang, and H. Huang, “Decoupled parallel backpropagation with
convergence guarantee.,” in ICML, vol. 80 of Proceedings of Machine Learning Research,
pp. 2103–2111, PMLR, 2018.
[29] Z. Huo, B. Gu, and H. Huang, “Training neural networks using features replay,” in
Advances in Neural Information Processing Systems, pp. 6659–6668, 2018.
[30] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using
local errors,” Frontiers in Neuroscience, vol. 12, p. 608, 2018.
[31] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix
factorization,” Nature, vol. 401, no. 6755, p. 788, 1999.
[32] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: transfer
learning from unlabeled data,” in Proceedings of the 24th International Conference
on Machine Learning, pp. 759–766, ACM, 2007.
[33] A. Coates and A. Y. Ng, “Selecting receptive fields in deep networks,” in Advances
in Neural Information Processing Systems, pp. 2528–2536, 2011.
[34] P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in Proceedings
of ICML Workshop on Unsupervised and Transfer Learning, pp. 37–49, 2012. |