關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：12

、訪客IP：18.227.48.119

姓名

高聿緯(Yu-Wei Kao) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法
(Associated Learning: Decomposing End-to-end Backpropagation based on Auto-encoders and Target Propagation)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 動態多模型融合分析研究	★ 擴展點擊流：分析點擊流中缺少的使用者行為
★ 融合多模型排序之點擊預測模型	★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為
★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量	★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機
★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架	★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究
★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測	★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路
★ 使用圖神經網路偵測 PTT 的低活躍異常帳號	★ 針對個別使用者從其少量趨勢線樣本生成個人化趨勢線
★ 基於雙變量及多變量貝他分布的兩個新型機率分群模型	★ 一種可同時更新神經網路各層網路參數的新技術— 採用關聯式學習及管路化機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

倒傳遞演算法已被廣泛的運用在深度學習上，但因為有傳遞鎖與梯度
消失、爆炸的問題，它不是有效率且穩定的演算法，在較深的網路架構
更可以觀察到這些現象。此外，單單只用一個目標來更新神經網路中的
參數在生物學來說並非合理的。
在本篇論文中，我們提出了一種新穎且受生物學啟發的學習架構，名
為「關聯式學習」，這個訓練方式將原有的神經網路模組化成小單元，每
個小單元都有自己的局部目標，又因為這些單元兩兩獨立，關聯式學習
能夠獨立且同時訓練彼此獨立的參數。
令人驚訝的是，利用關聯式學習訓練的準確度，也能與直接使用目標
訓練的傳統倒傳遞演算法相當，此外，可能是因為模組內的梯度流較短，
關聯式學習也能訓練用sigmoid 當作活化函數的深度學習網路，然而若
是用倒傳遞演算法訓練這類網路會容易導致梯度消失。
我們也透過觀察隱藏層中的類間與類內距離，以及t-SNE 來呈
現數量上與品質上的結果，發現聯想式學習能夠生成更好的間特徵
(Metafeatures)。

摘要(英)

Backpropagation has been widely used in deep learning approaches, but
it is inefficient and sometimes unstable because of backward locking and
vanishing/exploding gradient problems, especially when the gradient flow
is long. Additionally, updating all edge weights based on a single objective
seems biologically implausible. In this paper, we introduce a novel biologically
motivated learning structure called Associated Learning, which
modularizes the network into smaller components, each of which has a local
objective. Because the objectives are mutually independent, Associated
Learning can learn the parameters independently and simultaneously when
these parameters belong to different components. Surprisingly, training
deep models by Associated Learning yields comparable accuracies to models
trained using typical backpropagation methods, which aims at fitting
the target variable directly. Moreover, probably because the gradient flow
of each component is short, deep networks can still be trained with Associated
Learning even when some of the activation functions are sigmoid—a
situation that usually results in the vanishing gradient problem when using
typical backpropagation. We also found that the Associated Learning generates better metafeatures, which we demonstrated both quantitatively
(via inter-class and intra-class distance comparisons in the hidden layers)
and qualitatively (by visualizing the hidden layers using t-SNE).

關鍵字(中)

★ 生物合理性演算法
★ 深度學習
★ 平行運算
★ 模組化

關鍵字(英)

★ Biologically plausible algorithm
★ Deep learning
★ Parallel computing
★ Modularization

論文目次

摘要 iv
Abstract v
Contents vii
List of Figures ix
List of Tables xii
1 Introduction 1
2 Methodology 4
2.1 Preliminaries....................................4
2.1.1 Artificial Neural Network..................... 4
2.1.2 Backpropagation............................... 5
2.1.3 Models........................................ 5
2.2 Motivation...................................... 8
2.3 Associated Loss of Associated Learning.......... 9
2.4 Inverse Loss of Associated Learning............. 10
2.5 Bridge of Associated Learning................... 11
2.6 Effective Parameters and Hypothesis Space....... 11
3 Experiments 13
3.1 Datasets........................................ 14
3.1.1 MNIST......................................... 14
3.1.2 CIFAR ........................................ 16
3.2 Testing Accuracy ............................... 17
3.2.1 MNIST......................................... 17
3.2.2 CIFAR-10 ..................................... 19
3.2.3 CIFAR-100..................................... 20
3.3 Metafeature Visualization and Quantification ... 21
4 Related Work 25
5 Discussion and future works 29
Bibliography 31
A Source Code 34
A.1 Code link ...................................... 34
A.2 Usage........................................... 34

參考文獻

[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” Nature, vol. 323, no. 6088, p. 533, 1986.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, O. Vinyals, A. Graves, D. Silver,
and K. Kavukcuoglu, “Decoupled neural interfaces using synthetic gradients,” in
Proceedings of the 34th International Conference on Machine Learning-Volume 70,
pp. 1627–1635, JMLR. org, 2017.
[3] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, “Gradient flow in recurrent
nets: The difficulty of learning long-term dependencies,” in A Field Guide to
Dynamical Recurrent Neural Networks (S. C. Kremer and J. F. Kolen, eds.), IEEE
Press, 2001.
[4] X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in
Proceedings of the fourteenth International Conference on Artificial Intelligence and
Statistics, pp. 315–323, 2011.
[5] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation,
vol. 9, no. 8, pp. 1735–1780, 1997.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 770–778, 2016.
[7] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training
by reducing internal covariate shift,” in Proceedings of the 32Nd International Conference
on International Conference on Machine Learning - Volume 37, ICML’15,
pp. 448–456, JMLR.org, 2015.
[8] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent
neural networks,” in International Conference on Machine Learning, pp. 1310–1318,
2013.
[9] F. Crick, “The recent excitement about neural networks.,” Nature, vol. 337, no. 6203,
pp. 129–132, 1989.
[10] D. Balduzzi, H. Vanchinathan, and J. M. Buhmann, “Kickback cuts backprop’s
red-tape: Biologically plausible credit assignment in neural networks.,” in AAAI,
pp. 485–491, 2015.
[11] T. P. Lillicrap, D. Cownden, D. B. Tweed, and C. J. Akerman, “Random synaptic
feedback weights support error backpropagation for deep learning,” Nature Communications,
vol. 7, p. 13276, 2016.
[12] A. Nøkland, “Direct feedback alignment provides learning in deep neural networks,”
in Advances in Neural Information Processing Systems, pp. 1037–1045, 2016.
[13] A. G. Ororbia, A. Mali, D. Kifer, and C. L. Giles, “Conducting credit assignment by
aligning local representations,” arXiv preprint arXiv:1803.01834, 2018.
[14] A. G. Ororbia and A. Mali, “Biologically motivated algorithms for propagating local
target representations,” arXiv preprint arXiv:1805.11703, 2018.
[15] S. Bartunov, A. Santoro, B. Richards, L. Marris, G. E. Hinton, and T. Lillicrap,
“Assessing the scalability of biologically-motivated deep learning algorithms and architectures,”
in Advances in Neural Information Processing Systems, pp. 9390–9400,
2018.
[16] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals.,” in
ICML, vol. 97 of Proceedings of Machine Learning Research, pp. 4839–4850, PMLR,
2019.
[17] Y. Bengio, “How auto-encoders could provide credit assignment in deep networks
via target propagation,” arXiv preprint arXiv:1407.7906, 2014.
[18] D.-H. Lee, S. Zhang, A. Fischer, and Y. Bengio, “Difference target propagation,”
in Joint European Conference on Machine Learning and Knowledge Discovery in
Databases, pp. 498–515, Springer, 2015.
[19] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine
Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition.,” in ICLR (Y. Bengio and Y. LeCun, eds.), 2015.
[21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998.
[22] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,”
tech. rep., Citeseer, 2009.
[23] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in
Advances in Neural Information Processing Systems, pp. 3856–3866, 2017.
[24] M. Michael and W.-C. Lin, “Experimental study of information measure and interintra
class distance ratios on feature selection and orderings,” IEEE Transactions on
Systems, Man, and Cybernetics, no. 2, pp. 172–181, 1973.
[25] Y. Luo, Y. Wong, M. Kankanhalli, and Q. Zhao, “G-softmax: Improving intraclass
compactness and interclass separability of features,” IEEE Transactions on Neural
Networks and Learning Systems, 2019.
[26] Y. Bengio, D.-H. Lee, J. Bornschein, T. Mesnard, and Z. Lin, “Towards biologically
plausible deep learning,” arXiv preprint arXiv:1502.04156, 2015.
[27] G. Taylor, R. Burmeister, Z. Xu, B. Singh, A. Patel, and T. Goldstein, “Training
neural networks without gradients: A scalable admm approach,” in International
Conference on Machine Learning, pp. 2722–2731, 2016.
[28] Z. Huo, B. Gu, Q. Yang, and H. Huang, “Decoupled parallel backpropagation with
convergence guarantee.,” in ICML, vol. 80 of Proceedings of Machine Learning Research,
pp. 2103–2111, PMLR, 2018.
[29] Z. Huo, B. Gu, and H. Huang, “Training neural networks using features replay,” in
Advances in Neural Information Processing Systems, pp. 6659–6668, 2018.
[30] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using
local errors,” Frontiers in Neuroscience, vol. 12, p. 608, 2018.
[31] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix
factorization,” Nature, vol. 401, no. 6755, p. 788, 1999.
[32] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: transfer
learning from unlabeled data,” in Proceedings of the 24th International Conference
on Machine Learning, pp. 759–766, ACM, 2007.
[33] A. Coates and A. Y. Ng, “Selecting receptive fields in deep networks,” in Advances
in Neural Information Processing Systems, pp. 2528–2536, 2011.
[34] P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in Proceedings
of ICML Workshop on Unsupervised and Transfer Learning, pp. 37–49, 2012.

指導教授

陳弘軒(Hung-Hsuan Chen)

審核日期

2019-11-8

推文