以作者查詢圖書館館藏 、以作者查詢臺灣博碩士 、以作者查詢全國書目 、勘誤回報 、線上人數:17 、訪客IP:3.145.10.9
姓名 王承凱(Cheng-Kai Wang) 查詢紙本館藏 畢業系所 軟體工程研究所 論文名稱 利用 SCPL 分解端到端倒傳遞演算法
(Decomposing End-to-End Backpropagation Based on SCPL)相關論文 檔案 [Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載]
- 本電子論文使用權限為同意立即開放。
- 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
- 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
摘要(中) 倒傳遞 (Backpropagation, BP) 是當今深度神經網路更新權重演算法
的基石,但反向傳播因反向鎖定 (backward locking) 的問題而效率不佳。
本研究試圖解決反向鎖定問題,並將提出的新方法命名為 Supervised
Contrastive Parallel Learning (SCPL),SCPL 利用監督對比損失函數作為每個卷積層的區域目標函數,因為每一層的區域目標函數間互相隔離,
SCPL 可以平行地學習不同卷基層的權重。
本論文亦和過去在神經網路平行化的研究進行比較,探討現存方法
各自的優勢與限制,並討論此議題未來的研究方向。摘要(英) Backpropagation (BP) is the cornerstone of today’s deep learning algorithms to update the weights in deep neural networks, but it is inefficient partially because of the backward locking problem. This thesis proposes Supervised Contrastive Parallel Learning (SCPL) to address the issue of backward locking. SCPL uses the supervised contrastive loss as the local objective function for each layer. Because the local objective functions in different layers are isolated, SCPL can learn the weights of different layers in parallel. We compare SCPL with recent works on neural network parallelization. We discuss the advantages and limitations of the existing methods. Finally, we suggest future research directions on neural network parallelization. 關鍵字(中) ★ 倒傳遞
★ 反向鎖定
★ 監督對比損失函數
★ 平行化訓練
★ 監督式 對比平行學習關鍵字(英) ★ Backpropagation
★ backward locking
★ supervised contrastive loss
★ parallel learning
★ supervised contrastive parallel learning論文目次 摘要 v
Abstract vi
致謝 vii
目錄 viii
一、 緒論 1
二、 相關研究 4
三、 研究模型及方法 6
3.1 對比學習的機制 ......................................................... 6
3.2 監督對比損失函數 ...................................................... 8
3.3 學習機制與網路架構 ................................................... 9
3.4 推論函數及假設空間 ................................................... 11
3.5 與其他方法比較 ......................................................... 11
3.6 模型虛擬碼 ............................................................... 12
四、 實驗結果與分析 14
4.1 實驗設定與實作細節 ................................................... 14
4.1.1 實驗設定 ......................................................... 14
4.1.2 實作細節 ......................................................... 14
4.2 分類任務準確率 ......................................................... 17
4.2.1 CIFAR-10 ........................................................ 17
4.2.2 CIFAR-100....................................................... 18
4.2.3 TinyImageNet-val .............................................. 18
4.3 泛化能力測試 ............................................................ 19
4.4 消融實驗 .................................................................. 21
4.4.1 資料擴增 ......................................................... 21
4.4.2 批次大小 ......................................................... 22
4.4.3 投影頭 ............................................................ 23
4.5 討論 ........................................................................ 24
4.5.1 方法比較與分析 ................................................ 24
4.5.2 問題探討 ......................................................... 25
五、 總結 27
5.1 結論 ........................................................................ 27
5.2 未來展望 .................................................................. 28
參考文獻 29
附錄 A 實驗程式碼 31參考文獻 [1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, et al., “Decoupled neural interfaces
using synthetic gradients,” in International conference on machine learning, PMLR,
2017, pp. 1627–1635.
[3] Y.-W. Kao and H.-H. Chen, “Associated learning: Decomposing end-to-end backpropagation based on autoencoders and target propagation,” Neural Computation,
vol. 33, no. 1, pp. 174–193, 2021.
[4] D. Y. Wu, D. Lin, V. Chen, and H.-H. Chen, “Associated learning: An alternative to end-to-end backpropagation that works on cnn, rnn, and transformer,” in
International Conference on Learning Representations, 2021.
[5] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals,”
in International conference on machine learning, PMLR, 2019, pp. 4839–4850.
[6] S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Branchynet: Fast inference via
early exiting from deep neural networks,” in 2016 23rd International Conference
on Pattern Recognition (ICPR), IEEE, 2016, pp. 2464–2469.
[7] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using
local errors,” Frontiers in neuroscience, p. 608, 2018.
[8] P. Khosla, P. Teterwak, C. Wang, et al., “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 661–18 673, 2020.
[9] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998.
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 770–778.
[11] C. J. Shallue, J. Lee, J. Antognini, J. Sohl-Dickstein, R. Frostig, and G. E. Dahl,
“Measuring the effects of data parallelism on neural network training,” arXiv preprint
arXiv:1811.03600, 2018.
[12] T. Vogels, S. P. Karimireddy, and M. Jaggi, “Powersgd: Practical low-rank gradient compression for distributed optimization,” Advances in Neural Information
Processing Systems, vol. 32, 2019.
[13] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, 2020, pp. 9729–9738.
[14] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine
learning, PMLR, 2020, pp. 1597–1607.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny
images,” 2009.
[17] Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7,
no. 7, p. 3, 2015.
[18] S. Garg, S. Balakrishnan, Z. Kolter, and Z. Lipton, “Ratt: Leveraging unlabeled
data to guarantee generalization,” in International Conference on Machine Learning, PMLR, 2021, pp. 3598–3609.
[19] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2:
Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2018, pp. 4510–4520.
[20] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” in International conference on machine learning, PMLR, 2019, pp. 6105–
6114.指導教授 陳弘軒(Hung-Hsuan Chen) 審核日期 2022-7-19 推文 facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu 網路書籤 Google bookmarks del.icio.us hemidemi myshare