實現監督對比平行學習的參數同步更新、動態累加層、及前向捷徑

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：18

、訪客IP：3.15.223.129

姓名

何名曜(Ming-Yao Ho) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

實現監督對比平行學習的參數同步更新、動態累加層、及前向捷徑
(Realizing Synchronized Parameter Updating, Dynamic Layer Accumulation, and Forward Shortcuts in Supervised Contrastive Parallel Learning)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 動態多模型融合分析研究	★ 擴展點擊流：分析點擊流中缺少的使用者行為
★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法	★ 融合多模型排序之點擊預測模型
★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為	★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量
★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機	★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架
★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究	★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測
★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路	★ 使用圖神經網路偵測 PTT 的低活躍異常帳號
★ 針對個別使用者從其少量趨勢線樣本生成個人化趨勢線	★ 基於雙變量及多變量貝他分布的兩個新型機率分群模型

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

端到端倒傳遞（End-to-End backpropagatio, BP）是現今深度學習技術的重要基石。然而，隨著深度學習網絡的逐漸變深，使得 BP 也面臨挑戰。監督對比平行學習（Supervised Contrastive Parallel Learning, SCPL) 是一種解偶 BP 的新方法，其透過多個局部訓練目標與監督對比學習方式，使原本的深層網路的長梯度流轉換為多個短梯度流，並透過管線化的設計使不同層中的參數獨立訓練，進而達到比 BP 更快的訓練速度，以此解決 BP 在反向傳播中因反向鎖定（Backward Locking）而導致時間效率不佳的現象。

然而，SCPL 的原始論文並未實際實現平行化的參數訓外，也未在自然語言（NLP）領域中進行探討，因此本論文補足這些，並在視覺（Vision）領域與自然語言領域的資料集上進行準確率與平行化訓練時間的探討。藉此表現出本方法 SCPL 在這兩個領域中都能作為替代 BP 的一項新方法。此外，在研究的過程中發現一種 SCPL 的改進架構，使其可動態累加層（Dynamic Layer Accumulation）並有前向捷徑（Forward Shortcuts）與提早退出（Early Exit）的能力，本論文將這個新的架構稱為動態累加監督對比平行學習（Dynamic Accumulated Supervised Contrastive Parallel Learning, DASCPL）。也基於這兩個特性，使 DASCPL 比起 SCPL 有更高的彈性與靈活度，並與 SCPL 擁有一致的學習能力。

摘要(英)

End-to-End backpropagation (BP) is a cornerstone of modern deep learning techniques. However, as deep learning networks grow deeper, training networks by BP becomes challenging. Supervised Contrastive Parallel Learning (SCPL) is a novel approach that decouples BP by multiple local training objectives and supervised contrastive learning. It transforms the original deep network′s long gradient flow into multiple short gradient flows and trains the parameters in different layers independently through a pipelined design. This method achieves faster training speed than BP by addressing the inefficiency caused by backward locking in backpropagation.

However, the original paper on SCPL did not practically implement parallel parameter training nor explore its application in the field of Natural Language Processing (NLP). This paper supplements these aspects and examines the accuracy and parallel training time of SCPL on datasets in both the vision and NLP domains. It demonstrates that SCPL can be a new alternative to BP in both domains. Additionally, we improved the architecture of SCPL, which enables dynamic layer accumulation, forward shortcuts, and early exits. This new architecture is called Dynamic Accumulated Supervised Contrastive Parallel Learning (DASCPL). Based on these two features, DASCPL offers higher flexibility and adaptability compared to SCPL while maintaining consistent learning capabilities.

關鍵字(中)

★ 倒傳遞
★ 反向鎖定
★ 監督對比損失函數
★ 管線化
★ 平行化訓練
★ 模型平行化
★ 前向捷徑
★ 提早退出
★ 動態累加層
★ 監督對比平行學習

關鍵字(英)

★ Backpropagation
★ Backward Locking
★ Supervised Contrastive Loss
★ Pipeline
★ Parallel Learning
★ Model Parallelism
★ Forward Shortcut
★ Early Exit
★ Dynamic Layer Accumulation
★ Supervised Contrastive Parallel Learning

論文目次

摘要 v
Abstract vii
致謝 ix
目錄 x
一、緒論 1
二、相關研究 5
2.1 反向鎖定（Backward Locking）..................................... 5
2.2 局部目標設計 ............................................................ 6
2.3 對比學習（Contrastive Learning） ................................. 7
2.4 模型平行化與管線化 ................................................... 8
2.5 總結 ........................................................................ 9
三、研究方法與機制 10
3.1 對比學習的機制 ......................................................... 10
3.2 監督對比損失函數 ...................................................... 13
3.3 學習機制與網路架構 ................................................... 14
3.4 推論函數及假設空間 ................................................... 18
3.5 管線化與可平行性 ...................................................... 18
3.6 可動態累加層的 SCPL................................................. 20
3.7 特色與比較 ............................................................... 24
四、實驗與結果分析 25
4.1 視覺領域之實驗探討 ................................................... 26
4.1.1 實驗概述 ......................................................... 26
4.1.2 實作細節 ......................................................... 26
4.1.3 資料集 ............................................................ 28
4.1.4 分類任務準確率 ................................................ 28
4.1.5 訓練時間評估 ................................................... 33
4.2 自然語言領域之實驗探討 ............................................. 37
4.2.1 實驗概述 ......................................................... 37
4.2.2 實作細節 ......................................................... 37
4.2.3 資料集 ............................................................ 38
4.2.4 分類任務準確率 ................................................ 39
4.2.5 訓練時間評估 ................................................... 44
4.3 平行化的評估 ............................................................ 46
4.3.1 強縮放（Strong Scaling）.................................... 46
4.3.2 弱縮放（Weak Scaling） ..................................... 48
4.4 SCPL 延伸實驗.......................................................... 49
4.4.1 對訓練回合的敏感性 .......................................... 50
4.4.2 SCPL 對深度的敏感性........................................ 50
4.4.3 泛化能力測試 ................................................... 51
4.5 DASCPL 相關實驗 ..................................................... 53
4.5.1 實驗設定 ......................................................... 53
4.5.2 分類任務準確評估 ............................................. 54
4.5.3 訓練時間評估 ................................................... 56
4.5.4 前向捷徑 ......................................................... 56
4.5.5 動態累加層實驗 ................................................ 58
4.5.6 消融實驗 ......................................................... 62
4.6 討論 ........................................................................ 64
4.6.1 SCPL 中視覺與自然語言分類表現的差異................ 64
4.6.2 剖析 SCPL 加速主因.......................................... 65
4.6.3 SCPL 加速怪異現象之原因.................................. 67
4.7 問題探討與未來展望 ................................................... 69
五、總結 72
參考文獻 73
附錄 A 實驗程式碼 76

參考文獻

[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, et al., “Decoupled neural interfaces
using synthetic gradients,” in International conference on machine learning, PMLR,
2017, pp. 1627–1635.
[3] S. Hochreiter, “The vanishing gradient problem during learning recurrent neural
nets and problem solutions,” International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, vol. 6, no. 02, pp. 107–116, 1998.
[4] M. A. Nielsen, Neural networks and deep learning. Determination press San Francisco, CA, USA, 2015, vol. 25.
[5] C. Szegedy, W. Liu, Y. Jia, et al., “Going deeper with convolutions,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun.
2015.
[6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep
bidirectional transformers for language understanding,” in Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers),
Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019,
pp. 4171–4186. doi: 10.18653/v1/N19-1423.
[7] Y.-W. Kao and H.-H. Chen, “Associated learning: Decomposing end-to-end backpropagation based on autoencoders and target propagation,” Neural Computation,
vol. 33, no. 1, pp. 174–193, 2021.
[8] D. Y. Wu, D. Lin, V. Chen, and H.-H. Chen, “Associated learning: An alternative to end-to-end backpropagation that works on cnn, rnn, and transformer,” in
International Conference on Learning Representations, 2021.
[9] S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Branchynet: Fast inference via
early exiting from deep neural networks,” in 2016 23rd International Conference
on Pattern Recognition (ICPR), IEEE, 2016, pp. 2464–2469.
[10] H. Mostafa, V. Ramesh, and G. Cauwenberghs, “Deep supervised learning using
local errors,” Frontiers in neuroscience, p. 608, 2018.
[11] C.-K. Wang, “Decomposing end-to-end backpropagation based on scpl,” 碩士論文，
國立中央大學軟體工程研究所, 2022.
[12] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals,”
in International conference on machine learning, PMLR, 2019, pp. 4839–4850.
[13] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine
learning, PMLR, 2020, pp. 1597–1607.
[14] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, 2020, pp. 9729–9738.
[15] P. Khosla, P. Teterwak, C. Wang, et al., “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 661–18 673, 2020.
[16] Y. Huang, Y. Cheng, A. Bapna, et al., Gpipe: Efficient training of giant neural
networks using pipeline parallelism, 2019. arXiv: 1811.06965 [cs.CV].
[17] A. Paszke, S. Gross, F. Massa, et al., “Pytorch: An imperative style, high-performance
deep learning library,” in Advances in Neural Information Processing Systems, H.
Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett,
Eds., vol. 32, Curran Associates, Inc., 2019.
[18] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 770–778.
[20] A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny
images,” 2009.
[21] Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7,
no. 7, p. 3, 2015.
[22] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997, issn: 0899-7667. doi: 10.1162/neco.
1997.9.8.1735.
[23] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Advances
in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, et
al., Eds., vol. 30, Curran Associates, Inc., 2017.
74
[24] X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text
classification,” in Advances in Neural Information Processing Systems, C. Cortes, N.
Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28, Curran Associates,
Inc., 2015.
[25] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning word vectors for sentiment analysis,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA: Association for Computational Linguistics, Jun. 2011,
pp. 142–150.
[26] R. Socher, A. Perelygin, J. Wu, et al., “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the 2013 Conference
on Empirical Methods in Natural Language Processing, Seattle, Washington, USA:
Association for Computational Linguistics, Oct. 2013, pp. 1631–1642.
[27] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word
representation,” in Empirical Methods in Natural Language Processing (EMNLP),
2014, pp. 1532–1543.
[28] J.-B. Grill, F. Strub, F. Altché, et al., “Bootstrap your own latent - a new approach
to self-supervised learning,” in Advances in Neural Information Processing Systems,
H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Curran
Associates, Inc., 2020, pp. 21 271–21 284.
[29] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
2021, pp. 15 750–15 758.
[30] A. Bardes, J. Ponce, and Y. LeCun, “VICReg: Variance-invariance-covariance regularization for self-supervised learning,” in International Conference on Learning
Representations, 2022.
[31] C.-H. Yeh, C.-Y. Hong, Y.-C. Hsu, T.-L. Liu, Y. Chen, and Y. LeCun, “Decoupled
contrastive learning,” in Computer Vision–ECCV 2022: 17th European Conference,
Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVI, Springer, 2022,
pp. 668–684.

指導教授

陳弘軒(Hung-Hsuan Chen)

審核日期

2023-7-25

推文