一種可同時更新神經網路各層網路參數的新技術— 採用關聯式學習及管路化機制

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：18

、訪客IP：3.144.252.140

姓名

林廷翰(Ting-Han Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

一種可同時更新神經網路各層網路參數的新技術— 採用關聯式學習及管路化機制
(Enabling simultaneous parameter updates in different layers for a neural network —using associated learning and pipeline)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 動態多模型融合分析研究	★ 擴展點擊流：分析點擊流中缺少的使用者行為
★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法	★ 融合多模型排序之點擊預測模型
★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為	★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量
★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機	★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架
★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究	★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測
★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路	★ 使用圖神經網路偵測 PTT 的低活躍異常帳號
★ 針對個別使用者從其少量趨勢線樣本生成個人化趨勢線	★ 基於雙變量及多變量貝他分布的兩個新型機率分群模型

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

倒傳遞 (Back-propagation, BP) 廣泛運用於今日的深度學習演算法，然而它仍存在反向鎖定的問題導致模型訓練效率不佳。許多研究嘗試解決反向鎖定問題，而關聯式學習 (Associated Learning, AL) 便是其中一種模型架構。雖然關聯式學習理論上可透過管線化來增加訓練的效率，但原論文並未實現管線化，本論文補足這個部份，並透過大量實驗及效能分析工具(profiler) 觀察關聯式學習管線化後的實際行為。本論文亦和過去使用倒傳遞訓練之模型做比較，探討各自的優勢與限制，並討論關聯式學習未來的研究方向。

摘要(英)

Back-propagation (BP) is widely utilized in deep learning algorithms, but it suffers from the issue of backward locking, resulting in inefficient model training. Various research efforts have been made to address this problem, and one promising solution is Associated Learning (AL). In theory, AL has the potential to enhance training efficiency through pipelining. However, the original proposal lacks the implementation of the pipeline. In this thesis, we bridge this gap by implementing the pipeline mechanism and conducting experiments on multiple GPUs. By leveraging profiling tools, we analyze the behavior of AL after pipelining. We compare models trained using back-propagation and pipelined AL to examine their respective advantages and limitations. Moreover, we discuss potential future research directions for Associated Learning.

關鍵字(中)

★ 倒傳遞
★ 反向鎖定
★ 關聯式學習
★ 平行化訓練
★ 模型平行化

關鍵字(英)

★ back-propagation
★ backward locking
★ associated learning
★ parallel training
★ model parallelism

論文目次

一、緒論 1
二、相關研究 3
2.1 Back-propagation 3
2.2 Associated Learning 4
2.3 平行化訓練與管線化機制 4

三、研究模型及方法 6
3.1 關聯式學習基本架構 6
3.2 VGG、VGGAL 7
3.3 Resnet、ResnetAL 8
3.4 LSTM、LSTMAL 8
3.5 Transformer、TransformerAL 8
3.6 模型平行化 9
3.7 三種 AL 及 Gpipe 訓練方式優缺分析與差異 10

四、實驗結果與分析 12
4.1 實驗環境、參數與細節設定 12
4.1.1 實驗環境 12
4.1.2 實驗參數與細節 13
4.2 分類任務表現 15
4.2.1 圖像分類任務 15
4.2.2 文本分類任務 16
4.3 BP 與 AL 訓練過程分析 17
4.4 擴展性分析 22
4.4.1 Weak Scaling 22
4.4.2 Strong Scaling 25
4.5 消融實驗 27
4.5.1 AL 三種訓練方式差異分析 27
4.5.2 單次訓練所需的時間比較 29
4.6 討論 30
4.6.1 方法比較與分析 30
4.6.2 問題探討 31

五、總結 33
5.1 結論 33
5.2 未來展望 34

參考文獻 35

附錄 A 實驗程式碼 37
附錄 B 模型虛擬碼 38

參考文獻

[1] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, et al., “Decoupled neural interfaces
using synthetic gradients,” in International conference on machine learning, PMLR,
2017, pp. 1627–1635.
[3] W. M. Czarnecki, G. Świrszcz, M. Jaderberg, S. Osindero, O. Vinyals, and K.
Kavukcuoglu, “Understanding synthetic gradients and decoupled neural interfaces,”
in International Conference on Machine Learning, PMLR, 2017, pp. 904–912.
[4] D.-H. Lee, S. Zhang, A. Fischer, and Y. Bengio, “Difference target propagation,” in
Machine Learning and Knowledge Discovery in Databases: European Conference,
ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I
15, Springer, 2015, pp. 498–515.
[5] D. Y. Wu, D. Lin, V. Chen, and H.-H. Chen, “Associated learning: An alternative to end-to-end backpropagation that works on cnn, rnn, and transformer,” in
International Conference on Learning Representations, 2021.
[6] Y.-W. Kao and H.-H. Chen, “Associated learning: Decomposing end-to-end backpropagation based on autoencoders and target propagation,” Neural Computation,
vol. 33, no. 1, pp. 174–193, 2021.
[7] C.-Y. Chuang, J. Robinson, Y.-C. Lin, A. Torralba, and S. Jegelka, “Debiased
contrastive learning,” Advances in neural information processing systems, vol. 33,
pp. 8765–8775, 2020.
[8] C.-K. Wang, “利用 scpl 分解端到端倒傳遞演算法,” M.S. thesis, National Central
University, 2022.
[9] C. J. Shallue, J. Lee, J. Antognini, J. Sohl-Dickstein, R. Frostig, and G. E. Dahl,
“Measuring the effects of data parallelism on neural network training,” arXiv preprint
arXiv:1811.03600, 2018.
[10] T. Vogels, S. P. Karimireddy, and M. Jaggi, “Powersgd: Practical low-rank gradient compression for distributed optimization,” Advances in Neural Information
Processing Systems, vol. 32, 2019.
35
參考文獻
[11] Y. Huang, Y. Cheng, A. Bapna, et al., “Gpipe: Efficient training of giant neural
networks using pipeline parallelism,” Advances in neural information processing
systems, vol. 32, 2019.
[12] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[13] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 770–778.
[14] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE
transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
[15] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation,
vol. 9, no. 8, pp. 1735–1780, 1997.
[16] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” Advances
in neural information processing systems, vol. 30, 2017.
[17] D. Narayanan, A. Harlap, A. Phanishayee, et al., “Pipedream: Generalized pipeline
parallelism for dnn training,” in Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019, pp. 1–15.
[18] M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, and B. Catanzaro,
“Megatron-lm: Training multi-billion parameter language models using model parallelism,” arXiv preprint arXiv:1909.08053, 2019.

指導教授

陳弘軒(Hung-Hsuan Chen)

審核日期

2023-7-20

推文