以作者查詢圖書館館藏 、以作者查詢臺灣博碩士 、以作者查詢全國書目 、勘誤回報 、線上人數:111 、訪客IP:18.117.104.216
姓名 彭彥霖(Yen-Lin Peng) 查詢紙本館藏 畢業系所 資訊工程學系 論文名稱 基於關聯式學習的動態自適應推論及動態網路擴增
(Adaptive Inference and Dynamic Network Accumulation based on Associated Learning)相關論文 檔案 [Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] [檢視] [下載]
- 本電子論文使用權限為同意立即開放。
- 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
- 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
摘要(中) 關聯式學習(Associated Learning, AL)將傳統的多層類神經網路模組化成多個較小的區塊,每個區塊有各自的局部目標。這些目標互相獨立,因此AL可以同時地訓練不同層的參數以提高訓練模型時的效率。儘管AL架構在多種任務上已證實能達到不輸傳統神經網路的成績,AL仍有多項尚未實驗證實的優點。
AL模型架構允許動態地增加模型層數,在不改變已訓練參數的狀況下,專注於訓練新增的AL層之參數,以達到更好的預測準率。相較之下,使用傳統神經網絡要動態地擴增參數量是非常的困難。
此外,AL架構在各層區塊預留了冗餘的捷徑(Shortcuts),這些捷徑讓資料流在推論階段有多種路徑可選擇。
本論文探討AL動態疊加層(Dynamic Layer Accumulation)、提早輸出(Early Exit)以及自適應推論(Adaptive Inference)的特性,實作改良版本的AL,並比較各種AL推論方法。
我們更提出一種動態增加訓練特徵的架構,讓追加的AL層能夠額外接收原本AL層所沒有的特徵,以達到更好的訓練效果,我們的實驗使用多種經典的RNN及CNN模型作為AL架構的骨幹網路,並且在公開的文章分類及圖形分類資料集上實驗。摘要(英) Associated Learning (AL) modularizes traditional multi-layer neural networks into smaller blocks, each with its local objective. These independent objectives enables AL to train parameters of different layers simultaneously and improve training efficiency. Despite achieving comparable performance to traditional neural networks in various tasks, AL possesses several unexplored advantages.
The AL framework allows dynamic layer stacking, enabling the addition of AL layers without modifying the already trained parameters. This approach focuses on training the parameters of the newly added AL layers to achieve better prediction accuracy. In contrast, dynamically increasing the parameter size in traditional neural networks is challenging.
Furthermore, the AL architecture incorporates redundant shortcuts at each layer block, providing multiple paths for data flow during the inference stage.
This paper explores the characteristics of AL, including Dynamic Layer Accumulation, Early Exit, and Adaptive Inference, and implements improved versions of AL, comparing various AL inference methods.
We propose a framework for dynamically increasing training features, allowing the appended AL layers to receive additional features not present in the original AL layers. This enhancement aims to improve training effectiveness.
Since the design can incorporate new features without retraining the entire network, it improves the training effectiveness in a dynamic environment where new features may appear over time.
Our experiments employ classic RNN and CNN models as the backbone networks for the AL architecture and conduct evaluations on publicly available text classification and image classification datasets.關鍵字(中) ★ 關聯式學習
★ 動態網路
★ 提早輸出
★ 動態推論
★ 動態擴增模型關鍵字(英) ★ Associated Learning
★ Dynamic Neural Networks
★ Early Exit
★ Adaptive Inference
★ Dynamic Layer Accumulation論文目次 摘要 v
Abstract vi
致謝 viii
目錄 ix
一、 緒論 1
二、 相關研究 3
2.1 關聯式學習架構 (Associated Learning) ............................ 3
2.2 動態模型架構 (Dynamic Architectures) ........................... 4
2.2.1 提早退出 (Early Exit)......................................... 4
2.2.2 動態擴增網路 ................................................... 4
三、 研究模型及方法 6
3.1 AL 的推論路徑 .......................................................... 6
3.1.1 完整路徑推論 (Fullpath Inference)......................... 6
3.1.2 捷徑推論 (Shortcut Inference) .............................. 6
3.1.3 自適應路徑推論 (Adaptive Inference) ..................... 7
3.2 動態增加層機制 ......................................................... 8
3.3 動態增加特徵機制 ...................................................... 8
四、 實驗結果與分析 10
4.1 實驗設定與實作細節 ................................................... 10
4.1.1 實驗設定 ......................................................... 10
4.1.2 實作細節 ......................................................... 10
4.2 提早輸出及動態推論 ................................................... 12
4.2.1 文字分類 ......................................................... 12
4.2.2 圖片分類 ......................................................... 13
4.2.3 準確率與推論時間 ............................................. 14
4.3 動態增加訓練層 ......................................................... 19
4.3.1 文字分類 ......................................................... 19
4.3.2 圖片分類 ......................................................... 20
4.4 動態增加訓練特徵 ...................................................... 23
4.4.1 文字分類 ......................................................... 23
4.5 討論 ........................................................................ 24
4.5.1 各層 FLOPs ..................................................... 24
4.5.2 各層樣本分布 ................................................... 26
4.5.3 自適應推論的閾值設定 ....................................... 29
五、 總結 30
5.1 結論 ........................................................................ 30
5.2 未來展望 .................................................................. 30
參考文獻 32
附錄 A 實驗程式碼 34參考文獻 [1] Y. Han, G. Huang, S. Song, L. Yang, H. Wang, and Y. Wang, Dynamic neural
networks: A survey, 2021. arXiv: 2102.04906 [cs.CV].
[2] S. Teerapittayanon, B. McDanel, and H. T. Kung, Branchynet: Fast inference via
early exiting from deep neural networks, 2017. arXiv: 1709.01686 [cs.NE].
[3] W. Liu, P. Zhou, Z. Zhao, Z. Wang, H. Deng, and Q. Ju, Fastbert: A self-distilling
bert with adaptive inference time, 2020. arXiv: 2004.02178 [cs.CL].
[4] Y. Wang, K. Lv, R. Huang, S. Song, L. Yang, and G. Huang, Glance and focus:
A dynamic approach to reducing spatial redundancy in image classification, 2020.
arXiv: 2010.05300 [cs.CV].
[5] Y.-W. Kao and H.-H. Chen, “Associated learning: Decomposing end-to-end back-
propagation based on autoencoders and target propagation,” Neural Computation,
vol. 33, no. 1, pp. 174–193, 2021.
[6] D. Y. Wu, D. Lin, V. Chen, and H.-H. Chen, “Associated learning: An alternative
to end-to-end backpropagation that works on CNN, RNN, and transformer,” in
International Conference on Learning Representations, 2022. [Online]. Available:
https://openreview.net/forum?id=4N-17dske79.
[7] E. Park, D. Kim, S. Kim, et al., “Big/little deep neural network for ultra low power
inference,” in 2015 International Conference on Hardware/Software Codesign and
System Synthesis (CODES+ISSS), 2015, pp. 124–132. doi: 10.1109/CODESISSS.
2015.7331375.
[8] T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama, Adaptive neural networks for
efficient inference, 2017. arXiv: 1702.07811 [cs.LG].
[9] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, et al., Progressive neural networks,
2022. arXiv: 1606.04671 [cs.LG].
[10] C. Liu, B. Zoph, M. Neumann, et al., Progressive neural architecture search, 2018.
arXiv: 1712.00559 [cs.CV].
[11] G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Q. Weinberger,
Multi-scale dense networks for resource efficient image classification, 2018. arXiv:
1703.09844 [cs.LG].
[12] L. Yang, Y. Han, X. Chen, S. Song, J. Dai, and G. Huang, Resolution adaptive
networks for efficient inference, 2020. arXiv: 2003.07326 [cs.CV].
[13] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computa-
tion, vol. 9, pp. 1735–1780, 8 1997.
[14] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Advances
in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, et
al., Eds., vol. 30, Curran Associates, Inc., 2017.
[15] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[16] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 770–778.
[17] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2017. arXiv:
1412.6980 [cs.LG].
33指導教授 陳弘軒(Hung-Hsuan Chen) 審核日期 2023-7-25 推文 facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu 網路書籤 Google bookmarks del.icio.us hemidemi myshare