訊息正則化的監督解耦式學習之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：48

、訪客IP：18.221.243.29

姓名

黃梓豪(Zih-Hao Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

訊息正則化的監督解耦式學習之研究
(Decoupled Supervised Learning with Information Regularization)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 動態多模型融合分析研究	★ 擴展點擊流：分析點擊流中缺少的使用者行為
★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法	★ 融合多模型排序之點擊預測模型
★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為	★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量
★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機	★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架
★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究	★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測
★ 藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路	★ 使用圖神經網路偵測 PTT 的低活躍異常帳號
★ 針對個別使用者從其少量趨勢線樣本生成個人化趨勢線	★ 基於雙變量及多變量貝他分布的兩個新型機率分群模型

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來，在端到端反向傳遞(End-to-End BackPropagation, BP) 的深度神經網絡結構設計中，通過增加層數來提高模型識別能力成為一個
明顯的趨勢，但也面臨梯度爆炸與消失等BP問題。因此，我們提出了新的解耦式模型: 訊息正則化的監督解耦式學習(Decoupled Supervised Learning with Information Regularization, DeInfoReg)，透過解耦式模型來截斷各模塊的梯度，使各模塊的梯度互不影響。

本論文透過設計新的局部損失函數和模型結構設計，來增強模型性能與靈活性。新的模型結構設計使模型擁有自適應推理輸出與動態擴增層與新特徵的新特性。新的局部損失函數透過三種正則化方法來衡量模型輸出嵌入之間的訊息，這些方法包括：使輸出嵌入與真實標籤保持不變性、使批次內的輸出嵌入保持差異性、計算輸出嵌入的共變異性來減少自身的冗餘資訊。這些正則化方法使模型能夠更好地捕捉數據的細微特徵，提高模型的辨識性能。

在後續的實驗中，詳細評估了DeInfoReg 模型的性能，證實了其在多種任務和數據集上的優越表現。實驗結果顯示，DeInfoReg 在處理深層結構下的梯度問題方面具有顯著優勢，並且在不同雜訊標籤比例下的抗噪能力也優於傳統BP模型。此外，我們還探討了模型在自適應推理輸出和動態擴增層與新特徵的應用潛力，並提出了未來改進的方向，以進一步提升模型的實用性和泛化能力。這些結果表明DeInfoReg在深度學習領域具有廣泛的應用前景和強大的拓展能力。

摘要(英)

Increasing the number of layers to enhance model capabilities has become a clear trend in designing deep neural networks. However, this approach faces various optimization issues, such as vanishing or exploding gradients. We propose a new model, Decoupled Supervised Learning with Information Regularization (DeInfoReg), that decouples the gradients of
different blocks to ensure that the gradients of different blocks do not interfere.

DeInfoReg enhances model performance and flexibility by designing new Local Loss and model structures. The new model structure endows the model with an Adaptive Inference Path, Dynamic Expanded Layers, and Dynamic Extended Layers with new features. The Local Loss function measures the information between model output embedding through three regularization methods. Those methods include: ensuring the invariance of the output embedding with true labels, maintaining the variance of output embedding within batch size, and using the covariance to reduce redundancy in the output embedding. This method enables the model to capture features in the data better, thus improving performance.

We evaluate the performance of DeInfoReg through various tasks and datasets. The experimental results demonstrate that DeInfoReg signif cantly addresses gradient issues in deep neural networks and shows superior noise resistance under different proportions of label noise compared to traditional backpropagation. Additionally, we explore the potential applications of the model in Adaptive Inference Paths and Dynamically Expanded Layers with new features. The findings indicate that DeInfoReg has broad application prospects and robust expansion capabilities in deep neural networks. Finally, we discuss future improvements to enhance the model’s practicality and generalization capabilities.

關鍵字(中)

★ 監督式學習
★ 解耦式範例
★ 正則化的局部損失函數
★ 自適應推理輸出
★ 動態擴增層
★ 動態增加新特徵

關鍵字(英)

★ Supervised Learning
★ Decoupled Paradigm
★ Regularized Local Loss
★ Adaptive Inference Path
★ Dynamic Extended Layers
★ Dynamic Extended Layers with new features

論文目次

摘要 v
Abstract vii
致謝 ix
目錄 x
一、緒論 1
二、相關研究 4
2.1解耦式範例(DecoupledParadigm)................................. 4
2.2自監督式學習(Self-SupervisedLearning)......................... 5
2.2.1對比式方法(ContrastiveMethod)......................... 5
2.2.2最大化互訊息方法(Mutual InformationMaximiza
tionMethod)............................................................. 6
2.3自適應推理輸出(AdaptiveInferencePath)....................... 7
2.4總結........................................................................ 7
三、研究模型及方法 9
3.1 DeInfoReg架構與梯度回流設計..................................... 9
3.2局部損失函數設計...................................................... 11
3.3自適應推理輸出設計................................................... 13
3.4動態擴增層與新特徵................................................... 15
四、實驗結果與分析 17
4.1視覺領域實驗探討...................................................... 17
4.1.1視覺領域實驗設定............................................. 17
4.1.2視覺領域實作細節............................................. 18
4.1.3視覺分類任務實驗數據....................................... 20
4.2自然語言領域實驗探討................................................ 22
4.2.1自然語言領域實驗設定....................................... 22
4.2.2自然語言領域實作細節....................................... 23
4.2.3自然語言分類任務實驗數據................................. 24
4.3自適應推理輸出實驗................................................... 26
4.4動態擴增層與新特徵實驗............................................. 27
4.5 DeInfoReg延伸實驗.................................................... 29
4.5.1探討模型對抗噪聲的能力.................................... 29
4.5.2探討模型在各種模塊深度與梯度影響性.................. 30
4.6 DeInfoReg消融(Ablation)實驗.................................... 32
4.6.1探討局部損失函數對模型性能的影響..................... 32
4.6.2探討投影層維度的影響....................................... 33
五、總結 34
5.1結論........................................................................ 34
5.2未來展望.................................................................. 35
參考文獻 37
附錄A 附錄 41
A.1模型虛擬碼............................................................... 41
A.2 ResNet模型輸出層的選擇............................................ 42

參考文獻

[1] S. Hochreiter, “The vanishing gradient problem during learning recurrent neural nets and problem solutions,” International Journal of ncertainty, Fuzziness and Knowledge-Based Systems, vol. 06, no. 02, pp. 107–116, 1998.

[2] M. Jaderberg, W. M. Czarnecki, S. Osindero, et al., “Decoupled neural interfaces using synthetic gradients,” in International conference on achine learning, PMLR, 2017, pp. 1627–1635.

[3] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[4] C. Szegedy, W. Liu, Y. Jia, et al., “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern ecognition, 2015, pp. 1–9.

[5] Y.-W. Kao and H.-H. Chen, “Associated learning: Decomposing end-to-end backpropagation based on autoencoders and target propagation,” Neural Computation, vol. 33, no. 1, pp. 174–193, 2021.

[6] D. Y. Wu, D. Lin, V. Chen, and H.-H. Chen, “Associated learning: An alternative to end-to-end backpropagation that works on cnn, rnn, and ransformer,” in International Conference on Learning Representations, 2021.

[7] C.-K. Wang, “Decomposing end-to-end backpropagation based on scpl,” 碩士論文，國立中央大學軟體工程研究所,2022.

[8] M.-Y. Ho, “Realizing synchronized parameter updating, dynamic layer accumulation, and forward shortcuts in supervised contrastive parallel learning,” 碩士論文，國立中央大學資訊工程學系,2023.

[9] T.-H. Lin, “Enabling simultaneous parameter updates in different layers for a neural network —using associated learning and pipeline,” 碩士論文，國立中央大學資訊工程學系,2023.

[10] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals,” in International conference on machine learning, PMLR, 2019, pp. 4839–4850.

[11] C.-Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” in Artificial intelligence and statistics, Pmlr, 2015, pp. 562–570.

[12] S. A. Siddiqui, D. Krueger, Y. LeCun, and S. Deny, “Blockwise self-supervised learning at scale,” arXiv preprint arXiv:2302.01647, 2023.

[13] P. Khosla, P. Teterwak, C. Wang, et al., “Supervised contrastive learning,” Advances in neural information processing systems, vol. 33, pp. 18661–18673, 2020.

[14] S. Ozsoy, S. Hamdan, S. Arik, D. Yuret, and A. Erdogan, “Self-supervised learning with an information maximization criterion,” Advances in Neural Information Processing Systems, vol. 35, pp. 35240–35253, 2022.

[15] L. Jing, P. Vincent, Y. LeCun, and Y. Tian, “Understanding dimensional collapse in contrastive self-supervised learning,” arXiv preprint arXiv:2110.09348, 2021.

[16] X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 15750–15758.

[17] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738.

[18] X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020.

[19] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, PMLR, 2020, pp. 1597–1607.

[20] J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” in International conference on machine learning, PMLR, 2021, pp. 12310–12320.

[21] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.

[22] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, PMLR, 2020, pp. 1597–1607.

[23] R. Linsker, “Self-organization in a perceptual network,” Computer, vol. 21, no. 3, pp. 105–117, 1988.

[24] T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). USA: Wiley-Interscience, 2006, isbn: 0471241954.

[25] M. Tschannen, J. Djolonga, P. K. Rubenstein, S. Gelly, and M. Lucic, “On mutual information maximization for representation learning,” arXiv preprint arXiv:1907.13625, 2019.

[26] A. Bardes, J. Ponce, and Y. LeCun, “Vicreg: Variance-invariance-covariance regularization for self-supervised learning,” arXiv preprint arXiv:2105.04906, 2021.

[27] S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Branchynet: Fast inference via early exiting from deep neural networks,” in 2016 23rd
international conference on pattern recognition (ICPR), IEEE, 2016, pp. 2464–2469.

[28] M. Elbayad, J. Gu, E. Grave, and M. Auli, “Depth-adaptive transformer,” arXiv preprint arXiv:1910.10073, 2019.

[29] H. Li, H. Zhang, X. Qi, R. Yang, and G. Huang, “Improved techniques for training adaptive deep networks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1891–1900.

[30] W. Zhou, C. Xu, T. Ge, J. McAuley, K. Xu, and F. Wei, “Bert loses patience: Fast and robust inference with early exit,” Advances in Neural Information Processing Systems, vol. 33, pp. 18330–18341, 2020.

[31] J. Xin, R. Tang, Y. Yu, and J. Lin, “Berxit: Early exiting for bert with better finetuning and extension to regression,” in Proceedings of the 16th conference of the European chapter of the association for computational linguistics: Main Volume, 2021, pp. 91–104.

[32] Z. Fei, X. Yan, S. Wang, and Q. Tian, “Deecap: Dynamic early exiting for efficient image captioning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12216–12226.

[33] S. Tang, Y. Wang, Z. Kong, et al., “You need multiple exiting: Dynamic early exiting for accelerating unified vision language model,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10781–10791.

[34] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

[35] Y. You, I. Gitman, and B. Ginsburg, “Large batch training of convolutional networks,” arXiv preprint arXiv:1708.03888, 2017.

[36] A. Krizhevsky, “One weird trick for parallelizing convolutional neural networks,” arXiv preprint arXiv:1404.5997, 2014.

[37] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pmlr, 2015, pp. 448–456.

[38] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, pp. 1735–1780, 8 1997.

[39] A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, et al., Eds., vol. 30, Curran Associates, Inc., 2017.

[40] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.

指導教授

陳弘軒(Hung-Hsuan Chen)

審核日期

2024-7-15

推文