藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：18

、訪客IP：3.137.217.220

姓名

楊緣智(Yuan-Chih Yang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

藉由權重之梯度大小調整DropConnect的捨棄機率來訓練神經網路
(Training a neural network by adjusting the drop probability in DropConnect based on the magnitude of the gradient)

相關論文

★ 透過網頁瀏覽紀錄預測使用者之個人資訊與性格特質	★ 透過矩陣分解之多目標預測方法預測使用者於特殊節日前之瀏覽行為變化
★ 動態多模型融合分析研究	★ 擴展點擊流：分析點擊流中缺少的使用者行為
★ 關聯式學習：利用自動編碼器與目標傳遞法分解端到端倒傳遞演算法	★ 融合多模型排序之點擊預測模型
★ 分析網路日誌中有意圖、無意圖及缺失之使用者行為	★ 基於自注意力機制產生的無方向性序列編碼器使用同義詞與反義詞資訊調整詞向量
★ 探索深度學習或簡易學習模型在點擊率預測任務中的使用時機	★ 空氣品質感測器之故障偵測--基於深度時空圖模型的異常偵測框架
★ 以同反義詞典調整的詞向量對下游自然語言任務影響之實證研究	★ 結合時空資料的半監督模型並應用於PM2.5空污感測器的異常偵測
★ 使用圖神經網路偵測 PTT 的低活躍異常帳號	★ 針對個別使用者從其少量趨勢線樣本生成個人化趨勢線
★ 基於雙變量及多變量貝他分布的兩個新型機率分群模型	★ 一種可同時更新神經網路各層網路參數的新技術— 採用關聯式學習及管路化機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在深度學習訓練中，Dropout 和 DropConnect 是常被用來解決過度
擬合的正則化技術，Dropout 和 DropConnect 藉由在訓練過程中以一個
固定機率隨機地捨棄神經元及該神經元前後的連結，使得每個神經元彼
此之間不會過度依賴其他神經元，進而提高模型泛化的能力。
本文提出了一種新模型 Gradient DropConnect，它利用每個權重和
偏差的梯度以確定它們在訓練期間的下降捨棄機率。我們進行了一連串
的實驗以驗證這種方法可以有效地緩解過度擬合。

摘要(英)

Dropout and DropConnect are regularization techniques often used to address the overfitting issue in deep learning. Dropout and DropConnect randomly discard neurons or links with a fixed probability during training
so that each neuron does not depend too much on other neurons, thereby improving the model’s generalization ability.
This paper proposes a new model, Gradient DropConnect, which leverages the gradient of each weight and bias to determine their dropping probabilities during training. We conducted thorough experiments to validate that such an approach can effectively mitigate overfitting.

關鍵字(中)

★ 過度擬合、正則化、Dropout、DropConnect、泛化

關鍵字(英)

★ Overfitting, Regularization, Dropout, DropConnect, Generalization

論文目次

目錄
頁次
摘要 iv
Abstract v
致謝 vi
目錄 viii
圖目錄 xi
表目錄 xii
一、緒論 1
1.1 研究動機 .................................................................. 1
1.2 研究目標 .................................................................. 2
1.3 研究貢獻 .................................................................. 2
1.4 論文架構 .................................................................. 3
二、相關研究 4
2.1 Dropout ................................................................... 4
2.2 Adaptive dropout for training deep neural networks............ 6
2.3 DropConnect ............................................................. 7
2.4 Inverted Dropout ........................................................ 7
三、模型及方法 9
3.1 模型架構 .................................................................. 9
viii
目錄
3.2 DropConnect 與 Mask 遮罩矩陣 .................................... 10
3.3 Gradient DropConnect ................................................. 12
3.4 模型虛擬碼 (PseudoCode) ............................................ 18
四、實驗結果 21
4.1 實驗參數細節 ............................................................ 21
4.2 訓練用資料集 ............................................................ 21
4.2.1 隨機生成的線性回歸資料集 ................................. 21
4.2.2 MNIST............................................................ 22
4.2.3 CIFAR10 ......................................................... 22
4.2.4 CIFAR100........................................................ 22
4.2.5 NORB............................................................. 22
4.3 實驗一: 觀察 Gradient DropConnect 的特性..................... 23
4.4 實驗二: 比較 Gradient DropConnect 和其他正則化方法的
效能 ............................................................................... 25
4.4.1 MNIST Dataset 的結果....................................... 27
4.4.2 CIFAR10 Dataset 的結果 .................................... 29
4.4.3 CIFAR100 Dataset 的結果 ................................... 36
4.4.4 NORB Dataset 的結果........................................ 38
4.5 實驗三: 探討 Gradient DropConnect 和一般 DropConnect
在不同捨棄機率下的效能 .................................................... 42
4.6 實驗四: DropConnect、Dropout 和 Gradient DropConnect
訓練完的參數分布比較 ....................................................... 44
五、總結 48
5.1 結論 ........................................................................ 48
5.2 未來展望 .................................................................. 48
參考文獻 51
ix
目錄
附錄 A 補充模型架構 53
A.1 AlexNet.................................................................... 53
A.2 VGG........................................................................ 54
A.3 實驗及模型程式碼 ...................................................... 55

參考文獻

參考文獻
[1] “Sppmg/TW_thesis_template,” GitHub. (), [Online]. Available: https://github.
com/sppmg/TW_Thesis_Template (visited on 10/23/2016).
[2] D. M. Hawkins, “The problem of overfitting,” Journal of chemical information and computer sc
vol. 44, no. 1, pp. 1–12, 2004.
[3] F. Girosi, M. Jones, and T. Poggio, “Regularization theory and neural networks
architectures,” Neural computation, vol. 7, no. 2, pp. 219–269, 1995.
[4] O. Bousquet and A. Elisseeff, “Stability and generalization,” The Journal of Machine Learning
vol. 2, pp. 499–526, 2002.
[5] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdi-
nov, “Improving neural networks by preventing co-adaptation of feature detectors,”
arXiv preprint arXiv:1207.0580, 2012.
[6] L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neu-
ral networks using dropconnect,” in International conference on machine learning,
PMLR, 2013, pp. 1058–1066.
[7] J. V. Uspensky, “Introduction to mathematical probability,” 1937.
[8] J. Ba and B. Frey, “Adaptive dropout for training deep neural networks,” Advances in neural in
vol. 26, 2013.
[9] P. Galeone, Analysis of dropout, Jan. 2017. [Online]. Available: https://pgaleone.
eu/deep-learning/regularization/2017/01/10/anaysis-of-dropout/.
[10] A. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodal bio-
metric systems,” Pattern recognition, vol. 38, no. 12, pp. 2270–2285, 2005.
[11] D. G. Altman and J. M. Bland, “Statistics notes: The normal distribution,” Bmj,
vol. 310, no. 6975, p. 298, 1995.
[12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied
to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324,
1998.
[13] A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny
images,” 2009.
51
[14] Y. LeCun, F. J. Huang, and L. Bottou, “Learning methods for generic object recog-
nition with invariance to pose and lighting,” in Proceedings of the 2004 IEEE Computer Society
IEEE, vol. 2, 2004, pp. II–104.
[15] L. Bottou, “Stochastic gradient descent tricks,” in Neural networks: Tricks of the trade,
Springer, 2012, pp. 421–436.
[16] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? a new look at
signal fidelity measures,” IEEE signal processing magazine, vol. 26, no. 1, pp. 98–
117, 2009.
[17] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:171
2017.
[18] ——, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983,
2016.
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” Advances in neural information processing systems,
vol. 25, 2012.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale
image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[21] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthog-
onal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
[22] P. Juszczak, D. Tax, and R. P. Duin, “Feature scaling in support vector data
description,” in Proc. asci, Citeseer, 2002, pp. 95–102.
[23] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-validation.,” Encyclopedia of database systems,
vol. 5, pp. 532–538, 2009.
[24] L. Prechelt, “Early stopping-but when?” In Neural Networks: Tricks of the trade,
Springer, 1998, pp. 55–69.
[25] D. A. Van Dyk and X.-L. Meng, “The art of data augmentation,” Journal of Computational and
vol. 10, no. 1, pp. 1–50, 2001.
[26] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistic
vol. 58, no. 1, pp. 267–288, 1996.
52

指導教授

陳弘軒(Hung-Hsuan Chen)

審核日期

2022-7-19

推文