利用注意力插件改善卷積網路：使用前置與後置方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：80

、訪客IP：3.144.105.209

姓名

吳佳霖(Chia-Lin Wu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

利用注意力插件改善卷積網路：使用前置與後置方法
(Attention-based plugin for CNN improvement: Front end and Back end)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

卷積神經網絡處理的一個常見的任務是圖像分類任務，且其模型結構可以進一步擴展到不同類型的工作。例如，影像語意分割與對象檢測都基於類似於處理分類問題的卷積網路架構。基於卷積神經網絡提供的特徵識別能力，卷積神經網絡在處理這些任務時，與其他傳統方法相比具有一定的性能上的提升。大多數卷積神經網絡的設計，通常將原始圖像作為這些任務的訓練和測試階段的輸入信息。因為在電腦視覺的技術中，特徵的擷取與選擇並不總是可預期的，藉由卷積網路自身的學習能力能夠提取到更適合的特徵。當任務的描述目標未覆蓋整個圖像時，卷積神經網絡可能會在訓練時將部分非正確的特徵納入預測考量。為了提高卷積神經網絡模型的正確性和穩定性，並且不遺漏任何隱含的圖像信息，我們嘗試將專注遮罩資訊以數種不同的形式提供給深度學習模型。為了後續實驗的比較，我們採取了兩個主要想法去設計各個方法。第一個是前置方法，這類型的方法會以不同形式提供專注資訊給模型的輸入階段。主要是在模型的輸入階段為更好的預測結果提供了額外的附加特徵。另一種後置方法，是為了提高判斷正確位置的能力，在訓練階段應用額外的子訓練任務。相比之下，第二種種類的方法，為我們的實驗的目標任務提供了更合理的改進和兼容性。

摘要(英)

A general task that convolutional neural network(CNN) dealing is image classification, and the model structure has been further extended to different kinds of works. For example, both semi-segmentation and object detection are based on slimier technics that solve the classification problem. Based on the pattern recognition ability that CNN provided, it can provide more performance improvement compared to other traditional methods. Most of the CNN design usually takes a raw image as input information on both training and testing phase of these tasks, because the suitable feature in computer vision is not always predictable. When the describing target of a task is not covering the entire image, the CNN model will be free to learn any pattern that might not be the right patterns of the target objects. For increasing the correctness and the robustness of a CNN model and not losing any possible information of an image, we attempt to assign attention information to the model. For comparison, there are two groups of methods we are using. The front end which assigns the attention information in different forms provides an additional feature for the prediction. Another end aims to increase the ability of judgment on correct positions, that applies an additional loss function on the training phase. For comparison, the second end provides more reasonable improvements and compatibility on our experimental results.

關鍵字(中)

★ 卷積網路
★ 插件
★ 注意力模型

關鍵字(英)

★ CNN
★ plugin
★ attention model

論文目次

1. Introduction 1
2. Related work 5
2.1 Features extraction methods 5
2.1.1 Residual neural network 6
2.1.2 ResNeXt 7
2.1.3 MobileNet 9
2.1.4 SSD: Single Shot Multi-Box Detector 10
2.1.5 FPN: feature pyramid networks 11
2.2 Visual Attention methods 13
2.2.1 Show, Attend and Tell: Neural Image Caption 14
2.2.2 Residual Attention Network for Image Classification 15
2.2.3 Interpretable Convolutional Neural Networks 17
2.2.4 Dual Attention Network for Scene Segmentation 18
2.2.5 Pedestrians detection via Simultaneous Detection & Segmentation 20
3. Architecture 22
3.1 Attention mask 22
3.1.1 Definition of the attention mask 23
3.1.2 The overlapping between the bounding boxes 24
3.2 Front: Additional attention information 25
3.2.1 An additional information for a pre-trained model 26
3.2.2 Additional input path 28
3.2.3 Weighted fusion 29
3.3 Back: Variance Loss function 31
3.3.1 A new method for feature map visualization 31
3.3.2 Variance loss function 38
3.3.3 Modification for object detection 40
3.4 Classification task experiments 42
3.4.1 CUB200 45
3.4.2 Stanford Dog 49
3.5 Object detection task 51
3.5.1 PASCAL-VOC 51
4. Conclusion 56
5. Reference 58

參考文獻

[1] A. Krizhevsky, I. Sutskever, and G. Hinton. “Imagenet classification with deep convolutional neural networks.” In NIPS, 2012.
[2] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. “Imagenet large scale visual recognition challenge.” arXiv:1409.0575, 2014.
[3] K. He, X. Zhang, S. Ren, and J. Sun. “Identity mappings in deep residual networks.” In ECCV, 2016
[4] K. He, X. Zhang, S. Ren, and J. Sun. “Deep residual learning for image recognition.” In CVPR, 2016.
[5] S. Zagoruyko and N. Komodakis. “Wide residual networks.” arXiv:1605.07146, 2016.
[6] S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He. “Aggregated residual transformations for deep neural networks.” In CVPR, 2017.
[7] G. Huang, Z. Liu, K. Q. Weinberger, and L. Maaten. “Densely connected convolutional networks.”, In CVPR, 2017.
[8] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. “Inceptionv4, inception-resnet and the impact of residual connections on learning.”, In ICLR Workshop, 2016.
[9] X. Zhang, X. Zhou, M. Lin, and J. Sun. “Shufflenet: An extremely efficient convolutional neural network for mobile devices.”, arXiv:1707.01083, 2017
[10] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. “Mobilenets: Efficient convolutional neural networks for mobile vision applications.”, arXiv:1704.04861, 2017.
[11] D. Eigen, C. Puhrsch, and R. Fergus. “Depth map prediction from a single image using a multi-scale deep network.” arXiv:1406.2283, 2014.
[12] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” arXiv:1411.4734, 2014
[13] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio, “Show, attend and tell: Neural image caption generation with visual attention.” In ICML, 2015.
[14] Q. Zhang, Y. N. Wu, and S.-C. Zhu. “Interpretable convolutional neural networks.” In CVPR, 2018.
[15] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition.” In ICLR, 2015.
[16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. “Going deeper with convolutions.” In CVPR, 2015
[17] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Cham, pp. 234-241, 2015.
[18] Girshick, R. B., Donahue, J., Darrell, T., and Malik, J. Rich “Feature hierarchies for accurate object detection and semantic segmentation.”, CVPR, 2014.
[19] R. Girshick. “Fast R-CNN. “In ICCV, 2015.
[20] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed, “SSD: Single shot multibox detector,” arXiv:1512.02325, 2015.
[21] M. Liang and X. Hu. “Recurrent convolutional neural network for object recognition.” In CVPR, 2015.
[22] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie. “Feature pyramid networks for object detection.” In CVPR, 2017.
[23] H. Zheng, J. Fu, T. Mei, and J. Luo. “Learning multi-attention convolutional neural network for fine-grained image recognition.” In ICCV, 2017.
[24] Brazil, G., Yin, X., Liu, X.: “Illuminating Pedestrians via Simultaneous Detection & Segmentation.” In ICCV, 2017.
[25] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang. “Residual attention network for image classification.” In CVPR, 2017.
[26] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. “The Caltech-UCSD Birds-200-2011 Dataset.” Technical Report CNS-TR-2011-001, California Institute of Technology, 2011
[27] Jun Fu, Jing Liu, Haijie Tian, Zhiwei Fang, and Hanqing Lu. “Dual attention network for scene segmentation.” arXiv:1809.02983, 2018.
[28] Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao and Li Fei-Fei. “Novel dataset for Fine-Grained Image Categorization. First Workshop on Fine-Grained Visual Categorization (FGVC)”, In CVPR, 2011.
[29] J. Hu, L. Shen, and G. Sun. “Squeeze-and-excitation networks.” arXiv:1709.01507, 2017.
[30] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database.” In CVPR, 2009.
[31] S. Ren, K. He, R. Girshick, and J. Sun. “Faster R-CNN: Towards real-time object detection with region proposal networks.”, In NIPS, 2015.
[32] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. “Focal loss for dense object detection.”, In ICCV, 2017
[33] Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J. and Zisserman, A. “The PASCAL Visual Object Classes Challenge: A Retrospective”, International Journal of Computer Vision, 88(2), 303-338, 2010

指導教授

施國琛(Guo-Chen Shih)

審核日期

2019-7-15

推文