基於注意力之用於物件定位的語義分割方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：11

、訪客IP：3.145.53.97

姓名

席朗斯(Phanuvich Hirunsirisombut) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於注意力之用於物件定位的語義分割方法
(Attention Based Semantic Segmentation for Object Localization)

相關論文

★ 基於圖卷積網路的自動門檢測	★ 以多模態時空域建模的深度學習方法分類影像中的動態模式
★ 基於職業技能和教育視訊之學習內容生成與總結方法

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

現今已有多數的研究者使用深度學習來解決電腦視覺領域的問題，語意分割是其中一項最熱門的問題，其目的在於以像素為單位進行各類別的標註。U-Net是其中一項著名的方法，該方法於2015年提出並用於生物醫學上的圖像語意分割。然而U-Net用於小物件的分割不甚理想，此外先前的研究引出了一個問題，該問題與透過Self-attention來強化語意分割時使用ReLU有關，因為該激勵函數會將負數轉為零。
為了解決這些問題，有人提出用於物件定位的語意分割基於Dilated Attention的方法。首先這個研究使用典型的U-Net來提取特徵，為了預防相關資訊在較深層的網路時會遺失，注意力模組被用於skipping connected時。此外每個注意力模組皆使用擴張捲積取代典型的卷積來增加感受野，並將淺層特徵傳至深層。在現實環境中車子等物件可能會因太靠近而有重疊的現象，對這個現象進行語意分割的問題稱之為「merging regions」。我們在分割兩個物件時使用Watershed transform後處理的方法來解決該問題。實驗結果顯示在語意分割任務中相較於原先的方法並使用數種不同的損失函數，這個方法在Dice score coefficient評分較原先的方法來得優秀。

摘要(英)

Nowadays, many researches were built up to solve problems in computer vision field by using deep learning algorithms. Semantic segmentation is one of most popular problem that related to label every single of pixels in an image which category that they belong to. Then, famous approach call “U-Net” was invented in 2015 for medical purpose in case of biomedical segmentation. Unfortunately, U-net is facing with small reception fields that affected to outcoming result. Moreover, one problem of previous work of usage self-attention for enhance semantic segmentation came from using Rectified Linear Unit (ReLU), because of degree of negative part of this activation function will judge every value that spreading around negative number into zero. To address these problems, Dilated Attention based semantic segmentation for object localization was proposed. Firstly, this work using standard U-net as a main network to extract features from input. Then, each edge of skipping strategy inside U-net network, attention modules are placed to prevent missing relevant information while going deeper. Moreover, each attention module is using atrous convolutional instead of ordinary convolutional to enlarge reception fields of attention module to collect and pass feature from coarse layer to fine layer. In the real scenarios, object like cars may stick too close to another cars. Unfortunately, this problem called “merging regions” that appear then we try to segment two or more object that are overlaying. To solve the problem, Watershed transform is using as post-processing strategy to separate two objects apart. For experimental result shows that under Dice score coefficient or DSC measurement this proposed method outperform baseline model with combination of models with different well-known loss functions in semantic segmentation task.

關鍵字(中)

★ 語意分割
★ 深度學習
★ 擴張捲積
★ 注意力網路

關鍵字(英)

★ Semantic Segmentation
★ Deep learning
★ Dilated Convolutional
★ Attention network

論文目次

Table of Content

Abstract V
摘要 VIII
Acknowledgement IX
List Of Figures XI
List Of Tables XI
Chapter 1 Introduction 1
1.1 Background 1
1.2 Dissertation Organization 4
Chapter 2 Related Works 5
2.1 Segmentation Task 5
2.2 Attention Networks 14
2.3 Differential Of Activation Functions 17
2.4 Watershed Transform For Image Segmentation 20
Chapter 3 Methodology 24
3.1 Proposed Model 24
3.2 Attention Module For Semantic Segmentation 26
3.3 Watershed Transform For Post-Processing 28
Chapter 4 Experimental Result 29
4.1 Dataset And Annotation 29
4.2 Experimental Setup 30
4.3 Comparison To Baseline Model 30
Chapter 5 Discussion 36
Chapter 6 Conclusion And Future Works 37
6.1 Conclusion 37
6.2 Future Works 38
Reference 39

參考文獻

[1] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, vol. 07-12-June, pp. 3431–3440, doi: 10.1109/CVPR.2015.7298965.
[2] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, 2015, pp. 234–241.
[3] O. Oktay et al., “Attention U-Net: Learning Where to Look for the Pancreas,” 2018, [Online]. Available: http://arxiv.org/abs/1804.03999.
[4] M. Cordts et al., “The Cityscapes Dataset for Semantic Urban Scene Understanding,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, vol. 2016-Decem, pp. 3213–3223, doi: 10.1109/CVPR.2016.350.
[5] Y. Li, K. He, J. Sun, and others, “R-fcn: Object detection via region-based fully convolutional networks,” Adv. Neural Inf. Process. Syst., no. Nips, pp. 379–387, 2016, [Online]. Available: http://papers.nips.cc/paper/6465-r-fcn-object-detection-via-region-based-fully-convolutional-networks.pdf.
[6] L. Wang, W. Ouyang, X. Wang, and H. Lu, “Visual tracking with fully convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, vol. 2015 Inter, pp. 3119–3127, doi: 10.1109/ICCV.2015.357.
[7] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “BiSeNet: Bilateral segmentation network for real-time semantic segmentation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11217 LNCS, pp. 334–349, doi: 10.1007/978-3-030-01261-8_20.
[8] Y. Hu et al., “Fully Automatic Pediatric Echocardiography Segmentation Using Deep Convolutional Networks Based on BiSeNet,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 2019, pp. 6561–6564, doi: 10.1109/EMBC.2019.8856457.
[9] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, 2020, doi: 10.1109/TPAMI.2018.2844175.
[10] L. Chen, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834--848, 2015.
[11] M. T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” in Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1412–1421, doi: 10.18653/v1/d15-1166.
[12] W. Wang and J. Shen, “Deep Visual Attention Prediction,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2368–2378, 2018, doi: 10.1109/TIP.2017.2787612.
[13] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015.
[14] F. Wang et al., “Residual attention network for image classification,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017-Janua, pp. 6450–6458, doi: 10.1109/CVPR.2017.683.
[15] Z. Shi, C. Chen, Z. Xiong, D. Liu, Z. J. Zha, and F. Wu, “Deep residual attention network for spectral image super-resolution,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11133 LNCS, pp. 214–229, doi: 10.1007/978-3-030-11021-5_14.
[16] J.-H. Kim, J.-H. Choi, M. Cheon, and J.-S. Lee, “RAM: Residual Attention Module for Single Image Super-Resolution,” arXiv Prepr., 2018, doi: arXiv:1811.12043v1.
[17] T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, vol. 2017-Octob, pp. 2999–3007, doi: 10.1109/ICCV.2017.324.
[18] N. Abraham and N. M. Khan, “A novel focal tversky loss function with improved attention u-net for lesion segmentation,” in Proceedings - International Symposium on Biomedical Imaging, 2019, vol. 2019-April, pp. 683–687, doi: 10.1109/ISBI.2019.8759329.
[19] H. P. Ng, S. Huang, S. H. Ong, K. W. C. Foong, P. S. Goh, and W. L. Nowinski, “Medical image segmentation using watershed segmentation with texture-based region merging,” in Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS’08 - “Personalized Healthcare through Technology,” 2008, pp. 4039–4042, doi: 10.1109/iembs.2008.4650096.
[20] Y. Zhao, J. Liu, H. Li, and G. Li, “Improved watershed algorithm for dowels image segmentation,” in Proceedings of the World Congress on Intelligent Control and Automation (WCICA), 2008, pp. 7640–7643, doi: 10.1109/WCICA.2008.4594115.
[21] M. Sikander Hayat Khiyal, A. Khan, and A. Bibi, “Modified Watershed Algorithm for Segmentation of 2D Images,” Issues Informing Sci. Inf. Technol., vol. 6, pp. 877–886, 2009, doi: 10.28945/1077.
[22] M. Bai and R. Urtasun, “Deep watershed transform for instance segmentation,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017-Janua, pp. 2858–2866, doi: 10.1109/CVPR.2017.305.
[23] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” 2016.
[24] Y. Wei, H. Xiao, H. Shi, Z. Jie, J. Feng, and T. S. Huang, “Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 1, pp. 7268–7277, 2018, doi: 10.1109/CVPR.2018.00759.
[25] “UdaCity dataset.” https://github.com/udacity/self-driving-car/.

指導教授

施國琛教授(Prof. Timothy K. Shih)

審核日期

2020-7-10

推文