基於深度學習之結合全局及局部資訊和修復分割細節的語義分割方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：17

、訪客IP：3.145.110.107

姓名

邱亦成(Yi-Cheng Chiu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於深度學習之結合全局及局部資訊和修復分割細節的語義分割方法
(Global and Local context and Coarse to Fine Semantic Segmentation)

相關論文

★ 基於edX線上討論板社交關係之分組機制	★ 利用Kinect建置3D視覺化之Facebook互動系統
★ 利用 Kinect建置智慧型教室之評量系統	★ 基於行動裝置應用之智慧型都會區路徑規劃機制
★ 基於分析關鍵動量相關性之動態紋理轉換	★ 基於保護影像中直線結構的細縫裁減系統
★ 建基於開放式網路社群學習環境之社群推薦機制	★ 英語作為外語的互動式情境學習環境之系統設計
★ 基於膚色保存之情感色彩轉換機制	★ 一個用於虛擬鍵盤之手勢識別框架
★ 分數冪次型灰色生成預測模型誤差分析暨電腦工具箱之研發	★ 使用慣性傳感器構建即時人體骨架動作
★ 基於多台攝影機即時三維建模	★ 基於互補度與社群網路分析於基因演算法之分組機制
★ 即時手部追蹤之虛擬樂器演奏系統	★ 基於類神經網路之即時虛擬樂器演奏系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

圖像語義分割的問題在計算機視覺和人工智慧是非常熱門的議題。影像分割的訓練資料集的產生，也非常耗費時間及人力，訓練高精確度的影像分割結果來減輕資料產出的成本，也是本論文的目標。最近對基於深度學習的語義分割研究中，為了能即時在道路上運行和GPU卡的容量限制，通常會採取下採樣的操作，導致場景中的細節丟失。我們在論文中探討各個著名的語義分割架構所提出的方法，從自編碼到專注力模型分析其貢獻及優缺點，此外我們也修改其網路架構，提出由二個模組所組成的JCF架構，其中一個模組從高分辨率圖像中取得細節資訊，透過最後通道權重結合二特徵圖使原來的分割結果更加精細。
而我們最終所提出的網路架構GLNet，結合全域專注力資訊和局部的多尺度上下文資訊，幫助模型理解各種場景之間物體的關係，減少分類的錯誤，並透過通道權重模組，引入卷積神經網路前層的資訊來修補分割物件的邊界和細節部分，而我們提出的架構和目前幾個著名的方法相比得到了改進。

摘要(英)

The issue of image semantic segmentation is renowned within computer vision and artificial intelligence. The ground truth in image segmentation is hard to produce and is time- and resource-intensive. It is also the goal of this paper to produce high-precision image segmentation results to reduce the cost of ground truth data output. Recently, in the research of semantic segmentation based on deep learning, in order to be able to run in real-time and limit the capacity of the GPU card, has reduced image resolution through downsampling operation, resulting in detail loss in the scene. In the paper, we explore the famous semantic segmentation architecture, from autoencoder to attention model to analyze its contribution, advantages and disadvantages. In addition, we also modify its network architecture, and propose a JCF architecture consisting of two modules. One module obtains detailed information from high-resolution images, and combines two feature map with the channel weights to make the segmentation result from coarse to fine.
Our proposed network architecture, combined with global spatial information and local multi-scale context information, helps the model understand the relationship between objects between various scenes, reduces false alarm, and repair the boundaries and details of the segmented object through channel attention modules. The experiments of our proposed architecture is improved compared to state-of-the-art methods.

關鍵字(中)

★ 深度學習
★ 語義分割
★ 卷積神經網路

關鍵字(英)

★ deep learning
★ semantic segmentation
★ convolutional neural network

論文目次

1 Introduction 1
2 Related work 3
2.1 Convolutional Neural Network 3
2.1.1 AlexNet 4
2.1.2 VGGNet 5
2.1.3 GoogLeNet 6
2.1.4 ResNet 7
2.2 Semantic Segmentation with Deep Learning 9
2.2.1 FCN 9
2.2.2 U-Net 11
2.2.3 SegNet 12
2.2.4 RefineNet 13
2.2.5 PSPNet 14
2.2.6 DeepLab 15
2.3 Attention-Based Neural Network 20
2.3.1 SENet 20
2.3.2 Non-local Networks 21
2.3.3 DANet 22
3 Method 23
3.1 Development Environment 23
3.2 Data Collection Tool 25
3.3 Fully Convolutional Network for Semantic Segmentation 29
3.4 Dilated Convolution 30
3.5 Multi-Scale Context Information 31
3.6 Joint Coarse-and-Fine Semantic Segmentation 34
3.7 The Purposed Architecture 41
3.7.1 Combine Global-and-Local Context Information 42
3.7.2 Channel Attention for Coarse-and-Fine Fusion 44
4 Experiment 46
4.1 Experimental Setup 46
4.2 Experimental Results 47
5 Conclusion 57
6 Reference 58

參考文獻

[1] V. Badrinarayanan, A. Kendall and R. Cipolla, "SegNet: A Deep convolutional encoder-decoder architecture for image segmentation," in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 2017.
[2] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation." 2015 The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, U.S.A., 2015, pp. 3431-3440.
[3] O. Ronneberger, P. Fischer and T. Brox, "U-net: Convolutional networks for biomedical image segmentation." Medical Image Computing and Computer-Assisted Intervention(MICCAI), Munich, Germany, 2015, pp. 234-241.
[4] G. Lin, A. Milan, C. Shen and I. Reid, "RefineNet: multi-path refinement networks for high-resolution semantic segmentation," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, U.S.A., 2017, pp. 5168-5177.
[5] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1 Nov. 1998.
[6] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected CRFs," International Conference on Learning Representations (ICLR), San Diego, U.S.A., 2015, pp. 1-14.
[7] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, "DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 1 April 2018.
[8] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," International Conference on Learning Representations (ILCR), San Juan, U.S.A., 2016, pp. 1-13.
[9] D. Eigen and R. Fergus, "Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture," The IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 2650-2658.
[10] L. Chen, Y. Yang, J. Wang, W. Xu and A. L. Yuille, "Attention to scale: Scale-aware semantic image segmentation," The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, U.S.A., 2016, pp. 3640-3649.
[11] L. Chen, G. Papandreou, F. Schroff and H. Adam, "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587, 2017.
[12] H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, "Pyramid scene parsing network," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, U.S.A., 2017, pp. 6230-6239.
[13] H. Noh, S. Hong and B. Han, "Learning deconvolution network for semantic segmentation," The IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1520-1528.
[14] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg and F. Li, "ImageNet large scale visual recognition challenge," in International Journal of Computer Vision, vol. 115, no. 3, pp. 211-252, 1 Dec. 2015.
[15] H. Zhao, X. Qi, X. Shen, J. Shi and J. Jia, “ICNet for real-time semantic segmentation on high-resolution images,” arXiv preprint arxiv:1704.08545, 2018.
[16] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet classification with deep convolutional neural networks," The 25th International Conference on Neural Information Processing Systems(NIPS′12), Lake Tahoe, U.S.A., 2012, vol.1, pp. 1097-1105.
[17] K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, U.S.A., 2016, pp. 770-778.
[18] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, U.S.A., 2016, pp. 3213-3223.
[19] L. Chen, Y. Zhu, G. Papandreou, F. Schroff and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” European Conference on Computer Vision (ECCV), Munich, Germany, 2018, pp. 833-851.
[20] K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556.
[21] Mollahosseini, Ali, David Chan, and Mohammad H. Mahoor. "Going deeper in facial expression recognition using deep neural networks." 2016 IEEE Winter conference on applications of computer vision (WACV). IEEE, 2016.
[22] Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[23] X. Wang, R. Girshick, A. Gupta and K. He, "Non-local neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[24] J. Fu, J. Liu, H. Tian, Z. Fang, H. Lu, “Dual attention network for scene segmentation.” arXiv preprint arXiv:1809.02983, 2018.
[25] F. Chollet. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[26] M. Yang, K. Yu, C. Zhang, Z. Li and K. Yang, "Denseaspp for semantic segmentation in street scenes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[27] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi and A. Agrawal, “Context encoding for semantic segmentation.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[28] S. Woo, J. Park, L. Joon-Young and I. So Kweon, "Cbam: Convolutional block attention module." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

指導教授

施國琛(Timothy K. Shih)

審核日期

2019-7-15

推文