結合前景感知與多尺度注意力機制之語意分割模型應用於土石流偵測

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：113

、訪客IP：13.59.153.178

姓名

陳元娣(Yuan-Di Chen) 查詢紙本館藏

畢業系所

軟體工程研究所

論文名稱

結合前景感知與多尺度注意力機制之語意分割模型應用於土石流偵測
(Foreground-Aware and Multi-scale Convolutional Attention Mechanism for Remote Sensing Images Semantic Segmentation in Landslide Detection)

相關論文

★ 影片指定對象臉部置換系統	★ 以單一攝影機實現單指虛擬鍵盤之功能
★ 基於視覺的手寫軌跡注音符號組合辨識系統	★ 利用動態貝氏網路在空照影像中進行車輛偵測
★ 以視訊為基礎之手寫簽名認證	★ 使用膚色與陰影機率高斯混合模型之移動膚色區域偵測
★ 影像中賦予信任等級的群眾切割	★ 航空監控影像之區域切割與分類
★ 在群體人數估計應用中使用不同特徵與回歸方法之分析比較	★ 以視覺為基礎之強韌多指尖偵測與人機介面應用
★ 在夜間受雨滴汙染鏡頭所拍攝的影片下之車流量估計	★ 影像特徵點匹配應用於景點影像檢索
★ 自動感興趣區域切割及遠距交通影像中的軌跡分析	★ 基於回歸模型與利用全天空影像特徵和歷史資訊之短期日射量預測
★ Analysis of the Performance of Different Classifiers for Cloud Detection Application	★ 全天空影像之雲追蹤與太陽遮蔽預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-7-2以後開放)

摘要(中)

隨著衛星和無人機技術的進步，現在越來越容易獲取高解析度遙測影像資料，這促使遙測影像在眾多領域中得到廣泛的研究和應用。其中，遙測影像語意分割是一個特殊的語意分割任務，不僅面臨多尺度挑戰，還具有以下兩個獨特的挑戰特徵：一個是極度的前景-背景不平衡分佈，二是多個小物體與複雜背景共存，然而，現有的語意分割方法主要研究在自然場景中的尺度變化，忽略了遙測影像所面臨的特定問題，缺乏對前景建模。為了解決這些問題，本論文提出一種前景感知的遙測語意分割模型。
該模型引入了多尺度卷積注意力機制，並採用特徵金字塔網絡(FPN)架構提取多尺度特徵，以解決多尺度問題。通過前景-場景關係模組對前景和場景進行建模，增強前景特徵，從而抑制誤報。在損失函數部分，使用正規化的交點損失函數，在訓練過程中專注於前景樣本，以緩解對前景背景分配不均問題，
透過實驗和分析，在LS資料集的基準測試中，所提出的方法優於最先進的通用語義分割方法以及基於Transformer方法，並在速度和精確度之間達到平衡。

摘要(英)

As satellite and aerial camera technology advances, acquiring high-resolution remote sensing images has become more readily achievable, leading to widespread research and applications in various fields. Remote sensing image semantic segmentation is a crucial task that provides semantic and localization information for target objects. Besides the large-scale variation issues common in most semantic segmentation datasets, aerial images present unique challenges, including high background complexity and imbalanced foreground-background ratios. However, general semantic segmentation methods primarily address scale variations in natural scenes and often neglect the specific challenges in remote sensing images, such as inadequate foreground modeling.
In this paper, we present a foreground-aware remote sensing semantic segmentation model. The model introduces a multi-scale convolutional attention mechanism and utilizes a Feature Pyramid Network (FPN) architecture to extract multi-scale features, addressing the multi-scale problem. Additionally, we introduce a foreground-scene relation module to mitigate false alarms. The model enhances the foreground features by modeling the relationship between the foreground and the scene. In the loss function, a Soft Focal Loss focuses on foreground samples during training, alleviating the foreground-background imbalance issue.
Experimental results indicate that our proposed method surpasses current state-of-the-art general semantic segmentation and transformer-based methods on LS dataset benchmark, achieving a trade-off between speed and accuracy.

關鍵字(中)

★ 遙測語意分割
★ 特徵金字塔網絡
★ 卷積注意力機制
★ 多尺度特徵融合

關鍵字(英)

★ Remote Sensing
★ Semantic segmentation
★ Convolutional Attention Mechanism
★ Multi-scale Features Fusion

論文目次

Abstract I
摘要 II
目錄 III
圖目錄 V
表目錄 VI
1 緒論 1
1.1 研究背景與動機 1
1.2 論文結構 2
2 相關研究 3
2.1 相關背景與發展 3
2.1.1 語意分割 3
2.1.2 遙測語意分割 4
2.2 相關技術 5
2.2.1 VAN 5
2.2.2 GoogLeNet 6
2.2.3 HRNet 7
2.2.4 SegFormer 8
2.2.5 SegNeXt 9
2.2.6 FPN 10
2.2.7 FarSeg 11
3 研究方法 13
3.1 模型架構 13
3.1.1 編碼器(Encoder) 14
3.1.2 Foreground-Scene Relation Module 18
3.1.3 解碼器(Decoder) 20
3.2 損失函數 22
4 實驗結果 23
4.1 實作細節 24
4.2 資料集 25
4.2.1 LS Dataset 25
4.2.2 Bijie Dataset 26
4.3 驗證指標 27
4.3.1 精確率(Precision)、招回率(Recall) 27
4.3.2 IoU(Intersection over Union) 28
4.3.3 Pixel Accuracy 29
4.3.4 F1 score 29
4.4 主要結果 30
4.5 消融實驗 35
5 結論 39
參考文獻 40

參考文獻

[1] F. Rottensteiner et al., "The ISPRS benchmark on urban object classification and 3D building reconstruction," ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences; I-3, vol. 1, no. 1, pp. 293-298, 2012.
[2] M. Volpi and V. Ferrari, "Semantic segmentation of urban scenes by learning local class interactions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2015, pp. 1-9.
[3] M. Kampffmeyer, A.-B. Salberg, and R. Jenssen, "Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2016, pp. 1-9.
[4] R. Kemker, C. Salvaggio, and C. Kanan, "Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning," ISPRS journal of photogrammetry and remote sensing, vol. 145, pp. 60-77, 2018.
[5] Z. Deng, H. Sun, S. Zhou, J. Zhao, L. Lei, and H. Zou, "Multi-scale object detection in remote sensing imagery with convolutional neural networks," ISPRS journal of photogrammetry and remote sensing, vol. 145, pp. 3-22, 2018.
[6] G.-S. Xia et al., "DOTA: A large-scale dataset for object detection in aerial images," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3974-3983.
[7] Z. Zheng, Y. Zhong, J. Wang, and A. Ma, "Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4096-4105.
[8] S. Waqas Zamir et al., "isaid: A large-scale dataset for instance segmentation in aerial images," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 28-37.
[9] L.-C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, "Attention to scale: Scale-aware semantic image segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3640-3649.
[10] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122, 2015.
[11] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881-2890.
[12] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[13] M.-H. Guo, C.-Z. Lu, Q. Hou, Z. Liu, M.-M. Cheng, and S.-M. Hu, "Segnext: Rethinking convolutional attention design for semantic segmentation," Advances in Neural Information Processing Systems, vol. 35, pp. 1140-1156, 2022.
[14] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980-2988.
[15] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[16] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[17] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, "Aggregated residual transformations for deep neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492-1500.
[18] J. Wang et al., "Deep high-resolution representation learning for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3349-3364, 2020.
[19] S. Zheng et al., "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 6881-6890.
[20] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, "SegFormer: Simple and efficient design for semantic segmentation with transformers," Advances in neural information processing systems, vol. 34, pp. 12077-12090, 2021.
[21] J. Fu et al., "Dual attention network for scene segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3146-3154.
[22] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, "Large kernel matters--improve semantic segmentation by global convolutional network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4353-4361.
[23] H. Ding, X. Jiang, A. Q. Liu, N. M. Thalmann, and G. Wang, "Boundary-aware feature propagation for scene segmentation," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 6819-6829.
[24] A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.
[25] M.-H. Guo, C.-Z. Lu, Z.-N. Liu, M.-M. Cheng, and S.-M. Hu, "Visual attention network," Computational Visual Media, vol. 9, no. 4, pp. 733-752, 2023.
[26] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, "A convnet for the 2020s," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976-11986.
[27] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[28] Z. Geng, M.-H. Guo, H. Chen, X. Li, K. Wei, and Z. Lin, "Is attention better than matrix decomposition?," arXiv preprint arXiv:2109.04553, 2021.
[29] A. Kirillov, R. Girshick, K. He, and P. Dollár, "Panoptic feature pyramid networks," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 6399-6408.
[30] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818.
[31] M. Yin et al., "Disentangled non-local neural networks," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, 2020: Springer, pp. 191-207.
[32] X. Li, Z. Zhong, J. Wu, Y. Yang, Z. Lin, and H. Liu, "Expectation-maximization attention networks for semantic segmentation," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9167-9176.
[33] Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, "Ccnet: Criss-cross attention for semantic segmentation," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 603-612.
[34] T. Xiao, Y. Liu, B. Zhou, Y. Jiang, and J. Sun, "Unified perceptual parsing for scene understanding," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 418-434.
[35] Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, "Gcnet: Non-local networks meet squeeze-excitation networks and beyond," in Proceedings of the IEEE/CVF international conference on computer vision workshops, 2019.

指導教授

鄭旭詠(Hsu-Yung Cheng)

審核日期

2024-7-22

推文