融合UNetFormer、對比式學習與生成對抗網路之鑑別器在遙測影像山崩語意分割中的應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：100

、訪客IP：3.136.108.163

姓名

鄭少捷(Shao-Chieh Cheng) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

融合UNetFormer、對比式學習與生成對抗網路之鑑別器在遙測影像山崩語意分割中的應用
(Landslide Semantic Segmentation of Remote Sensing Images using UNetFormer, Contrastive Learning and GAN Discriminator)

相關論文

★ 影片指定對象臉部置換系統	★ 以單一攝影機實現單指虛擬鍵盤之功能
★ 基於視覺的手寫軌跡注音符號組合辨識系統	★ 利用動態貝氏網路在空照影像中進行車輛偵測
★ 以視訊為基礎之手寫簽名認證	★ 使用膚色與陰影機率高斯混合模型之移動膚色區域偵測
★ 影像中賦予信任等級的群眾切割	★ 航空監控影像之區域切割與分類
★ 在群體人數估計應用中使用不同特徵與回歸方法之分析比較	★ 以視覺為基礎之強韌多指尖偵測與人機介面應用
★ 在夜間受雨滴汙染鏡頭所拍攝的影片下之車流量估計	★ 影像特徵點匹配應用於景點影像檢索
★ 自動感興趣區域切割及遠距交通影像中的軌跡分析	★ 基於回歸模型與利用全天空影像特徵和歷史資訊之短期日射量預測
★ Analysis of the Performance of Different Classifiers for Cloud Detection Application	★ 全天空影像之雲追蹤與太陽遮蔽預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-7-2以後開放)

摘要(中)

近年來，臺灣山區的開發與擴張使得邊坡安全評估變得更加重要，特別是在豪雨事件後更容易發生山崩，政府長期以來將防災和減災列為重要的施政項目，並且在地質法通過後，臺灣都市周邊坡地的山崩潛勢評估和地質敏感區的劃設成為關注的重點，因此繪製台灣全島山崩目錄是極為重要的，在深度學習方法尚未盛行前，前人使用完全人工的方式對遙測影像中的山崩部分進行一一圈繪，此方法需耗費非常大量的時間以及人力。隨著時代、科技的演進，深度學習方法在許多領域獲得巨大的成功，利用深度學習我們可以電腦軟體先行圈繪大部分之山崩區域，再經由專業人士進行評估，以此達到減時之效果。
遙測影像具有高空間解析度和豐富的地物資訊，但由於台灣山體地形複雜、環境變化大，傳統的語意分割方法難以達到理想效果。為了解決這一問題，本論文以UNetFormer為主要架構下，提出了一種結合UNetFormer、對比式學習及GAN鑑別器的混合模型，旨在提升分割結果的真實性、邊界細節和模型的泛化能力。
經過實驗後，我們提出的架構與現今流行的遙測影像語意分割模型以及與原架構UNetFormer比較過後，我們取得了不錯的成績。我們提出的兩大組件對比式學習以及GAN鑑別器經過消融實驗的驗證後，確實能有效提升模型的分割能力。

摘要(英)

In recent years, as Taiwan′s mountainous regions have expanded, evaluating slope safety has become crucial, especially after heavy rains which increase landslide risks. Following the Geology Act, assessing landslide potential near urban areas is now a focus. Creating a comprehensive landslide catalog for Taiwan is essential. Before deep learning, manual annotation of landslides in remote sensing images was time-consuming and labor-intensive. With technological advancements, deep learning has greatly improved efficiency in many fields. Using software for preliminary annotations, followed by expert review, saves significant time.
This thesis presents a hybrid model using UNetFormer architecture, incorporating contrastive learning and a GAN discriminator, designed to improve the effectiveness of semantic segmentation in complex terrains. Our multi-stage training framework enhances segmentation accuracy, boundary precision, and model adaptability. Experimental results show our model′s superiority in segmentation capabilities compared to traditional methods and the original UNetFormer.

關鍵字(中)

★ 遙測影像
★ 山崩語意分割
★ 對比式學習
★ 生成對抗網絡

關鍵字(英)

★ Remote sensing images
★ semantic segmentation
★ contrastive learning
★ generative adversarial networks

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vi
第一章緒論 1
1.1 研究背景與動機 1
1.2 論文架構 2
第二章相關研究 4
2.1 臺灣全島山崩資料集 4
2.2 山崩監測以及遙測技術 5
2.3 CNN為基礎的語意分割方法 6
2.4 全域上下文資訊建模 7
2.5 Transformer架構綜述 9
2.5.1 自注意力機制 9
2.5.2 多頭注意力機制 9
2.5.3 位置編碼 10
2.6 以Transformer為基礎的語意分割方法 10
第三章研究方法 13
3.1 資料集 13
3.2 模型架構 13
3.2.1 CNN-based Encoder 14
3.2.2 Transformer-based Decoder 16
3.2.3 Global-Local Transformer Block(GLTB) 16
3.2.4 Feature Refinement Head (FRH) 21
3.2.5 對比式學習 22
3.2.6 GAN鑑別器網路 23
3.3 損失函數 24
3.3.1 Primary Loss 24
3.3.2 Contrastive Loss （L1-Loss） 25
3.3.3 GAN 鑑別器損失 25
3.3.4 總損失 26
第四章實驗結果 27
4.1 設備環境設定 27
4.2 實作細節 27
4.3 驗證指標 28
4.3.1 召回率(Recall)、Dice係數 28
4.3.2 IoU（Intersection over Union） 29
4.4 完整模型之實驗比較結果 29
4.4.1 台灣全島山崩資料集模型性能評估結果 30
4.4.2 台灣全島山崩語意分割之圖式化結果 31
4.5 消融實驗(Ablation Experiments) 33
4.5.1 各組件消融實驗之數值結果 34
4.5.2 各組件消融實驗之圖式化結果 35
第五章結論與未來研究方向 38
參考文獻 39
附錄1模型架構之編碼器各層參數結構 43
附錄2模型架構之編碼器各層參數結構 45

參考文獻

[1] L. Wang et al., "UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 190, pp. 196-214, 2022.
[2] Y. Bengio, "Deep learning of representations: Looking forward," in International conference on statistical language and speech processing, 2013: Springer, pp. 1-37.
[3] I. Goodfellow et al., "Generative adversarial nets," Advances in neural information processing systems, vol. 27, 2014.
[4] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[5] R. Kemker, C. Salvaggio, and C. Kanan, "Algorithms for semantic segmentation of multispectral remote sensing imagery using deep learning," ISPRS journal of photogrammetry and remote sensing, vol. 145, pp. 60-77, 2018.
[6] I. Kotaridis and M. Lazaridou, "Remote sensing image segmentation advances: A meta-analysis," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 173, pp. 309-322, 2021.
[7] X.-Y. Tong et al., "Land-cover classification with high-resolution remote sensing images using transferable deep models," Remote Sensing of Environment, vol. 237, p. 111322, 2020.
[8] W. Zhao and S. Du, "Learning multiscale and deep representations for classifying remotely sensed imagery," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 113, pp. 155-165, 2016.
[9] X. X. Zhu et al., "Deep learning in remote sensing: A comprehensive review and list of resources," IEEE geoscience and remote sensing magazine, vol. 5, no. 4, pp. 8-36, 2017.
[10] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, 2015: Springer, pp. 234-241.
[11] V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481-2495, 2017.
[12] F. I. Diakogiannis, F. Waldner, P. Caccetta, and C. Wu, "ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 162, pp. 94-114, 2020.
[13] K. Yue, L. Yang, R. Li, W. Hu, F. Zhang, and W. Li, "TreeUNet: Adaptive tree convolutional neural networks for subdecimeter aerial image segmentation," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 156, pp. 1-13, 2019.
[14] Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, "Unet++: A nested u-net architecture for medical image segmentation," in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4, 2018: Springer, pp. 3-11.
[15] Q. Liu, M. Kampffmeyer, R. Jenssen, and A.-B. Salberg, "Dense dilated convolutions’ merging network for land cover classification," IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 9, pp. 6309-6320, 2020.
[16] W. Zhao, S. Du, Q. Wang, and W. J. Emery, "Contextually guided very-high-resolution imagery classification with semantic segments," ISPRS journal of Photogrammetry and Remote Sensing, vol. 132, pp. 48-60, 2017.
[17] D. Marmanis, K. Schindler, J. D. Wegner, S. Galliani, M. Datcu, and U. Stilla, "Classification with an edge: Improving semantic image segmentation with boundary detection," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 135, pp. 158-172, 2018.
[18] K. Nogueira, M. Dalla Mura, J. Chanussot, W. R. Schwartz, and J. A. Dos Santos, "Dynamic multicontext segmentation of remote sensing images based on convolutional networks," IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 10, pp. 7503-7520, 2019.
[19] J. Sherrah, "Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery," arXiv preprint arXiv:1606.02585, 2016.
[20] L. Wang, R. Li, C. Duan, C. Zhang, X. Meng, and S. Fang, "A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images," IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1-5, 2022.
[21] J. Fu et al., "Dual attention network for scene segmentation," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3146-3154.
[22] Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, "Ccnet: Criss-cross attention for semantic segmentation," in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 603-612.
[23] Y. Yuan, X. Chen, and J. Wang, "Object-contextual representations for semantic segmentation," in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, 2020: Springer, pp. 173-190.
[24] X. Yang et al., "An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 177, pp. 238-262, 2021.
[25] H. Li, K. Qiu, L. Chen, X. Mei, L. Hong, and C. Tao, "SCAttNet: Semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images," IEEE Geoscience and Remote Sensing Letters, vol. 18, no. 5, pp. 905-909, 2020.
[26] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
[27] N. He, X. Qu, Z. Yang, L. Xu, and F. Gurkalo, "Disaster Mechanism and Evolution Characteristics of Landslide–Debris-Flow Geohazard Chain Due to Strong Earthquake—A Case Study of Niumian Gully," Water, vol. 15, no. 6, p. 1218, 2023.
[28] A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.
[29] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable detr: Deformable transformers for end-to-end object detection," arXiv preprint arXiv:2010.04159, 2020.
[30] S. Zheng et al., "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 6881-6890.
[31] Y. Bazi, L. Bashmal, M. M. A. Rahhal, R. A. Dayil, and N. A. Ajlan, "Vision transformers for remote sensing image classification," Remote Sensing, vol. 13, no. 3, p. 516, 2021.
[32] D. Hong et al., "SpectralFormer: Rethinking hyperspectral image classification with transformers," IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-15, 2021.
[33] R. Li, S. Zheng, C. Duan, J. Su, and C. Zhang, "Multistage attention ResU-Net for semantic segmentation of fine-resolution remote sensing images," IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1-5, 2021.
[34] H. Chen, Z. Qi, and Z. Shi, "Remote sensing image change detection with transformers," IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-14, 2021.
[35] L. Wang, R. Li, D. Wang, C. Duan, T. Wang, and X. Meng, "Transformer meets convolution: A bilateral awareness network for semantic segmentation of very fine resolution urban scene images," Remote Sensing, vol. 13, no. 16, p. 3065, 2021.
[36] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, "SegFormer: Simple and efficient design for semantic segmentation with transformers," Advances in neural information processing systems, vol. 34, pp. 12077-12090, 2021.
[37] H. Cao et al., "Swin-unet: Unet-like pure transformer for medical image segmentation," in European conference on computer vision, 2022: Springer, pp. 205-218.
[38] J. Chen et al., "Transunet: Transformers make strong encoders for medical image segmentation," arXiv preprint arXiv:2102.04306, 2021.
[39] Z. Liu et al., "Swin transformer: Hierarchical vision transformer using shifted windows," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012-10022.
[40] T. Panboonyuen, K. Jitkajornwanich, S. Lawawirojwong, P. Srestasathiern, and P. Vateekul, "Transformer-based decoder designs for semantic segmentation on remotely sensed images," Remote Sensing, vol. 13, no. 24, p. 5100, 2021.
[41] J. Naidoo, N. Bates, T. Gee, and M. Nejati, "Pallet Detection from Synthetic Data Using Game Engines," arXiv preprint arXiv:2304.03602, 2023.
[42] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2881-2890.
[43] X. Tang, Z. Tu, Y. Wang, M. Liu, D. Li, and X. Fan, "Automatic detection of coseismic landslides using a new transformer method," Remote Sensing, vol. 14, no. 12, p. 2884, 2022.
[44] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[45] X. Wang, R. Girshick, A. Gupta, and K. He, "Non-local neural networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7794-7803.
[46] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in International conference on machine learning, 2015: pmlr, pp. 448-456.
[47] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818.

指導教授

鄭旭詠(Hsu-Yung Cheng)

審核日期

2024-7-11

推文