基於極化注意力網路之領域自適應語義分割

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：25

、訪客IP：13.58.219.150

姓名

張瑋菱(Wei-Ling Chang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於極化注意力網路之領域自適應語義分割
(PANDA: Polarized Attention Network for Domain Adaptive Semantic Segmentation)

相關論文

★ 基於卷積神經網路之注視區塊估測

★ 多尺度可變形卷積對齊網路應用於影片超解析

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-7-2以後開放)

摘要(中)

在無監督領域自適應中，使來源域上訓練的模型能夠將其所獲得的知識轉移到目標域並展現出良好性能，而無需目標域的標註。目前，既有的方法主要著重於減少不同域之間特徵、像素和預測的差異。然而，圖像內部的上下文相關性等領域內知識仍然未得到充分探索。針對語義分割任務，本文提出了一種基於極化注意力網路的無監督領域自適應模型PANDA，主要透過極化注意力捕捉圖像的局部和全局結構，同時結合通道和空間的資訊提高對特徵的感知能力，以改善模型的多層級特徵融合。該方法在兩個廣泛使用的無監督領域自適應情境中取得了進步，PANDA使得最先進的性能在GTA→Cityscapes提高了0.2 mIoU，而在SYNTHIA→Cityscapes提高了1.4 mIoU，分別達到76.1和68.7 mIoU。實驗成果表明了模型捕捉局部細節和全局特徵的有效性，為無監督領域自適應問題提供了一種新的解決方案。

摘要(英)

In Unsupervised Domain Adaptation (UDA), the goal is to enable models trained on the source domain to transfer their acquired knowledge to the target domain and exhibit robust performance without annotations from the target domain. Presently, existing methods primarily focus on reducing differences in features, pixels, and predictions between different domains. However, domain-internal knowledge, such as contextual relevance within images, remains underexplored. For semantic segmentation, we propose a novel UDA method PANDA that primarily leverages Polarized Self-Attention (PSA) to capture both local and global structures of the images, while integrating channel and spatial information to enhance feature perception and improve multi-level feature fusion within the model. The proposed method demonstrates advancements in two widely used UDA scenarios. Specifically, PANDA improves the state-of-the-art performance by 0.2 mIoU on GTA→Cityscapes and 1.4 mIoU on SYNTHIA→Cityscapes, resulting in 76.1 mIoU and 68.7 mIoU, respectively. Experimental results illustrate the effectiveness of PANDA in capturing both local details and global features, offering a solution for UDA problems.

關鍵字(中)

★ 無監督領域自適應
★ 語義分割
★ 注意力機制

關鍵字(英)

★ Unsupervised Domain Adaptation
★ Semantic Segmentation
★ Attention mechanism

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 iv
表目錄 v
第一章緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 論文架構 3
第二章文獻回顧 4
2.1 無監督領域自適應 4
2.2 注意力機制 7
2.3 空間注意力 8
2.4 通道注意力 9
2.5 雙重注意力 10
第三章研究方法 13
3.1 模型架構 13
3.2 來源域訓練 14
3.3 目標域訓練 16
3.4 跨域訓練 17
3.5 極化注意力網路 18
第四章實驗成果 22
4.1 實驗環境 22
4.2 資料集 23
4.2.1 GTA 23
4.2.2 SYNTHIA 25
4.2.3 Cityscapes 26
4.2.4 類別選定 27
4.3 評估指標 30
4.4 不同極化注意力排列的比較 30
4.5 不同卷積模組的比較 31
4.6 不同注意力機制對模型的影響 32
4.7 不同UDA方法的比較 33
4.8 計算效率分析 36
第五章結論與未來展望 37
參考文獻 38

參考文獻

[1] M. Toldo, A. Maracani, U. Michieli, and P. Zanuttigh, “Unsupervised domain adaptation in semantic segmentation: a review,” Technologies, vol. 8, no. 2, pp. 35, 2020.
[2] H. Shimodaira, “Improving predictive inference under covariate shift by weighting the log-likelihood function,” Journal of statistical planning and inference, vol. 90, no. 2, pp. 227–244, 2000.
[3] M. Wang and W. Deng, “Deep Visual Domain Adaptation: A Survey,” Neurocomputing, vol. 312, pp. 135–153, Oct. 2018, doi: 10.1016/j.neucom.2018.05.083.
[4] X. Yang, C. Deng, T. Liu and D. Tao, “Heterogeneous Graph Attention Network for Unsupervised Multiple-Target Domain Adaptation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 4, pp. 1992-2003, 2022.
[5] H. Liu, F. Liu, X. Fan, and D. Huang, “Polarized self-attention: Towards high-quality pixel-wise regression,” 2021, arXiv:2107.00782.
[6] L. Hoyer, D. Dai, H. Wang, and L. Van Gool, “MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 11721–11732.
[7] M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Deep Transfer Learning with Joint Adaptation Networks,” in Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 2208–2217.
[8] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in European conference on computer vision (ECCV), 2016, pp. 443–450.
[9] M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Unsupervised Domain Adaptation with Residual Transfer Networks,” Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 136–144.
[10] J. Hoffman, E. Tzeng, T. Park, J.-Y. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “Cycada: cycle-consistent adversarial domain adaptation,” in Proceedings of the 35th International Conference on Machine Learning (ICML), 2018, pp. 1989–1998.
[11] T.-H. Vu, H. Jain, M. Bucher, M. Cord, and P. Perez, “ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2019, pp. 2517–2526.
[12] Y. Luo, L. Zheng, T. Guan, J. Yu, and Y. Yang, “Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2507–2516.
[13] Y. Yang and S. Soatto, “Fda: Fourier domain adaptation for semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 4085–4095.
[14] K. Mei, C. Zhu, J. Zou, and S. Zhang, “Instance adaptive self-training for unsupervised domain adaptation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 415–430.
[15] Y. Zou, Z. Yu, B. V. K. Vijaya Kumar, and J. Wang, “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 289–305.
[16] P. Zhang, B. Zhang, T. Zhang, D. Chen, Y. Wang, and F. Wen, “Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12414–12424.
[17] Q. ZHANG, J. Zhang, W. Liu, and D. Tao, “Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation,” Advances in Neural Information Processing Systems (NeurIPS), 2019, pp. 435–445.
[18] M. Sajjadi, M. Javanmardi, and T. Tasdizen, “Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning,” Advances in Neural Information Processing Systems (NeurIPS), 2016, pp. 1163–1171.
[19] K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “Fixmatch: Simplifying semi-supervised learning with consistency and confidence,” Advances in Neural Information Processing Systems (NeurIPS), 2020, pp. 596–608.
[20] A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 1195–1204.
[21] W. Tranheden, V. Olsson, J. Pinto, and L. Svensson, “Dacs: Domain adaptation via cross-domain mixed sampling,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1379–1389.
[22] L. Hoyer, D. Dai, and L. Van Gool, “DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 9914–9925.
[23] L. Hoyer, D. Dai, and L. Van Gool, “HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2022, pp. 372–391.
[24] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers,” Advances in Neural Information Processing Systems (NeurIPS), 2021, pp. 12077-12090.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
[26] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7794–7803.
[27] Z. Shen, M. Zhang, H. Zhao, S. Yi, and H. Li, “Efficient attention: Attention with linear complexities,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 3531-3539.
[28] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-Excitation Networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132–7141.
[29] Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, “GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2019, pp. 1971-1980.
[30] J. Fu, J. Liu, H. Tian, and Y. Li, “Dual Attention Network for Scene Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3146-3154.
[31] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3-19.
[32] H. Zhang, Q. Lian, J. Zhao, Y. Wang, Y. Yang, and S. Feng, “RatUNet: residual U-Net based on attention mechanism for image denoising,” PeerJ Comput. Sci., vol. 8, pp. e970, May 2022.
[33] P. Song, J. Li, and H. Fan, “Attention based multi-scale parallel network for polyp segmentation,” Comput. Biol. Med., vol. 146, pp. 105476, Jul. 2022.
[34] Z. Lv, H. Huang, W. Sun, T. Lei, J. A. Benediktsson, and J. Li, “Novel Enhanced UNet for Change Detection Using Multimodal Remote Sensing Image,” IEEE Geosci. Remote Sens. Lett., vol. 20, pp. 1–5, 2023.
[35] Q. Yu, W. Wei, Z. Pan, J. He, S. Wang, and D. Hong, “GPF-Net: Graph-polarized fusion network for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–22, 2023, Art. no. 5519622.
[36] N. Araslanov and S. Roth, “Self-supervised augmentation consistency for adapting semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15384–15394, 2021.
[37] V. Olsson, W. Tranheden, J. Pinto, and L. Svensson, “Classmix: Segmentation-based data augmentation for semi-supervised learning,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1369–1378. 8, 10.
[38] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818.
[39] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 102–118.
[40] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3234–3243.
[41] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, B. Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213-3223.

指導教授

范國清高巧汶

審核日期

2024-7-2

推文