使用基於風格轉換之語意分割實現紅外光與可見光影像融合畫面對齊

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：60

、訪客IP：3.17.110.119

姓名

林思婷(Si-Ting Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

使用基於風格轉換之語意分割實現紅外光與可見光影像融合畫面對齊
(Registration of Infrared and Visible Images Using Style Transfer-Based Semantic Segmentation)

相關論文

★ 基於QT之跨平台無線心率分析系統實現	★ 網路電話之額外訊息傳輸機制
★ 針對與運動比賽精彩畫面相關串場效果之偵測	★ 植基於向量量化之視訊/影像內容驗證技術
★ 植基於串場效果偵測與內容分析之棒球比賽精華擷取系統	★ 以視覺特徵擷取為基礎之影像視訊內容認證技術
★ 使用動態背景補償以偵測與追蹤移動監控畫面之前景物	★ 應用於H.264/AVC視訊內容認證之適應式數位浮水印
★ 棒球比賽精華片段擷取分類系統	★ 利用H.264/AVC特徵之多攝影機即時追蹤系統
★ 利用隱式型態模式之高速公路前車偵測機制	★ 基於時間域與空間域特徵擷取之影片複製偵測機制
★ 結合數位浮水印與興趣區域位元率控制之車行視訊編碼	★ 應用於數位智權管理之H.264/AVC視訊加解密暨數位浮水印機制
★ 基於文字與主播偵測之新聞視訊分析系統	★ 植基於數位浮水印之H.264/AVC視訊內容驗證機制

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-8-6以後開放)

摘要(中)

紅外光與可見光影像融合藉由擷取此兩種影像感測器畫面的互補資訊進而生成兼具兩者特徵的單一影像，希望融合畫面更符合人類視覺感知，或協助後續場景語意分割與物件偵測等高階視覺任務。現今的融合演算法多假設可取得成對的紅外光與可見光影像，然而，不同的感測裝置經常造成畫面內容物錯位或是發生掉幀而出現時間域的不對齊。近期研究在輸入影像解析度相同的前提下或能消除存在於兩輸入影像中的輕微位移及變形，但實際拍攝的影像在解析度及拍攝範圍等可能存在甚大差異而需更有效的畫面對齊方式。現有影像融合資料集缺乏物件和語意分割標記而不利相關模型的訓練，且不同資料集的紅外光與可見光內容也讓傳統特徵比對方法難有令人滿意的效果。本論文提出建立具語意分割資訊的紅外光與可見光影像融合資料集方法，將現有語意分割資料集影像經風格轉換生成對應的紅外光與可見光影像，再利用這些影像重新訓練語意分割模型，從而建立符合應用場景情境且包含相對應語意分割標記與遮罩的影像資料集。我們根據背景是否包含經典畫面分割類別而選擇使用語意分割標記或重要物件遮罩，透過對數極座標轉換暨傅立葉轉換於頻域上計算畫面縮放和平移量以達成全局影像空間域對齊。我們可再利用深度學習方法微調局部輕微位移以取得畫面中物件更精確的對齊效果。關於時間域對齊問題，我們結合空間域對齊及遮罩比對逐一檢視紅外光與可見光目標影像以找出最大物件重疊相對應畫面，藉此克服因掉幀或裝置設定所導致的時域錯位。最後，我們提出超低參數量的影像融合設計以降低計算資源需求，同時提升影像融合性能及效率。
關鍵字 – 影像融合、影像對齊、深度學習、語意分割、風格轉換

摘要(英)

Infrared and visible image fusion aims to integrate the complementary information from both types of sensors to generate a single image that incorporates the features of both. This fusion is intended to better match human visual perception or assist with high-level visual tasks such as semantic segmentation and object detection. Most current fusion algorithms assume that paired infrared and visible images are available. However, different sensor devices often cause misalignment of image content or result in frame drops, leading to temporal misalignment. Recent research addresses slight displacements and distortions between input images under the assumption of the same resolution. However, significant differences in resolution and field of view in actual captured images necessitate more effective alignment methods. Existing image fusion datasets lack object and semantic segmentation annotations, which hampers the training of related models, and the differing content between infrared and visible images across datasets makes traditional feature matching methods less effective.
This paper proposes a method for creating an infrared and visible image fusion dataset with semantic segmentation information. By applying style transfer to existing semantic segmentation dataset images, we generate corresponding infrared and visible images. These images are then used to retrain semantic segmentation models, resulting in a dataset that matches the application scenario and includes relevant semantic segmentation annotations and masks. Depending on whether the background includes common segmentation classes, we use either semantic segmentation annotations or important object masks. We achieve global spatial alignment by calculating image scaling and translation using logarithmic polar coordinate transformation and Fourier Transforms. We can choose to refine local slight displacements using deep learning methods to achieve more accurate object alignment. To address temporal alignment issues, we combine spatial alignment and mask comparison to identify the maximum object overlap and corresponding images between infrared and visible targets, overcoming temporal misalignment caused by frame drops or device settings. Finally, we propose a low-parameter image fusion design to reduce computational resource requirements while enhancing image fusion performance and efficiency.
Keywords - Image fusion, Image alignment, Deep learning, Semantic segmentation, Style transfer.

關鍵字(中)

★ 影像融合
★ 影像對齊
★ 深度學習
★ 語意分割
★ 風格轉換

關鍵字(英)

★ Image fusion
★ Image alignment
★ Deep learning
★ Semantic segmentation
★ Style transfer

論文目次

致謝 IV
圖目錄 VIII
表目錄 X
第一章、緒論 1
1.1. 研究動機 1
1.2. 研究貢獻 5
1.3. 論文架構 6
第二章、相關研究 7
2.1. 典型紅外光與可見光影像融合方法 7
2.1.1. 基於自動編碼器(AE)之融合方法 8
2.1.2. 基於卷積神經網路(CNN)之融合方法 10
2.1.3. 基於生成對抗網路(GAN)之融合方法 11
2.2. 融合資料集介紹與建立 12
2.2.1. 紅外光與可見光影像融合資料集 12
2.2.2. 具有語意分割的資料集建立 14
2.3. 現有的紅外光與可見光影像對齊方法 14
2.3.1. 基於多特徵一致性和相互引導影像對齊 14
2.3.2. 結合多模態實現精準影像對齊 15
第三章、提出方法 17
3.1. 資料集建立 17
3.1.1. Cityscapes資料集 18
3.1.2. 風格轉換網路 19
3.1.2.1. 維持內容一致性多功能風格轉換網路 20
3.1.2.2. 跨模態感知風格轉換網路 20
3.1.3. 以不同資料集訓練風格轉換網路 21
3.1.4. 語意分割 22
3.2. 影像對齊 23
3.2.1. 基於經典影像處理技術的縮放和平移對齊 23
3.2.2. 基於深度學習的輕微位移和變形對齊 27
3.2.3. 基於Intersection over Union(IoU)計算的時域同步對齊 27
3.3. 影像融合 30
3.3.1. 網路架構 30
3.3.2. 融合策略 32
3.3.3. 損失函數 32
3.3.3.1. 融合損失 33
3.3.3.2. 梯度損失 33
3.3.4. 模型量化 34
第四章、實驗結果 34
4.1. 開發環境 34
4.2. 測試資料集 34
4.3. 可見光與紅外光風格轉換影像生成結果 35
4.4. 語意分割結果 36
4.5. 紅外光與可見光對齊結果 37
4.6. 紅外光與可見光影像融合結果 40
4.6.1. 訓練細節 40
4.6.2. 指標評估 40
4.6.3. 融合結果 47
第五章、結論與未來展望 50
5.1. 結論 50
5.2. 未來展望 50
參考文獻 51

參考文獻

[1] D. Wang, J. Liu, X. Fan, and R. Liu, "Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration," arXiv preprint arXiv:2205.11876, 2022.
[2] L. Tang, Y. Deng, Y. Ma, J. Huang, and J. Ma, "SuperFusion: A versatile image registration and fusion network with semantic awareness," IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 12, pp. 2121-2137, 2022.
[3] J. Liu et al., "Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5802-5811.
[4] L. Tang, J. Yuan, and J. Ma, "Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network," Information Fusion, vol. 82, pp. 28-42, 2022.
[5] C. Sun, C. Zhang, and N. Xiong, "Infrared and visible image fusion techniques based on deep learning: A review," Electronics, vol. 9, no. 12, p. 2162, 2020.
[6] H. Li and X.-J. Wu, "DenseFuse: A fusion approach to infrared and visible images," IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614-2623, 2018.
[7] H. Li, X.-J. Wu, and J. Kittler, "RFN-Nest: An end-to-end residual fusion network for infrared and visible images," Information Fusion, vol. 73, pp. 72-86, 2021.
[8] L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, "PIAFusion: A progressive infrared and visible image fusion network based on illumination aware," Information Fusion, vol. 83, pp. 79-92, 2022.
[9] J. Ma, H. Xu, J. Jiang, X. Mei, and X.-P. Zhang, "DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4980-4995, 2020.
[10] A. Toet, "The TNO multiband image data collection," Data in brief, vol. 15, pp. 249-251, 2017.
[11] H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2Fusion: A unified unsupervised image fusion network," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502-518, 2020.
[12] X. Jia, C. Zhu, M. Li, W. Tang, and W. Zhou, "LLVIP: A visible-infrared paired dataset for low-light vision," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 3496-3504.
[13] H.-W. Chang, P.-C. Su, and S.-T. Lin, "Exploiting Style Transfer and Semantic Segmentation to Facilitate Infrared and Visible Image Fusion," in International Conference on Technologies and Applications of Artificial Intelligence, 2023: Springer, pp. 269-283.
[14] M. Cordts et al., "The cityscapes dataset for semantic urban scene understanding," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3213-3223.
[15] L. Wen, C. Gao, and C. Zou, "CAP-VSTNet: content affinity preserved versatile style transfer," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18300-18309.
[16] W. Wang et al., "Internimage: Exploring large-scale vision foundation models with deformable convolutions," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14408-14419.
[17] J. N. Sarvaiya, S. Patnaik, and S. Bombaywala, "Image registration using log-polar transform and phase correlation," in TENCON 2009-2009 IEEE region 10 conference, 2009: IEEE, pp. 1-5.
[18] H. R. Sheikh and A. C. Bovik, "Image information and visual quality," IEEE Transactions on image processing, vol. 15, no. 2, pp. 430-444, 2006.
[19] G. Qu, D. Zhang, and P. Yan, "Information measure for performance of image fusion," Electronics letters, vol. 38, no. 7, p. 1, 2002.
[20] S. Singh et al., "A review of image fusion: Methods, applications and performance metrics," Digital Signal Processing, vol. 137, p. 104020, 2023.
[21] X.-l. Zhang, Z.-f. Liu, Y. Kou, J.-b. Dai, and Z.-m. Cheng, "Quality assessment of image fusion based on image content and structural similarity," in 2010 2nd International Conference on Information Engineering and Computer Science, 2010: IEEE, pp. 1-4.
[22] V. Aslantas and E. Bendes, "A new image quality metric for image fusion: The sum of the correlations of differences," Aeu-international Journal of electronics and communications, vol. 69, no. 12, pp. 1890-1896, 2015.
[23] G. Piella and H. Heijmans, "A new quality metric for image fusion," in Proceedings 2003 international conference on image processing (Cat. No. 03CH37429), 2003, vol. 3: IEEE, pp. III-173.
[24] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE transactions on image processing, vol. 13, no. 4, pp. 600-612, 2004.

指導教授

蘇柏齊

審核日期

2024-8-9

推文