影像融合的目的是整合不同類型的輸入影像,透過影像間的互補資訊生成具更完整場景顯示和視覺感知的影像,以支援後續的進階視覺任務,例如物件偵測與語意分割等。紅外線與可見光影像融合是受到廣泛關注的研究領域,但使用深度學習方法進行模型訓練時通常需要大量的標記資料,現有的紅外線與可見光影像融合資料集卻只提供影像,缺乏精確的物件標記以及語意分割等,從而影響影像融合結果的呈現,也限制了相關領域的進一步發展。本研究提出創建具語意分割資訊的紅外線與可見光影像融合資料集方法,利用現有的語意分割資料集的一般影像,以風格轉換方式生成相對應的紅外線影像,依此建立具標記的融合影像資料集,即每組紅外線與可見光影像皆包含對應的語意分割標記。這樣的資料集建立方式能夠提升影像融合效果,也能針對實際拍攝的紅外線與可見光影像可能出現畫面解析度不同及內容錯位的問題,提供基於語意分割遮罩的對齊方法,將紅外線及可見光影像進行重新採樣對齊,對於此類研究中常見的對齊前處理能節省不少的時間與人力。;The purpose of image fusion is to integrate different types of input images and generate a more complete image with improved scene representation and visual perception, supporting advanced vision tasks such as object detection and semantic segmentation. Infrared and visible image fusion is a widely studied research area, but training fusion models using deep learning methods often requires a large amount of annotated data. Existing infrared and visible image fusion datasets only provide images without precise object annotations or semantic segmentation, which affects the presentation of fusion results and limits the further development of related fields. In this study, we propose a method to create a dataset for infrared and visible image fusion with semantic segmentation information. We utilize general images from existing semantic segmentation datasets and generate corresponding infrared images using style transfer techniques. This allows us to establish a labeled fusion image dataset, where each pair of infrared and visible images is accompanied by their respective semantic segmentation labels. This dataset creation method improves image fusion performance and can also provide an alignment method based on semantic segmentation masks for disparate resolution and misalignment in real-world infrared and visible images. which saves significant time and resources in the common alignment preprocessing step.