dc.description.abstract | Infrared and visible image fusion aims to integrate the complementary information from both types of sensors to generate a single image that incorporates the features of both. This fusion is intended to better match human visual perception or assist with high-level visual tasks such as semantic segmentation and object detection. Most current fusion algorithms assume that paired infrared and visible images are available. However, different sensor devices often cause misalignment of image content or result in frame drops, leading to temporal misalignment. Recent research addresses slight displacements and distortions between input images under the assumption of the same resolution. However, significant differences in resolution and field of view in actual captured images necessitate more effective alignment methods. Existing image fusion datasets lack object and semantic segmentation annotations, which hampers the training of related models, and the differing content between infrared and visible images across datasets makes traditional feature matching methods less effective.
This paper proposes a method for creating an infrared and visible image fusion dataset with semantic segmentation information. By applying style transfer to existing semantic segmentation dataset images, we generate corresponding infrared and visible images. These images are then used to retrain semantic segmentation models, resulting in a dataset that matches the application scenario and includes relevant semantic segmentation annotations and masks. Depending on whether the background includes common segmentation classes, we use either semantic segmentation annotations or important object masks. We achieve global spatial alignment by calculating image scaling and translation using logarithmic polar coordinate transformation and Fourier Transforms. We can choose to refine local slight displacements using deep learning methods to achieve more accurate object alignment. To address temporal alignment issues, we combine spatial alignment and mask comparison to identify the maximum object overlap and corresponding images between infrared and visible targets, overcoming temporal misalignment caused by frame drops or device settings. Finally, we propose a low-parameter image fusion design to reduce computational resource requirements while enhancing image fusion performance and efficiency.
Keywords - Image fusion, Image alignment, Deep learning, Semantic segmentation, Style transfer. | en_US |