摘要: | 在影像去模糊任務中,大多採用pixel-level的損失函數,以減輕回復結果與真相之間的失真﹙distortion﹚。但此類做法忽視了人類眼睛對影像品質的主觀感知﹙perception﹚,導致回復結果細節不足。近年來,在影像合成﹙image synthesis﹚領域取得成功的擴散模型,也開始被運用於影像去模糊領域,不過雖然現有基於擴散模型的方法可以幫助感知問題,但在推論時,需要更多的運算量或處理時間。因此本篇論文設計了採用預訓練的大型潛在擴散模型幫助現有影像去模糊網路的方法,此作法僅會在訓練過程中,藉由潛在擴散模型提升原本影像去模糊網路的影像感知品質。預訓練的潛在擴散模型,會先經過本篇論文所提之prompt tuning方法,調整使其適配於幫助影像去模糊網路,相較於整個模型進行fine-tuning,需要訓練的參數量更少,且有效保持潛在擴散模型在預訓練時所得的先驗﹙prior﹚知識。最後本論文所提方案在GoPro資料集上,相較於原FFTformer方案,PSNR雖然下降了0.64dB,但在感知指標上有所提升,LPIPS下降了0.012,NIQE下降0.51,FID下降0.63,CLIP-IQA上升0.002以及CLIP-IQA^+上升0.01。;In the image deblurring task, most work uses pixel-level loss to reduce the distortion between the restored result and ground truth. However, these kind of methods overlook the human perception of image quality, leading to insufficient details in the restored results. Recently, diffusion models, which have achieved impressive success in image synthesis, have also been applied to the image deblurring task. Although the existing diffusion-based image deblurring methods can address the perception issue, they require more computational consumption or processing time during inference. In this paper, we propose a method that employs a pre-trained latent diffusion model to enchance the existing image deblurring model. This approach only utilizes latent diffusion model to improve perceptual quality of the result of the original image deblurring model (e.g., FFTformer) during training. And the pre-trained latent diffusion model will be adjusted to make it suitable for aiding the image deblurring network by new prompt tuning methods, as proposed in this paper. Compared with fine-tuning, the proposed method requires fewer training parameters and maintains the prior knowledge obtained during pre-training of the latent diffusion model. In experiments, our proposed method shows a 0.64 dB decrease in PSNR. However, it improves perceptual metrics, with LPIPS decreasing by 0.012, NIQE decreasing by 0.51, FID decreasing by 0.63, CLIP-IQA increasing by 0.002, and CLIP-IQA^+ increasing by 0.01. |