採用潛在擴散模型的影像去模糊;Image Deblurring Using Latent Diffusion Models

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Communication Engineering > Electronic Thesis & Dissertation > Item 987654321/95335

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95335

题名:	採用潛在擴散模型的影像去模糊;Image Deblurring Using Latent Diffusion Models
作者:	吳政佳;Wu, Cheng-Chia
贡献者:	通訊工程學系
关键词:	影像去模糊;擴散模型;預訓練;提示微調;Image deblurring;diffusion model;pre-trained model;prompt tuning
日期:	2024-07-22
上传时间:	2024-10-09 16:40:43 (UTC+8)
出版者:	國立中央大學
摘要:	在影像去模糊任務中，大多採用pixel-level的損失函數，以減輕回復結果與真相之間的失真﹙distortion﹚。但此類做法忽視了人類眼睛對影像品質的主觀感知﹙perception﹚，導致回復結果細節不足。近年來，在影像合成﹙image synthesis﹚領域取得成功的擴散模型，也開始被運用於影像去模糊領域，不過雖然現有基於擴散模型的方法可以幫助感知問題，但在推論時，需要更多的運算量或處理時間。因此本篇論文設計了採用預訓練的大型潛在擴散模型幫助現有影像去模糊網路的方法，此作法僅會在訓練過程中，藉由潛在擴散模型提升原本影像去模糊網路的影像感知品質。預訓練的潛在擴散模型，會先經過本篇論文所提之prompt tuning方法，調整使其適配於幫助影像去模糊網路，相較於整個模型進行fine-tuning，需要訓練的參數量更少，且有效保持潛在擴散模型在預訓練時所得的先驗﹙prior﹚知識。最後本論文所提方案在GoPro資料集上，相較於原FFTformer方案，PSNR雖然下降了0.64dB，但在感知指標上有所提升，LPIPS下降了0.012，NIQE下降0.51，FID下降0.63，CLIP-IQA上升0.002以及CLIP-IQA^+上升0.01。;In the image deblurring task, most work uses pixel-level loss to reduce the distortion between the restored result and ground truth. However, these kind of methods overlook the human perception of image quality, leading to insufficient details in the restored results. Recently, diffusion models, which have achieved impressive success in image synthesis, have also been applied to the image deblurring task. Although the existing diffusion-based image deblurring methods can address the perception issue, they require more computational consumption or processing time during inference. In this paper, we propose a method that employs a pre-trained latent diffusion model to enchance the existing image deblurring model. This approach only utilizes latent diffusion model to improve perceptual quality of the result of the original image deblurring model (e.g., FFTformer) during training. And the pre-trained latent diffusion model will be adjusted to make it suitable for aiding the image deblurring network by new prompt tuning methods, as proposed in this paper. Compared with fine-tuning, the proposed method requires fewer training parameters and maintains the prior knowledge obtained during pre-training of the latent diffusion model. In experiments, our proposed method shows a 0.64 dB decrease in PSNR. However, it improves perceptual metrics, with LPIPS decreasing by 0.012, NIQE decreasing by 0.51, FID decreasing by 0.63, CLIP-IQA increasing by 0.002, and CLIP-IQA^+ increasing by 0.01.
显示于类别:	[Graduate Institute of Communication Engineering] Electronic Thesis & Dissertation

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	12	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....