博碩士論文 111523063 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:16 、訪客IP:18.188.93.255
姓名 吳政佳(Cheng-Chia Wu)  查詢紙本館藏   畢業系所 通訊工程學系
論文名稱 採用潛在擴散模型的影像去模糊
(Image Deblurring Using Latent Diffusion Models)
相關論文
★ 應用於車內視訊之光線適應性視訊壓縮編碼器設計★ 以粒子濾波法為基礎之改良式頭部追蹤系統
★ 應用於空間與CGS可調性視訊編碼器之快速模式決策演算法★ 應用於人臉表情辨識之強健式主動外觀模型搜尋演算法
★ 結合Epipolar Geometry為基礎之視角間預測與快速畫面間預測方向決策之多視角視訊編碼★ 基於改良式可信度傳遞於同質區域之立體視覺匹配演算法
★ 以階層式Boosting演算法為基礎之棒球軌跡辨識★ 多視角視訊編碼之快速參考畫面方向決策
★ 以線上統計為基礎應用於CGS可調式編碼器之快速模式決策★ 適用於唇形辨識之改良式主動形狀模型匹配演算法
★ 以運動補償模型為基礎之移動式平台物件追蹤★ 基於匹配代價之非對稱式立體匹配遮蔽偵測
★ 以動量為基礎之快速多視角視訊編碼模式決策★ 應用於地點影像辨識之快速局部L-SVMs群體分類器
★ 以高品質合成視角為導向之快速深度視訊編碼模式決策★ 以運動補償模型為基礎之移動式相機多物件追蹤
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2027-8-1以後開放)
摘要(中) 在影像去模糊任務中,大多採用pixel-level的損失函數,以減輕回復結果與真相之間的失真﹙distortion﹚。但此類做法忽視了人類眼睛對影像品質的主觀感知﹙perception﹚,導致回復結果細節不足。近年來,在影像合成﹙image synthesis﹚領域取得成功的擴散模型,也開始被運用於影像去模糊領域,不過雖然現有基於擴散模型的方法可以幫助感知問題,但在推論時,需要更多的運算量或處理時間。因此本篇論文設計了採用預訓練的大型潛在擴散模型幫助現有影像去模糊網路的方法,此作法僅會在訓練過程中,藉由潛在擴散模型提升原本影像去模糊網路的影像感知品質。預訓練的潛在擴散模型,會先經過本篇論文所提之prompt tuning方法,調整使其適配於幫助影像去模糊網路,相較於整個模型進行fine-tuning,需要訓練的參數量更少,且有效保持潛在擴散模型在預訓練時所得的先驗﹙prior﹚知識。最後本論文所提方案在GoPro資料集上,相較於原FFTformer方案,PSNR雖然下降了0.64dB,但在感知指標上有所提升,LPIPS下降了0.012,NIQE下降0.51,FID下降0.63,CLIP-IQA上升0.002以及CLIP-IQA^+上升0.01。
摘要(英) In the image deblurring task, most work uses pixel-level loss to reduce the distortion between the restored result and ground truth. However, these kind of methods overlook the human perception of image quality, leading to insufficient details in the restored results. Recently, diffusion models, which have achieved impressive success in image synthesis, have also been applied to the image deblurring task. Although the existing diffusion-based image deblurring methods can address the perception issue, they require more computational consumption or processing time during inference. In this paper, we propose a method that employs a pre-trained latent diffusion model to enchance the existing image deblurring model. This approach only utilizes latent diffusion model to improve perceptual quality of the result of the original image deblurring model (e.g., FFTformer) during training. And the pre-trained latent diffusion model will be adjusted to make it suitable for aiding the image deblurring network by new prompt tuning methods, as proposed in this paper. Compared with fine-tuning, the proposed method requires fewer training parameters and maintains the prior knowledge obtained during pre-training of the latent diffusion model. In experiments, our proposed method shows a 0.64 dB decrease in PSNR. However, it improves perceptual metrics, with LPIPS decreasing by 0.012, NIQE decreasing by 0.51, FID decreasing by 0.63, CLIP-IQA increasing by 0.002, and CLIP-IQA^+ increasing by 0.01.
關鍵字(中) ★ 影像去模糊
★ 擴散模型
★ 預訓練
★ 提示微調
關鍵字(英) ★ Image deblurring
★ diffusion model
★ pre-trained model
★ prompt tuning
論文目次 目錄      
摘要 ………………………………………………………………i
Abstract ………………………………………………………………ii
致謝 ………………………………………………………………iii
目錄 ………………………………………………………………iv
圖目錄 ………………………………………………………………vi
表目錄 ………………………………………………………………vii
一、 緒論 …………………………………………………………1
1-1 前言 …………………………………………………………1
1-2 研究動機 ……………………………………………………1
1-3 研究方法 ……………………………………………………2
1-4 論文架構 ……………………………………………………3
二、 擴散模型 ……………………………………………………4
2-1 去雜訊擴散機率模型 ………………………………………4
2-2 潛在擴散模型 ………………………………………………5
2-3
2-4 ControlNet……………………………………………………6
總結 …………………………………………………………7
三、 影像去模糊網路 ……………………………………………8
3-1 基於CNN的影像去模糊網路 ……………………………8
3-2
3-3 基於Transformer的影像去模糊網路………………………9
基於擴散模型的影像去模糊網路…………………………10
3-4 總結…………………………………………………………12
四、 本論文所提之採用潛在擴散模型的影像去模糊…………13
4-1 本論文提出之增進去模糊網路視覺品質訓練架構………13
4-2 大型預訓練潛在擴散模型之Prompt Tuning ……………14
4-3 潛在擴散模型幫助影像去模糊……………………………16
4-4 總結…………………………………………………………17
五、 實驗…………………………………………………………18
5-1 資料集與參數設定…………………………………………18
5-1-1 訓練與測試資料集 ………………………………18
5-1-2 評估指標 …………………………………………20
5-1-3 參數設定 …………………………………………22
5-2 實驗結果……………………………………………………23
5-2-1 在測試資料集上的評估指標比較 ………………23
5-2-2 主觀視覺結果比較 ………………………………28
5-3 總結 ………………………………………………………37
六、 結論與未來展望……………………………………………38
參考文獻 ………………………………………………………………39
符號表 ………………………………………………………………43
參考文獻 參考文獻      
﹝1﹞ W. H. Richardson, "Bayesian-Based Iterative Method of Image Restoration," Journal of the Optical Society of America 62, pp. 55-59, Jan. 1972.
﹝2﹞ N. Wiener, "Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications", The MIT press, 1949.
﹝3﹞ D. Krishnan, and R. Fergus "Fast Image Deconvolution using Hyper-Laplacian Priors," in Proc. Conference on Neural Information Processing Systems, pp. 1033-1041, Dec. 2009.
﹝4﹞ D. Zoran, and Y. Weiss, "From Learning Models of Natural Image Patches to Whole Image Restoration," in Proc. IEEE International Conference on Computer Vision, pp. 479-486, Nov. 2011.
﹝5﹞ L. Xu, J. S. Ren, C. Liu, and J. Jia, "Deep convolutional neural network for image deconvolution," in Proc. Advances in Neural Information Processing Systems, pp. 1790-1798, Dec. 2014..
﹝6﹞ J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 769-777, June 2015.
﹝7﹞ L. Kong, J. Dong, M. Li, J. Ge, and J. Pan, "Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5886-5895, Jun. 2023.
﹝8﹞ CompVis. Stable diffusion v1 model card, https://github.com/CompVis/stable-diffusion/blob/main/Stable_Diffusion_v1_Model_Card.md, 2022.
﹝9﹞ X. Lin, J. He, Z. Chen, Z. Lyu, B. Fie, B. Dai, W. Ouyang, Y. Qiao, and C. Dong, "DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior," arXiv preprint arXiv:2308.15070, Aug. 2023.
﹝10﹞ L. Zhang, A. Rao, and M. Agrawala, "Adding Conditional Control to Text-to-Image Diffusion Models," in Proc. IEEE International Conference on Computer Vision, pp. 3836-3847, Oct. 2023.
﹝11﹞ B. Xia, Y. Zhang, S. Wang, Y. Wang, X. Wu, Y. Tian, W. Yang, and L. V. Gool, "DiffIR: Efficient DOct. 2023.iffusion Model for Image Restoration," in Proc. IEEE International Conference on Computer Vision, pp. 13095-13105, Oct. 2023.
﹝12﹞ R. Zhang, P. Isola, A. A. Efros, E. Shechtman and O. Wang, "The Unreasonable Effectiveness of Deep Features as a Perceptual Metric," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 586-595, Jun. 2018.
﹝13﹞ A. Mittal, R. Soundararajan, and A. C. Bovik, "Making a “Completely Blind” Image Quality Analyzer" IEEE SIGNAL PROCESSING LETTERS, pp. 209-212, Mar. 2013.
﹝14﹞ M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium," in Proc. Conference on Neural Information Processing Systems, Dec. 2017.
﹝15﹞ J. Wang, K. C.K. Chan, and C. C. Loy, "Exploring CLIP for Assessing the Look and Feel of Images," in Proc. of the AAAI Conference on Artificial Intelligence, Vol. 37, No. 2, pp. 2555-2563, Feb. 2023.
﹝16﹞ OpenAI. Dall-e-3, https://openai.com/index/dall-e-3/, 2024.
﹝17﹞ J. S. Dickstein, E. A. Weiss, N. Maheswaranathan, and S. Ganguli, "Deep Unsupervised Learning using Nonequilibrium Thermodynamics," in International Conference on Machine Learning, pp. 2256-2265, Jul. 2015.
﹝18﹞ J. Ho, A. Jain, and P. Abbeel, "Denoising Diffusion Probabilistic Models," in Proc. Conference on Neural Information Processing Systems, Dec. 2020.
﹝19﹞ R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-Resolution Image Synthesis with Latent Diffusion Models," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 10684-10695, Jun. 2022.
﹝20﹞ P. Dhariwal, and A. Nichol, "Diffusion Models Beat GANs on Image Synthesis," in Proc. Conference on Neural Information Processing Systems, pp. 8780-8794, Dec. 2021.
﹝21﹞ C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, P. Schramowski, S. Kundurthy, K. Crowson, L. Schmidt, R. Kaczmarczyk, and J. Jitsev, "LAION-5B: An open large-scale dataset for training next generation image-text models," in Proc. Conference on Neural Information Processing Systems, pp. 25278-25294, Nov. 2022.
﹝22﹞ O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in medical image computing and computer-assisted intervention, pp. 234-241, Oct. 2014.
﹝23﹞ C. Sahaira, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, "Image Super-Resolution via Iterative Refinement," in Proc. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 4713-4726, Oct. 2022.
﹝24﹞ P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, "Image-to-Image Translation with Conditional Adversarial Networks," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125-1134, Jul. 2017.
﹝25﹞ S. Nah, T. H. Kim, and K. M. Lee, "Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3883-3891, Jul. 2017.
﹝26﹞ X. Tao, H. Gao, Y. Wang, X. Shen, J. Wang, and J. Jia, "Scale-recurrent Network for Deep Image Deblurring," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 8174-8182, Jun. 2018.
﹝27﹞ S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M. H. Yang, and L. Shao "Multi-Stage Progressive Image Restoration," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 14821-14831, Jun. 2021.
﹝28﹞ S. J. Cho, S. W. Ji, J. P. Hong, S. W. Jung, and S. J. Ko, "Rethinking Coarse-to-Fine Approach in Single Image Deblurring," in Proc. IEEE International Conference on Computer Vision, pp. 4641-4650, Oct. 2021.
﹝29﹞ L. Chen, X. Chu, X. Zhang, and J. Sun, "Simple Baselines for Image Restoration," in Proc. European Conference on Computer Vision, pp. 17-33, Oct. 2022.
﹝30﹞ Z. Fang, F. Wu, W. Dong, X. Li, J. Wu, and G. Shi, "Self-supervised Non-uniform Kernel Estimation with Flow-based Motion Prior for Blind Image Deblurring," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 18105-18114, Jun. 2023.
﹝31﹞ J. Liang, K. Zhang, S. Gu, L. V. Gool, and R. Timofte, "Flow-based Kernel Prior with Application to Blind Super-Resolution," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 10601-10610, Jun. 2021.
﹝32﹞ X. Mao, Q. Li, and Y. Wang, "AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2024.
﹝33﹞ Y. Cai, Y. Zhou, Q. Han, J. Sun, X. Kong, J. Li, and X. Zhang, "Reversible Column Networks," in Proc. International Conference on Learning Representations, May 2023.
﹝34﹞ A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and L. Kaiser, "Attention Is All You Need," in Proc. Conference on Neural Information Processing Systems, pp. 5998-6008, Dec. 2017.
﹝35﹞ S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M. H. Yang, "Restormer: Efficient Transformer for High-Resolution Image Restoration," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5728-5739, Jun. 2022.
﹝36﹞ Y. Li, Y. Fan, X. Xiang, D. Denmandolx, R. Ranjan, R. Timofte, and L. V. Gool, "Efficient and Explicit Modelling of Image Hierarchies for Image Restoration," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 18278-18289, Jun. 2023.
﹝37﹞ Z. Liu, H. Hu, Y. Lin Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, and B. Guo, "Swin Transformer V2: Scaling Up Capacity and Resolution," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 12009-12019, Jun. 2022.
﹝38﹞ J. Whang, M. Delbracio, H. Talebi, C. Saharia, A. G. Dimakis, and P. Milanfar, "Deblurring via Stochastic Refinement," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1031-1044, Jun. 2022.
﹝39﹞ M. Ren, M. Delbracio, H. Talebi, G. Gerig, and P. Milanfar, "Multiscale Structure Guided Diffusion for Image Deblurring," in Proc. IEEE International Conference on Computer Vision, pp. 10721-10733, Oct. 2023.
﹝40﹞ Z. Luo, F. K. Gustafsson, Z. Zhao, J. Sjolund, and T. B. Schon, "Image Restoration with Mean-Reverting Stochastic Differential Equations," in International Conference on Machine Learning, pp. 23045-23066, Jul. 2023.
﹝41﹞ Z. Chen, Y. Zhang, D. Liu, B. Xia, J. Gu, L. Kong, and X. Yuan, "Hierarchical Integration Diffusion Model for Realistic Image Deblurring," in Proc. Conference on Neural Information Processing Systems, Dec. 2023.
﹝42﹞ Y. Ai, H. Huang, X. Zhou, J. Wang, and R. He, "Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 25432-25444, Jun. 2024.
﹝43﹞ A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell P. Mishkin, J. Clark G. Drueger, and I. Sutskever, "Learning Transferable Visual Models From Natural Language Supervision," in International conference on machine learning, pp. 8748-8763, Jul. 2021.
﹝44﹞ X. Liu, T. Sun, X. Huang, and X. Qiu, "Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts," Findings of the Association for Computational Linguistics, pp. 1325-1338, Dec. 2022.
﹝45﹞ K. Zhou, J. Yang, C. C. Loy, and Z. Liu, "Learning to Prompt for Vision-Language Models," in International Journal of Computer Vision, pp. 2337-2348, Sep. 2022.
﹝46﹞ B. Lester, R. A. Rfou, N. Constant, "The Power of Scale for Parameter-Efficient Prompt Tuning," in Empirical Methods in Natural Language Processing, pp 3045–3059, Nov. 2021.
﹝47﹞ Y. Li, K. Zhang, J. Liang, et al. "LSDIR: A Large Scale Dataset for Image Restoration," in Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 1775-1787, Jun. 2023.
﹝48﹞ J. Rim, H. Lee, J. Won ,and S. Cho, "Real-World Blur Dataset for Learning and Benchmarking Deblurring Algorithms," in Proc. European Conference on Computer Vision, pp. 184-201, Aug. 2020.
﹝49﹞ Z. Shen, W. Wang, X. Lu, J. Shen, H. Ling, T. Xu, and L. Shao, "Human-Aware Motion Deblurring," in Proc. IEEE International Conference on Computer Vision, pp. 5572-5581, Oct. 2019.
﹝50﹞ Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image Quality Assessment: From Error Visibility to Structural Similarity," IEEE Transactions on Image Processing, Vol. 23, No. 4, pp. 600-612, Apr. 2004.
﹝51﹞ A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Proc. Conference on Neural Information Processing Systems, Dec. 2012.
﹝52﹞ K. Simonyan, and A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," in Proc. International Conference on Learning Representations, May 2015.
﹝53﹞ D. L. Ruderman, "The statistics of natural images, " Netw. Comput. Neural Syst., vol. 5, no. 4, pp. 517–548, 1994.
﹝54﹞ C. Szegedy, V. Vanhoucke, S. Ioffe, and J. Shlens, "Rethinking the Inception Architecture for Computer Vision," in Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818-2826, Jun. 2016.
﹝55﹞ V. Hosu, H. Lin, T. Sziranyi, and D. Saupe, "KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment," IEEE Transactions on Image Processing, Vol. 29, pp. 4041-4056, Sep. 2020.
指導教授 唐之瑋(Chih-Wei Tang) 審核日期 2024-7-22
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明