結構引導式圖像修復結合 NSVQ-VQVAE 與類 KSVD 的 Codebook 還原機制;Structure-Guided Image Inpainting via NSVQ-VQVAE and KSVD-Inspired Codebook Restoration

NCU Institutional Repository > 資訊電機學院 > 電機工程研究所 > 博碩士論文 > Item 987654321/98551

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98551

題名:	結構引導式圖像修復結合 NSVQ-VQVAE 與類 KSVD 的 Codebook 還原機制;Structure-Guided Image Inpainting via NSVQ-VQVAE and KSVD-Inspired Codebook Restoration
作者:	王少廷;Wang, Shao-Ting
貢獻者:	電機工程學系
關鍵詞:	結構導向生成;交叉注意力;階層式向量量化變分自編碼器;編碼簿匹配
日期:	2025-08-20
上傳時間:	2025-10-17 12:55:10 (UTC+8)
出版者:	國立中央大學
摘要:	圖像修復旨在填補圖像中遺失的區域，使其內容在語意上保持一致，並在視覺上逼真自然。傳統基於 VQVAE 的方法雖在捕捉圖像結構與紋理方面具備一定效果，但常面臨量化誤差與 codebook 利用率不足（dead-code）問題。本研究提出一種結構引導式圖像修復框架，結合階層式 NSVQ-VAE 與 codebook 條件控制的紋理生成模型。本方法設計了一個 Teacher–Student 編碼架構，可由遮罩輸入中預測結構潛變量，並藉由受 KSVD 啟發的 codebook 對應機制，恢復具語意一致性的結構資訊。在紋理生成階段，我們進一步引入一個結構條件控制的紋理生成器，搭配 Gated Convolution、Cross-Attention 與 CGSM 模組，以結構 token 為引導合成細緻的高品質紋理，此設計方式能大幅降低參數量。我們在 CelebA-HQ 與 Places2 資料集上進行實驗，結果顯示本方法在 PSNR、SSIM、Inception Score（IS）與 FID 等指標上均表現出色，且相較於 GConv、MEDFE、LGNet與AOT 等現有方法，擁有明顯更少的參數量。值得注意的是，本方法在效能與模型大小之間展現出極高的性價比，具備實際應用的潛力。未來的研究方向將探討進一步引入紋理 codebook 資訊作為條件輸入，以增強紋理細節的生成效果。 ;Image inpainting aims to fill in missing regions in an image with semantically coherent and visually realistic content. While traditional VQ-VAE-based approaches are effective at capturing structural and textural representations, they often suffer from quantization errors and low codebook utilization (i.e., dead-code issues). In this paper, we propose a structure-guided image inpainting framework that integrates a hierarchical NSVQ-VAE with a codebook-conditioned texture generation module. Our method introduces a Teacher–Student encoder architecture that predicts structure latent variables from masked inputs, and employs a KSVD-inspired codebook matching strategy to restore semantically meaningful structure representations. For the texture generation stage, we further develop a structure-conditioned texture generator composed of Gated Convolutions, Cross-Attention, and CGSM modules, which synthesizes fine-grained textures guided by the restored structure tokens. This design significantly reduces parameter complexity while preserving generation quality. Experiments conducted on the CelebA-HQ and Places2 datasets demonstrate that our method achieves superior performance in terms of PSNR, SSIM, Inception Score (IS), and FID, while requiring substantially fewer parameters compared to state-of-the-art methods such as GConv、MEDFE、LGNet and AOT. Notably, our approach achieves a high performance-to-parameter ratio, making it highly suitable for practical deployment. Future work will explore incorporating texture codebook information as an additional condition to further enhance the fidelity of generated textures.
顯示於類別:	[電機工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	86	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....