中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98551
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 83696/83696 (100%)
Visitors : 56618190      Online Users : 8477
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98551


    Title: 結構引導式圖像修復結合 NSVQ-VQVAE 與類 KSVD 的 Codebook 還原機制;Structure-Guided Image Inpainting via NSVQ-VQVAE and KSVD-Inspired Codebook Restoration
    Authors: 王少廷;Wang, Shao-Ting
    Contributors: 電機工程學系
    Keywords: 結構導向生成;交叉注意力;階層式向量量化變分自編碼器;編碼簿匹配
    Date: 2025-08-20
    Issue Date: 2025-10-17 12:55:10 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 圖像修復旨在填補圖像中遺失的區域,使其內容在語意上保持一致,並在視覺上逼真自然。傳統基於 VQVAE 的方法雖在捕捉圖像結構與紋理方面具備一定效果,但常面臨量化誤差與 codebook 利用率不足(dead-code)問題。
    本研究提出一種結構引導式圖像修復框架,結合階層式 NSVQ-VAE 與 codebook 條件控制的紋理生成模型。本方法設計了一個 Teacher–Student 編碼架構,可由遮罩輸入中預測結構潛變量,並藉由受 KSVD 啟發的 codebook 對應機制,恢復具語意一致性的結構資訊。在紋理生成階段,我們進一步引入一個結構條件控制的紋理生成器,搭配 Gated Convolution、Cross-Attention 與 CGSM 模組,以結構 token 為引導合成細緻的高品質紋理,此設計方式能大幅降低參數量。
    我們在 CelebA-HQ 與 Places2 資料集上進行實驗,結果顯示本方法在 PSNR、SSIM、Inception Score(IS)與 FID 等指標上均表現出色,且相較於 GConv、MEDFE、LGNet與AOT 等現有方法,擁有明顯更少的參數量。值得注意的是,本方法在效能與模型大小之間展現出極高的性價比,具備實際應用的潛力。未來的研究方向將探討進一步引入紋理 codebook 資訊作為條件輸入,以增強紋理細節的生成效果。
    ;Image inpainting aims to fill in missing regions in an image with semantically coherent and visually realistic content. While traditional VQ-VAE-based approaches are effective at capturing structural and textural representations, they often suffer from quantization errors and low codebook utilization (i.e., dead-code issues).
    In this paper, we propose a structure-guided image inpainting framework that integrates a hierarchical NSVQ-VAE with a codebook-conditioned texture generation module. Our method introduces a Teacher–Student encoder architecture that predicts structure latent variables from masked inputs, and employs a KSVD-inspired codebook matching strategy to restore semantically meaningful structure representations.
    For the texture generation stage, we further develop a structure-conditioned texture generator composed of Gated Convolutions, Cross-Attention, and CGSM modules, which synthesizes fine-grained textures guided by the restored structure tokens. This design significantly reduces parameter complexity while preserving generation quality.
    Experiments conducted on the CelebA-HQ and Places2 datasets demonstrate that our method achieves superior performance in terms of PSNR, SSIM, Inception Score (IS), and FID, while requiring substantially fewer parameters compared to state-of-the-art methods such as GConv、MEDFE、LGNet and AOT. Notably, our approach achieves a high performance-to-parameter ratio, making it highly suitable for practical deployment. Future work will explore incorporating texture codebook information as an additional condition to further enhance the fidelity of generated textures.
    Appears in Collections:[Graduate Institute of Electrical Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML9View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明