基於狀態空間之輕量化人臉影像修復模型;A Lightweight SSM-based Network for Facial Image Inpainting

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/98393

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98393

Title:	基於狀態空間之輕量化人臉影像修復模型;A Lightweight SSM-based Network for Facial Image Inpainting
Authors:	李奕臻;Lee, Yi-Jhen
Contributors:	資訊工程學系
Keywords:	影像修復;輕量化;Mamba;Image Inpainting;Light-weight;Mamba
Date:	2025-07-28
Issue Date:	2025-10-17 12:43:36 (UTC+8)
Publisher:	國立中央大學
Abstract:	現今的影像修復模型不僅能夠考量背景的特徵，也能夠提取物件的外觀與整體結構，更為細緻地修復被遮蔽之影像，產生合理且自然的結果。然而這些模型仰賴龐大的參數量與複雜架構，不僅造成訓練與推理時間大幅上升，也提高了計算成本與硬體需求。針對此問題，本研究認為各種降低計算成本的方法也難以改善架構本身造成的瓶頸，提出一個基於狀態空間模型Mamba的影像修復模型，透過架構的設計與模型訓練方法的改良，達到輕量化的目標。模型整體架構設計為Coarse-to-Fine，初期之Coarse Inpainting Block會先重建影像的大致結構，再接續Fine Inpainting Block修補影像的細節，並採用單階段的端對端訓練流程以完成模型的訓練。從在FFHQ資料集上的實驗結果可以證明，本研究成功達到輕量化的目標，模型僅需基準模型ICT參數量的12%，訓練時間快了ICT 4.5倍，在遮罩比例為40%至60%的情況下，推理時間更是快了它737倍，在PSNR評估指標的分數也有所提升，並且在修復細節的呈現更為合理且細緻。在CelebA-HQ資料集上與多種類型的模型相比，則是在量化表現與視覺化結果都有明顯優勢，與分數最高之SEM-Net和BAT比較，在僅需它們8%和12%參數量的情況下，可取得相近，甚至更優異的分數；在相同的硬體設備上測試，也展現了更為迅速的推理能力，同樣在40%至60%的遮罩比例下，本模型的推理時間分別快了它們149倍和213倍，證明本研究所提出之模型具備良好的修復能力，也更為輕量且迅速。實驗成果表明了本文提出之模型在各方面表現都具有競爭性，為影像修復領域提供了一種更為輕量的模型。;Modern image inpainting models not only capture background features but also extract the external appearance and global structure of objects, enabling them to restore occluded regions of an image more naturally and with finer details. However, these models often rely on a large number of parameters and complex architectures, which significantly increase training and inference time, as well as computational cost and hardware requirements. In response to these challenges, our study argues that various existing methods for reducing computational cost are insufficient to overcome the architectural bottleneck, and thus proposes an image inpainting model based on the state space model Mamba. Through architectural design and improved training strategies, the model aims to achieve lightweight performance. The overall model is designed with a coarse-to-fine structure: the Coarse Inpainting Block reconstructs the rough structural layout of the image, followed by the Fine Inpainting Block to refine visual details. A single-stage end-to-end training process is used to complete the model training. Experimental results on the FFHQ dataset demonstrate that the proposed model successfully achieves its lightweight goal. It requires only 12% of the parameters used by the baseline model ICT, and achieves a 4.5× faster training time. Under mask ratios ranging from 40% to 60%, the model performs 737× faster in inference, while also yielding higher PSNR scores and more accurate and detailed restoration results. When evaluated on the CelebA-HQ dataset against a variety of model types, the proposed model shows clear advantages in both quantitative performance and visual quality. Compared to the top-performing models SEM-Net and BAT, our model achieves comparable or even better scores while using only 8% and 12% of their parameters, respectively. On the same hardware setup, the proposed model also exhibits significantly faster inference, running 149× and 213× faster than SEM-Net and BAT, respectively, under 40% to 60% mask conditions. These results confirm that the proposed model offers strong inpainting capability while being significantly more lightweight and efficient. The experimental results confirm that the proposed model is competitive in multiple aspects, providing a lightweight solution for the image inpainting domain.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	24	View/Open

社群 sharing

Loading...