基於遮蔽生成式Transformer的風格生成表達;基於遮蔽生成式Transformer的風格生成表達

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/99388

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/99388

題名:	基於遮蔽生成式Transformer的風格生成表達;基於遮蔽生成式Transformer的風格生成表達
作者:	李勁廷;TING, LEE CHIN
貢獻者:	資訊工程學系碩士班
關鍵詞:	表情轉換;表情生成;facial expression;style expression
日期:	2026-02-25
上傳時間:	2026-03-06 18:52:15 (UTC+8)
出版者:	國立中央大學
摘要:	本研究利用一種數據驅動的方法，用於從中性表情生成個人化的微笑風格圖像，目標在於產生多樣化的微笑風格，同時保留個人的臉部特徵。相較於傳統需要耗費大量人力進行臉部屬性標註和合成的生成模型不同，我們的方法先利用表情分類器(Emotion classifier)搭配GradCAM[1]來自動提取與身份無關的臉部表情注意力區域，這些關鍵區域接著用於引導一個遮蔽生成式 Transformer（Masked Generative Transformer）。實現更具表情一致性與個人特徵保留的圖像生成。整體架構包含兩個階段:在第一階段，我們採用VQGAN[2]來提取圖像的潛在空間標記(latent tokens)，同時融合表情分類器與GradCAM[1]產生的臉部表情注意力圖。透過這些來引導Transformer重建潛在標記，以生成初步的微笑圖像。第二階段，專注於超解析度（super-resolution），我們將更高解析度的圖像輸入到另一個VQGAN[2]提取高解析度的潛在空間標記。這些高解析度潛在空間標記，同時融入前一個階段的低解析度標記，還有表情注意力圖。此階段的Transformer將整合第一階段生成的低解析度標記及高解析度標記資訊，以還原高品質及細節豐富的最終圖像。實驗結果顯示，我們的方法在保留個人面部特徵的同時，能有效產生多樣且自然的微笑風格圖像，展現其於個性化表情生成領域的潛力。;This paper presents a data-driven approach for generating personalized smiling facial images from a single neutral expression. The objective is to synthesize diverse smile styles while preserving individual facial characteristics. In contrast to conventional generative models that rely heavily on manual annotation of facial attributes and synthesis procedures, the proposed method utilizes an emotion classifier combined with Grad-CAM to automatically extract identity-independent facial expression attention regions. These regions are subsequently used to guide a Masked Generative Transformer, enabling expression-consistent and identity-preserving image generation. The proposed framework comprises two stages. In the first stage, VQGAN is employed to extract latent tokens from the input image. Expression attention maps, derived from the emotion classifier and Grad-CAM, are integrated to guide a Transformer in reconstructing the masked tokens and generating an initial smiling image. In the second stage, which addresses super-resolution, a higher-resolution version of the image is processed through another VQGAN to extract high-resolution latent tokens. These are combined with the low-resolution tokens from the first stage and the expression attention maps. A second Transformer then reconstructs the final high-quality image by fusing both low-resolution and high-resolution information. Experimental results show that the proposed method effectively generates diverse and natural smile styles while preserving individual facial identity, demonstrating its potential in the domain of personalized facial expression synthesis.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	242	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....