姓名 江俊辰(Chun-Chen Chiang)
畢業系所 資訊工程學系
論文名稱 基於雙重注意力機制與人臉強化機制 之人體姿態遷移
(Enhancing Human Pose Transfer with Attention Mechanisms, Convolutional Block Attention Module and Facial Loss Optimization)
摘要(中) 在合成物體和場景的領域中,有許多相關技術可以適用於計算機圖形學、圖像重建、攝影以及視覺數據的生成。在合成視角時,我們經常遇到遮擋、照明變化和幾何失真等挑戰。當處理可變形物體,例如人類時,這些問題尤為突出。這些因素顯著增加了視角合成的複雜性。
在架構上,我們以Multi-scale Attention Guided Pose Transfer(MAGPT)模型為基礎,修改其中Residual Block,對其加入Convolutional Block Attention Module (CBAM) 並且將激活函數從Relu改為Mish以獲得更多關於衣服與人物膚色相關等特徵,並且對於原架構生成之圖片臉部特徵與原圖相比有所差異,對於此問題,我們提出兩種不同臉部特徵的損失函數可以分別幫助模型學到更精確的圖片特徵。最後,基於本系統的架構下,我們只要使用一張參考圖片,就可以讓使用者轉換成不同的動作影片。
摘要(英) In the field of synthesizing objects and scenes, many related techniques can be applied to computer graphics, image reconstruction, photography, and the generation of visual data. When synthesizing perspectives, we often encounter challenges such as occlusion, lighting changes, and geometric distortions. These issues are particularly pronounced when dealing with deformable objects, such as humans. These factors significantly increase the complexity of perspective synthesis.
In modern society, sports and dance not only contribute to physical health and quality of life but also serve as avenues for personal charm and artistic expression. For non-professionals, efficiently improving skills during leisure time poses a significant challenge. Pose transfer technology in deep learning, which transfers the motion and posture of one individual onto a provided reference movement, offers an innovative solution. This technology enables coaches and students to intuitively compare movement differences, allowing effective learning and correction of actions even without the presence of a coach. This paper presents a pose transfer system that generates related action images automatically by using reference images provided by users and selecting poses offered by the system, and it also allows users to download the videos locally.
In terms of architecture, our model is based on the Multi-scale Attention Guided Pose Transfer (MAGPT) model, with modifications to its Residual Block by integrating the Convolutional Block Attention Module (CBAM) and changing the activation function from Relu to Mish to capture more features related to clothing and skin color. Additionally, as the generated images had facial features differing from the original image, we propose two different facial feature loss functions to help the model learn more precise image features. Ultimately, with our system′s architecture, just one reference image is required to enable users to transform into different action videos.
關鍵字(中) ★ 姿態轉換
★ 生成對抗網路
關鍵字(英) ★ Pose Transfer
★ Generative Adversarial Network
論文目次 摘要 i
Abstract ii
圖目錄 v
表目錄 vi
第一章 緒論 1
1.1 研究背景與動機 1
1.2 相關文獻 2
1.3 系統架構 3
1.4 論文架構 4
第二章 文獻回顧 6
2.1 DeepFashion資料集 6
2.2 VGG-19 網路模型 7
2.3 SqueezeNet 網路模型 8
2.4 生成對抗網路 9
2.5 ADGAN 10
2.6 openpose 11
2.7 PatchGAN discriminator 12
第三章 研究方法 14
3.1 資料集 14
3.2 模型架構 15
3.2.1 生成模型架構 15
3.2.2 鑑別模型架構 24
3.3 損失函數 25
3.3.1 生成器損失函數 25
3.3.2 鑑別器損失函數 28
第四章 實驗結果 30
4.1 設備環境設定 30
4.2 資料集 30
4.3 驗證指標 31
4.3.1 Structural Similarity Index (SSIM) 31
4.3.2 Inception Score (IS) 31
4.3.3 Fréchet Inception Distance(FID) 32
4.3.4 SSD: Single Shot MultiBox Detector Score (DS) 33
4.3.5 PCKh 34
4.3.6 Perceptual Image Patch Similarity (LPIPS) 35
4.4 完整模型之實驗比較結果 35
4.4.1 定性實驗結果 36
4.4.2 定量實驗結果 38
4.4.3 實驗結果分析 38
4.5 消融實驗(Ablation Experiments) 40
4.5.1 加入CBAM之影響 40
4.5.2 激活函數改為Mish之影響 42
4.5.3 加入Head Region Loss之影響 43
4.5.4 加入Face Focused Loss之影響 45
4.6應用 47
4.6.1 應用資料集 47
4.6.2 自定義動作資料集轉換結果 49
第五章 結論與未來研究方向 53
參考文獻 54
指導教授 鄭旭詠(HSU-YUNG CHENG) 審核日期 2024-7-11
