基於多尺度特徵與控制網路的潛在擴散模型達到姿態轉換任務

DC 欄位	值	語言
DC.contributor	資訊工程學系	zh_TW
DC.creator	蘇嘉成	zh_TW
DC.creator	Su Chia Cheng	en_US
dc.date.accessioned	2024-7-18T07:39:07Z
dc.date.available	2024-7-18T07:39:07Z
dc.date.issued	2024
dc.identifier.uri	http://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=111522116
dc.contributor.department	資訊工程學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	近年來，生成式人工智慧的突出表現吸引了大量學者的研究興趣，在自然語言處理、圖像和音頻等領域掀起了一股熱潮。最為特別的是在圖像生成領域中，Diffusion Model 憑藉其卓越的性能在多個應用中取得了顯著的成果，如文生圖和圖生圖等。有鑑於此，本研究提出了一個全新的架構，使得 Diffusion Model 針對姿態轉換任務(Pose Transfer)擁有良好的表現，僅需憑藉參考圖和人體骨架圖即可實現精確的姿態轉換成果。然而，傳統的 Diffusion Model 是在像素級別上進行運算，來學習圖像特徵，這通常需要龐大的計算資源，僅僅是驗證模型的可行性和測試其性能就需耗時數日，對資源受限的研究單位而言，是一個重大的難題。為了解決這一瓶頸，本論文結合了 Latent Diffusion Model、ControlNet 和多尺度特徵擷取模組，並在注意力神經網路層中加入語意擷取濾波器，使得模型能夠專注於學習影像中最為重要的特徵和姿態之間的關係的同時，也降低運算資源，使得模型可以在RTX 4090 上有效地訓練。實驗結果表明，我們所提出的模型在硬體成本受限的情況下，能與其他基於 Diffusion Model 建構的模型匹敵，不只在姿態轉換準確度上有顯著地提升，也有效地減少了訓練以及圖像生成所耗費的時間。	zh_TW
dc.description.abstract	In recent years, generative AI has become popular in areas like natural language processing, image, and audio, significantly expanding AI′s creative capabilities. Particularly in the realm of image generation, Diffusion Models have achieved remarkable success across various applications, such as image synthesis and transformation. Therefore, the present study introduces a new framework that enables Diffusion Models to perform effectively in pose transfer tasks, requiring only a reference image and a human skeleton diagram to achieve precise pose transformations. However, traditional Diffusion Models operate at the pixel level when learning image features, inevitably demanding substantial computational resources. For organizations with limited resources, merely validating the feasibility of the model and testing its performance could take days, which is a major challenge. To address this issue, this paper integrates the Latent Diffusion Model, ControlNet, and a multi-scale feature extraction module, and incorporates a semantic extraction filter into the attention neural network layer. This allows the model to focus on important image features and the relationships between poses, and the architecture can be effectively trained on an RTX 4090. Experimental results demonstrate that our proposed method can compete with other Diffusion Model-based approaches under resource constraints, significantly improving pose transfer accuracy and effectively reducing the time required for training and image generation.	en_US
DC.subject	擴散模型	zh_TW
DC.subject	姿態轉換	zh_TW
DC.subject	OpenPose	zh_TW
DC.subject	生成影像	zh_TW
DC.subject	Diffusion Models	en_US
DC.subject	Pose Transfer	en_US
DC.subject	OpenPose	en_US
DC.subject	Image Generation	en_US
DC.title	基於多尺度特徵與控制網路的潛在擴散模型達到姿態轉換任務	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Pose Transfer with Multi-Scale Features Combined with Latent Diffusion Model and ControlNet	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 111522116 完整後設資料紀錄