博碩士論文 111522116 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊工程學系zh_TW
DC.creator蘇嘉成zh_TW
DC.creatorSu Chia Chengen_US
dc.date.accessioned2024-7-18T07:39:07Z
dc.date.available2024-7-18T07:39:07Z
dc.date.issued2024
dc.identifier.urihttp://ir.lib.ncu.edu.tw:444/thesis/view_etd.asp?URN=111522116
dc.contributor.department資訊工程學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract近年來,生成式人工智慧的突出表現吸引了大量學者的研究興趣,在自然語言處理、圖像和音頻等領域掀起了一股熱潮。最為特別的是在圖像生成領域中,Diffusion Model 憑藉其卓越的性能在多個應用中取得了顯著的成果,如文生圖和圖生圖等。有鑑於此,本研究提出了一個全新的架構,使得 Diffusion Model 針對姿態轉換任務(Pose Transfer)擁有良好的表現,僅需憑藉參考圖和人體骨架圖即可實現精確的姿態轉換成果。 然而,傳統的 Diffusion Model 是在像素級別上進行運算,來學習圖像特徵,這通常需要龐大的計算資源,僅僅是驗證模型的可行性和測試其性能就需耗時數日,對資源受限的研究單位而言,是一個重大的難題。為了解決這一瓶頸,本論文結合了 Latent Diffusion Model、ControlNet 和多尺度特徵擷取模組,並在注意力神經網路層中加入語意擷取濾波器,使得模型能夠專注於學習影像中最為重要的特徵和姿態之間的關係的同時,也降低運算資源,使得模型可以在RTX 4090 上有效地訓練。 實驗結果表明,我們所提出的模型在硬體成本受限的情況下,能與其他基於 Diffusion Model 建構的模型匹敵,不只在姿態轉換準確度上有顯著地提升,也有效地減少了訓練以及圖像生成所耗費的時間。zh_TW
dc.description.abstractIn recent years, generative AI has become popular in areas like natural language processing, image, and audio, significantly expanding AI′s creative capabilities. Particularly in the realm of image generation, Diffusion Models have achieved remarkable success across various applications, such as image synthesis and transformation. Therefore, the present study introduces a new framework that enables Diffusion Models to perform effectively in pose transfer tasks, requiring only a reference image and a human skeleton diagram to achieve precise pose transformations. However, traditional Diffusion Models operate at the pixel level when learning image features, inevitably demanding substantial computational resources. For organizations with limited resources, merely validating the feasibility of the model and testing its performance could take days, which is a major challenge. To address this issue, this paper integrates the Latent Diffusion Model, ControlNet, and a multi-scale feature extraction module, and incorporates a semantic extraction filter into the attention neural network layer. This allows the model to focus on important image features and the relationships between poses, and the architecture can be effectively trained on an RTX 4090. Experimental results demonstrate that our proposed method can compete with other Diffusion Model-based approaches under resource constraints, significantly improving pose transfer accuracy and effectively reducing the time required for training and image generation.en_US
DC.subject擴散模型zh_TW
DC.subject姿態轉換zh_TW
DC.subjectOpenPosezh_TW
DC.subject生成影像zh_TW
DC.subjectDiffusion Modelsen_US
DC.subjectPose Transferen_US
DC.subjectOpenPoseen_US
DC.subjectImage Generationen_US
DC.title基於多尺度特徵與控制網路的潛在擴散模型達到姿態轉換任務zh_TW
dc.language.isozh-TWzh-TW
DC.titlePose Transfer with Multi-Scale Features Combined with Latent Diffusion Model and ControlNeten_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明