摘要: | 在合成物體和場景的領域中,有許多相關技術可以適用於計算機圖形學、圖像重建、攝影以及視覺數據的生成。在合成視角時,我們經常遇到遮擋、照明變化和幾何失真等挑戰。當處理可變形物體,例如人類時,這些問題尤為突出。這些因素顯著增加了視角合成的複雜性。 而在現代社會中,運動和舞蹈不僅有助於提升身體健康和生活品質,也是展現個人魅力和藝術表現的途徑。對非專業人士來說,有效率地在閒暇時間提升技能是一大挑戰。深度學習中的姿態轉換技術,是將一人的動作姿態轉移到提供的參考動作上,提供了一種創新解決方案。這技術讓老師與學員能直觀比較動作差異,即使在無人指導的情況下,也能有效學習和修正動作。本篇論文提供一個姿態轉換系統,藉由使用者提供參考圖片與選擇本系統提供之姿態,讓系統自動生成出相關動作的人物圖片,並且可以提供使用者下載成影片於本地端。 在架構上,我們以Multi-scale Attention Guided Pose Transfer(MAGPT)模型為基礎,修改其中Residual Block,對其加入Convolutional Block Attention Module (CBAM) 並且將激活函數從Relu改為Mish以獲得更多關於衣服與人物膚色相關等特徵,並且對於原架構生成之圖片臉部特徵與原圖相比有所差異,對於此問題,我們提出兩種不同臉部特徵的損失函數可以分別幫助模型學到更精確的圖片特徵。最後,基於本系統的架構下,我們只要使用一張參考圖片,就可以讓使用者轉換成不同的動作影片。 ;In the field of synthesizing objects and scenes, many related techniques can be applied to computer graphics, image reconstruction, photography, and the generation of visual data. When synthesizing perspectives, we often encounter challenges such as occlusion, lighting changes, and geometric distortions. These issues are particularly pronounced when dealing with deformable objects, such as humans. These factors significantly increase the complexity of perspective synthesis. In modern society, sports and dance not only contribute to physical health and quality of life but also serve as avenues for personal charm and artistic expression. For non-professionals, efficiently improving skills during leisure time poses a significant challenge. Pose transfer technology in deep learning, which transfers the motion and posture of one individual onto a provided reference movement, offers an innovative solution. This technology enables coaches and students to intuitively compare movement differences, allowing effective learning and correction of actions even without the presence of a coach. This paper presents a pose transfer system that generates related action images automatically by using reference images provided by users and selecting poses offered by the system, and it also allows users to download the videos locally. In terms of architecture, our model is based on the Multi-scale Attention Guided Pose Transfer (MAGPT) model, with modifications to its Residual Block by integrating the Convolutional Block Attention Module (CBAM) and changing the activation function from Relu to Mish to capture more features related to clothing and skin color. Additionally, as the generated images had facial features differing from the original image, we propose two different facial feature loss functions to help the model learn more precise image features. Ultimately, with our system′s architecture, just one reference image is required to enable users to transform into different action videos. |