近年對基於深度學習的3D領域發展以極快的速度發展中,科技開始從2D平面的領域擴 展至3D立體的層面。隨著3D研究的發展,已有開始許多點子想利用3D才可呈現的立 體功能,在以往的基礎上進一步地加強畫面的呈現或者應用,例如:已有快速利用人物圖 片生成對應的3D模型,並用於表現出真實人物的動作、姿勢,或者利用3D重建技術, 建立影像的人物及物件。 然而,深度學習的領域中,往往需要大量資料集給予AI模型進行學習,而資料集的 數量和多樣性往往會影響AI模型的後續表現和應用成效,因此在資料集的使用上往往需 要應用各種方法獲取及利用,但這部分往往在3D深度學習領域中更嚴重,不像2D圖像 或者語音已有著大量資料集,3D領域的資料往往較為稀缺,同時由於3D領域在相對於 2D 空間中複雜度較高,僅用單一的2D圖像影像資料往往無法還原實際的3D環境場景, 最常見的問題即是如何的將結果收斂至準確的3D領域中。 為了解決此類的問題,本論文建構了一個利用2D影像建立對應3D物體的作法,利 用多個AI模型進行資料處理,配合有著領域自適應特性的模型,最後利用損失函數進一 步規範生成的結果,使其可以在一定的範圍之內可以生成與現實生活中相似或者近似的結 果;In recent years, the developmentof3Dtechnologybasedondeeplearninghasbeenprogressingatan extremely rapid pace, withtechnologyexpandingfrom2Dplanardomainsto3Dspatialdimensions. As 3D research advances, many ideas have emerged that leverage the unique capabilities of 3D to enhance visual representation and applications. For example, there are now techniques to quickly generate corresponding 3D models from human images, which can be used to realistically depict human movements and poses. Additionally, 3D reconstruction technology can be used to create images of people and objects. However, in the field of deep learning, a significant amount of data is often required for AI models to learn effectively. The quantity and diversity of datasets greatly influence the subsequent performance and application effectiveness of AI models. This issue is particularly severe in the realm of 3Ddeeplearning. Unlike 2D images or audio, where there are abundant datasets available, 3D data is often scarce. Due to the higher complexity of 3D spaces compared to 2D, a single 2D imageisusuallyinsufficienttoaccuratelyreconstructtheactual3Denvironment. Themostcommon challenge is how to converge the results to an accurate 3D domain. To address these issues, this paper constructs a method to establish corresponding 3D objects from 2D images. It utilizes multiple AI models for data processing and incorporates models with domain adaptation capabilities. Finally, it employs a loss function to further constrain the generated results, ensuring that the generated outputs are similar or approximate to real-life objects within a certain range