結合Perlin Noise 與 Diffusion Model 的山水風景動畫生成;Animating Landscapes: Integrating Perlin Noise and Diffusion Model for ShanShui Scenery

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/95480

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95480

題名:	結合Perlin Noise 與 Diffusion Model 的山水風景動畫生成;Animating Landscapes: Integrating Perlin Noise and Diffusion Model for ShanShui Scenery
作者:	徐紹恩;Hsu, Shou-En
貢獻者:	資訊工程學系
關鍵詞:	自動動畫生成;自動圖畫生成;深度學習;人工智慧;山水風景;擴散模型;GPT;Perlin噪聲;AnimateDiff;Stable Diffusion;Automatic Animation Generation;Automatic Image Generation;Deep Learning;Artificial Intelligences;ShanShui Landscapes;Diffusion Model;GPT;Perlin Noise;AnimateDiff;Stable Diffusion
日期:	2024-07-13
上傳時間:	2024-10-09 16:53:39 (UTC+8)
出版者:	國立中央大學
摘要:	當前深度學習模型在生成東方山水風景圖像時表現不佳，主要原因是現有訓練資料集中此類數據量有限，導致模型難以準確學習和理解其特徵和結構。為了解決這一問題，我們提出了幾種改進策略。首先，我們旨在擴充東方山水風景數據集，以提高模型對該類型風景的理解能力。其次，我們計劃利用DreamBooth的微調技術對DiffusionModel進行調整，並引入更多與東方山水風景相關的提示詞，如地形特徵、植被分佈和色彩風格等。此外，我們提議開發一個山水骨架生成模組，使用PerlinNoise生成骨架圖，並通過ControlNet 限制Diffusion Model，使生成圖像在骨架圖的基礎上增加色彩，從而增強畫面豐富度。為了讓Diffusion Model 更好地理解我們所需生成的圖像，我們在生成圖片的提示詞中使用GPT-4來擴充這些提示詞。我們將骨架圖輸入GPT-4，生成對於骨架圖結構的詳細敘述，並根據這些敘述進行對應風格上的改寫。這樣可以讓DiffusionModel更好地理解使用者所需的提示詞，並改善提示詞描述不清楚的問題。為了進一步提升生成圖像的品質，我們引入了TextualInversion，特別應用於負面提示，以改善DiffusionModel對於品質不佳圖像的理解，從而避免生成低質量的圖像。此外，將Diffusion Model 生成的著色圖進行I2V編碼處理，生成影片Diffusion Model（AnimateDiff）的輸入，最終由 AnimateDiff 生成生動的動畫。這一整合的工作流程允許用戶靈活調整各步驟以滿足特定需求，例如在骨架圖上添加物件、調整著色提示詞以及設置動畫框架的個別提示詞。我們還比較和評估了AnimateDiff中不同版本的運動模塊，以確定最適合我們系統的版本。實驗結果表明，引入TextualInversion顯著提升了圖像生成的品質。我們進一步比較了不同版本的ControlNet對生成圖像的影響。在附錄部分，我們提供了幾種靈活調整生成圖像的方法，並附上相應的結果。通過針對性的研究和創新方法的應用，我們有信心提升生成效果，從而豐富和拓展深度學習在圖像生成領域的應用，為人們帶來更多樣化和高品質的視覺體驗。;Current deep learning models exhibit suboptimal performance in generating ShanShui landscape images, primarily due to the limited representation of such data in existing training datasets. This deficiency hampers the models’ ability to accurately learn and understand the distinctive features and structures of ShanShui landscapes. To address this issue, we propose several improvement strategies. Firstly, we aim to expand the dataset with more ShanShui landscape images to enhance the model’s comprehension of this specific type of scenery. Secondly, we plan to fine-tune the Diffusion Model using DreamBooth and introduce additional prompts related to ShanShui landscapes, such as terrain features, vegetation distribution, and color styles. Additionally, we propose developing a landscape skeleton generation module that employs Perlin Noise to create skeleton images. By leveraging ControlNet to constrain the Diffusion Model, we can enrich the visual output by adding color to these skeleton images. To further improve the model’s understanding of the desired output, we utilize GPT-4 to augment the prompts used for generating images. Specifically, we input the skeleton images into GPT-4 to generate detailed descriptions of their structure, which are then used to refine the corresponding style prompts. This approach enhances the Diffusion Model’s comprehension of user requirements and mitigates issues related to unclear prompt descriptions. To further enhance the quality of generated images, we incorporate Textual Inversion, particularly applied to negative prompts, to improve the Diffusion Model’s understanding of low-quality images and help avoid generating them. Furthermore, the colored images generated by the Diffusion Model are processed through anI2VEncodertocreateinputdataforavideoDiffusionModel(AnimateDiff), whichultimately produces vivid animations. This integrated workflow allows users to flexibly adjust each step to meet specific needs, such as adding objects to the skeleton image, modifying coloring prompts, and setting individual prompts for specific animation frames. Wealsocompareandevaluate various versions of the Motion Module within AnimateDiff to identify the most suitable version for our system. Experimental results demonstrate that the incorporation of Textual Inversion significantly enhances image generation quality. Addition ally, we compare the effects of different versions of ControlNet on the generated images. In the appendix, we provide several methods for flexibly adjusting the generated images, along with the corresponding results. Through targeted research and the application of innovative methods, we are confident in our ability to improve generation performance, thereby enriching the diversity and quality of visual experiences in the field of deep learning image generation.
顯示於類別:	[資訊工程研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	14	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....