姓名 劉亞昇(Ya-Sheng Liu)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 符合時間與場景描述之自動影像生成模型
(Automatic Nature Scene Image Generation with Time and Place Descriptions)
摘要(中) 隨著人工智慧的蓬勃發展,無論在影像辨識、語意辨識,影像生成…等等,機器學習都取得了優異的成果,「人工智慧」四個字顧名思義是要人類所創造出來的智慧,藉由讓電腦學習的方式來讓機器或電腦獲得一定的邏輯判斷能力,是目前我們所達到的,但如果微觀的角度去看人工智慧的發展其實還是未到達真正的智慧。
為了讓生成的圖片有更多的多樣性而不是特定的單幾種圖片,本篇利用圖片的隱藏層資訊來初始化RNN的Memory Cell來產生更豐富的圖片,從實驗結果中,比起直接套用先前研究的網路架構,加入這個方法確實有助於增加生成圖片的多樣性。
摘要(英) With the rapid development of artificial intelligence, machine learning has achieved excellent results in image recognition, semantic recognition, image generation, etc. The deep meaning of the words “artificial intelligence” are the wisdom of human being. Let the computer to learn the way to get a certain logical judgment ability, which is what we have achieved at present, but if we look at the development of artificial intel-ligence from a microscopic point of view, we still have not reached the true intelligence.
This paper is mainly to simulate the imagination of human brain. In the field of text-to-image, there have been some researches, such as StackGAN, StackGAN++ and AttnGAN in recent years, but their initial goal is to target bird dataset (CUB-200) and flower (102Flowers) dataset for training and optimization. Usually when people imag-ine a thing, they usually give a description of the thing. The ultimate goal of this paper is to produce a narrative photo with description. In present stage, we make neural-based network an ability of generating scene photos corresponded to the description and en-hance the diversity with our dataset.
In order to make the generated images more diverse than a specific single image, this paper uses the hidden layer information of the image to initialize the RNN Memory Cell to produce a narrative photo. From the experimental results, it indeed works. Comparing to the original AttnGAN architecture, our proposed method does help to increase the diversity of generated images.
關鍵字(中) ★ 對抗式生成網路
★ 影像生成
★ 注意力機制
★ 想像力機制
關鍵字(英) ★ GAN
★ Image Generation
★ Attention
★ Imagination
論文目次 摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
第1章 緒論 1
1.1 研究動機 1
1.2 相關文獻 1
1.3 論文架構 3
第2章 資料蒐集 4
2.1 資料來源 4
2.1.1 OpenImagesV4 4
2.1.2 Places 6
2.2 資料蒐集 7
2.3 SceneDataset資料集描述 10
第3章 研究方法與系統架構 13
3.1 Attentional Generative and Imaginative Networks 13
3.1.1 注意力機制 14
3.1.2 想像力機制 15
3.1.3 損失函數 20
3.2 Deep Attentional Multimodal Similarity Model 22
3.2.1 影像編碼器 22
3.2.2 文字編碼器 23
3.2.3 Attention-driven image-text matching score 24
3.2.4 損失函數 25
第4章 實驗結果 27
4.1 實驗環境與設備 27
4.2 驗證方法 27
4.2.1 Inception Score 28
4.2.2 Fréchet Inception Distance ( FID ) 29
4.3 方法比較與實驗結果 30
4.4 風格轉換 40
第5章 結論與未來工作 44
參考文獻 45
審核日期 2019-7-25
