符合時間與場景描述之自動影像生成模型

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：18.118.7.85

姓名

劉亞昇(Ya-Sheng Liu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

符合時間與場景描述之自動影像生成模型
(Automatic Nature Scene Image Generation with Time and Place Descriptions)

相關論文

★ 影片指定對象臉部置換系統	★ 以單一攝影機實現單指虛擬鍵盤之功能
★ 基於視覺的手寫軌跡注音符號組合辨識系統	★ 利用動態貝氏網路在空照影像中進行車輛偵測
★ 以視訊為基礎之手寫簽名認證	★ 使用膚色與陰影機率高斯混合模型之移動膚色區域偵測
★ 影像中賦予信任等級的群眾切割	★ 航空監控影像之區域切割與分類
★ 在群體人數估計應用中使用不同特徵與回歸方法之分析比較	★ 以視覺為基礎之強韌多指尖偵測與人機介面應用
★ 在夜間受雨滴汙染鏡頭所拍攝的影片下之車流量估計	★ 影像特徵點匹配應用於景點影像檢索
★ 自動感興趣區域切割及遠距交通影像中的軌跡分析	★ 基於回歸模型與利用全天空影像特徵和歷史資訊之短期日射量預測
★ Analysis of the Performance of Different Classifiers for Cloud Detection Application	★ 全天空影像之雲追蹤與太陽遮蔽預測

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-7-23以後開放)

摘要(中)

隨著人工智慧的蓬勃發展，無論在影像辨識、語意辨識，影像生成…等等，機器學習都取得了優異的成果，「人工智慧」四個字顧名思義是要人類所創造出來的智慧，藉由讓電腦學習的方式來讓機器或電腦獲得一定的邏輯判斷能力，是目前我們所達到的，但如果微觀的角度去看人工智慧的發展其實還是未到達真正的智慧。
本篇論文主要是想模擬人類大腦的想像能力來增加生成模型的多樣性，在text-to-image這方面領域其實已經有一些研究了，像是近年的StackGAN、StackGAN++和AttnGAN，只是他們初始的目標都是針對鳥類(CUB-200)資料集和花朵(102Flowers)資料集去做訓練和優化，通常人類在想像一個事物時，通常會給予該事物一個描述，本篇最終目標是利用這個描述產生一個有故事性的圖片，目前階段以蒐集場景的資料來使神經網路有能力產生一個符合描述的場景圖並加強多樣性。
為了讓生成的圖片有更多的多樣性而不是特定的單幾種圖片，本篇利用圖片的隱藏層資訊來初始化RNN的Memory Cell來產生更豐富的圖片，從實驗結果中，比起直接套用先前研究的網路架構，加入這個方法確實有助於增加生成圖片的多樣性。

摘要(英)

With the rapid development of artificial intelligence, machine learning has achieved excellent results in image recognition, semantic recognition, image generation, etc. The deep meaning of the words “artificial intelligence” are the wisdom of human being. Let the computer to learn the way to get a certain logical judgment ability, which is what we have achieved at present, but if we look at the development of artificial intel-ligence from a microscopic point of view, we still have not reached the true intelligence.
This paper is mainly to simulate the imagination of human brain. In the field of text-to-image, there have been some researches, such as StackGAN, StackGAN++ and AttnGAN in recent years, but their initial goal is to target bird dataset (CUB-200) and flower (102Flowers) dataset for training and optimization. Usually when people imag-ine a thing, they usually give a description of the thing. The ultimate goal of this paper is to produce a narrative photo with description. In present stage, we make neural-based network an ability of generating scene photos corresponded to the description and en-hance the diversity with our dataset.
In order to make the generated images more diverse than a specific single image, this paper uses the hidden layer information of the image to initialize the RNN Memory Cell to produce a narrative photo. From the experimental results, it indeed works. Comparing to the original AttnGAN architecture, our proposed method does help to increase the diversity of generated images.

關鍵字(中)

★ 對抗式生成網路
★ 影像生成
★ 注意力機制
★ 想像力機制

關鍵字(英)

★ GAN
★ Image Generation
★ Attention
★ Imagination

論文目次

摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
第1章緒論 1
1.1 研究動機 1
1.2 相關文獻 1
1.3 論文架構 3
第2章資料蒐集 4
2.1 資料來源 4
2.1.1 OpenImagesV4 4
2.1.2 Places 6
2.2 資料蒐集 7
2.3 SceneDataset資料集描述 10
第3章研究方法與系統架構 13
3.1 Attentional Generative and Imaginative Networks 13
3.1.1 注意力機制 14
3.1.2 想像力機制 15
3.1.3 損失函數 20
3.2 Deep Attentional Multimodal Similarity Model 22
3.2.1 影像編碼器 22
3.2.2 文字編碼器 23
3.2.3 Attention-driven image-text matching score 24
3.2.4 損失函數 25
第4章實驗結果 27
4.1 實驗環境與設備 27
4.2 驗證方法 27
4.2.1 Inception Score 28
4.2.2 Fréchet Inception Distance ( FID ) 29
4.3 方法比較與實驗結果 30
4.4 風格轉換 40
第5章結論與未來工作 44
參考文獻 45

參考文獻

[1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville and Yoshua Bengio, "Generative Adversarial Networks," NIPS, 2014.
[2] Diederik P Kingma and Max Welling, “Auto-encoding variational bayes,” ICLR, 2014.
[3] Mehdi Mirza and Simon Osindero, “Conditional Generative Adversarial Nets,” arXiv:1411.1784, 2014.
[4] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende and Daan Wierstra, “DRAW: A Recurrent Neural Network For Image Generation,” ICML, 2015.
[5] Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves and Koray Kavukcuoglu, “Conditional Image Generation with PixelCNN Decoders,” NIPS, 2016.
[6] Alec Radford, Luke Metz and Soumith Chintala, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” ICLR, 2016.
[7] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele and Honglak Lee, “Generative Adversarial Text to Image Synthesis,” ICML, 2016.
[8] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang and Dimitris Metaxas, “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks,” ICCV, 2017.
[9] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang and Dimitris Metaxas, “StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks,” arXiv:1710.10916, 2017.
[10] Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang and Xiaodong He, “AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks,” CVPR, 2018.
[11] Andrew Brock, Jeff Donahue and Karen Simonyan, “Large Scale GAN Training for High Fidelity Natural Image Synthesis,” arXiv, 2018.
[12] Tero Karras, Timo Aila, Samuli Laine and Jaakko Lehtinen, “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” arXiv:1710.10196, 2017.
[13] Scott Reed, Zeynep Akata, Bernt Schiele and Honglak Lee, “Learning Deep Representations of Fine-grained Visual Descriptions,” CVPR, 2016.
[14] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin, “Attention Is All You Need,” NIPS, 2017.
[15] Scott Reed, Zeynep Akata, Santosh Mohan, Samuel TenkaBernt Schiele and Honglak Lee, “Learning What and Where to Draw,” NIPS, 2016.
[16] Volodymyr Mnih, Nicolas Heess, Alex Graves and Koray Kavukcuoglu, “Recurrent Models of Visual Attention,” NIPS, 2014.
[17] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel and Yoshua Bengio, “show attend and tell neural image caption generation with visual attention,” ICML, 2015.
[18] Tu Dinh Nguyen, Trung Le, Hung Vu and Dinh Phung, “Dual Discriminator Generative Adversarial Nets,” arXiv:1709.03831, 2017.
[19] Lantao Yu, Weinan Zhang, Jun Wang and Yong Yu, “SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient,” AAAI, 2017.
[20] Jingjing Xu, Xuancheng Ren, Junyang Lin and Xu Sun, “DP-GAN: Diversity-Promoting Generative Adversarial Network for Generating Informative and Diversified Text,” EMNLP, 2018.
[21] Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Tom Duerig and Vittorio Ferrari, “The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale,” arXiv:1811.00982, 2018.
[22] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva and Antonio Torralba, “A 10 million Image Database for Scene Recognition,” IEEE, 2017.
[23] Martin Arjovsky, Soumith Chintala and Léon Bottou, “Wasserstein GAN”.arXiv:1701.07875.
[24] S Hochreiter and J Schmidhuber, “Long short-term memory,” Neural Computation, 1997.
[25] Qiantong Xu, Gao Huang, Yang Yuan, Chuan Guo, Yu Sun, Felix Wu and Kilian Weinberger, “An empirical study on evaluation metrics of generative adversarial networks,” arXiv:1806.07755, 2018.
[26] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler and Sepp Hochreiter, “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium,” NIPS, 2017.
[27] Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv:1810.04805v2, 2018.

指導教授

鄭旭詠(Hsu-Yung Cheng)

審核日期

2019-7-25

推文