多種不同生成對抗網路的瑕疵影像合成

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：40

、訪客IP：3.133.156.54

姓名

蔡明勳(Ming-Hsun Tsai) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

多種不同生成對抗網路的瑕疵影像合成
(Defect image synthesis using various generative adversarial nets)

相關論文

★ 適用於大面積及場景轉換的視訊錯誤隱藏法	★ 虛擬觸覺系統中的力回饋修正與展現
★ 多頻譜衛星影像融合與紅外線影像合成	★ 腹腔鏡膽囊切除手術模擬系統
★ 飛行模擬系統中的動態載入式多重解析度地形模塑	★ 以凌波為基礎的多重解析度地形模塑與貼圖
★ 多重解析度光流分析與深度計算	★ 體積守恆的變形模塑應用於腹腔鏡手術模擬
★ 互動式多重解析度模型編輯技術	★ 以小波轉換為基礎的多重解析度邊線追蹤技術(Wavelet-based multiresolution edge tracking for edge detection)
★ 基於二次式誤差及屬性準則的多重解析度模塑	★ 以整數小波轉換及灰色理論為基礎的漸進式影像壓縮
★ 建立在動態載入多重解析度地形模塑的戰術模擬	★ 以多階分割的空間關係做人臉偵測與特徵擷取
★ 以小波轉換為基礎的影像浮水印與壓縮	★ 外觀守恆及視點相關的多重解析度模塑

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-1以後開放)

摘要(中)

深度學習應用中不論是偵測或辨識，影響預測正確率的最大因素就是資料，特別是訓練資料。因此收集資料是深度學習中最重要的課題，就二分類的問題而言，資料須要達到一定比例的平衡。如果正確樣本的資料很多，瑕疵樣本的資料很少，預測或測試的結果很容易因資料的不均衡 (imbalance) 而產生錯誤，所以本研究將採用生成對抗網路，來產生適當的瑕疵樣本以解決資料不平衡的問題。
本研究大致分為二個步驟，第一步驟為生成合理的瑕疵樣本，第二步驟為合成瑕疵樣本的驗證與比較。
由於生成對抗網路和其他生成模型相比多了一個判別器負責監督網路訓練，所以生成樣本的效果好，但訓練困難；除了很有可能遇到梯度消失或梯度爆炸問題外，也很容易遇到模型崩塌也就是生成影像缺乏多樣性的問題。我們使用了不同的訓練方式，希望能讓生成對抗網路訓練更穩定。
除了訓練穩定外，我們還希望生成樣本能夠具有特殊的瑕疵特徵，因此我們從不同的觀點創造兩種不同網路架構，來學習瑕疵特徵合成瑕疵影像。第一我們使用適應性實例正規化 (adaptive instance normalization) 來學習每個解析度層的特徵，由於每個不同解析度表達的特徵不盡相同，所以我們從中觀察出每個解析度表達的意義，使得生成樣本更具意義。另外，我們使用調變 (modulation) 與解調變 (demodulation) 的方法，學習每個解析度層的特徵，並透過捷徑連接和殘差網路提升生成樣本的品質。第二種網路架構我們使用變分自動編碼生成對抗網路，透過編碼器和解碼器能從中學習到瑕疵特徵，訓練完成後輸入正常樣本使得正常樣本融合瑕疵特徵，得到我們想要的瑕疵影像；最後，更換變分自動編碼器架構提升生成效果。
在實驗中，我們使用了風格生成對抗網路做風格混合，在正常樣本和瑕疵樣本的亂數組合中，找出最佳的搭配；另外我們結合自動編碼器與生成對抗網，沒有更改網路架構的情況下，生成樣本與真實樣本的平均 SSIM 數值為0.3917。加入殘差自我注意力層後，平均 SSIM 提升至0.5039，最後加入殘差網路架構平均 SSIM 提升至0.6157。

摘要(英)

Whether it is detection or identification in deep learning applications, the biggest factor affecting prediction accuracy is amount of data, especially training data. Therefore, collecting data is the most important issue in deep learning. As far as the problem of binary classification, data need to achieve balance. If there are a lot of data from correct samples and few data from defective samples, the results of prediction or testing are likely to be wrong due to imbalance of data. In our research, we will use the generative adversarial network to generate properly defective samples to solve the problem of data imbalance.
our research is divided into two steps. The first step is to generate reasonable defective samples, and the second step is to verify and compare the synthetic defective samples.
The results of the generative adversarial network are good, but training is difficult. In addition to the possibility of encountering gradient disappearance or gradient explosion, it is also easy to encounter the problem of model collapse which is the lack of diversity in the generated images. We used different training methods, expected to lead the network more stable during training.
In addition to training stability, we also expect that the generated samples can have particularly defective features. we create four different network architectures from different perspectives to learn the defective features and synthesize defective images. First, we use adaptive instance normalization to learn the features of each resolution layer. Because features expressed by each different resolution are not the same, we observe the meaning of each resolution expression, then make the generated samples more meaningful. Second, we use modulation and demodulation methods to learn the characteristics of each resolution layer, and improve the quality of the generated samples through skip connection and residual network. Third, we use adversarial variational autoencoder to learn the defect features from the encoder and decoder. After training, we input the normal samples, so that the normal samples are fused with the defect features to get the defective images which we want. Finally, replace the adversarial variational autoencoder architecture to improve the generative effect.
In our experiment, we used styleGAN to mix style, and found the best match among the random number combination of normal samples and defective samples. In addition, we used adversarial variational autoencoder to replace different architectures. Since then, the SSIM value has improved significantly.

關鍵字(中)

★ 生成對抗網路
★ 瑕疵影像合成
★ 風格生成對抗網路
★ 變分自動編碼器

關鍵字(英)

★ generative adversarial nets
★ Defect image synthesis
★ StyleGAN
★ VAE

論文目次

摘要 i
Abstract iii
致謝 v
目錄 vi
圖目錄 viii
表目錄 x
第一章緒論 1
1.1 研究動機 1
1.2 系統架構 2
1.3 論文特色 3
1.4 論文架構 4
第二章相關研究 5
2.1 生成對抗網路 5
2.2 不平衡數據 8
第三章合成瑕疵樣本的生成對抗網路 10
3.1 生成對抗網路 10
3.2 深度卷積生成對抗網路 16
3.3 風格生成對抗網路 27
3.4 第二代風格生成對抗網路 34
3.5 殘差自我注意力層 39
3.6 變分自動編碼器 44
3.7 改進的生成對抗網路架構 49
第四章實驗與結果 54
4.1 實驗設備與實驗環境 54
4.2 實驗資料 54
4.3 資料前處理 55
4.4 實驗方法 58
4.5 實驗評估準則 59
4.6 實驗與結果 60
第五章結論與未來展望 68
參考文獻 69

參考文獻

[1] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. of Neural Information Processing Systems, Quebec, Canada, Dec.8-15, 2014, pp.2672-2680.
[2] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv:1511.06434.
[3] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv:1411.1784.
[4] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-toimage translation with conditional adversarial networks,” arXiv:1611.07004v3.
[5] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, “Autoencoding beyond pixels using a learned similarity metric,” arXiv:1512.09300.
[6] C.-Y. Liou, W.-C. Cheng, J.-W. Liou, and, D.-R. Liou, "Autoencoder for words,“ Neurocomputing, vol.139, pp.84-96, Sep. 2014.
[7] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” arXiv:1312.6114.
[8] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv:1703.10593v6.
[9] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv:1710.10196.
[10] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” arXiv:1812.04948.
[11] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” arXiv:1106.1813.
[12] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” arXiv:1708.02002.
[13] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems (NIPS), Lake Tahoe, NV, Dec.3-8, 2012, pp.1097-1105.
[14] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, “Deconvolutional networks,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, Jun.13-18, 2010, pp.2528-2535.
[15] N. Chigozie Enyinna, I. Winifred, G. Anthony, and M. Stephen, “Activation functions: comparison of trends in practice and research for deep learning,” arXiv:1811.03378.
[16] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. of ICML Conf. , Haifa, Israel, Jun.21-24, 2010, pp.807-814.
[17] M. Andrew L, H. Awni Y, and N. Andrew Y, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. of ICML Conf. , Atlanta, GA, Jun.16-21, 2013, pp.1-6.
[18] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” arXiv:1502.03167.
[19] X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” arXiv:1703.06868v2.
[20] V. Dumoulin, J. Shlens, and M. Kudlur, “A learned representation for artistic style,” arXiv:1610.07629v5.
[21] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin, and J. Shlens, “Exploring the structure of a real-time, arbitrary neural artistic stylization network,” arXiv:1705.06830v2.
[22] D. Ulyanov and A. Vedaldi, “Instance normalization: the missing ingredient for fast stylization,” arXiv:1607.08022v3.
[23] R. Zhang, “Making convolutional networks shift-invariant again,” arXiv:1904.11486v2.
[24] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation,” arXiv:1711.09020v3.
[25] V. Kazemi and J. Sullivan, "One millisecond face alignmentwith an ensemble of regression trees," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Ohio, Jun.24-27, 2014, pp.1867-1874.
[26] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of StyleGAN,” arXiv:1912.04958v2.
[27] T. Salimans and D.-P. Kingma, “Weight normalization: A simple reparameterization to accelerate training of deep neural networks,” arXiv:1602.07868v3.
[28] A. Karnewar, O. Wang, and R.-S. Iyengar, “MSG-GAN: multi-scale gradient GAN for stable image synthesis,” arXiv:1903.06048v3.
[29] O. Ronneberger, P. Fischer, and T. Brox, "UNet: Convolutional networks for biomedical image segmentation," in Proc. Medical Image Computing and ComputerAssisted Intervention (MICCAI), Munich, Germany, Oct.5-9, 2015, pp.234-241.
[30] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
[31] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” arXiv:1704.00028.
[32] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” arXiv:1802.05957.
[33] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” arXiv:1805.08318.
[34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.-N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv:1706.03762v5.
[35] I. Sutskever, O. Vinyals, and Q.-V. Le, “Sequence to sequence learning with neural networks,” arXiv:1409.3215v3.
[36] Y. Xia, D. He, T. Qin, L. Wang, N. Yu, T.-Y. Liu, and W.-Y. Ma, “Dual learning for machine translation,” arXiv:1611.00179v1.
[37] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A convolutional neural network for modelling sentences,” arXiv:1404.2188v1.
[38] K. Cho, B. v. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN Encoder-Decoder for statistical machine translation,” arXiv:1406.1078v3.
[39] H. Mi, Z. Wang, and A. Ittycheriah, “Vocabulary manipulation for Neural machine translation,” arXiv:1605.03209v1.
[40] F. Hill, K. Cho, S. Jean, C. Devin, and Y. Bengio, “Embedding word similarity with neural machine translation,” arXiv:1412.6448v4.

指導教授

曾定章

審核日期

2020-7-29

推文