自我注意力殘差U網路的物體表面瑕疵分割

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：11

、訪客IP：18.221.11.68

姓名

莊華澎(Hua-Peng Chuang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

自我注意力殘差U網路的物體表面瑕疵分割
(Self-attention residual U-Net for surface defect segmentation)

相關論文

★ 適用於大面積及場景轉換的視訊錯誤隱藏法	★ 虛擬觸覺系統中的力回饋修正與展現
★ 多頻譜衛星影像融合與紅外線影像合成	★ 腹腔鏡膽囊切除手術模擬系統
★ 飛行模擬系統中的動態載入式多重解析度地形模塑	★ 以凌波為基礎的多重解析度地形模塑與貼圖
★ 多重解析度光流分析與深度計算	★ 體積守恆的變形模塑應用於腹腔鏡手術模擬
★ 互動式多重解析度模型編輯技術	★ 以小波轉換為基礎的多重解析度邊線追蹤技術(Wavelet-based multiresolution edge tracking for edge detection)
★ 基於二次式誤差及屬性準則的多重解析度模塑	★ 以整數小波轉換及灰色理論為基礎的漸進式影像壓縮
★ 建立在動態載入多重解析度地形模塑的戰術模擬	★ 以多階分割的空間關係做人臉偵測與特徵擷取
★ 以小波轉換為基礎的影像浮水印與壓縮	★ 外觀守恆及視點相關的多重解析度模塑

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-1以後開放)

摘要(中)

利用機器視覺檢測產品的瑕疵是一個被廣泛討論許久的問題，特點是能提高生產效率及自動化程度。在一些不適合人工作業的環境中也能代替人眼檢測，除了提高效率，也不會如人眼因長時間使用而產生視覺疲勞，更能在大量重複性的作業中維持較好的效率與品質。
對於電腦鍵盤之鍵帽瑕疵偵測的議題，我們採取深度學習模式的演算法。深度學習本身是模仿人類神經網路的構造層層搭建。在經過訓練前，裡頭的參數都是隨機初始化的值。需要大量的訓練資料和適當定義的誤差函數作為網路學習的依據，在不斷的迭代訓練中調整參數，最後得到一個能夠依照輸入給出期望推論結果的網路模型。
本研究目標是以語義分割 (semantic segmentation) 網路模式找出瑕疵區塊，我們採用對稱的U-Net模型進行修改。修改的核心是調整網路編碼與解碼的網路連接方式與卷積層數，並在網路中加入自我注意力 (self-attention) 機制，更進一步提高網路的學習效果。編碼解碼修改的方式是依然固定對稱的結構，使用深層卷積神經網路中經常被使用的殘差 (residual) 區塊，並衡量效率調整卷積層數。自我注意力機制分別運用在網路深層的高階特徵強化和上採樣融合低階特徵兩個地方。對於高階特徵強化的部份，具體而言是在編碼器輸出特徵圖後，運用了兩個自我注意力模組分別對其做了位置與通道的自我注意力強化。融合低階特徵強化則是在解碼器中，高低階特徵合併前先做一次自我注意力強化後才並聯。
修改後的網路相較於U-Net，除了不會增加太多的硬體成本之外，還能有效提高原本U-Net在細小區塊的物體表面瑕疵上之分割能力。實驗中使用的資料集為同款鍵盤上的659張鍵帽影像。我們將其中的594張和剩餘的65張分別做為訓練集與測試集，並使用資料擴增方法將訓練集的樣本總數提高至4752張。針對模型方面，我們比較了多種不同網路架構及自我注意力機制。由殘差區塊組成之對稱的編碼器和解碼器提升了1%的召回率 (recall)；連接在編碼器之後的位置與通道注意力模組分別提升了4%及2%的召回率；於上採樣融合高低階特徵階段加入的全域注意力上採樣提升了2%的召回率。最終版本的瑕疵分割網路結合上述所有部分，在實驗集上獲得85%的MIoU以及召回率。

摘要(英)

The use of methods based on computer vision to detect product defects is an issue that has been widely discussed for a long time. It is characterized by the efficiency of production and the degree of automation. Furthermore, it can also replace human eye detection in some environments that are not suitable for manual work. In addition to improving efficiency, it will not cause visual fatigue as human eye after a long time using. It can maintain better efficiency and quality in a long-time repetitive works.
To solve the problem of detecting tiny defects on object surface, we adopt the deep learning methods. Deep learning is one kind of algorithm that imitates the structure of neural networks. Before training, parameters in the model are all randomly initialized values. It requires a large amount of training data and a fine-defined loss function to make network learning. Step by step, adjusts the parameters in every iterative training, and finally obtains a model that can give the expected outputs according to the input.
The goal of this research is to find the defective region on object surface using semantic segmentation. We propose our symmetric model based on U-Net. The main ideas to make encoder and decoder stronger, and add self-attention mechanism to the model to further improve the learning effect. We still keep the model in a symmetric structure, but use the residual blocks as a replacement of continuous convolutional computing without skip connection. More, we adjust the number of convolutional layers to fit the task of detecting tiny defects on object surface. Self-attention mechanism is applied in two part of our model where high-level feature enhancement and the merging of high-level and low-level feature while feature up-sampling. In particular, the procedure of high-level feature enhancement, has been added in the encoder output part. Two self-attention modules have been used to perform position and channel self-attention enhancement respectively. The other part has been added in the up-sampling step of the decoder, the module performing self-attention to enhance the low-level features before concatenating with the high-level feature.
Compared with U-Net, our model not only does not increase much hardware cost, but also improves the ability to detect defects of small regions on object surface. In experiments, we choose the keycap images as target of our surface defect detection. 594 images are picked for training and 65 images for testing. We compared many different network architectures and self-attention modules. The residual symmetric encoder and decoder enhance 1% of recall. The position and channel attention module raise up recall for 4% and 2% respectively. The global attention upsample increase 2% of recall. Finally, the final version of our defect segmentation network achieve 85% MIoU and recall.

關鍵字(中)

★ 瑕疵分割
★ 語義分割
★ 分割
★ U網路
★ 自我注意力
★ 注意力
★ 殘差

關鍵字(英)

★ defect segmentation
★ semantic segmentation
★ segmentation
★ U-Net
★ self-attention
★ attention
★ residual

論文目次

摘要 i
Abstract iii
致謝 v
目錄 vi
圖目錄 viii
表目錄 x
第一章緒論 1
1.1 研究動機 1
1.2 系統架構 2
1.3 論文特色 4
1.4 論文架構 4
第二章相關研究 5
2.1 語義分割 5
2.2 全卷積網路 6
2.3 編碼-解碼架構 9
2.4 多重解析度的目標分析模組 10
2.5 注意力機制 12
第三章自我注意力殘差U網路 14
3.1 U-Net網路架構 14
3.2 編碼器 15
3.3 位置與通道注意力模組 23
3.4 解碼器 36
3.5 損失函數 41
第四章實驗與結果 43
4.1 實驗設備與開發環境 43
4.2 分割網路的訓練 43
4.3 評估準則 47
4.4 實驗與結果 47
第五章結論與未來展望 53
參考文獻 54

參考文獻

[1] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition”, Proc. IEEE, vol.86, no.11, pp.2278-2324, 1998.
[2] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.7-12, 2015, pp.3431-3440.
[3] V. Badrinarayanan, A. Kendall, and R. Cipolla, "Segnet: a deep convolutional encoder-decoder architecture for image segmentation," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.39, no.12, pp.2481-2495, 2017.
[4] C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, “Large kernel matters - improve semantic segmentation by global convolutional network,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul.21-26, 2017, pp.4353-4361.
[5] P. Fischer, O. Ronneberger, and T. Brox, “U-Net: convolutional networks for biomedical image segmentation”, Medical Image Computing and Computer-Assisted Intervention MICCAI 2015, vol. 9351, pp.234-241, 2015.
[6] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Learning a discriminative feature network for semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, Jun.18-23, 2018, pp.1857-1866.
[7] L. Chen, Y. Zhu, G. Papandreou, and F. Schroff, H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proc. of the European Conf. on Computer Vision (ECCV), Munich, DE, Sep.8-14, 2018, pp.801-818.
[8] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556v6.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
[10] X. Wang, R. Girshick, A. Gupta, and K. Hein, “Non-local neural networks,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, Jun.18-23, 2018, pp.7794-7803.
[11] J. Hu, L. Shen, and G . Sun, “Squeeze-and-excitation networks,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, Jun.18-23, 2018, pp.7132-7141.
[12] S. Woo, J. Park, J. Lee, and I. Kweon, “CBAM: convolutional block attention module,” in Proc. of the European Conf. on Computer Vision (ECCV), Munich, DE, Sep.8-14, 2018, pp.3-19.
[13] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H . Lu, “Dual attention network for scene segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, Jun.16-20, 2019, pp.3146-3154.
[14] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” arXiv:1805.08318 [stat.ML].
[15] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille, “Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.40, no.4, pp.834-848, 2018.
[16] L. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv:1706.05587.
[17] A. Giusti, D. Ciresan, J. Masci, L. Gambardella, and J. Schmidhuber, “Fast image scanning with deep max-pooling convolutional neural networks,” in Proc. of IEEE Int. Conf. on Image Processing (ICIP), Melbourne, AU, Sep.15-18, 2013, pp.4034-4038.
[18] G. Papandreou, I. Kokkinos, and P.-A. Savalle, “Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.7-12, 2015, pp.390-399.
[19] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Le-Cun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” in Proc. of International Conference on Learning Representations Conf. (ICLR), Banff, CA, Apr.14-16, 2014.
[20] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul.21-26, 2017, pp.6230-6239.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” arXiv:1406.4729v4.
[22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems 30 (NIPS), Long Beach, CA, Dec.4-9, 2017, pp.6000-6010.
[23] B. Jimmy, M. Volodymyr, and K. Koray, “Multiple object recognition with visual attention,” arXiv:1412.7755.
[24] V. Mnih, N. Heess, and A. Graves et al., “Recurrent models of visual attention,” in Proc. of Neural Information Processing Systems (NIPS), Montreal, CA, Dec.8-13, 2014, pp.2204-2212.
[25] D. Wang, Z. Shen, J. Shao, W. Zhang, X. Xue, and Z. Zhang, “Multiple granularity descriptors for fine-grained categorization,” in Proc. of IEEE Conf. on International Conference on Computer Vision (ICCV), Santiago, Chile, Dec.11-18, 2015, pp.2399-2406.
[26] J. Fu, H. Zheng, and T. Mei, ‘‘Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition,’’ in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, Jul.21-26, 2017, pp. 4476–4484.
[27] S. Jetley, N. A. Lord, N. Lee, and P. H. Torr, ‘‘Learn to pay attention,’’ in Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, CA, Apr.30-May.3, 2018, pp.1-14.
[28] H. Zhao, Y. Zhang, S. Liu, J. Shi, C. C. Loy, D. Lin, and J. Jia, “Psanet: point-wise spatial attention network for scene parsing,” in Proc. of the European Conf. on Computer Vision (ECCV), Munich, DE, Sep.8-14, 2018, pp.267-283.
[29] Y. Yuan and J. Wang, “Ocnet: object context network for scene parsing,” arXiv:1809.00916.
[30] Y. Du, C. Yuan, B. Li, L. Zhao, Y. Li, and W. Hu, ”Interaction-aware spatio-temporal pyramid attention networks for action classification,” in Proc. of the European Conf. on Computer Vision (ECCV), Amsterdam, Netherlands, Oct.8-16, 2016, pp.388-404.
[31] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal, “Context encoding for semantic segmentation,” in Proc. of IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, Jun.18-23, 2018, pp.7151-7160.
[32] H. Qin, W. Chihao, X. Chunyang, W. Ye, Kuo, and C.-C. Jay, “Semantic segmentation with reverse attention,” in Proc. of the British Machine Vision Conference (BMVC), London, UK, Sep.4-7, 2017, pp.1-13.
[33] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proc. of ICML Conf. , Lille, France, Jul.7-9, 2015, vol.37, pp.448-456.
[34] Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv:1505.00853.
[35] X. Xiao, et al., “Weighted res-unet for high-quality retina vessel segmentation,” in Proc. of IEEE Int. Conf. on Information Technology in Medicine and Education (ITME), Hangzhou, PRC, Oct.19-21, 2018, pp. 327-331.
[36] O. Oktay, J. Schlemper, L. L. Folgoc, et al., “Attention u-net: learning where to look for the pancreas,” arXiv:1804.03999.
[37] J. He, Z. Deng, L. Zhou, Y. Wang, and Y. Qiao, “Adaptive pyramid context network for semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, Jun.16-20, 2019, pp.7519-7528.
[38] J. Konig, M. D. Jenkins, P. Barrie, M. Mannion, and G. Morison, ‘‘A convolutional neural network for pavement surface crack segmentation using residual connections and attention gating,’’ in Proc. of IEEE Int. Conf. on Image Processing (ICIP), Taipei, ROC, Sep.22-25, 2019, pp. 1460-1464.
[39] C. Kaul, S. Manandhar, and N. Pears, ‘‘FocusNet: an attention-based fully convolutional network for medical image segmentation,’’ in Proc. IEEE 16th Int. Symposium on Biomedical Imaging (ISBI), Hilton Molino Stucky, Venice, Italy, Apr.8-11, 2019, pp.455-458.
[40] H. Li, P. Xiong, J. An, and L. Wang, “Pyramid attention network for semantic segmentation,” arXiv:1805.10180.
[41] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980.

指導教授

曾定章(Din-Chang Tseng)

審核日期

2020-7-28

推文