自我注意力殘差U網路的物體表面瑕疵分割;Self-attention residual U-Net for surface defect segmentation

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/84005

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/84005

Title:	自我注意力殘差U網路的物體表面瑕疵分割;Self-attention residual U-Net for surface defect segmentation
Authors:	莊華澎;Chuang, Hua-Peng
Contributors:	資訊工程學系
Keywords:	瑕疵分割;語義分割;分割;U網路;自我注意力;注意力;殘差;defect segmentation;semantic segmentation;segmentation;U-Net;self-attention;attention;residual
Date:	2020-07-28
Issue Date:	2020-09-02 17:53:52 (UTC+8)
Publisher:	國立中央大學
Abstract:	利用機器視覺檢測產品的瑕疵是一個被廣泛討論許久的問題，特點是能提高生產效率及自動化程度。在一些不適合人工作業的環境中也能代替人眼檢測，除了提高效率，也不會如人眼因長時間使用而產生視覺疲勞，更能在大量重複性的作業中維持較好的效率與品質。對於電腦鍵盤之鍵帽瑕疵偵測的議題，我們採取深度學習模式的演算法。深度學習本身是模仿人類神經網路的構造層層搭建。在經過訓練前，裡頭的參數都是隨機初始化的值。需要大量的訓練資料和適當定義的誤差函數作為網路學習的依據，在不斷的迭代訓練中調整參數，最後得到一個能夠依照輸入給出期望推論結果的網路模型。本研究目標是以語義分割 (semantic segmentation) 網路模式找出瑕疵區塊，我們採用對稱的U-Net模型進行修改。修改的核心是調整網路編碼與解碼的網路連接方式與卷積層數，並在網路中加入自我注意力 (self-attention) 機制，更進一步提高網路的學習效果。編碼解碼修改的方式是依然固定對稱的結構，使用深層卷積神經網路中經常被使用的殘差 (residual) 區塊，並衡量效率調整卷積層數。自我注意力機制分別運用在網路深層的高階特徵強化和上採樣融合低階特徵兩個地方。對於高階特徵強化的部份，具體而言是在編碼器輸出特徵圖後，運用了兩個自我注意力模組分別對其做了位置與通道的自我注意力強化。融合低階特徵強化則是在解碼器中，高低階特徵合併前先做一次自我注意力強化後才並聯。修改後的網路相較於U-Net，除了不會增加太多的硬體成本之外，還能有效提高原本U-Net在細小區塊的物體表面瑕疵上之分割能力。實驗中使用的資料集為同款鍵盤上的659張鍵帽影像。我們將其中的594張和剩餘的65張分別做為訓練集與測試集，並使用資料擴增方法將訓練集的樣本總數提高至4752張。針對模型方面，我們比較了多種不同網路架構及自我注意力機制。由殘差區塊組成之對稱的編碼器和解碼器提升了1%的召回率 (recall)；連接在編碼器之後的位置與通道注意力模組分別提升了4%及2%的召回率；於上採樣融合高低階特徵階段加入的全域注意力上採樣提升了2%的召回率。最終版本的瑕疵分割網路結合上述所有部分，在實驗集上獲得85%的MIoU以及召回率。;The use of methods based on computer vision to detect product defects is an issue that has been widely discussed for a long time. It is characterized by the efficiency of production and the degree of automation. Furthermore, it can also replace human eye detection in some environments that are not suitable for manual work. In addition to improving efficiency, it will not cause visual fatigue as human eye after a long time using. It can maintain better efficiency and quality in a long-time repetitive works. To solve the problem of detecting tiny defects on object surface, we adopt the deep learning methods. Deep learning is one kind of algorithm that imitates the structure of neural networks. Before training, parameters in the model are all randomly initialized values. It requires a large amount of training data and a fine-defined loss function to make network learning. Step by step, adjusts the parameters in every iterative training, and finally obtains a model that can give the expected outputs according to the input. The goal of this research is to find the defective region on object surface using semantic segmentation. We propose our symmetric model based on U-Net. The main ideas to make encoder and decoder stronger, and add self-attention mechanism to the model to further improve the learning effect. We still keep the model in a symmetric structure, but use the residual blocks as a replacement of continuous convolutional computing without skip connection. More, we adjust the number of convolutional layers to fit the task of detecting tiny defects on object surface. Self-attention mechanism is applied in two part of our model where high-level feature enhancement and the merging of high-level and low-level feature while feature up-sampling. In particular, the procedure of high-level feature enhancement, has been added in the encoder output part. Two self-attention modules have been used to perform position and channel self-attention enhancement respectively. The other part has been added in the up-sampling step of the decoder, the module performing self-attention to enhance the low-level features before concatenating with the high-level feature. Compared with U-Net, our model not only does not increase much hardware cost, but also improves the ability to detect defects of small regions on object surface. In experiments, we choose the keycap images as target of our surface defect detection. 594 images are picked for training and 65 images for testing. We compared many different network architectures and self-attention modules. The residual symmetric encoder and decoder enhance 1% of recall. The position and channel attention module raise up recall for 4% and 2% respectively. The global attention upsample increase 2% of recall. Finally, the final version of our defect segmentation network achieve 85% MIoU and recall.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	130	View/Open

社群 sharing

Loading...