印刷電路板的瑕疵辨識之深度學習系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：62

、訪客IP：3.144.84.155

姓名

吳芷芳(Zhi-Fang Wu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

印刷電路板的瑕疵辨識之深度學習系統
(Defect Recognition on Printed Circuit Boards using A Deep Learning System)

相關論文

★ 適用於大面積及場景轉換的視訊錯誤隱藏法	★ 虛擬觸覺系統中的力回饋修正與展現
★ 多頻譜衛星影像融合與紅外線影像合成	★ 腹腔鏡膽囊切除手術模擬系統
★ 飛行模擬系統中的動態載入式多重解析度地形模塑	★ 以凌波為基礎的多重解析度地形模塑與貼圖
★ 多重解析度光流分析與深度計算	★ 體積守恆的變形模塑應用於腹腔鏡手術模擬
★ 互動式多重解析度模型編輯技術	★ 以小波轉換為基礎的多重解析度邊線追蹤技術(Wavelet-based multiresolution edge tracking for edge detection)
★ 基於二次式誤差及屬性準則的多重解析度模塑	★ 以整數小波轉換及灰色理論為基礎的漸進式影像壓縮
★ 建立在動態載入多重解析度地形模塑的戰術模擬	★ 以多階分割的空間關係做人臉偵測與特徵擷取
★ 以小波轉換為基礎的影像浮水印與壓縮	★ 外觀守恆及視點相關的多重解析度模塑

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-8-2以後開放)

摘要(中)

印刷電路板 (printed circuit board, PCB) 被稱為電子系統產品之母，是因為它可以嵌入各種電子組件，控制軟體和硬體之間的交流，是大多數電子產品必要的元件之一，近兩年興起遠距辦公和教學，智慧型產品需求上升，面對逐年增長的電子產品上游零件需求，如何提升產品良率是廠商重視的議題之一。傳統印刷電路板製程中的瑕疵檢測是透過專業人工，隨著深度學習領域的發展，廠商開始在產線中加入自動光學檢測 (automated optical inspection, AOI) 和自動視覺檢測 (automated visual inspection, AVI) 的技術，降低生產成本並提升產品良率，但是自動檢測儀器會因為拍攝影像的色差或角度等因素，導致印刷電路板瑕疵的誤判，需要人工加以篩選，如果以影像分類網路來辨識自動檢測儀偵測到的瑕疵影像是否為真的瑕疵，可以更進一步降低瑕疵漏撿率。
近幾年來，許多結合卷積神經網路與 Transformer 的網路架構被應用於分類任務，希望能夠同時保有卷積擷取特徵的能力與自我注意力 (self-attention) 學習特徵之間關聯性的能力，我們用這種類型網路架構輸出的特徵圖作為特徵金字塔網路 (feature pyramid network, FPN) 的輸入，相較於傳統卷積神經網路，讓特徵金字塔網路有更好的特徵圖作高低階特徵融合與分類預測。
在本研究中，採用 CoAtNet-0 作為骨幹網路，這是一種結合Transformer和卷積神經網路的架構，主要修改內容包含：i. 卷積區塊加入注意力模組；ii. 加入特徵金字塔網路，通過特徵融合，讓高階特徵幫助低階特徵更穩定，模型在預測時有更豐富的空間資訊，加強網路辨識較小或是不明顯瑕疵特徵的能力；此外，我們還研究了不同注意力模組在骨幹網路的效能，不同的高低階特徵融合方法對網路效能的影響，以及測試修改損失函數類別權重值大小，改善樣本數量不平均的問題。
在實驗中，我們共收集了105,093張印刷電路板的影像資料，其中正常類別有61,671張，瑕疵類別有43,422張。正常類別分為訓練樣本55,016張，驗證樣本6,655張；瑕疵類別分為訓練樣本38,659張，驗證樣本4,763張。實驗結果顯示，原始 CoAtNet-0 的驗證集精確率為 99.264%，召回率為 99.055%，準確率 (accuracy) 為 99.299%；經過本研究修改網路架構與調整訓練參數，最終取得驗證集精確率為 99.140%，召回率為 99.265%，準確率為99.334% 的成果。

摘要(英)

The printed circuit board (PCB) is known as the mother of electronic system products. PCB can be embedded various electronic components and is one of the necessary components of most electronic products. In the past two years, work from home and distance learning has become more and more popular, leading demand for smart products rise up. The demand for PCB is very high and still growing, how to improve product yield is one of the issues that manufacturers pay attention to. Defect detection in the traditional PCB manufacturing process relies on manual inspection. With the development of deep learning, manufacturers have switched to automatic optical inspection (AOI) and automatic visual inspection (AVI) for defect detection, reducing production costs and improving product yield. But automatic inspection instruments will cause misjudgment of PCB defects due to factors such as image chromatic aberration or angle, which needs additional manual screening.
In recent years, many network architectures combining convolutional neural networks (CNN) and Transformers have been applied to classification tasks. This type of architecture can retain the advantages of both CNN and Transformer. We use the feature map output by this type of network architecture as the input of the feature pyramid network (FPN), which makes FPN have better feature maps than only CNN.
In our experiment, we use CoAtNet-0 as the backbone network, which architecture is based on Transformer and CNN. The modifications include: i. Adding attention module at depthwise convolution block; ii. Adding feature pyramid network (FPN), by multiple resolutions, the model can learn more feature information to improve performance. In addition, we compare the performance of adding different attention modules into CoAtNet-0, the impact of different feature fusion methods on FPN performance, and modified different class weight of loss funciton to solve the problem of unbalance class data size.
In the experiment, we collected 105,093 images of PCBs, including 61,671 in the normal category and 43,422 in the defect category. Normal images are divided into 55,016 training samples and 6,655 verification samples; defective images are divided into 38,659 training samples and 4,763 verification samples. The experimental results show that the precision of the validation set of the original CoAtNet-0 is 99.264%, the recall is 99.055%, the accuracy is 99.299%. After modifying the network architecture and adjusting the training parameters in this study, the final precision of the validation set reached 99.140%, the recall reached 99.268%, the accuracy reached 99.334%.

關鍵字(中)

★ 深度學習
★ 分類網路
★ 瑕疵辨識

關鍵字(英)

★ deep learning
★ classification network
★ defect recognition

論文目次

摘要 ii
Abstract iv
致謝 vi
目錄 vii
圖目錄 ix
表目錄 xi
第一章緒論 1
1.1 研究動機 1
1.2 系統概述 3
1.3 系統特色 5
1.4 論文架構 5
第二章相關研究 6
2.1 卷積神經網路分類系統發展 6
2.2 Transformer在電腦視覺的發展 9
2.3 卷積網路與 Transformer 13
2.4 注意力機制 14
第三章 CoAtNet-0 網路的修改 17
3.1 CoAtNet-0 架構介紹 17
3.2 CoAtNet-0 中加入壓縮與激發區塊 (SE) 24
3.3 特徵金字塔網路 26
3.4 反卷積放大特徵圖 28
3.5 損失函數 28
3.6 分類網路可視化 29
第四章實驗 31
4.1 實驗設備與開發環境 31
4.2 影像二分類網路訓練 31
4.3 評估準則 33
4.4 實驗結果 35
第五章結論與未來展望 49
參考文獻 50

參考文獻

[1] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems (NIPS), Harrahs and Harveys, Lake Tahoe, NV, Dec.3-8, 2012, pp.1106-1114.
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv:1706.03762.
[3] Z. Dai, H. Liu, Q. V. Le, and M. Tan, “CoAtNet: marrying convolution and attention for all data sizes,” arXiv:2106.04803.
[4] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: efficient convolutional neural networks for mobile vision applications,” arXiv:1704.04861.
[5] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” arXiv: 1709.01507v4.
[6] T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, Jun.21-26, 2017, pp. 2117-2125.
[7] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556.
[8] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” arXiv:1409.4842.
[9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385.
[10] G. Huang, Z. Liu, L. Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” arXiv:1608.06993.
[11] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: transformers for image recognition at scale,” arXiv:2010.11929.
[12] H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang, “CvT: introducing convolutions to vision transformers,” arXiv:2103.15808.
[13] L. Yuan, Q. Hou, Z. Jiang, J. Feng, and S. Yan, “ VOLO: vision outlooker for visual recognition,” arXiv:2106.13112
[14] S. Woo, J. Park, J.-Y. Lee, and I. Kweon, “CBAM: convolutional block attention module,” arXiv:1807.06521v2.
[15] Y. Liu, Z. Shao, Y. Teng, and N. Hoffmann, “NAM: normalization-based attention module,” arXiv:2111.12419v1.
[16] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: inverted residuals and linear bottlenecks,” arXiv:1801.04381v4.
[17] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, Oct.22-29, 2017, pp.843-852.
[18] D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELU),” arXiv:1606.08415v4.
[19] A. F. Agarap, “Deep learning using rectified linear units (ReLU),” arXiv:1803.08375v2.
[20] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for mobilenetv3.” arXiv:1905.02244.
[21] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus, “Deconvolutional networks,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, Jun.13-18, 2010, pp.2528-2535.
[22] Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” in Proc. of Neural Information Processing Systems (NIPS), Palais des Congrès de Montréal, Montréal, Canada, Dec.2-8, 2018, pp.8778-8788.
[23] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” arXiv:1512.04150.
[24] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: visual explanations from deep networks via gradient-based localization,” arXiv:1610.02391.
[25] K. Zuiderveld, “Contrast limited adaptive histogram equalization,” in Graphics Gems, Academic Press, Amsterdam, 1994, Ch.5, pp.474-485.
[26] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv:1711.05101v3.
[27] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980v9.

[28] M. J. Zhao, N. Edakunni, A. Pocock, and G. Brown, “Beyond Fano’s inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications,” Journal of Machine Learning Research, 2013, pp.1033-1090.
[29] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “ImageNet: a large-scale hierarchical image database,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Miami, FL, Jun.20-25, 2009, pp.248-255.
[30] T. Ridnik, E. B. Baruch, A. Noy, and L. Z. Manor, “Imagenet-21k pretraining for the masses,” arXiv:2104.10972.
[31] P. Shaw, J. Uszkoreit, and A. Vaswani, “Self-attention with relative position repre sentations,” arXiv:1803.02155.
[32] Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, “Transformer-xl: attentive language models beyond a fixed-length context,” arXiv:1901.02860.
[33] P. Ramachandran, N. Parmar, A. Vaswani, I. Bello, A. Levskaya, and J. Shlens, “Stand-alone self-attention in vision models,” arXiv:1906.05909.
[34] Y.-H. H. Tsai, S. Bai, M. Yamada, L.-P. Morency, and R. Salakhutdinov, “Transformer dissection: a unified understanding of transformer’s attention via the lens of kernel,” arXiv:1908.11775.
[35] B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze, “Levit: a vision transformer in convnet’s clothing for faster inference,” arXiv:2104.01136.
[36] L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, F. E. H. Tay, J. Feng, and S. Yan, “Tokens-to-token ViT: training vision transformers from scratch on imagenet,” arXiv:2101.11986.
[37] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805.
[38] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell, “Language models are few-shot learners,” arXiv:2005.14165.

指導教授

曾定章(Din-Chang Tseng)

審核日期

2022-8-13

推文