電子元件之深度影像轉移與修補的深度學習系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：41

、訪客IP：18.221.124.226

姓名

張晏慈(Yen-Tzu Chang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

電子元件之深度影像轉移與修補的深度學習系統
(Depth Image Translation and Completion for Electronic Components using Deep Learning System)

相關論文

★ 適用於大面積及場景轉換的視訊錯誤隱藏法	★ 虛擬觸覺系統中的力回饋修正與展現
★ 多頻譜衛星影像融合與紅外線影像合成	★ 腹腔鏡膽囊切除手術模擬系統
★ 飛行模擬系統中的動態載入式多重解析度地形模塑	★ 以凌波為基礎的多重解析度地形模塑與貼圖
★ 多重解析度光流分析與深度計算	★ 體積守恆的變形模塑應用於腹腔鏡手術模擬
★ 互動式多重解析度模型編輯技術	★ 以小波轉換為基礎的多重解析度邊線追蹤技術(Wavelet-based multiresolution edge tracking for edge detection)
★ 基於二次式誤差及屬性準則的多重解析度模塑	★ 以整數小波轉換及灰色理論為基礎的漸進式影像壓縮
★ 建立在動態載入多重解析度地形模塑的戰術模擬	★ 以多階分割的空間關係做人臉偵測與特徵擷取
★ 以小波轉換為基礎的影像浮水印與壓縮	★ 外觀守恆及視點相關的多重解析度模塑

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-7-1以後開放)

摘要(中)

許多精密的機械零件及電子元件都需要透過三維資訊做精密的檢測，但非接觸性的視覺量測存在一些不完全的缺陷，若將量測結果應用於瑕疵檢測或模型重建上，會有遺漏或誤判的結果，因此深度資料的修補就成為一個重要議題。
對於印刷電路板上的電子元件之深度影像修補議題，我們將發展一個監督式 (supervised) 的深度學習 (deep learning) 系統。深度學習是模擬人類神經元的構造及運作機制，藉由大量的訓練資料作為卷積神經網路 (convolutional neural network, CNN) 學習的依據，從訓練資料與誤差函數中學習如何調整修正網路中的記憶參數，最後得到一個完整的網路模型，可依照輸入樣本給出期望的推論結果。
本研究目標是以一個網路架構來達成深度影像 (depth image) 修補 (completion) 與轉移 (translation) 兩種不同意義的應用任務。修補技術的修復品質佳，但訓練資料取得較不容易；至少需要瑕疵深度影像及完美深度影像；要得到更好結果，還需要瑕疵彩色影像。轉移技術的修復品質較差，但訓練資料只需要瑕疵彩色影像及完美深度影像。由於深度學習系統需要龐大的訓練資料，收集資料著實不易；若能依據手邊可取得的影像資料來選擇修補或轉移模式的話，能減少收集資料的成本並增加使用的彈性。由於兩類型任務的輸入影像張數及內容不一樣，為求一致，網路輸入端做了一些變動；對於不需要輸入的影像就以全黑影像代替。
我們的網路是從輕量級RefineNet模型修改而來的；修改的主要內容包括：i.使用編碼/解碼的網路架構，ii.編碼使用有效率網路 (EfficientNet) 作為骨幹，解碼使用輕量級RefineNet的模組並加入前置激活殘差模組 (pre-activation residual block, PRB)，iii.在解碼上採樣中使用反卷積 (deconvolution) 取代內插法，iv.調整網路編碼與解碼之間的複雜度，並在解碼中加入適應性殘差位置注意力 (residual spatial attention) 機制，更進一步提升網路的修補能力。
在實驗中，我們使用印刷電路板上的電子元件影像作訓練及測試，包含瑕疵深度影像、完美深度影像、及瑕疵彩色影像。修改後的卷積神經網路架構相較於輕量級RefineNet，能有效提高原本輕量級RefineNet對於缺失區塊修復的能力，深度影像的平均絕對誤差減少了35 %。針對深度修補及轉移的運作模式，修補模式平均絕對誤差為1.39灰階值，而轉移模式平均絕對誤差為3.04灰階值。

摘要(英)

Many high-precise mechanical parts and electronic components need to be precisely inspected based on 3D information. However, non-contact visual inspection has some problems of incomplete 3D data. If the incomplete data are applied for defect detection or 3D model reconstruction, there would result in detection loss or false construction; thus, the depth image completion becomes an important issue currently.
To the issue of depth image completion for electronic components on printed circuit boards, we will develop a supervised deep learning system. Deep learning is simulating the structure and operation mechanism of human neurons. Based on the training of a large amount of data, the convolutional neural network (CNN) can show off excellent ability and give an expected inference result according to the input data.
The goal of this studying is using a CNN to achieve two different-meaning application tasks: depth image completion and translation. Depth image completion has a better repair quality, but needs more kinds of training data; flawed depth images and the related perfect depth images are necessary; if we want to obtain better results, flawed color images are needed. The repair quality from translation is limited, but only flawed color image and the perfect depth image are needed in training. In general, a deep learning system requires a huge amount of training data and the data need consuming large manpower to collect. If the completion or translation mode can be alternatively selected based on the available data at hand, it will reduce cost and enhance the flexibility of usage. The number and contents of the input images for the two different-meaning tasks are different, to simplify the input structure of the proposed CNN, we also need to input a full-black image for the unnecessary image.
The proposed CNN is modified from the lightweight RefineNet. The main modifications include: (1) using the encoding/decoding architecture; (2) using EfficientNet as the backbone of encoder; using lightweight RefineNet modules and pre-activation residual block as the decoder; (3) using deconvolution operations in up-sampling to replace interpolation in decoder; (4) adjusting the complexity between encoder and decoder, and adding an adaptive residual spatial attention module to the decoder for further improving the ability of completion.
In this experiment, we used images of electronic components on printed circuit board as training data, including flawed depth images, perfect depth images, and flawed color images. The proposed CNN can effectively repair missing parts; comparing with the lightweight RefineNet, the average absolute error of the depth image is reduced by 35 %. For completion and translation tasks, the mean absolute errors (MAE) of the completion and translation modes are 1.39 and 3.04 grayscale values, respectively.

關鍵字(中)

★ 深度學習
★ 深度影像修補
★ 深度影像轉移

關鍵字(英)

★ deep learning
★ depth image completion
★ depth image translation

論文目次

摘要 ii
Abstract iv
致謝 vi
目錄 vii
圖目錄 ix
表目錄 xi
第一章緒論 1
1.1 研究動機 1
1.2 系統架構 4
1.3 論文特色 8
1.4 論文架構 8
第二章相關研究 9
2.1 光學深度量測 9
2.2 彩色影像修補 13
2.3 深度影像修補 14
2.4 深度影像修補及轉移系統 21
2.5 密集標記與語意分割 22
2.6 注意力 25
第三章深度修補及轉移網路 29
3.1 輕量級 RefineNet 29
3.2 整體的深度影像修補及轉移網路架構 31
3.3 編碼器 EfficientNet 31
3.4 解碼器 36
3.5 注意力機制 43
3.6 損失函數 46
第四章實驗與結果 48
4.1 實驗設備與開發環境 48
4.2 深度影像修補與轉移網路的細節 48
4.3 評估準則 52
4.4 實驗與結果 53
第五章結論與未來展望 60
參考文獻 61

參考文獻

[1] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, "Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer," arXiv:1907.01341v3.
[2] Y. Zhang and T. Funkhouser, "Deep depth completion of a single RGB-D image," arXiv:1803.09326.
[3] Y. Huang, T. Wu, Y. Liu, and W. Hsu, "Indoor depth completion with boundary consistency and self-attention," arXiv:1908.08344.
[4] D. Senushkin, M. Romanov, I. Belikov, A. Konushin, and N. Patakin, "Decoder modulation for indoor depth completion," arXiv:2005.08607.
[5] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv:1409.1556v6.
[6] P. Fischer, O. Ronneberger, and T. Brox, "U-Net: convolutional networks for biomedical image segmentation," arXiv:1505.04597v1.
[7] V. Nekrasov, C. Shen, and I. Reid, "Light-weight refineNet for real-time semantic segmentation," arXiv:1810.03272v1.
[8] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional neural networks," in Proc. of ECCV Conf., Zurich, Switzerland, Sep.6-12, 2014, pp.818-833.
[9] M. Tan and Q. Le, "EfficientNet: rethinking model scaling for convolutional neural networks," arXiv:1905.11946.
[10] T.-Y. Lin, P. Doll?r, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," arXiv:1612.03144.
[11] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus, "Deconvolutional networks," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, Jun.13-18, 2010, pp.2528-2535.
[12] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.-N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," arXiv:1706.03762v5.
[13] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, "Self-attention generative adversarial networks," arXiv:1805.08318v2.
[14] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, "Squeeze-and-excitation networks," arXiv:1709.01507v4.
[15] S. Woo, J. Park, J.-Y. Lee, and I. Kweon, "CBAM: convolutional block attention module," arXiv:1807.06521v2.
[16] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, "Dual attention network for scene segmentation," arXiv:1809.02983v4.
[17] Y. Chen, Y. Kalantidis, J. Li, S. Yan, and J. Feng, "A2-Nets: Double attention networks," arXiv:1810.11579v1.
[18] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, "ECA-Net: efficient channel attention for deep convolutional neural networks," arXiv:1910.03151v4.
[19] B. Niu, W. Wen, W. Ren, X. Zhang, L. Yang, S. Wang, K. Zhang, X. Cao, and H. Shen, "Single image super-resolution via a holistic attention network," arXiv:2008.08767v1.
[20] M. Bertalmio, G. Sapiro, V. Caselles, and C. Ballester, "Image inpainting," in Proc. 27th Int. Conf. on Computer Graphics and Interactive Techniques, New Orleans, USA, 2000, pp.417-424.
[21] C. Ballester, M. Bertalmio, V. Caselles, G. Sapiro, and J. Verdera, "Filling-in by joint interpolation of vector fields and gray levels," in Proc. of IEEE Trans. on Image Processing, vol.10, no.8, pp.1200-1211, 2001.
[22] A. Telea, "An image inpainting technique based on the fast marching method," Journal of Graphics Tools, vol.9, no.1, pp.25-36, Jan. 2004.
[23] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman, "Patchmatch: a randomized correspondence algorithm for structural image editing," ACM Trans. on Graphics (TOG), vol.28, no.3, 2009.
[24] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," arXiv:1406.2661.
[25] Y. Li, S. Liu, J. Yang, and M.-H. Yang, "Generative face completion," arXiv:1704.05838.
[26] S. Iizuka, E. Simo-Serra, and H. Ishikawa, "Globally and locally consistent image completion," ACM Trans. on Graphics (TOG), vol.36, no.4, 2017.
[27] J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox, and A. Geiger, "Sparsity invariant CNNs," arXiv:1708.06500v2.
[28] S. Shivakumar, T. Nguyen, I. Miller, S. Chen, V. Kumar, and C. Taylor, "Dfusenet: deep fusion of RGB and sparse depth information for image guided dense depth completion," arXiv:1902.00761v2.
[29] J. Park, K. Joo, Z. Hu, C. Liu, and I. Kweon, "Non-local spatial propagation network for depth completion," arXiv:2007.10042v1.
[30] Y. Zhang and T. Funkhouser, "Deep depth completion of a single RGB-D image," arXiv:1803.09326.
[31] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," arXiv:1406.4729v4.
[32] T. Park, M. Liu, T. Wang, and J. Zhu, "Semantic image synthesis with spatially-adaptive normalization," arXiv:1903.07291.
[33] D. Ulyanov, A. Vedaldi, and V. Lempitsky, "Instance normalization: the missing ingredient for fast stylization," arXiv:1607.08022v3.
[34] S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," arXiv:1502.03167v3.
[35] C. Zhao, Q. Sun, C. Zhang, Y. Tang, and F. Qian, "Monocular depth estimation based on deep learning: an overview," arXiv:2003.06620v2.
[36] K. Lore, K. Reddy, M. Giering, and E. Bernal, "Generative adversarial networks for depth map estimation from rgb video," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, Jun.18-22, 2018, pp.1177-1185.
[37] P. Chakravarty, P. Narayanan, and T. Roussel, "GEN-SLAM: generative modeling for monocular simultaneous localization and mapping," arXiv:1902.02086.
[38] D. Wofk, F. Ma, T. Yang, S. Karaman, and V. Sze, "FastDepth: fast monocular depth estimation on embedded systems," arXiv:1903.03273.
[39] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network," arXiv:1612.01105v2.
[40] G. Lin, A. Milan, C. Shen, and I. Reid, "RefineNet: multi-path refinement networks for high-resolution semantic segmentation," arXiv:1611.06612.
[41] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," arXiv:1512.03385v1.
[42] J. Yu, Z. Lin, J. Yan, X. Shen, X. Lu, and T. S. Huang, "Generative image inpainting with contextual attention," arXiv:1801.07892.
[43] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. Le, "MnasNet: platform-aware neural architecture search for mobile," arXiv:1807.11626v3.
[44] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen, "MobileNet v2: inverted residuals and linearbottlenecks, arXiv:1801.04381.
[45] K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," arXiv:1603.05027v3, also in Proc. of ECCV Conf., Amsterdam, The Netherlands, Oct.11-14, 2016, pp.630-645.
[46] A. Agarap, "Deep learning using rectified linear units (ReLU)," arXiv:1803.08375v2.
[47] D. P. Kingma and J. Ba, "Adam: a method for stochastic optimization," arXiv:1412.6980.
[48] Z. Zhang, T. He, H. Zhang, Z. Zhang, J. Xie, and M. Li, "Bag of freebies for training object detection neural networks." arXiv:1902.04103v3.
[49] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, "Image quality assessment: from error visibility to structural similarity," in Proc. of IEEE Trans. on Image Processing, vol.13, no.4, pp.600-612, 2004.

指導教授

曾定章(Din-Chang Tseng)

審核日期

2021-7-29

推文