基於深度學習之工業用智慧型機器視覺系統：以文字定位與辨識為例

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：31

、訪客IP：18.116.15.98

姓名

楊凱霖(Yang, Kai-Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於深度學習之工業用智慧型機器視覺系統：以文字定位與辨識為例
(An Industrial AI Vision System based on Deep Learning : A Case Study of Industrial Text Localization and Recognition)

相關論文

★ 基於虹膜色彩空間的極端學習機的多類型頭痛分類	★ 以多分數加權融合方式進行虹膜影像品質檢定
★ 基於深度學習的即時血壓估測演算法	★ 基於深度學習之工業用智慧型機器視覺系統:以焊點品質檢測為例
★ 基於pix2pix深度學習模型之條件式虹膜影像生成架構	★ 以核方法化的相關濾波器之物件追蹤方法實作眼動儀系統
★ 雷射都普勒血流原型機之驗證與校正	★ 以生成對抗式網路產生特定目的影像—以虹膜影像為例
★ 一種基於Faster R-CNN的快速虹膜切割演算法	★ 運用深度學習、支持向量機及教導學習型最佳化分類糖尿病視網膜病變症狀
★ 應用卷積神經網路的虹膜遮罩預估	★ Collaborative Drama-based EFL Learning with Mobile Technology Support in Familiar Context
★ 可用於自動訓練深度學習網路的網頁服務	★ 基於深度學習方法之高精確度瞳孔放大片偵測演算法
★ 基於CNN方法之真假人臉識別模型	★ 深度學習基礎模型與自監督學習

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

光學影像上文字定位與辨識的應用相當廣泛，例如：辨識生產日期、產品料號和藥物編號等…。若要辨識影像上的文字，則首先定位出文字的邊界框，之後在對邊界框內的文字進行辨識。

而若要在深度學習的方法下得到非常準確以及穩健的結果，則往往需要非常大量的資料作進行網路模型的訓練；另外在深度學習進行訓練以及測試前提下，需要對影像做預處理如：影像的裁切、影像的縮放與轉正、影像的標記以及利用影像處理的方法增加影像的數量等…。然而影像的預處理是一件非常耗費時間與精力的工作，所以為了能夠只需要少量資料，而得到很好的準確率以及穩健性的目標，本篇論文利用了遷移學習的方法。除了在預訓練模型需要耗費大量資料與時間之外，對於再訓練模型的後續應用上，能夠以少量的文字影像資料，使得測試準確度可達到95% 以上的水準。

摘要(英)

The application of text detection and recognition on optical images is quite extensive. For example, recognition of production date, product part number and drug number, etc... To recognize the text on an image, one has to first detect the bounding box of the text, and then perform the text recognition for the localized image.

However, in order to get a very accurate and robust results under deep learning method, huge amount of data is indispensable for the training of the network model. In addition, before training and testing a deep learning model, it is important to preprocess the image, such as image cropping, scaling and rotating… etc. Data augmentation, which is an approach to increase the number of images, is also important. However, image preprocessing is a very time-consuming and tedious work. In this research, transfer learning is applied to achieve the goal of deep learning training using a small amount of data and get a model with a good accuracy and robustness. In addition to the large amount of data and time required in pre-training a model, the subsequent retrained model can achieve an accuracy higher than 95% in a small amount of text image data.

關鍵字(中)

★ 深度學習
★ 機器視覺

關鍵字(英)

★ Deep Learning
★ Computer Vision

論文目次

中文摘要 i
英文摘要 ii
致謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
一、緒論 1
1-1前言 1
1-2 工業與傳統的OCR差異 2
1-3 論文目的 3
1-4 論文架構 4
二、文獻回顧 5
2-1 物件偵測網路介紹 5
2-1-1 Faster R-CNN 7
2-1-2 YOLO 10
2-1-3 SSD 11
2-2 文字定位網路介紹 12
2-2-1 TextBoxes與TextBoxes++ 13
2-3 文字辨識網路介紹 14
2-3-1 CRNN 14
2-4 端到端文字定位與文字辨識網路介紹 15
2-4-1 Deep TextSpotter 15
三、方法說明 16
3-1 方法架構 16
3-1-1 文字定位 16
3-1-2 文字辨識 17
3-2 損失函數 18
四、實驗結果 20
4-1 基於字元的方法 20
4-1-1 字元定位 20
4-1-2 字元辨識 23
4-2 端到端的方法 26
4-2-1 網路預訓練 27
4-2-2 網路再訓練 28
五、結論與未來展望 35
5-1 結論 35
5-2 未來展望 36
六、參考文獻 37

參考文獻

[1] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, JAN 2016.
[2] R. Girshick, “Fast R-CNN,” 2015 IEEE International Conference on Computer Vision, pp. 1440-1448, DEC 2015.
[3] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154-171, SEP 2013.
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
[5] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR.
[6] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. SSD: Single shot multibox detector. In ECCV, 2016.
[7] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Textboxes: A fast text detector with a single deep neural network,” in Proc. AAAI, 2017, pp. 4161–4167.
[8] M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “TextBoxes++: A Single-Shot Oriented Scene Text Detector,” in Proc. IEEE, 2018.
[9] B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE TPAMI, vol. 39, no. 11, pp. 2298–2304, 2017.
[10] Sepp Hochreiter and Jurgen Schmidhuber. “Long Short- Term Memory,” Neural Computation, 9(8):1735–1780, November 1997. ISSN 0899-7667.
[11] Bušta, M., Neumann, L., Matas, J.: Deep TextSpotter: An end-to-end trainable scene text localization and recognition framework. In: Computer Vision (ICCV), 2017 IEEE International Conference on. pp. 2223–2231. IEEE (2017)
[12] J. Redmon and A. Farhadi. Yolo9000: Better, faster, stronger. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 6517–6525. IEEE, 2017.
[13] Michal Bušta, Yash Patel and Jiri Matas,“E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text,” arXiv preprint arXiv: 1801.09919, 2018.
[14] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
[15] Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)

[16] J. Yu, Y. Jiang, Z. Wang, Z. Cao, and T. Huang. Unitbox: An advanced object detection network. In Proceedings of the 2016 ACM on Multimedia Conference, pages 516–520. ACM, 2016.
[17] Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3DV (2016)
[18] A. Graves, S. Fern´andez, F. J. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML, 2006.
[19] Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR (2016)
[20] Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
[21] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” International Conference on Learning Representations, APR 2015.
[22] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014.
[23] Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: An efficient and accurate scene text detector. CVPR (2017)
[24] M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial transformer networks. In Advances in Neural Information Processing Systems, pages 2017–2025, 2015.
[25] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane,´ R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. War- ´ den, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
[26] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
[27] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
[28] M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. mobile networks for classification, detection and segmentation. CoRR, abs/1801.04381, 2018.
[29] Francois Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357, 2016.
[30] G. Huang, Z. Liu, K. Q. Weinberger, and L. Maaten. Densely connected convolutional networks. In CVPR, 2017.

指導教授

栗永徽(Yung-Hui Li)

審核日期

2019-7-26

推文