強化低階特徵擷取能力的小數據光學字元偵測與辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：47

、訪客IP：18.189.192.107

姓名

吳冠毅(Guan-Yi Wu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

強化低階特徵擷取能力的小數據光學字元偵測與辨識
(Enhancing low-level feature extraction for small data optical character detection and recognition)

相關論文

★ 適用於大面積及場景轉換的視訊錯誤隱藏法	★ 虛擬觸覺系統中的力回饋修正與展現
★ 多頻譜衛星影像融合與紅外線影像合成	★ 腹腔鏡膽囊切除手術模擬系統
★ 飛行模擬系統中的動態載入式多重解析度地形模塑	★ 以凌波為基礎的多重解析度地形模塑與貼圖
★ 多重解析度光流分析與深度計算	★ 體積守恆的變形模塑應用於腹腔鏡手術模擬
★ 互動式多重解析度模型編輯技術	★ 以小波轉換為基礎的多重解析度邊線追蹤技術(Wavelet-based multiresolution edge tracking for edge detection)
★ 基於二次式誤差及屬性準則的多重解析度模塑	★ 以整數小波轉換及灰色理論為基礎的漸進式影像壓縮
★ 建立在動態載入多重解析度地形模塑的戰術模擬	★ 以多階分割的空間關係做人臉偵測與特徵擷取
★ 以小波轉換為基礎的影像浮水印與壓縮	★ 外觀守恆及視點相關的多重解析度模塑

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

光學字元偵測與辨識是一個傳統的問題，以往的偵測辨識都需要從影像特性中設計演算法，通常只要更換背景或字形稍有不同，就需要再重新設計演算法來配合使用。傳統演算法的好處是速度非常快，而且能夠比較容易根據問題設計出適合的演算法；但缺點是通用性比較低，有些複雜的情況很難找到適當的特徵。在深度學習中，網路模型就好比一個巨大的演算法，我們通過訓練來調整巨大模型的函數曲線，讓它通過類神經網路的學習方式來自主學會處理複雜的問題；所以在本研究中，我們使用深度學習的方式來解決光學字元偵測與辨識的問題。這個應用的困難在於影像前景與背景的相似度極高，人的肉眼雖然可以辨識但交給電腦演算法卻難以辨識，就算是經過對比強化後的影像還是有許多雜訊的干擾，這時卷積神經網路的自主學習特徵提取，就能幫助我們將這些對比強化更高的影像有效的提取出特徵。
本研究內容分為兩個部分，第一部分討論線下的影像處理對於特徵提取的影響。第二部分是線上的網路架構修改，目的在於能夠讓提取的特徵視野範圍包含到整個文字，改變殘差網路架構使優化更容易，並且能夠節省製作預訓練模型的時間。
在實驗中，原始網路架構和光學字元資料的偵測與辨識結果很差，mAP是67.14%，召回率是82.52%，準確率是97.63%，不僅很多字元都沒找出來，偵測框的位置精準度也很差。不過準確率高是因為系統只辨識偵測出的文字，沒被偵測出來的文字，沒有對準確率造成影響。本研究討論整個系統的改進如下：mAP提升了約20%，最終到達了97.27%，召回率提升了約17%，最終到達了99.35%，準確率從97.63%提升到了99.60%。準確率的提升看起來雖只有2%，但是事實上考慮召回率的大量提升，讓更多模糊不清的文字被偵測到了，我們提出的光學字元偵測與辨識系統在工業應用中接近實踐。

摘要(英)

Optical character detection and recognition is a traditional problem. Traditional detection and classification need to design algorithms based on image characteristics. Usually, the background or the font shape is changed, we need to redesign new algorithms. The advantage of the traditional algorithms is fast and easier to design suitable algorithms based on the problems. In contrast to deep learning, its versatility is less. Some complicated situations are difficult to extract appropriate features. In deep learning, the network model is like a huge algorithm. We can adjust the parameters of the model through training, and let the model learn as a neural network to know how to deal with complex problems. Thus in this studying, we use deep learning to solve the problems of optical character detection and classification. The difficulty of this application is that the image foreground and background are highly similar. Even if the contrast of images is enhanced, the detection and classification are still difficult due to high noises. So the autonomous learning feature extraction of the convolutional neural network in deep learning is highly help to extract features from these contrast-enhanced images.
There are two parts of this paper. The first part discusses the effect of offline image processing on feature extraction. The second part is the online network architecture modification. The tasks include : enlarging the feature receptive field of entire alphabets, changing the residual network architecture to make optimization easier, and saving time in making pre-trained models.
In the experiment, the results of the original network architecture and optical character data are poor. The mAP is 67.14%, the recall is 82.52% and the precision is 97.63%. Lots of characters could not be detected. The improvement of this research is significative. We increase the mAP of the whole system about 20% to 97.27%, the recall is increased about 17% to 99.35%, and the precision is increased from 97.63% to 99.60%. The proposed optical character detection and recognition system is near practice on industrial applications.

關鍵字(中)

★ 強化低階特徵
★ 特徵擷取
★ 光學字元
★ 偵測
★ 卷積神經網路
★ 辨識

關鍵字(英)

★ Enhance
★ feature extraction
★ convolutional
★ optical character
★ detection
★ recognition

論文目次

摘要 i
Abstract ii
致謝 iv
目錄 v
圖目錄 vii
表目錄 ix
第一章緒論 1
1.1 研究動機 1
1.2 系統概述 3
1.3 論文特色 6
1.4 論文架構 6
第二章相關研究 7
2.1 卷積神經網路物件偵測系統發展 7
2.2 殘差網路 12
2.3 激活函數 13
第三章針對OCR小數據集網路架構修改 15
3.1 YOLOv3網路架構 16
3.2 增加卷積層數擴大特徵提取感知視野 23
3.3 完全預激活和指數線性單元的殘差網路 25
3.4 訓練資料和影像強化 28
第四章實驗結果與討論 32
4.1 實驗設備介紹和評估準則 32
4.2 影像處理對辨識和偵測的影響實驗 34
4.3 預訓練模型對準確率的實驗 41
4.4 卷積層數與殘差區塊結構實驗 42
第五章結論與未來展望 48
參考文獻 50

參考文獻

[1] A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. of Neural Information Processing Systems (NIPS), Harrahs and Harveys, Lake Tahoe, Dec.3-8, 2012.
[2] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Le-Cun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” in Proc. of ICLR Conf., Banff, Canada, Apr.14-16, 2014.
[3] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional neural networks,” in Proc. of ECCV Conf., Zurich, Switzerland, Sep.6-12, 2014, pp.818-833.
[4] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. of ICLR Conf., The Hilton San Diego Resort & Spa, May.7-9, 2015.
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, Jun.7-12, 2015, pp.1-9.
[6] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proc. of ICML Conf. , Lille, France, Jul.7-9, 2015, vol.37, pp.448-456.
[7] R. K. Srivastava, K. Greff, and J. Schmidhuber, “Training very deep networks,” in Proc. of Neural Information Processing Systems (NIPS), Montréal, Canada, Dec.7-12, 2015.
[8] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.770-778.
[9] J. Redmon and A. Farhadi, YOLOv3: An incremental improvement, Technical report, arXiv:1804.02767, 2018.
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Proc. of ECCV Conf., Amsterdam, The Netherlands, Oct.11-14, 2016.
[11] A. Shah, E. Kadam, H. Shah, S. Shinde, “Deep residual networks with exponential linear unit,” International Conference on Applied Soft Computing and Communication Networks (ACN), Jaipur, India, Sep.21-24, 2016.
[12] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, Ohio, Jun.23-28, 2014, pp.580-587.
[13] J. Uijlings, K. Sande, T. Gevers, and A. Smeulders, “Selective search for object recognition,” Int. Journal of Computer Vision (IJCV), vol.104, is.2, pp.154-171, 2013.
[14] R. Girshick, "Fast R-CNN," in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, Dec.11-18, 2015, pp.1440-1448.
[15] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in Proc. of ECCV Conf. , Zurich, Switzerland, Sep.6-12, 2014, pp.346-361.
[16] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN： Towards real-time object detection with region proposal networks,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.39, is.6, pp.1137-1149, 2016.
[17] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: unified, real-time object detection," in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp.779-788.
[18] J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul.21-26, 2017, pp.6517-6525.
[19] T.-Y. Lin, P. Doll´ar1, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul.21-26, 2017, pp.936-944.
[20] J. Dai, K. He, and J. Sun, “Convolutional feature masking for joint object and stuff segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.7-12, 2015, pp.3992-4000.
[21] B. Hariharan, P. Arbel´aez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, Jun.7-12, 2015, pp.447-456.
[22] P. O. Pinheiro, R. Collobert, and P. Dollar, “Learning to segment object candidates,” in Proc. of Neural Information Processing Systems (NIPS), Montréal, Canada, Dec.7-12, 2015.
[23] J. Dai, K. He, Y. Li, S. Ren and J. Sun, “Instance-sensitive fully convolutional networks,” in Proc. of ECCV Conf., Amsterdam, The Netherlands, Oct.11-14, 2016.
[24] P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Doll´ar, “Learning to refine object segments,” in Proc. of ECCV Conf. , Amsterdam, The Netherlands, Oct.11-14, 2016, vol.1, pp.75-91.
[25] J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Jun.27-30, 2016, pp.3150-3158.
[26] Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei, “Fully convolutional instance-aware semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Jul.21-26, 2017, pp.4438-4446.
[27] J. Dai, Y. Li, K. He, and J. Sun, “R-FCN: object detection via region-based fully convolutional networks,” in Proc. of Neural Information Processing Systems (NIPS), Barcelona, Spain, Dec.5-10, 2016.
[28] K. He, G. Gkioxari, P. Dollár and R. Girshick, "Mask R-CNN," in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy, Oct.22-29, 2017, pp. 2980-2988.
[29] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, MA, Jun.7-12, 2015, pp.3431-3440.
[30] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Santiago, Chile, Dec.7-13, 2015, pp.1520-1528.
[31] N. Chigozie Enyinna, I. Winifred, G. Anthony, and M. Stephen, “Activation functions: comparison of trends in practice and research for deep learning,” arXiv:1811.03378, 2018.
[32] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proc. of ICML Conf. , Haifa, Israel, Jun.21-24, 2010, pp.807-814.
[33] M. Andrew L, H. Awni Y, and N. Andrew Y, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. of ICML Conf. , Atlanta, GA, Jun.16-21, 2013.

指導教授

曾定章(Ding-Chang Tseng)

審核日期

2019-7-25

推文