A Robust Two-Stage Pre-processing Method to Improve Vehicle License Recognition

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：36

、訪客IP：18.117.229.193

姓名

詹振宗(Cheng-Tsung Chan) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(A Robust Two-Stage Pre-processing Method to Improve Vehicle License Recognition)

相關論文

★ Dynamic Overlay Construction for Mobile Target Detection in Wireless Sensor Networks	★ 車輛導航的簡易繞路策略
★ 使用傳送端電壓改善定位	★ 利用車輛分類建構車載網路上的虛擬骨幹
★ Why Topology-based Broadcast Algorithms Do Not Work Well in Heterogeneous Wireless Networks?	★ 針對移動性目標物的有效率無線感測網路
★ 適用於無線隨意網路中以關節點為基礎的分散式拓樸控制方法	★ A Review of Existing Web Frameworks
★ 將感測網路切割成貪婪區塊的分散式演算法	★ 無線網路上Range-free的距離測量
★ Inferring Floor Plan from Trajectories	★ An Indoor Collaborative Pedestrian Dead Reckoning System
★ Dynamic Content Adjustment In Mobile Ad Hoc Networks	★ 以影像為基礎的定位系統
★ 大範圍無線感測網路下分散式資料壓縮收集演算法	★ 車用WiFi網路中的碰撞分析

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-11-1以後開放)

摘要(中)

各種金融保險和投資應用網站都要求客戶上傳身份證明文件，例如財力證明和汽車行照，以驗證其身份。然而，人工驗證這些文件的成本很高。因此，自動文件識別的需求越來越大。在本研究中，我們提出一個穩健有效車輛行照辨識系統。本研究內的文字偵測分為兩個階段。在第一階段，矯正網路矯正了經常出現在未掃描行照中可能出現的變形。在第二階段，定位網路準確定位每個行照內欄位並切割成欄位影像。隨後欄位影像由商業文字識別軟體辨識欄位中的文字。由於車輛行照內資料的敏感性，很難收集到足夠的訓練資料進行模型訓練。因此，在進行模型訓練前，我們合成了假行照影像數據集，並且進行預處理以避免過擬合。此外，在進行文本識別之前，還利用降噪網路消除行照背景雜訊，消除文本的邊界，使模糊的文本更加清晰。我們的方法只需要客戶上傳一張行照照片，即使照片拍得不好，行照內欄位文字也能被識別。最後，通過真實數據集對本系統的性能進行了評估，辨識系統準確率接近90％。

摘要(英)

upload identity documents, such as financial certificates and vehicle licenses, to verify their identities. Manual verification of these documents is costly. As a result, there is an increasing demand for automatic document recognition. This study proposes a robust method for pre-processing vehicle license before text recognition. The proposed method has two stages. In the first stage, the possible distortion that often appears in non-scanned documents is repaired. In the second stage, each data field is accurately located. The subsequent captured fields are then processed by commercial text recognition model. As the vehicle license is sensitive, it is difficult to collect enough entries for model training. Consequently, the fake vehicle licenses are synthesized and pre-processed to avoid the overfitting when used for model training. Additionally, before text recognition, an encoder is applied to reduce the background noise, remove the border crossing over text, and make the blurred text clearer. Our approach only requires the customer to upload a photo of the vehicle license and the text can be recognized even when the photo is taken poorly. The performance of the proposed method is evaluated through true dataset, and the accuracy is close to 0.9.

關鍵字(中)

★ 文字辨識
★ 文字偵測
★ 光學字元辨識
★ 車輛行照

關鍵字(英)

★ Text Recognition
★ Text Detection
★ Optical Character Recognition
★ Vehicle license

論文目次

1 Introduction p.1
2 Related Work p.5
2.1 Text Detection p.5
2.1.1 Tradition Optical Character Detection p.5
2.1.2 Deep Learing-based Text Detection p.6
2.2 Text Recognition p.7
2.2.1 Traditional Optical Character Recognition p.7
2.2.2 Deep Learing-based Text Recognition p.8
3 Preliminary p.10
3.1 Image Processing Techniques p.10
3.1.1 Image Binarization p.10
3.1.2 Convolutional Neural Networks p.11
3.1.3 Image Augmentation p.12
3.2 Optical Character Recognition Tools p.13
3.2.1 Tesseract p.13
3.2.2 OpenCV p.14
4 Design p.16
4.1 Motivation p.16
4.2 Problem Definition p.17
4.3 Two-Stage Text Detection p.19
4.3.1 Data Acquisition and Augmentation p.20
4.3.1.1 Data Acquisition p.20
4.3.1.2 Synthesize Datasets p.25
4.3.1.3 Annotation Format p.30
4.3.1.4 Pre-processing p.32
4.3.2 Rectification Network p.34
4.3.3 Locating Network p.36
4.4 Text Recognition p.38
4.4.1 Synthesize Dataset p.38
4.4.2 Denoise Network p.40
4.4.3 Commercial text recognition software p.43
4.4.4 Post-processing p.44
5 Performance p.46
5.1 Experimental setup p.46
5.2 Evaluation metrics p.47
5.3 Experimental Results p.50
5.3.1 Two-Stage Detection p.50
5.3.2 Recognition p.54
6 Conclusions p.57
Reference p.58

參考文獻

[1] Idcar 2015 incidental scene text-task 4.1: Text localization. https://rrc.cvc.uab.es/?ch=4&com=tasks. Accessed: 2020-03-17.
[2] Kolourpaint. https://kde.org/applications/en/graphics/org.kde. kolourpaint. Accessed: 2020-07-11.
[3] Ministry of transportation and communications. https://www.motc.gov.tw/ch/ index.jsp. Accessed: 2020-07-11.
[4] Motor vehicle office https://www.mvdis.gov.tw/. Accessed: 2020-07-11.
[5] Fred L. Bookstein. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Transactions on pattern analysis and machine intelligence, 11(6):567–585, 1989.
[6] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995.
[7] Ivan Culjak, David Abram, Tomislav Pribanic, Hrvoje Dzapo, and Mario Cifrek. A brief introduction to opencv. In 2012 proceedings of the 35th international convention MIPRO, pages 1725–1730. IEEE, 2012.
[8] Boris Epshtein, Eyal Ofek, and Yonatan Wexler. Detecting text in natural scenes with stroke width transform. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2963–2970. IEEE, 2010.
[9] Naresh Garg and N Garg. Binarization techniques used for grey scale images. Inter- national Journal of Computer Applications, 71(1):8–11, 2013.
[10] google. Google vision ocr. https://cloud.google.com/vision. Accessed: 2020-07- 11.
[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[12] Tin Kam Ho. The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, 20(8):832–844, 1998.
[13] Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks. In Advances in neural information processing systems, pages 2017–2025, 2015.
[14] Alex Krizhevsky, Ilya Sutskever, and Geo↵rey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
[15] Yann LeCun, L ́eon Bottou, Yoshua Bengio, and Patrick Ha↵ner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278– 2324, 1998.
[16] Vladimir I Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710, 1966.
[17] Hui Li, Peng Wang, and Chunhua Shen. Towards end-to-end text spotting with convolutional recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 5238–5246, 2017.
[18] Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. Textboxes: A fast text detector with a single deep neural network. In Thirty-First AAAI Con- ference on Artificial Intelligence, 2017.
[19] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng- Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In European conference on computer vision, pages 21–37. Springer, 2016.
[20] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
[21] Simon M Lucas, Alex Panaretos, Luis Sosa, Anthony Tang, Shirley Wong, Robert Young, Kazuki Ashida, Hiroki Nagai, Masayuki Okamoto, Hiroaki Yamamoto, et al. Icdar 2003 robust reading competitions: entries, results, and future directions. Inter- national Journal of Document Analysis and Recognition (IJDAR), 7(2-3):105–122, 2005.
[22] Jiri Matas, Ondrej Chum, Martin Urban, and Toma ́s Pajdla. Robust wide- baseline stereo from maximally stable extremal regions. Image and vision computing, 22(10):761–767, 2004.
[23] Anand Mishra, Karteek Alahari, and CV Jawahar. Scene text recognition using higher order language priors. 2012.
[24] photock.jp. Free background dataset. https://www.photock.jp/. Accessed: 2020- 07-11.
[25] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional net- 60
works for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
[26] Baoguang Shi, Xiang Bai, and Serge Belongie. Detecting oriented text in natural images by linking segments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2550–2558, 2017.
[27] Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 39(11):2298–2304, 2016.
[28] Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence, 41(9):2035–2048, 2018.
[29] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large- scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[30] Ray Smith. An overview of the tesseract ocr engine. In Ninth international confer- ence on document analysis and recognition (ICDAR 2007), volume 2, pages 629–633. IEEE, 2007.
[31] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
[32] Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. Detecting text in natural 61
image with connectionist text proposal network. In European conference on computer vision, pages 56–72. Springer, 2016.
[33] Kai Wang, Boris Babenko, and Serge Belongie. End-to-end scene text recognition. In 2011 International Conference on Computer Vision, pages 1457–1464. IEEE, 2011.
[34] Song Yuheng and Yan Hao. Image segmentation algorithms overview. arXiv preprint arXiv:1707.02051, 2017.

指導教授

孫敏德(Min-Te Sun)

審核日期

2020-7-29

推文