中文文字偵測與辨識

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：18.191.165.252

姓名

林育如(Yuh-Ju Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

中文文字偵測與辨識
(Chinese Text Detection and Recognition)

相關論文

★ Single and Multi-Label Environmental Sound Recognition with Gaussian Process	★ 波束形成與音訊前處理之嵌入式系統實現
★ 語音合成及語者轉換之應用與設計	★ 基於語意之輿情分析系統
★ 高品質口述系統之設計與應用	★ 深度學習及加速強健特徵之CT影像跟骨骨折辨識及偵測
★ 基於風格向量空間之個性化協同過濾服裝推薦系統	★ RetinaNet應用於人臉偵測
★ 金融商品走勢預測	★ 整合深度學習方法預測年齡以及衰老基因之研究
★ 漢語之端到端語音合成研究	★ 基於 ARM 架構上的 ORB-SLAM2 的應用與改進
★ 基於深度學習之指數股票型基金趨勢預測	★ 探討財經新聞與金融趨勢的相關性
★ 基於卷積神經網路的情緒語音分析	★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

光學文字辨識(Optical Character Recognition，OCR)對於電腦視覺(Computer Vision)來說是一項挑戰。從最初辨識特定字體的英數字以及特殊符號，到現在識別在現實場景照片中的文字，如招牌、路標等等，挑戰難度不斷在升級。而在文字識別中，中文文本相較於英文文本又更佳複雜。首先，中文文字的數量遠大於英文字的數量，且文字本身結構複雜性也遠高於英文字體。在中文文本中，英文數字會被包含於其中。且不同於英文，中文的書寫方式包含橫式文書與直式文書，更甚者，在同一份文件中同時出現橫式與直式文字，這更加深了中文文字識別的複雜性。訓練一個OCR系統需要大量標記資料，越複雜的場景越是如此。本篇論文專注在較單純的任務，中文掃描文件的文字識別。與野外場景中的文字識別不同，掃描文件的文字區塊較有結構性，因此文字區塊的偵測僅需要一個簡單的物件偵測網路即可得到不錯的結果。利用偵測到並切割成一行行的文字區塊，作為文字識別的輸入，最後整合每個區塊每一行文字識別結果，可以得到掃描文件中的所有文字。

摘要(英)

Optical Character Recognition (OCR) is a big challenge of Computer Vision. The degree of challenge has become harder from the task of recognizing the English characters and numbers with specific font and some symbol to the task of detecting and recognizing the text in the wild. And in the domain of text detection and recognition, detecting and recognizing Chinese context is more complex than the English. First, the amount of Chinese character is much more than English, and the shape is much more complex, too. Different from English context, Chinese can be written from left to right, and from top to bottom, also, which makes Chinese text detection and recognition much harder. Training a model of OCR system needs a lot of data with label, both position of the character and what the character is, the more complex scene needs more data with label. We focus on simple task, we just detect and recognize the Chinese text with the scan files. Different from task of text in wild, the block of text is more structural in task that detecting text in scan files. Therefore, we can get a great result with a simple network for text detection. And we just need to separate each line from the region that we detected, and use the line as the input of text recognition. Then, combine the result of OCR and the position we detect, we can get all the text in the scan file. And maybe, with these results, it can develop more applications, file classification takes for an example.

關鍵字(中)

★ 文字辨識
★ 文字偵測
★ 中文文字辨識
★ 中文文字偵測

關鍵字(英)

★ OCR
★ Text Detection
★ Text Recognition
★ Chinese Text Detection
★ Chinese Text Recognition

論文目次

中文摘要 V
Abstract VI
圖目錄 VII
表目錄 IX
章節目次 X
第一章緒論 1
1.1 研究背景及研究目的 1
1.2 研究方法與章節概要 2
第二章文獻探討 3
2.1 類神經網路 3
2.1.1 感知機 3
2.1.2 多層感知機 6
2.2 深度神經網路 8
2.2.1 卷積神經網路 8
2.2.2 遞迴神經網路 17
2.3 物件偵測 20
2.4 光學文字辨識 24
第三章基於中文掃描文件的文字偵測與辨識 26
3.2 中文文字辨識 30
第四章實驗結果與討論 31
4.1 實驗環境介紹 31
4.2 訓練資料標註以及產生 32
4.3 實驗結果 34
4.3.1文字偵測 34
4.3.2 文字辨識 36
第五章結論與未來研究方向 39
第六章參考文獻 40

參考文獻

[1] I. Sutskever, O. Vinyals, Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” Advances in neural information processing system, 2014.
[2] R. O’Reilly, “Biologically Plausible Error-driven Learning using Local Activation Differences: The Generalized Recirculation Algorithm,” Neural Computation, 8:5, 895-938, 1996.
[3] Hamza Mahmood, “Activation Functions in Neural Networks,” [Online], Available: https://towardsdatascience.com/activation-functions-in-neural-networks-83ff7f46a6bd, [Accessed: 23-Jul-2019].
[4] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, Nov 1998.
[5] A. Krizhevsky, I. Sutskever, G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097-1105, Dec 2012.
[6] K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv:1409.1556v1 [cs.CV](2014)
[7] C. Szegedy et al., “Going deeper with convolutions,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1-9.
[8] S. Ioffe, C. Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” ArXiv abs/1502.03167 (2015): n. pag.
[9] C. Szegedy, et al. “Rethinking the Inception Architecture for Computer Vision,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015): 2818-2826.
[10] C. Szegedy, S. Ioffe, V. Vanhoucke, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” In: CoRR, abs/1602.07261 (2016)
[11] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016, pp. 770-778.
[12] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the National Academy of Sciences of the USA, vol. 79, no. 8, pp. 2554-2558, April 1982.
[13] Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.
[14] Hamza Mahmood, “Understanding LSTM Networks,” [Online], Available: https://colah.github.io/posts/2015-08-Understanding-LSTMs/, [Accessed: 25-Jul-2019].
[15] Alex Graves, Jürgen Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks, Volume 18, Issues 5–6, 2005, Pages 602-610, ISSN 0893-6080.
[16] Graves, Alex et al. “Speech recognition with deep recurrent neural networks.” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013): 6645-6649.
[17] Girshick, Ross B. et al. “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” 2014 IEEE Conference on Computer Vision and Pattern Recognition (2013): 580-587.
[18] R. Girshick, “Fast R-CNN,” 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1440-1448.
[19] GIRSHICK, Ross. Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2015. p. 1440-1448.
[20] He, Kaiming et al. “Mask R-CNN.” 2017 IEEE International Conference on Computer Vision (ICCV) (2017): 2980-2988.
[21] “The First OCR System: "GISMO",” [Online], Available: http://www.historyofinformation.com/detail.php?entryid=885, [Accessed: 20-Aug-2019].
[22] Casey, Richard G. and George Nagy. “Recognition of Printed Chinese Characters.” IEEE Trans. Electronic Computers 15 (1966): 91-101.
[23] “Tesseract Ocr,” [Online], Available: https://github.com/tesseract-ocr/, [Accessed: 20-Aug-2019].
[24] Zuo, Zhen & Shuai, Bing & Wang, Gang & Liu, Xiao & Wang, Xingxing & Wang, Bing & Chen, Yushi. (2015). Convolutional recurrent neural networks: Learning spatial dependencies for image representation. 18-26. 10.1109/CVPRW.2015.7301268.
[25] Graves, Alex et al. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.” ICML (2006).
[26] Tian, Zhi & Huang, Weilin & Tong, He & He, Pan & Qiao, Yu. (2016). Detecting Text in Natural Image with Connectionist Text Proposal Network. 9912. 56-72. 10.1007/978-3-319-46484-8_4.

指導教授

王家慶(Jia-Ching Wang)

審核日期

2019-8-22

推文