摘要: | 光學文字辨識(Optical Character Recognition,OCR)對於電腦視覺(Computer Vision)來說是一項挑戰。從最初辨識特定字體的英數字以及特殊符號,到現在識別在現實場景照片中的文字,如招牌、路標等等,挑戰難度不斷在升級。而在文字識別中,中文文本相較於英文文本又更佳複雜。首先,中文文字的數量遠大於英文字的數量,且文字本身結構複雜性也遠高於英文字體。在中文文本中,英文數字會被包含於其中。且不同於英文,中文的書寫方式包含橫式文書與直式文書,更甚者,在同一份文件中同時出現橫式與直式文字,這更加深了中文文字識別的複雜性。訓練一個OCR系統需要大量標記資料,越複雜的場景越是如此。本篇論文專注在較單純的任務,中文掃描文件的文字識別。與野外場景中的文字識別不同,掃描文件的文字區塊較有結構性,因此文字區塊的偵測僅需要一個簡單的物件偵測網路即可得到不錯的結果。利用偵測到並切割成一行行的文字區塊,作為文字識別的輸入,最後整合每個區塊每一行文字識別結果,可以得到掃描文件中的所有文字。;Optical Character Recognition (OCR) is a big challenge of Computer Vision. The degree of challenge has become harder from the task of recognizing the English characters and numbers with specific font and some symbol to the task of detecting and recognizing the text in the wild. And in the domain of text detection and recognition, detecting and recognizing Chinese context is more complex than the English. First, the amount of Chinese character is much more than English, and the shape is much more complex, too. Different from English context, Chinese can be written from left to right, and from top to bottom, also, which makes Chinese text detection and recognition much harder. Training a model of OCR system needs a lot of data with label, both position of the character and what the character is, the more complex scene needs more data with label. We focus on simple task, we just detect and recognize the Chinese text with the scan files. Different from task of text in wild, the block of text is more structural in task that detecting text in scan files. Therefore, we can get a great result with a simple network for text detection. And we just need to separate each line from the region that we detected, and use the line as the input of text recognition. Then, combine the result of OCR and the position we detect, we can get all the text in the scan file. And maybe, with these results, it can develop more applications, file classification takes for an example. |