A High Recognition Rate OCR for Specific Fonts

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/68999

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/68999

题名:	A High Recognition Rate OCR for Specific Fonts
作者:	王祉鈞;Wang,Chih-Chun
贡献者:	資訊工程學系
关键词:	光學字元辨識;Optical Character Recognition
日期:	2015-08-31
上传时间:	2015-09-23 14:52:41 (UTC+8)
出版者:	國立中央大學
摘要:	光學字元辨識(OCR)是一個發展多年的技術，現今已有非常多的專案及研究對這個領域做出貢獻[1]。但是發展如此久的技術，卻一直沒有一個工具能夠百分之百的正確辨識出所有文字。這個問題的理由很簡單：要做一個通用的OCR工具，所必須要面對的狀況太過多樣化，辨識影像的來源、文字影像的品質、各式的文章排版、五花八門的字體、大大小小的文字。只要有其中一項變動，對電腦來說就是一個新的挑戰。本論文為了解決辨識率不足的問題，將會針對我們的應用目的(Korat自動化測試系統的test oracle)，在合理的範圍內限制了各項的條件。經由固定且高品質的影像來源，確保待辨識影像的端正與乾淨。在固定平台下的操作，使文章排版受到限制。提供有限的字體，並且每種字體區分開來，降低影像辨識的複雜度。藉由以上的條件限制，我們實作出一個高辨識率的OCR工具-TinyOCR。經實驗結果證實，TinyOCR在Linux Console界面中，於大多數的應用情境下，皆可正確的辨識出影像中的文字。且在辨識的效率也非常的高。因此它能夠提供使用者非常高的信賴度，在自動化測試的應用上也令人能夠安心。 ;Optical Character Recognition (OCR) is a technology which has been developed for many years. There are a lot of projects and researches contribute to this field. However, there is still no tool that can achieve one hundred percent recognition rate for all characters. The reason is simple: for a general OCR tool, there are so many factors can complicate this problem. For example, the source of targets, the quality of image, layout of the article, wide variety fonts, and different zoom of the characters. Any one of these factors changing is a new challenge for OCR technique. In this paper, we will make some restriction in reasonable range for our usage, test oracle for our testing automation system Korat. A high quality image source from a frame grabber can make no skew and clear. Fixed platform provides the fixed layout. With limited fonts that may be used, we can separate each font to its own database and reduce the complexity of classification. By limited the above conditions, we implement a high recognition rate OCR tool - TinyOCR. Our experiments indicate that our implementation can recognize the characters correctly in most of scenarios in Linux console, and consume little time. So that it can provide users high reliability in the usage of testing automation.
显示于类别:	[資訊工程研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	501	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....