中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/9017
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 75369/75369 (100%)
造访人次 : 25456791      在线人数 : 401
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/9017


    题名: 中文商業名片辨識及後處理;Recognition and Postprocessing of Chinese Business Cards
    作者: 陳泰宏;Tai-Hung Chen
    贡献者: 電機工程研究所
    关键词: 隱藏式馬可夫模型;語意;後處理;中文;辨識;名片;維特比演算法;語言模型;language;linguish;Viterbi;OCR;HMM;card
    日期: 2000-07-10
    上传时间: 2009-09-22 11:39:31 (UTC+8)
    出版者: 國立中央大學圖書館
    摘要: 名片傳達許多重要的資訊,為了更有效率的使用這些資訊,自動地抽取這些資訊並建立電子資料庫是必要的,這類的程序稱之為名片辨識系統。一般而言,名片的辨識主要包含三步驟,首先,前處理級將處理名片影像並抽取名片上的文字,第二個步驟是針對名片版面作分析,最後則是後處理級,採用語意等方法來改善名片處理系統的辨識率。 這篇論文主要研究的目標為中文商業名片的辨識問題。我們假設名片上的字元已經被抽取出來並且已經分析過名片的版面,由於名片上的字元太小以及字型變化太大導致了OCR應用在名片上的低辨識率,我們研究的目的主要在改善這個問題。 在我們的方法中,採用了HMM來辨識中文商業名片上的字元,由左而右的HMM模型用來辨識字元並輸出前十名候選字。在後處理級中,語言模型接著用來改善辨識的結果。Viterbi演算法被應用在後處理的校正上,以bigram當作語意的資訊用來搜尋前十名候選字中的正確字元,所得到的最佳字元序列為後處理級中所改善的結果。 我們的實驗建立在辨識中文商業名片的公司欄位和地址欄位,用來訓練bigem和HMM的資料庫為電話簿上的資料,100張名片的地址欄位和30張名片的公司欄位被用來作測試。實驗的結果證實了我們提出的方法確實有效。 Business cards convey significant information of personal data. In order to use the information effectively, it is necessary to automatically extract the information to build an electronic business card database. This is called a business card recognition system. In generally, a business card recognition system has three stages. First, a preprocessing stage is needed to perform image processing and extract character images. It then needs a card layout analysis as the second stage. The last stage called post-processing usually adopts linguistics to increase the recognition rate of business card processing. The goal of this thesis is to study the recognition problems of business cards. We assume that characters have been recognized and card layout has been analyzed. Our aim is to improve the low recognition rate of OCR in business card, which happens due to the fact that characters vary greatly in font type and are too small to be recognized. In our approach, Hidden Markov Model is adopted to recognize characters in Chinese business card. A left-right model will output the top-10 candidates as its recognition result. A postprocessing stage is followed to improve the recognition result. A Viterbi algorithm is proposed in the postprocessing stage. The algorithm will use bigram as its linguistic information to search the top-10 candidates. An optimized character sequence is obtained as the improved result of postprocessing. Our experiments are built on the recognition of address item and company item in business cards. Bigram table and Hidden Markov Models are trained with a telephony database. 100 address items and 30 company items are used for testing. Experimental results reveal the validity of our proposed method.
    显示于类别:[電機工程研究所] 博碩士論文

    文件中的档案:

    档案 大小格式浏览次数


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明