中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/86355
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 78852/78852 (100%)
造访人次 : 50983      在线人数 : 657
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/86355


    题名: 以優勢點樹鄰近搜尋方法設計4808個 中文常用字分類器;Designed with Vantage Point Tree proximity search method 4808 common Chinese word
    作者: 張捷;Chang-Jay
    贡献者: 通訊工程學系在職專班
    关键词: 鄰近搜尋;歐幾里得距離;文字辨識;文字切割;水平垂直投影;Tesseract-OCR;VP-Tree;GLCM;Euclidean Distance
    日期: 2021-10-19
    上传时间: 2021-12-07 12:37:06 (UTC+8)
    出版者: 國立中央大學
    摘要: 本論文提出在中文字辨識領域中以鄰近搜尋的方式取代深度學習的模型訓練架構。採用以Two Stage方式,將教育部提供的4808個常用中文字作為文字辨識的依據,以影像形態學處理加上水平垂直投影的方法進行文字切割,使用灰度共生矩陣與空間矩擷取中文字的特徵,透過正規化將特徵值等比例的縮放到0~1區間輸出,並以不同字型樣式的4808中文字作為優勢點樹分類器資料庫,並以優勢點樹分類器透過歐幾里得距離範圍進行中文字的鄰近搜尋辨識,與開源的Tesseract-OCR光學字元辨識軟體進行4808個中文字常用的辨識結果比較。實驗中發現優勢點樹分類器的建立時間均低於1秒,比起深度學習模型的訓練減少許多,而且在以新細明體作為優勢點樹分類器資料庫對於不同字型中文字的鄰近搜尋中,平均辨識率達到79%,優於Tesseract-OCR中文字的辨識結果。;This paper proposes a model training architecture that replaces deep learning with proximity search in the field of Chinese character recognition, use 4808 commonly used Chinese characters provided by the Ministry of Education as the basis for character recognition, by using image morphology processing plus horizontal and vertical projection for Chinese character cutting, Using the grayscale symbiotic matrix and spatial moment to capture the characteristics of Chinese words, the characteristic value is scaled to 0 to 1 interval output by formalization. With different font styles of 4808 Chinese words as the advantage point tree classifier database, and the advantage point tree classifier through the Euclidean distance range for Chinese word proximity search identification. Compared with the optical character recognition software of open source Tesseract-OCR, the identification results of 4808 Chinese words were compared. In the experiment, found that the establishment time of the advantage point tree classifier was less than 1 second, which was much less than the training of the deep learning model, and the average recognition rate of the database of the advantage point tree classifier with the new detail as the advantage point tree classifier reached 79% in the adjacent search for different type Chinese words, which was better than the recognition results of Tesseract-OCR Chinese words.
    显示于类别:[通訊工程學系碩士在職專班 ] 博碩士論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML166检视/开启


    在NCUIR中所有的数据项都受到原著作权保护.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明