中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/9419
English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 40300929      線上人數 : 565
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/9419


    題名: 以外形特徵為基礎之影像語言分類器-應用於破碎中文字合併;Image Language Identification Using Shapelet Feature-Application in Merging Broken Chinese Characters
    作者: 范聖恩;Sheng-En Fann
    貢獻者: 資訊工程研究所
    關鍵詞: 機器學習;影像語言分類;影像語言辨識;影像處理;外形特徵;image language identification;machine learning;shapelet feature;image language classification;image processing
    日期: 2009-01-19
    上傳時間: 2009-09-22 11:47:22 (UTC+8)
    出版者: 國立中央大學圖書館
    摘要: 本論文使用外形特徵(Shapelet Feature)搭配Adaboost與 SVM兩種機器學習演算法來建構影像語言分類器。不同於過去,從上而下的概念將整張文件影像、或是某個文章、段落進行語言種類判別,本論文使用機器學習的方式自動計算足以分辨語言種類的特徵,可以細膩快速的判定文件中每個連通物件的語言種類(中文或是英文)。 輸入文字影像首先被邏輯上分成若干個區域,並計算各區域影像內四個方向的灰階梯度資訊以建構低階特徵,再將低階特徵傳入區域分類器計算其外形特徵,最後將各區域的區域外形特徵集合起來(全域外形特徵)即形成最終語言分類器的輸入特徵。 因為考量繁體中文字結構上的特性,對於文件中判定為中文部首、中文部分字的連通物件,我們再嘗試將其與左右連通物件合併以形成完整中文字。實驗除了分別比較兩階段Adaboost與Adaboost + SVM訓練方式效果的優劣外,亦將語言分類器發揮在以可攜式攝影器材取像的應用上。結果證明,本論文提出的方法可以實際應用在現今多語言文件的分析,除了能有效幫助後端文字辨識正確率的提升與文件內容的擷取,也能在不具備其它語系相關知識下,將此方法推廣至其它語系的語言分類上。 In this paper, a novel language identifier using shapelet feature with Adaboost and SVM has been developed. Different from previous works, our proposed mechanism not only can identify the language type in either Chinese or English of each connected component in the document image, but also obtain better robustness and gain highly efficiency and performance. First of all, the input connected component image has been divided into several sub-windows logically. After then, the gradient responses of each sub-image in different directions are extracted and the local average of these responses around each pixel is manipulated. In the following, the Adaboost is performed to select a subset of its low-level features to construct a mid-level shapelet feature. Finally, the shapelet features are merged together in all sub-windows. Through the above process, all of the information from different parts of the image is combined together and treated as the feature of the final language identifier. The broken or partial Chinese character connected components are tried to be combined with their neighboring connected components. The experimental results demonstrate that our proposed method not only can achieve the goal of improving the correctness rate for OCR process, but also obtain great merits for advanced document analysis.
    顯示於類別:[資訊工程研究所] 博碩士論文

    文件中的檔案:

    檔案 大小格式瀏覽次數


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明