一種應用於特定工程圖表影像的文字智慧辨識與提取之技術研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：6

、訪客IP：3.19.27.178

姓名

陳冠帆(Kuan-Fan Chen) 查詢紙本館藏

畢業系所

機械工程學系在職專班

論文名稱

一種應用於特定工程圖表影像的文字智慧辨識與提取之技術研究

相關論文

★ 輝度與色彩均勻化之發光二極體直下式背光模組應用設計	★ 薄型化LCD直下式背光模組設計
★ 非對稱型光分佈的發光二極體照明裝置之研究	★ 應用平行光互連技術於40Gb/s的光收發次模組之封裝技術
★ 大尺寸發光二極體側光式背光模組散熱技術	★ 灰化製程對鉻及氧化銦錫接觸阻抗之影響
★ 導光式發光框條的光學設計與驗證	★ 直下式LED液晶觸控顯示器之研究
★ 全周光裝飾型LED燈泡之研究	★ 卷對卷技術應用於凹形微透鏡膜製造之分析
★ 複合式多波長驗鈔裝置探討	★ 液晶顯示器品質提升之研究
★ 在微影製程中旋轉塗佈實驗之正型光阻減量的研究	★ 寬頻光方向耦合器使用數種權重函數之結構最佳化設計
★ 線上近紅外線穿透光檢測系統應用於不織布製程設備之研究	★ 遠端螢光粉LED光學效能提升之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

本文研究一種使用形態學與光學字符辨識功能取得特定工程圖表影像中單元格的文字內容，並記錄結果的快速辨識方法。本研究適用於特定工程圖表影像，如果需要應用於不同形式的工程圖表影像，可以修改相應工程圖表影像規則的參數。
本研究以Python程式語言作為基礎，前處理使用Otsu閾值法進行圖像二值化處理，並使用形態學操作提取特定工程圖表影像之單元格位置。在文字辨識的過程中，使用Tesseract-OCR套件分為三個階段進行文字辨識與提取：1.使用全自動頁面分割搭配預訓練的英語模型、2.使用單詞分割搭配重新訓練的英語模型與3.使用單字分割搭配重新訓練的英語模型。最後，使用正規表達式搭配窮舉法修正錯誤以及與規則不符的內容。
實驗結果表明，Tesseract-OCR套件雖然提供使用者預訓練的英語模型，並且這個英語模型在長字串的辨識能力非常卓越，但是在單元格中的單詞或單字辨識卻容易產生錯誤，使用三個階段搭配預訓練的英語模型辨識結果，正確率僅14.65%。而本研究使用特定工程圖表影像製作數據集重新訓練的英語模型，對於單元格中的單詞或單字辨識能力更好，正確率可以提升至58.04%。在後處理的過程中，依特殊工程圖表規則列出所有錯誤以及與規則不符的內容並使用正確字符取代，則可以讓正確率達到100%。

摘要(英)

This study investigates a rapid recognition method for extracting text content from cells in specific engineering chart images using morphology and optical character recognition (OCR) techniques and recording the results. The research is applicable to specific engineering chart images, and if it needs to be applied to different types of engineering chart images, the parameters of the corresponding engineering chart image rules can be modified.
Python programming language serves as the foundation for this research. In the preprocessing stage, the Otsu thresholding method is utilized for image binarization, and morphology operations are employed to extract the positions of cells in specific engineering chart images. In the text recognition process, the Tesseract-OCR package is used and divided into three stages for text recognition and extraction: 1. automatic page segmentation with a pre-trained English model, 2. word segmentation with a retrained English model, and 3. character segmentation with a retrained English model. Finally, regular expressions combined with an exhaustive approach are used to correct errors and content that deviate from the rules.
The experimental results indicate that although the Tesseract-OCR package provides users with a pre-trained English model, which exhibits excellent recognition capabilities for long strings, it tends to generate errors in recognizing words or individual characters within cells. Using the three-stage approach with the pre-trained English model, the recognition accuracy is only 14.65%. However, by retraining the English model using a dataset created from specific engineering chart images, the recognition capability for words or individual characters within cells improves, achieving an accuracy of 58.04%. In the post-processing stage, by listing all errors and content that deviate from the rules based on specific engineering chart rules and replacing them with correct characters, the accuracy can be enhanced to 100%.

關鍵字(中)

★ 文字辨識
★ 表格辨識
★ 信息提取
★ 形態學操作
★ 光學字符辨識

關鍵字(英)

★ Text recognition
★ Table extraction
★ Information extraction
★ Morphological operations
★ Optical Character Recognition

論文目次

摘要 i
ABSTRACT ii
致謝 iii
目錄 iv
圖目錄 vi
表目錄 ix
第一章緒論 1
1-1 研究背景 1
1-2 研究動機與目的 1
1-3 文獻回顧 4
1-3-1 文字辨識與提取 4
1-3-2 表格檢測與辨識 6
1-4 論文架構 9
第二章研究理論與方法 10
2-1 圖像處理與模式識別 10
2-2 程式語言與套件功能介紹 14
第三章程式架構 16
3-1 前置作業 16
3-2 表格判斷與內容提取流程 17
3-3 圖像前處理 18
3-3-1 二值化處理 19
3-3-2 表格輪廓提取 20
3-4 表格文字提取 31
3-4-1 光學字符辨識 32
3-4-2 重新訓練Tesseract-OCR的英語模型 37
3-5 後處理 39
第四章實驗結果 42
第五章結論與未來展望 46
5-1 結論 46
5-2 未來展望 48
參考資料 49
附錄 52
附件一：使用LSTM重新訓練Tesseract-OCR的步驟[29] 52

參考文獻

[1] S. Papert, “The Summer Vision Project”, Massachusetts Institute Of Technology Project Mac, Artificial Intelligence Group, Vision Memo, No. 100. July 1966.
[2] 莊永裕，「矽眼：電腦視覺初探」，探索基礎科學系列講座，第20期，2018年12月1日，取自臺大科學教育發展中心的YOUTUBE影音平台https://www.youtube.com/watch?v=7-Mk-VMM9F8
[3] 「LINE實用技：掃碼功能隱藏小技巧，一拍輕鬆擷取、翻譯文字」, 20 July 2021, 取自LINE官網https://official-blog-tw.line.me/archives/10528346.html
[4] G. Tauschek, M. Lakes and N. J., “READING MACHINE”美國專利，公告號US2026330A，December 1935。
[5] 林巧敏和蔡瀚緯，「光學字元辨識古籍之全文轉置經驗：以明人文集為例」，圖資與檔案學刊，12:2(No.97)，76-117頁，December 2020。
[6] J. Shashirangana, H. Padmasiri, D. Meedeniya, et al. “Automated License Plate Recognition: A Survey on Methods and Techniques”, IEEE Access, Vol 9, pp. 11203-11225, December 2020.
[7] X. Zhi, B. Zhao and Y. Wang, “A Hybrid Framework for Text Recognition Used in Commodity Futures Document Verification”, 2021 6th International Conference on Computational Intelligence and Applications (ICCIA), June 2021.
[8] M. Tamilselvi, G. Ramkumar, G Anitha, et al. “A Novel Text Recognition Scheme using Classification Assisted Digital Image Processing Strategy”, 2022 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), January 2022.
[9] H. Arslan, “End to End Invoice Processing Application Based on Key Fields Extraction”, IEEE Access, Vol 10, pp. 78398-78413, July 2022.

[10] S.A. Siddiqui, M.I. Malik, S. Agne, et al. “DeCNT: Deep Deformable CNN for Table Detection”, IEEE Access, Vol 6, pp. 74151-74161, November 2018.
[11] A. Sinha, J. Bayer and S.S. Bukhari, “Table Localization and Field Value Extraction in Piping and Instrumentation Diagram Images”, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), September 2019.
[12] N. Sun, Y. Zhu and X. Hu, “Faster R-CNN Based Table Detection Combining Corner Locating”, 2019 International Conference on Document Analysis and Recognition (ICDAR), September 2019.
[13] S.S. Paliwal, V. D, R. Rahul, et al. “TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images”, 2019 International Conference on Document Analysis and Recognition (ICDAR), September 2019.
[14] Nidhi, K. Saluja, A. Mahajan, et al. “Table Detection and Extraction using OpenCV and Novel Optimization Methods”, 2021 International Conference on Computational Performance Evaluation (ComPE), December 2021.
[15] S. Uchida, “Image Processing and Recognition for Biological Images”, Development, Growth & Differentiation (DGD), Vol 55, Issue 4, pp. 523-549, May 2013.
[16] “Python”, 2 April 2023, 取自Python 官方使用手冊https://docs.python.org/zh-tw/3/
[17] “OpenCV ThresholdTypes”, 2 April 2023, 取自OpenCV官方使用手冊
https://docs.opencv.org/4.7.0/d7/d1b/group__imgproc__misc.html
[18] “OpenCV MorphTypes”, 2 April 2023, 取自OpenCV官方使用手冊
https://docs.opencv.org/4.7.0/d4/d86/group__imgproc__filter.html
[19] R. Smith, S. Weil, Z. Podobny, et al. “Tesseract-OCR”, 25 March 2023,
取自Tesseract-OCR的Github網站https://github.com/tesseract-ocr/tesseract

[20] “Python Re”, 2 April 2023, 取自Python 官方使用手冊
https://docs.python.org/zh-tw/3.11/library/re.html
[21] “Python Os”, 2 April 2023, 取自Python官方使用手冊
https://docs.python.org/zh-tw/3/library/os.html
[22] “Python Statistics”, 2 April 2023, 取自Python官方使用手冊
https://docs.python.org/zh-tw/3/library/statistics.html
[23] “Python Zip”, 2 April 2023, 取自Python官方使用手冊
https://docs.python.org/zh-tw/3/library/functions.html?highlight#zip
[24] “Python Sorted”, 2 April 2023, 取自Python官方使用手冊
https://docs.python.org/zh-tw/3/library/functions.html?highlight#sorted
[25] “Matplotlib Pyplot”, 2 April 2023, 取自Matplotlib官方網站
https://matplotlib.org/stable/gallery/pyplots/index.html
[26] “Numpy”, 2 April 2023, 取自Numpy官方網站https://numpy.org/
[27] “Pandas DataFrame”, 2 April 2023, 取自Pandas官方網站https://pandas.pydata.org/docs/reference/frame.html
[28] S. Bardhan, “Table_Data_Extraction”, 2 October 2021, 取自S. Bardhan的Github網站 https://github.com/Soumi7/Table_Data_Extraction
[29] 李文丁, 「Tesseract-OCR LSTM模型訓練指南」, 15 June 2021, 取自李文丁的HackMD網站 https://hackmd.io/@garyli-wd/rJ619THsO#Case1%EF%BC%9ACompute-CTC-targets-failed
[30] 李馨，從零開始學Python程式設計，初版，博碩文化，新北市，民國107年。
[31] 繆鵬，CV+深度學習：AI最完整的跨套件Python人工智慧電腦視覺，初版，深智數位，臺北市，民國108年。
[32] 洪錦魁，OpenCV影像創意邁向AI視覺王者歸來，初版，深智數位，台灣，民國111年。

指導教授

陳奇夆(Chi-Feng Chen)

審核日期

2023-6-28

推文