以內容為基準的複雜字型文件影像分類

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：118

、訪客IP：3.145.60.4

姓名

黃健興(Chien-Hsiang Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

以內容為基準的複雜字型文件影像分類
(Content-based Text Image Classification with Complex Font Types)

相關論文

★ 使用視位與語音生物特徵作即時線上身分辨識	★ 以影像為基礎之SMD包裝料帶對位系統
★ 手持式行動裝置內容偽變造偵測暨刪除內容資料復原的研究	★ 基於SIFT演算法進行車牌認證
★ 基於動態線性決策函數之區域圖樣特徵於人臉辨識應用	★ 基於GPU的SAR資料庫模擬器：SAR回波訊號與影像資料庫平行化架構 (PASSED)
★ 利用掌紋作個人身份之確認	★ 利用色彩統計與鏡頭運鏡方式作視訊索引
★ 利用欄位群聚特徵和四個方向相鄰樹作表格文件分類	★ 筆劃特徵用於離線中文字的辨認
★ 利用可調式區塊比對並結合多圖像資訊之影像運動向量估測	★ 彩色影像分析及其應用於色彩量化影像搜尋及人臉偵測
★ 中英文名片商標的擷取及辨識	★ 利用虛筆資訊特徵作中文簽名確認
★ 基於三角幾何學及顏色特徵作人臉偵測、人臉角度分類與人臉辨識	★ 一個以膚色為基礎之互補人臉偵測策略

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

文字文件的分類是將包含在文件中的資訊抽取出來，分析出文件所要表達的抽象語意以及作者所要傳達的訊息並依需求分類管理。其中，文件分析（document analysis）的技術提供了將前景與背景分離的二值化（binarization）技術、將區塊物件分解出來的切割（segmentation）技術、經由排版分析（layout analysis）所獲得的幾何結構（geometrical structure）轉換成閱讀順序的邏輯結構（logical structure）的讀序分析（reading order analysis）技術、辨識文字影像資訊的字體辨識偵測（font style detection）與字型分類（font type classification）技術以及評量文件內容相似程度的隱含文法分析（latent semantic analysis）技術。
在文件分析程序中，全域的二值化臨界值（global threshold）先被選定，並進行區塊切割的處理，接著針對個別的區塊，區域性的二值化臨界值（local threshold）依個別區塊不同而決定，同時將字元區塊個別的切割出來。最後，各區塊間的邏輯關係由針對人類閱讀習慣所設計的讀序分析所定義。
對於所切割出來的文字區塊，利用虛擬筆劃（virtual stroke extraction）抽取出字元影像的外觀輪廓，以根據斜體字轉換原則對筆劃結構所造成的影響，歸納出無需字元辨識的斜體字辨識法則；以字元寬度及筆劃結筆的截線存在與否的分類法將字元影像的字型分成三大類及以字詞整體筆劃寬度的差異將粗體字辨識出來。
最後，根據整篇文字間字型與字體變化的情形，並參照各字詞間相對位置的變化所建立的語意樹，找出可能可以表達該文件內容的特徵字詞。以各文件在以各字詞間相對位置的特徵向量所構成的語意空間中的相對位置，評量文件內容的相似程度並將文件加以分類。

摘要(英)

The task of textual document image classification is to classify and manage textual document images by extracting the information in textual document images in order to analyze the abstract meaning embedded in the documents and the message that the authors want to express. Several techniques of document analysis have been proposed to perform the procedure of information extraction. Among them, the binarizarion technique will separate the foreground from the background, the segmentation technique will extract each object from the foreground, the geometrical structure formed by the layout analysis will be transformed into logical structure by employing the reading order analysis technique, the font style will be detected by utilizing the font style detection technique and the font type will be classified by the font type classification technique, and the similarity between different textual documents is estimated by the latent semantic analysis technique.
During the document analysis process, a global binary threshold is selected to perform the block segmentation task. Then, the local binary thresholds are decided for each paragraph block independently to more precisely segment character blocks. Finally, the logical relation between each pair of paragraph blocks is defined by the reading order analysis according to the reading habit of human beings.
The contour of each character image will be extracted and formed by employing the proposed virtual stroke extraction technique and the italic style character can be detected by the structural rule that is derived from the effect of shear transformation without the process of optical character recognition. The font type will be classified into three categories by the feature of width of character image and the existence of serif in the end of strokes. The boldface can be detected by checking the average width of strokes in each word.
The feature words to represent the content of document are selected according to the information of font style and type and the semantic tree that is created by the relative position of each pair of words. The similarity between two textual documents is calculated by the included angle of the feature vector constructed from the relative position of feature words in the textual document. Finally, document classification is performed based on the extracted content.

關鍵字(中)

★ 字體偵測
★ 字型辨識
★ 文件認知
★ 文件分析

關鍵字(英)

★ font type recognition
★ document understanding
★ document analysis

論文目次

Table of Contents
Chapter 1 Introduction…………………………………………………………………1
1.1 Motivation……………………………………………………………………1
1.2 Relative Works………………………………………………………………2
1.2.1 Document Image Processing…………………………………………………2
1.2.2 Font Style Detection and Font Type Recognition……………………10
1.2.3 Content Classification……………………………………………………13
1.3 Organization of Dissertation……………………………………………15
Chapter 2 Document Image Processing………………………………………………16
2.1 Binarization…………………………………………………………………16
2.2 Segmentation…………………………………………………………………19
2.2.1 Slant Character Segmentation…………………………………………20
2.2.2 Touching Character Segmentation………………………………………21
2.3 Reading Order Analysis……………………………………………………22
2.3.1 Geometric Relation Graph Construction…………………………..23
2.3.2 Block Growing Algorithm…………………………………………23
Chapter 3 Font Style Detection…………………………………………….…..25
3.1 Virtual Stroke Extraction……………………………………………….…28
3.2 Boldface Detection………………………………………………………..30
3.3 Shear Transformation……………………………………………………..32
3.4 Stroke Categorization and Character Classification………………………33
3.4.1 Virtual Stroke Categorization……………………………………….34
3.4.2 Character Classification…………………………………………….37
3.5 Italic Detection…………………………………………………………….40
3.5.1 Italic Detection of Class 1 Characters………………………………41
3.5.2 Italic Detection of Class 2 Characters………………………………43
3.5.3 Italic Detection of Class 3 Characters………………………………44
3.6 Italic Rectification…………………………………………………………47
3.6.1 Italic Rectification of Class 1 Characters…………………………..49
3.6.2 Italic Rectification of Class 2 Characters…………………………..49
3.6.3 Italic Rectification of Class 3 Characters…………………………..50
Chapter 4 Font Type Classification and Identification……………………….53
4.1 Typographical Line Extraction……………………………………………55
4.2 Font Type Classification…………………………………………………..56
4.3 Font Type Identification…………………………………………………..58
Chapter 5 Content Classification………………………………………………60
5.1 The Correlation between Words…………………………………………..62
5.2 Feature Word Selection and Semantic Tree Generation…………………..67
5.3 Latent Semantic Space…………………………………………………….72
5.4 Document Classification…………………………………………………..74
Chapter 6 Experimental Results……………………………………………….78
6.1 Italic Detection…………………………………………………………….78
6.2 Font Type Classification and Identification……………………………….80
6.3 Content Classification……………………………………………………..81
Chapter 7 Conclusions and Future Works…………………………………….83
7.1 Conclusions………………………………………………………………..83
7.2 Future Works………………………………………………………………84
References…………………………………………………………………………..87

參考文獻

[1] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms”, IEEE Transactions on Systems, Man, and Cybernetics, Volume SMC-9, No. 1, pp. 62-66, 1979.
[2] Y. Liu and S.N. Srihari, “Document Image Binarization Based on Texture Features”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 19, No. 5, pp. 540-544, 1997.
[3] J.N. Kapur, P.K. Sahoo and A.K.C. Wong, “A New Method for Gray-Level Picture Thresholding Using the Entropy of the Histogram”, Computer Vision, Graphics, and Image Processing, Volume 29, pp. 273-285, 1985.
[4] Y.H. Chiou and H.J. Lee, “Recognition of Chinese Business Cards”, Proceeding of 9th IPPR Conference on Computer Vision, Graphics, and Image Processing, Tai-Chung, Taiwan, R.O.C., pp. 438-447, 1996.
[5] T. Kurita, N. Otsu and N. Abdelmalek, “Maximum Likelihood Thresholding Based on Population Mixture Models”, Pattern Recognition, Volume 25, No. 10, pp. 1231-1240,1992.
[6] J. Kittler and J. Illingworth, “Minimum Error Thresholding”, Pattern Recognition, Volume 19, pp.41-47, 1986.
[7] J. Bernsen, “Dynamic Thresholding of Gray-Level Images”, Proceeding of 8th International Conference. Pattern Recognition, pp.1251-1255, Paris, 1986.
[8] W. Niblack, An Introduction to Digital Image Processing, pp. 115-116, Englewood Cliffs, N.J.:Prentice Hall, 1986.
[9] N.B. Venkateswarlu and R.D. Boyle, “New Segmentation Techniques for Document Image Analysis”, Image and Vision Computing, Volume 13, No. 7, pp. 573-583, 1995.
[10] A.K. Jain and Y. Zhong, “Page Segmentation Using Texture Analysis”, Pattern Recognition, Volume 29, No. 5, pp. 743-770, 1996.
[11] E.G. Johnston, “Printed Text Discrimination”, Computer Graphics Image Processing, Volume 3, pp. 88-89, 1974.
[12] K.Y. Wong, R.G. Cassey, and F.M. Wahl, “Document Analysis System”, IBM Journal Research Development, Volume 26, No. 6,pp. 647-656, 1982.
[13] L. Abele, F. Wahl and W. Scherl, “Procedures for an Automatic Segmentation of Text, Graph and Halftone Regions in Document”, Proceedings of the 2nd Sandinavian Conference Image Analysis, pp. 177-182, 1981.
[14] F.M. Wahl, K.Y. Wong and R.G. Casey, ”Block Segmentation and Text Extraction in Mixed Text/image Documents”, Computer Graphics and Image Proceeding, Volume 20, pp. 375-390, 1982.
[15] M. Nadler, “A Survey of Document Segmentation and Coding Techniques”, the Computer Vision, Graphics, and Image Processing, Volume 28, pp. 240-262, 1984.
[16] D. Wang and S.N. Srihari, “Classification of Newspaper Image Blocks Using Texture Analysis, “Computer Vision, Graphic and Image Process”, Volume 47, pp. 327-352, 1989.
[17] Iwaki, H. Kida and H. Arakawa, “A Segmentation Method Based on Office Document Hierarchical Structure”, Proceeding of the IEEE International Conference System Man Cybernation, pp. 759-763, 1987.
[18] H.S. Baird, S.E. Jones and S.J. Fortune, “Image Segmentation by Sharp-directed Covers”, Proceeding of the 10th International Conference on Pattern Recognition, Volume 2, pp. 820-825, 1990.
[19] G. Nagy and S. Seth, “Hierarchical Representation of Optical Scanned Documents”, Proceeding of the 7th International Conference on Pattern Recognition, pp. 347-349, 1986.
[20] G. Nagy and S. Seth, “Document Analysis with an Expert System”, Pattern Recognition, pp. 149-155, 1986.
[21] J. Higashino, H. Fujisawa, Y. Nakano and M. Ejiri, “A Knowledge-based Segmentation Method for Document understanding”, Proceeding of the 8th International Conference on Pattern Recognition, pp. 745-748, 1986.
[22] A.K. Jain. and S. Bhattacharjee, “Text Segmentation Using Gabor Filters for Automatic Document Processing”, Machine Vision Applications, Volume 5, pp. 169-184, 1992.
[23] R.C. Gonzales and P. Wintz, “Digital Image Processing (2nd edn) Addison-Wesley, Reading, Mass, 1987.
[24] L.L. Fletcher and R.K. Kasturi, “A Robust Algorithm for Text String Separating from Mixed Text/Graphics Images”, IEEE Transaction on Pattern Recognition Analysis Machine Intelligence, Volume 10, pp. 910-919, 1988.
[25] J.P. Bixler, “Tracking Text in Mixed-mode Documents”, Proceedings of the Conference on Document Processing Systems, pp. 177-185, 1988
[26] L. O’Gorman and R. Kasturi (eds), Special issue of IEEE Computer on “Document Image Analysis Systems”, Volume 25, No. 7, pp. 5-8, 1992.
[27] L. O’Gorman, “The document Spectrum for Page Layout Analysis”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Volume 15, No. 11, 1993.
[28] E. Mandler and M. Oberlender, “One-pass Encoding of Connected Components in Multi-valued Images”, Proceeding of the 10th International Conference on Pattern Recognition, Atlanctic City, pp. 64-69, 1990.
[29] F. Hones and J. Lichter, “Layout Extraction of Mixed Mode Documents”, Machine Vision and Applications, Volume 7, pp. 237-246, 1994.
[30] J. Wieser and A.Pinz, ”Layout and Analysis: Finding Text, Title and Photos in Digital Images of Newspaper Pages”, Proceedings of the 2nd International Conference on Document Analysis and Recognition, pp. 774-775, 1993.
[31] M. OKAMOTO and M. TAKAHASHI, “A Hybrid Page Segmentation Method”, Proceedings of the 2nd International Conference on Document Analysis and Recognition, pp. 743-748, 1993.
[32] Dengel, “Initial Learning of Document Structure”, Proceeding of 2nd ICDAR, Tsukuba, pp. 86-90, 1993.
[33] Yamashita, T. Amasno, H. Takahashi and K. Toyokawa,”A Model Based Layout Understanding Method for the Document Recognition System”, 1st ICDAR, Saint-Malo, pp. 130-138, 1991.
[34] F. King, C. C. Han, and K. C. Fan, “A Graph Growing Approach to the Finding the Reading Order of Chinese Newspapers”, Proceeding of ACCV’95 Second Asian Conference on Computer Vision, pp. 627-631, 1995.
[35] D. Peden, “Frame-based System for Macro-typographical Structure Analysis in Science papers”, Proceeding of 1st ICDAR, Saint-Malo, pp. 311-319, 1991
[36] G. Seneraro, F. Esposito, and D. Malerba, “Learning Contextual Rules for Document Understanding”, Proceeding of 7th International Conference on Artificial Intelligence for Applications, pp. 108-115, 1994.
[37] H. Fujisawa and Y. Nakano, ”A Top-down Approach for the Analysis of Documents”, Proceeding of the IAPR Workshop on Syntacic and Structural Pattern Recognition, Murray Hill, New Jersey, pp. 113-122, 1990.
[38] J. Kreich, A. Luhn, and G. Maderlechner, “An Experimental Environment for Model Based Document Analysis”, 1st ICDAR, Saint-Malo, pp. 50-58, 1991.
[39] K. K. Lau and C. H. Leung, “Layout Analysis and Segmentation of Chinese Newspaper article”, Computer Processing of Chinese and Oriental Languages, Volume 8, No. 1, pp. 97-114, 1994.
[40] L. F. Lee and W. H. Tsai, “Understanding of Arrangements and Extraction of Articles in Chinese Newspaper Images”, Proceeding of 8th IPPR Conference on Computer Vision, Graphics and Image Processing, pp. 479-487, 1995.
[41] S. Tsujimoto and H. Asada, “Understanding Multi-articled documents”, Proceeding of IEEE, pp. 551-556, 1990.
[42] S. Tsujimoto and H. Asada, “Major Component of a Complete Text Reading System”, Proceeding of IEEE, pp. 1133-1149, 1992.
[43] T. Saitoh, M. Tachikawa, and T. Yamaai, “Document Image Segmentation and Text Area Ordering”, Proceeding of International Conference on Document Analysis and Recognition, pp. 323-329, 1993.
[44] T. Watanabe, Q. Luo, and N. Sugie, ”Structure Recognition methods for Various Types of Documents”, Proceeding of Machine Vision and Applications, pp. 163-176, 1993.
[45] Y. Y. Tang, C.Y. Suen, C. D. Yan, and M. Cheriet, “Document Analysis and Understanding: a Brief Survey”, Proceeding of 1st ICDAR, Saint-Malo, pp. 17-31, 1991.
[46] S. Khoubyari and J. J. Hull. "Font and Function Word Identification in Document Recognition", Computer Vision and Image Understanding, Volume 63, No. 1, pp. 66-74, January, 1996.
[47] R. Cooperman, “Producing Good Font Attributes Determination Using Error-Prone Information”, International Society for Optical English Jornal, Volume 3027, pp. 50-57, 1997.
[48] Y. Zhu, T.N. Tan, Y.H. Wang, “Font Recognition Based on Global Texture Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 23, no. 10, 2001.
[49] H. Shi, T. Pavlidis, “A System for Text Recognition Based on Graphic Emnedding Matching”, Proceedings of International Association for Pattern Recognition Workshop on Document Analysis System (DAS), pp. 413-427, 1996.
[50] H. Shi, T. Pavlidis, “Font Recognition and Contextual Processing for more accurate text recognition”, Document Analysis and Recognition, Proceedings of the 14th International Conference on Volume 1. pp. 39-41, 18-20, Aug, 1997.
[51] B.B. Chaudhuri, U.Garain, “Automatic Detection of Italic, Bold and All-Capital Words in Document Images”, Pattern Recognition, Proceedings, 14th Conference on Vol. 1, pp. 610-612, 16-20 Aug, 1998.
[52] B.B. Chaudhuri, U.Garain, “Extraction of Type Style-based Meta-information from Imaged Documents”, Int. J. on Document Analysis and Recognition, 2001.
[53] Sun, D. Si, “Skew and Slant Correction for Document Image Using Gradient Direction”, Proceedings of the 14th International Conference on Volume 1, pp. 142-146, 18-20, Aug, 1997.
[54] Zramdini, R. Ingold, “Optical Font Recognition Using Typographical Features”, Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume 20, Issue 8, pp. 877-882, Aug, 1998.
[55] L. Zhang; Y. Lu; C. L. Tan; “Italic font recognition using stroke pattern analysis on wavelet decomposed word images”, Pattern Recognition, 2004. Proceedings of the 17th International Conference on Volume 4, pp. 835-838, Aug, 23-26, 2004.
[56] Y. Li, S. Naoi, M. Cheriet, C.Y. Suen “A segmentation method for touching italic characters”, Pattern Recognition, Proceedings of the 17th International Conference on Volume 2, pp. 594-597, Aug, 23-26, 2004.
[57] K.C. Fan, M.G. Wen, and D.F. Chen, "Skeletonization of binary images with nonuniform width using block decomposition and contour vector matching method," Pattern Recognition, Volume 31, No. 7, pp. 823-838, July, 1998.
[58] V. Lertnattee and T. Theeramunkong, “Multidimensional text classification for drug information”, IEEE Transactions on Information Technology in Biomedicine, Vol. 8, Issue 3, pp. 306-312, Sept., 2004.
[59] P. Herrmann and G. Schlageter, “Retrieval of document images using layout knowledge”, Proceedings of the Second International Conference on Document Analysis and Recognition, pp. 537-540, 20-22, Oct., 1993.
[60] T. Hofmann, “Probabilistic Latent Semantic Analysis”, Uncertainity in Artificial Intelligience, UAI’99, Stockholm
[61] R. Ozcan and Y.A. Aslandogan, “Concept-Based Information Access”, International Conference on Information Technology: Coding and Computing, Vol. 1, pp.794-799, 04-06, April, 2005
[62] T. Kakkonen and E. Sutinen, “Automatic assessment of the content of essays based on course materials”, International Conference on Information Technology: Research and Education, pp.126-130, 28 June-1 July, 2004.
[63] K.C. Fan, C.H. Liu and Y.K. Wang, "Segmentation and classification of mixed text/graphics/image documents," Pattern Recognition Letters, Vol. 15, pp. 1201-1209, December 1994.
[64] C.F. King, K.C. Fan and C.C. Han, "Reading order generation of Chinese newspaper articles using block growing method," Image and Vision Computing, Vol. 16, No. 8, pp. 571-584, June 1998.
[65] K. Lagus and S. Kaski, “Keyword selection method for characterizing text document maps”, International Conference on Artificial Neural Networks, Vol. 1, pp. 371-376, 7-10, Sept. 1999
[66] P.N. Garner and A. Hemsworth, ”A keyword selection strategy for dialogue move recognition and multi-class topic identification”, IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pp. 1823-1826, 21-24, April, 1997.
[67] R. Rubinstein, Digital Typography: An Introduction to Type and Composition for Computer System Design. Addison-Wesley, 1988
[68] B. Bauermeister, Manual of Comparative Typography: The PANOSE System, New York: VNR Company, 1991.
[69] M. Kamel and Z. Zhao, “Extraction of Binarization Character/Graphics Images from Grayscale Document Images”, CVGIP: Graphical Models Image Process, Vol. 55, No. 3, pp. 203-207, 1993.
[70] Y. Yan and H. Yan, “An Adaptive Logical Method for Binarization of Degraded Document Images”, Pattern Recognition, Vol. 33, pp. 787-807, 2000.
[71] J.M. White and G.D. Rohrer, “Image Segmentation for Optical Character Recognition and Other Application requiring Character Image Extraction”, IBM J. Res. Dev. Vol. 27, No. 4, pp.400-411, 1983.
[72] O.D. Trier and T. Taxt, “Improvement of ‘Integrated Function Algorithm’ for Binarization of Document Image”, Pattern Recognition Letters, Vol.16, pp. 277-286, 1995.
[73] G.E. Kopec, “Least-Square Font Metric Estimation From Images”, IEEE Transactions on Image Processing, Volume 2, No. 4, pp. 510-519, Oct, 1993.
[74] B. Cooperman, “Producing Good Font Attribute Determination Using Error-Prone Information”, SPIE, Volume 3,027, pp. 50-57, 1997.
[75] H. Shi and T. Pavlidis, “Font Recognition and Contextual Processing for More Accurate Text Recognition”, ICDAR’97; Fourth International Conference. Document Analysis and Recognition, pp. 39-44, Ulm, Germany, Aug, 1997.
[76] A. Zramdini and R. Ingold, “Optical font recognition using typographical features”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 20, No. 8, pp. 877-882, Aug. 1998.

指導教授

范國清(Kuo-Chin Fan)

審核日期

2005-7-25

推文