具線上學習之擷取系統和其自動維護機制

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：86

、訪客IP：3.149.26.40

姓名

王紹睿(Shao-Jui Wang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

具線上學習之擷取系統和其自動維護機制
(A Novel On-Line Learning Wrapper System and Its Automatic Maintenance Mechanism)

相關論文

★ 以Q-學習法為基礎之群體智慧演算法及其應用	★ 發展遲緩兒童之復健系統研製
★ 從認知風格角度比較教師評量與同儕互評之差異：從英語寫作到遊戲製作	★ 基於檢驗數值的糖尿病腎病變預測模型
★ 模糊類神經網路為架構之遙測影像分類器設計	★ 複合式群聚演算法
★ 身心障礙者輔具之研製	★ 指紋分類器之研究
★ 背光影像補償及色彩減量之研究	★ 類神經網路於營利事業所得稅選案之應用
★ 一個新的線上學習系統及其於稅務選案上之應用	★ 人眼追蹤系統及其於人機介面之應用
★ 結合群體智慧與自我組織映射圖的資料視覺化研究	★ 追瞳系統之研發於身障者之人機介面應用
★ 以類免疫系統為基礎之線上學習類神經模糊系統及其應用	★ 基因演算法於語音聲紋解攪拌之應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

目前網際網路上遽增的資訊導致使用者越來越常利用擷取程式(wrapper)來擷取網站資料。擷取程式的功能在於擷取網頁的資訊來源，並將其儲存為根據使用者所定義的格式，以方便將處理過後的資料做進一步的應用。本論文提出兩個新方法，第一種是以訊號化為基礎，找出使用者標示範例與網頁的關連性特徵，此方法本論文稱為「以長條圖及標籤名稱分布之關連性係數」，第二種是將一個網頁上的每個標籤(tag)視為一個數值重量，並計算區域重心的位置，最後由這些區域重心位置值可看出此網頁內每個資料間的分佈情形，此方法本論文稱為「區域重心法」。此外，本系統加入一個以適應共振理論演算法(adaptive resonance theory, ART)為基礎可自我學習及修正擷取規則的機制，使舊有擷取規則能不斷適應新網頁的變化。並藉由本體論(ontology)的觀念進一步整合出各網站間所包含的資訊，本論文也提出一個以類神經網路為基礎計算字義相似度的方法。另一方面，因為網際網路資訊變動快速並持續增加，如此可能造成既有的網站包覆程式因此失去效用，所以必須時常對其做維護更新，甚至重新改寫整個網站包覆程式。在本論文中，我們提出一個使用數位濾波器方法為基礎的自動化維護機制，重新產生一個正確的網站包覆程式，並傳送提醒訊息給系統發展者。

摘要(英)

The amount of information available on the World Wide Web has increased dramatically in recent years; however, many information resources are formatted for human browsing rather than for software programs. It is a demanding task to develop a tool to automatically extract information from semi-structured Web information sources to increase the utility of the Web for value-added services. This kind of tools is usually called wrapper. In this paper, we develop two methods based on signals to implement the wrapper. The first one is called” histogram and tag name-based correlation coefficient”. The method can discover correlation features between the template which the user marks and webpage, and implement the extraction system. In our method, templates for records with different tag structures will be incrementally generated by an ART-like algorithm, which follows the basic idea of the ART1 algorithm. Then records in a Web page can then be efficiently detected by using the generated templates via matching. The second method we propose is that we see every tag in a webpage having a weight, and then we can compute the area barycenter for it. Thus, after recording all the area barycenters, we will find the distribution can help us recognize the datas we want. After that, we propose an ontology-based method to integrate the information extracted from separate wrapped web sources by evaluating the similarities of the attributes between them. In this paper, we also propose a neural network-based approach for measuring semantic similarity between words.
Since the WWW is extremely dynamic and continually evolving, which results in frequent changes in the structures of Web documents, wrappers may not work as they did before. In this paper, we propose a filtering approach to implementing an automatic wrapper maintenance mechanism. The basic idea of the proposed method is to use a band-pass filter to automatically locate the contents of interest and then regenerate new templates of records in order to construct a new and correct wrapper.

關鍵字(中)

★ 擷取程式
★ 類神經網路
★ 資料整合
★ 自動維護機制
★ 字義相似度
★ 本體論

關鍵字(英)

★ data Integration
★ artificial neural network
★ wrapper
★ automatic maintenance mechanism
★ semantic similarity
★ ontology

論文目次

摘要......................................................................................................... i
Abstract ...................................................................................................... ii
誌謝....................................................................................................... iv
圖目錄..................................................................................................... viii
表目錄...................................................................................................... xii
一、緒論.....................................................................................................1
1-1 研究動機...................................................................................1
1-2 研究目的...................................................................................2
1-3 論文架構...................................................................................3
二、相關研究介紹....................................................................................4
2-1 網站資料擷取程式的相關研究...............................................4
2-2 資訊整合機制的相關研究.......................................................6
2-3 擷取系統之自動維護機制.......................................................7
三、網頁資料擷取系統..........................................................................10
3-1 系統架構簡介.........................................................................10
3-2 具線上學習之網頁資料擷取系統.........................................11
3-2-1 前置處理：網頁標籤訊號化........................................12
3-2-2 適應共振理論（ART）演算法簡介........................16
3-2-3 擷取各筆資料分布範圍(一)：HNCC 法.....................19
3-2-4 擷取各筆資料分布範圍(二)：區域重心法.................35
3-2-5 擷取各屬性值................................................................42
3-2-6 單一紀錄網頁處理........................................................50
3-3 擷取資訊的整合機制.............................................................51
3-3-2 以知識本體結合為基礎之擷取資訊整合機制...........52
3-3-2-1 倒傳遞網路簡介................................................56
3-3-2-2 以類神經網路計算字義相似度........................59
四、擷取系統之自動維護機制..............................................................64
4-1 整體架構...................................................................................64
4-2 前置處理：網頁標籤訊號化...................................................64
4-3 以頻譜資訊和濾波器的方式實現自動維護機制...................66
4-3-1 擷取各筆資料分布範圍的恢復....................................66
4-3-1-1 顏色分佈資訊....................................................77
4-3-2 擷取各屬性值的恢復....................................................80
五、系統實作與實驗結果......................................................................82
5-1 系統實作.................................................................................82
5-2 實驗結果.................................................................................97
5-2-1 擷取系統之實驗結果....................................................97
5-2-2 字義相似度計算之實驗結果......................................100
5-2-3 自動維護機制之實驗結果..........................................103
六、結論與未來展望............................................................................104
6-1 結論.......................................................................................104
6-2 未來展望...............................................................................105
參考文獻.................................................................................................106

參考文獻

[1] B. Adelberg, “NoDoSE: A Tool for Semi-Automatically Extracting Structured and Semi-Structured Data from Text Documents,” ACM SIGMOD Record, vol. 27, no. 2, pp. 283-294, 1998.
[2] R. Agrawal and R. Srikant, “On Integrating Catalogs,” in Proceedings of the 10th International Conference on World Wide Web, 2001, pp. 603-612.
[3] A. Arasu and H. Garcia-Molina, “Extracting structured data from Web pages,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, San Diego, California, 2003, pp. 337-348.
[4] G. O. Arocena and A. O. Mendelzon, “WebOQL: Restructuring Documents, Databases, and Webs,” in Proceedings of the 14th IEEE International Conference on Data Engineering, Orlando, Florida, 1998, pp. 24-33.
[5] H. Bulskov, R. Knappe, and T. Andreasen, “On Measuring Similarity for Conceptual Querying,” in Proceedings of the 5th International Conference on Flexible Query Answering Systems, vol. 2522, Copenhagen, Denmark, 27-29 October, 2002, pp. 100–111.
[6] M. Califf and R. Mooney, “Relational Learning of Pattern-Match Rules for Information Extraction,” in Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Stanford, California, March, 1998.
[7] G. A. Carpenter and S. Grossberg, “A massively parallel architecture for a self-organizing neural pattern recognition machine,” Computer Vision Graphics Image Process, vol. 37, pp. 54-115, 1987.
[8] G. A. Carpenter and S. Grossberg, “ART 2: Self-organization of stable category recognition codes for analog input patterns,” Appl. Opt. vol. 26, pp. 4919-4930, 1987.
[9] G. A. Carpenter and S. Grossberg, “The ART of adaptive pattern recognition by a self-organization neural network,” Computer, vol. 21, no. 3, pp. 77-88, 1988.
[10] G. A. Carpenter and S. Grossberg, “ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures,” Neural Networks, vol. 3, no. 2, pp. 129-152, 1990.
[11] S. Castano and V. D. Antonellis, “A schema analysis and reconciliation tool environment for heterogeneous databases,” in Proceedings of the 1999 International Symposium on Database Engineering & Applications, 1999, pp. 53-62.
[12] C.-H. Chang and S.-C. Lui, “IEPAD: Information Extraction based on Pattern Discovery,” in Proceedings of the Tenth International Conference on World Wide Web, Hong-Kong, 2001, pp. 223-231.
[13] C.-H. Chang and S.-C. Kuo, “OLERA: A Semi-Supervised Approach for Web Data Extraction with Visual Support,” IEEE Intelligent Systems, vol. 19, no. 6, pp.56-64, 2004.
[14] V. Crescenzi and G. Mecca, “Grammars Have Exceptions,” Information Systems, vol. 23, no. 8, pp. 539-565, 1998.
[15] V. Crescenzi, G. Mecca, and P. Merialdo, “RoadRunner: Towards Automatic Data Extraction from Large Web Sites,” in Proceedings of the 26th International Conference on Very Large Database Systems, Rome, Italy, 2001, pp. 109-118.
[16] C.-H. Chang, M. Kayed, M.R. Girgis, and K.F. Shaalan, “A Survey of Web Information Extraction Systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, pp. 1411-1428, 2006.
[17] B. Chidlovskii, “Automatic Repairing of Web Wrappers by Combining Redundant Views,” in Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence, Meylan, France, Nov. 4-6, 2002, pp. 399-406.
[18] A. Doan, P. Domingos, and A. Y. Halevy, “Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach,” in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 2001, pp. 509-520.
[19] A. Doan, J. Madhavan, R. Dhamankar, P. Domingos, and A.Y. Halevy, “Learning to match ontologies on the Semantic Web,” The International Journal on Very Large Data Bases, vol. 12, no. 4, pp. 303-319, 2003.
[20] D. W. Embley, Y. Jiang, and Y. K. Ng, “Record-boundary discovery in web documents,” in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’99), Philadelphia, PA, 1999, pp. 467-478.
[21] D. W. Embley, Y. K. Ng, and Li. Xu, “Recognizing Ontology -Applicable Multiple-Record Web Documents,” in Proceedings of the 20th International Conference on Conceptual Modeling on Lecture Notes in Computer Science, Vol. 2224, London, UK, 2001, pp.555-570.
[22] D. Freitag, “Information Extraction from HTML: Application of A General Learning Approach,” in Proceedings of the Fifteenth Conference on Artificial Intelligence, 1998.
[23] J. Hammer, J. McHugh, and H. Garcia-Molina, “Semistructured Data: the TSIMMIS Experience,” in Proceedings of the 1st East-European Symposium on Advances in Databases and Information Systems, St. Petersburg, Russia, 1997, pp. 1-8.
[24] F. Hakimpour and A. Geppert, “Resolving Semantic Heterogeneity in Schema Integration: an Ontology Based Approach,” in Proceedings of the International Conference on Formal Ontology in Information Systems, vol. 2001, 2001, pp. 297-308.
[25] B. He, K.C.-C. Chang, and J. Han, ”Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach,” in Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 148-157.
[26] M. A. Hernández, R. J. Miller, and L. M. Haas, “Clio: A Semi-Automatic Tool for Schema Mapping,” in Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 2001, p. 607.
[27] G. Hirst and D. St-Onge, “Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms,” in Proceedings of Fellbaum, 1998, pp. 305–332.
[28] A. Hogue and D. Karger, “Thresher: Automating the Unwrapping of Semantic Content from the World Wide,” in Proceedings of the 14th International Conference on World Wide Web, Japan, 2005, pp. 86-95.
[29] C.-N. Hsu, and M. Dung, “Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web,” Journal of Information Systems, vol. 23, no. 8, pp. 521-538, 1998.
[30] R. Ichise, H. Takeda and S. Honiden, “Integrating Multiple Internet Directories by Instance-based Learning,” in Proceedings of the 18th International Joint Conference on Artificial Intelligence, 2003, pp. 22-28.
[31] J.J. Jiang and D.W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,” in Proceedings of the International Conference on Research in Computational Linguistic, Taiwan, 1998.
[32] Y. Kalfoglou and M. Schorlemmer, “Ontology Mapping: The state of the Art,” The Knowledge Engineering Review, vol. 18, no. 1, pp. 1-31, 2003.
[33] R. Knappe, H. Bulskov, and T. Andreasen, “On Similarity Measures for Content-Based Querying,” in Proceedings of the 10th International Fuzzy Systems Association World Congress, Instsnbul, Turkey, June-July, 2003, pp. 400–403.
[34] R. Kosala, H. Blockeel, M. Bruynooghe and J.V. d. Bussche, “Information extraction from structured documents using k-testable tree automaton inference,” Data & Knowledge Engineering, vol. 58, no. 2, pp. 129-158, 2006.
[35] N. Kushmerick, D. Weld, and R. Doorenbos, “Wrapper Induction for Information Extraction,” in Proceedings of the Fifteenth International Conference on Artificial Intelligence, pp. 729-735, 1997, pp. 729-735.
[36] N. Kushmerick, “Wrapper Verification,” World Wide Web, vol. 3, no. 2, pp. 79-94, 2000.
[37] N. Kushmerick, “Regression Testing for Wrapper Maintenance,” in Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, Orlando, Florida, United States, 1999, pp. 74-79.
[38] A.H.F. Laender, B. Ribeiro-Neto, and A.S.D. Silva, “DEByE -Data Extraction by Example,” Data and Knowledge Engineering, vol. 40, no. 2, pp. 121-154, 2002.
[39] K. Lerman, L. Getoor, S. Minton, and C. A. Knoblock, “Using the Structure of Web Sites for Automatic Segmentation of Tables,” in Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, 2004, pp. 119-130.
[40] K. Lerman, S. Minton, and C. Knoblock, “Wrapper Maintenance: A Machine Learning Approach,” Journal of Artificial Intelligence Research, pp. 149-181, 2003.
[41] Y. Li, Z.A. Bandar, and D. McLean, “An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, pp. 871-882, July-August, 2003.
[42] L. Liu, C. Pu, and W. Han, “XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources,” in Proceedings of the 16th IEEE International Conference on Data Engineering, San Diego, California, 2000, pp. 611-621.
[43] D. Lin, “Principle-Based Parsing Without Overgeneration,” in Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, 1993, pp. 112–120.
[44] B. Liu, R. Grossman, and Y. Zhai, “Mining data records in Web pages,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 601-606, 2003.
[45] B. Liu, and Y. Zhai, “NET - A System for Extracting Web Data from Flat and Nested Data Records,” in Proceedings of the Sixth International Conference on Web Information Systems Engineering, pp. 487-495, 2005.
[46] P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble, “Investigating Semantic Similarity Measures across the Gene Ontology: the Relationship between Sequence and Annotation,” Bioinformatics, vol. 19, no. 10, pp.1275–1283, 2003.
[47] J. Madhavan, P. A. Bernstein and E. Rahm, “Generic Schema Matching with Cupid,” in Proceedings of the 27th International Conference on Very Large Data Bases, 2001, pp. 49-58.
[48] B. Magnini, L. Serafini, and M. Speranza, “Linguistic based Matching of Local Ontologies,” in Proceedings of AAAI-02 Workshop on Meaning Negotiation, 2002.
[49] S. Melnik, H. Garcia-Molona, and E. Rahm, “Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching,” in Proceedings of the International Conference on Data Engineering, 2002, pp. 117-128.
[50] X. Meng, D. Hu and C. Li, “Schema-Guided Wrapper Maintenance for Web-Data Extraction,” ACM Fifth International Workshop on Web Information and Data Management, New Orleans, Louisiana, USA, November 7-8, 2003.
[51] X. Meng, H. Wang, D. Hu and M. Gu, “SG-WRAM: Schema Guided Wrapper Maintenance: A Demonstration,” Proceedings 19th International Conference on Data Engineering, Bangalore, India, March 5-8, 2003.
[52] G.A. Miller, W.G. Charles, “Contextual Correlates of Semantic Similarity,” Language and Cognitive Processes, pp.1-28, 1991.
[53] I. Muslea, S. Minton, and C. Knoblock, “A hierarchical approach to wrapper induction,” Proceedings of the Third International Conference on Autonomous Agents, 1999.
[54] N.F. Noy, “Semantic Integration: A Survey of Ontology-based Approaches,” ACM SIGMOD Record, vol. 33, no. 4, December, 2004.
[55] N. Papadakis, D. N. Skoutas, K. Raftopoulos, and T. A. Varvarigou, “STAVIES: A System for Information Extraction from Unknown Web Data Sources through Automatic Web Wrapper Generation Using Clustering Techniques,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 12, December, 2005, pp. 24-30.
[56] N. Papadakis, D. N. Skoutas, K. Raftopoulos, and T. A. Varvarigou, “An Automatic Web Wrapper for Extracting Information from Web Sources, Using Clustering Techniques,” IEEE/IPSJ International Symposium on Applications and the Internet, Trento, Italy, January, 2005, pp. 24-30.
[57] D. Pinto, A. McCallum, X. Wei, and B. C. Croft, “Table Extraction Using Conditional Random Fields,” Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 235-242, 2003.
[58] R. Rada, H. Mili, E. Bicknell, and M. Blettner, “Development and Application of a Metric on Semantic Nets,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 19, no. 1, pp. 17-30, January-February, 1989.
[59] E. Rahm and P. A. Bernstein, “A Survey of Approaches to Automatically Schema Matching,” The International Journal on Very Large Data Bases, vol. 10, no. 4, pp. 334-350, 2001.
[60] J. Raposo, A. Pan, M. Álvarez, and J. Hidalgo, “Automatically Maintaining Wrappers for Web Sources,” in Proceedings of the 9th International Database Engineering & Application Symposium, 2005, pp. 105-114.
[61] J. Raposo, A. Pan, M. Álvarez, and J. Hidalgo, “Automatically Generating Labeled Examples for Web Wrapper Maintenance,” in Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, 2005, pp. 250-256.
[62] R. Richardson, A. Smeaton, and J. Murphy, “Using WordNet as a Knowledge Base for Measuring Semantic Similarity Between Words,” Working Paper CA-1294, School of Computer Applications, Dublin City University, Dublin, Ireland, 1994.
[63] O. Resnik, “Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity and Natural Language,” Journal of Artificial Intelligence Research, vol. 11, pp. 95–130, 1999.
[64] B. Ribeiro-Neto, A.H.F. Laender, and A.S.D. Silva, “Extracting semi-structured data through examples,” in Proceedings of the Eighth ACM International Conference on Information and Knowledge Management, Kansas City, Missouri, 1999, pp. 94-101.
[65] M.A. Rodriguez and M.J. Egenhofer, “Determining Semantic Similarity Among Entity Classes from Different Ontologies,” IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 2, pp. 442-456, March-April 2003.
[66] H. Rubenstein, J.B. Goodenough, “Contextual Correlates of Synonymy,” Communications of the ACM 8, pp.627-633, 1965.
[67] A. Saiiuguet and F. Azavant, “Building intelligent Web applications using lightweight wrappers,” Data and Knowledge Engineering, vol. 36, no. 3, pp. 283-316, 2001.
[68] S. Sarawagi, S. Chakrabarti, and S. Godbole, “Cross-Training: Learning Probabilistic Mappings between Topics,” in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 177-186, 2003.
[69] S. Soderland, “Learning Information Extraction Rules for Semi-Structured and Free Text,” Journal of Machine Learning, vol. 34, no. 1-3, pp. 233-272, 1999.
[70] M. C. Su, C.-K. Huang, J. Lee, and S.-P. Ma, “Webpage Information Extractor with On-Line Learning,” in Proceedings of the First Taiwan Conference on Software Engineering, Taipei, Taiwan, June 3-4, 2005, pp. 202-206.
[71] M. C. Su, J. Lee, and S. J. Wang, “Method for Wrapper Maintenance,” in Proceedings of the Second Taiwan Conference on Software Engineering, Taipei, Taiwan, June 9-10, 2006, pp. 293-298.
[72] A. Tversky, “Features of Similarity,” Psychological Review, vol. 84, no. 4, pp.327–352, 1977.
[73] H. Wache, T. V¨ogele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann and S. H¨ubner, “Ontology-Based Integration of Information—A Survey of Existing Approaches,” in Proceedings of IJCAI-01 Workshop: Ontologies and Information Sharing, 2001.
[74] J. Wang and F. H. Lochovsky, “Wrapper Induction based on Nested Pattern Discovery,” Technical Report HKUST-CS-27-02, Department of Computer Science, Hong Kong, University of Science & Technology, 2002.
[75] J. Wang, and F. H. Lochovsky, “Data Extraction and Label Assignment for Web Databases,” in Proceedings of the Twelfth International Conference on World Wide Web, Budapest, Hungary, 2003, pp. 187-196.
[76] W. Wu, C. Yu, A. Doan, and W. Meng, “An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web,” in Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004, pp. 95-106.
[77] Z. Wu and M. Palmer, “Verb Semantics and Lexical Selection,” in Proceedings of the 32nd Annual Meeting of the Associations for Computational Linguistics, Las Cruces, New Mexico, 1994, pp. 133-138.
[78] L. Yi, B. Liu, and X. Li, “Eliminating Noisy Information in Web Pages for Data Mining,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Washington, D.C., USA, August 24 - 27, 2003.
[79] Y. Zhai and B. Liu, “Web Data Extraction Based on Partial Tree Alignment,” in Proceedings of the 14th International Conference on World Wide Web, Japan, 2005, pp. 76-85.
[80] D. Zhang and W. S. Lee, “Web Taxonomy Integration through Co-Bootstrapping,” in Proceedings of the 27th annual International Conference on Research and Development in Information Retrieval, 2004, pp. 410-417.
[81] H. Zhao, W. Meng, Z. Wu, V. Raghavan, and C. Yu, “Fully Automatic Wrapper Generation For Search Engines,” in Proceedings of the 14th International Conference on World Wide Web, Japan, 2005, pp. 66-75.
[82] A Repository of Online Information Sources Used in Information Extraction Tasks, http://www.isi.edu/info-agents/RISE/index.html
[83] IEEE, Available: http://ieeexplore.ieee.org/
[84] Yahoo, Available: http://www.yahoo.com.tw
[85] Google, Available: http://www.google.com.tw
[86] Springerlink, Available: http://www.springerlink.com/
[87] 呂紹誠，「網際網路半結構性資料擷取系統之設計與實作」，國立中央大學資訊工程學系，碩士論文，民國90年。
[88] 郭釋謙，「線上擷取規則分析」，國立中央大學資訊工程學系，碩士論文，民國92年。
[89] 黃陳科，「具學習功能之新型擷取程式」，國立中央大學資訊工程學系，碩士論文，民國94年。
[90] 黃執強，「同性質網頁資料整合之自動化研究」，國立中央大學資訊工程學系，碩士論文，民國94年。
[91] 張斐章，張麗秋，黃浩倫，類神經網路理論與實務，東華書局，2003。
[92] 蘇威霖，「類神經網路應用於多資料庫資料表與欄位對應之研究」，朝陽科技大學資訊管理系，碩士論文，民國91年。

指導教授

蘇木春(Mu-Chun Su)

審核日期

2007-7-11

推文