同性質網頁資料整合之自動化研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：9

、訪客IP：18.219.189.247

姓名

黃執強(Chih-Chiang Huang) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

同性質網頁資料整合之自動化研究
(On-the-fly Data Integration of Homogeneous Web Data)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 時序性資料庫中未知週期之非同步週期性樣板的探勘

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

現在由於網際網路的發達以及電子商務的盛行，使用者常常上網訂購需要的服務或物品，為了要得到最划算的服務與物品，使用者常常需要在多個網站間，做相同性質資料的比較，而目前使用者在網路上查詢資料時，所得到的查詢後資料是動態的而且是相當豐富的，使用者必須對於感興趣的資料一個一個的作分析比較，通常要完成這樣的一件事情，就必須花費使用者非常多的心力。所以必須要有一個機制，能夠將這些“深網”中屬於相同領域網站的相同性質的資料作整合，提供使用者更便利的服務。我們從這些回傳的資料中發現，這些網站中其資料屬性名稱的標示是不充足的，而這些資料卻擁有著高度相關的資訊，本篇研究論文及是利用這些高度相關的資訊，發展一套自動化作資料整合的方法，也就是在作屬性之間的對應時，不需要經過屬性名稱的標示，即可以完成資料的分析整合。又，目前在同領域同性質的網站上，因為各網站的作者不一樣，使得用來描述每一筆紀錄所使用的資料屬性也不一樣，在某一網站上使用n個屬性作描述的資訊，在另一個網站時卻是使用m個屬性來描述，這樣造成網站之間屬性的關係是群與群之間的關係，是多對多的關係，所以我們在作資料屬性的對應時，必須達到多對多的資料屬性對應，而不只是單純的一對一的對應。也就是說我們利用不同網站中查詢到相同資料以及該資料所具有的特性，發展出一套自動化的、多對多對應的資料分析整合系統，並且對於多個領域作整合的測試，其結果顯示出我們的方法可以達到相當不錯的效能。

關鍵字(中)

★ 資料整合
★ 深網

關鍵字(英)

★ Data Integration
★ Deep Web

論文目次

目錄 I
圖目錄 II
表目錄 III
一、緒論 1
1.1 研究背景 1
1.2 研究動機 3
1.3 研究成果 7
二、相關研究 8
2.1 輪廓對應 (SCHEMA MATCHING) 8
2.1.1 LSD 11
2.1.2 Similarity Flooding 12
2.1.3 DCM 13
2.2 分類樹對應 (CATEGORY-TREE MAPPING) 14
三、系統架構 16
3.1 資料分析工具 (DATA ANALYZER) 17
3.1.1 資料型別的鑑定 18
3.1.2 找出相同的紀錄序列 20
3.2 屬性對應器 (ATTRIBUTE MATCHER) 21
3.2.1 屬性對應器的實際例子 24
3.3 對應群抉擇器 (CANDIDATE SELECTOR) 27
3.3.1 對應群抉擇器的實際例子 29
四、實驗 32
4.1 實驗評估方式 34
4.2 實驗設定與結果 35
4.2.1 各領域的對應效能 (實驗一) 36
4.2.2 資料量大小的影響 (實驗二) 41
4.2.3 屬性個數參數的影響 (實驗三) 43
五、結論 44
參考文獻 45

參考文獻

1. A. Arasu and H. Garcia-Molina. Extracting Structured Data from Web Pages. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 337-348, 2003
2. R. Agrawal and R. Srikant. On integrating catalogs. In Proceedings of the 10th International Conference on World Wide Web, pp. 603-612, 2001
3. M. K. Bergman. The Deep Web: Surfacing Hidden Value. http://www.brightplanet.com/technology/deepweb.asp, July 2001
4. S. Castano and V. D. Antonellis. A schema analysis and reconciliation tool environment for heterogeneous databases. In Proceedings of the 1999 International Symposium on Database Engineering & Applications, pp. 53-62, 1999
5. C. E. H. Chua, R. H. L. Chiang, and E.-P. Lim. Instance-based attribute identification in database integration. The International Journal on Very Large Data Bases, Volume 12, Issue 3, pp. 228-243, 2003
6. S. Chakrabarti, B. E. Dom, D. Gibson, J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Mining the Link Structure of the World Wide Web. IEEE Computer, Volume 32, Number 8, pp. 60-67, 1999
7. K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang. Structured databases on the web: Observations and implications. ACM SIGMOD Record, Volume 33, Issue 3, pp. 61-70, 2004
8. K. C.-C. Chang, B. He, C. Li, and Z. Zhang. Structured databases on the web: Observations and implications. Technical Report UIUCDCS-R-2003-2321, Department of Computer Science, UIUC, 2003
9. C.-H. Chang and S.-C. Kuo. OLERA: OnLine Extraction Rule Analysis for Semi-structured Documents. IEEE Intelligent Systems, Volume 19, Number 6, pp. 56-64, 2004
10. C.-H. Chang and S.-C. Lui. IEPAD: information extraction based on pattern discovery. In Proceedings of the 10th International Conference on World Wide Web, pp. 681-688, 2001
11. V. Crescenzi, G. Mecca, and P. Merialdo. ROADRUNNER: Towards Automatic Data Extraction from Large Web Sites. In Proceedings of 27th International Conference on Very Large Data Bases, pp. 109-118, 2001
12. A. Doan, P. Domingos, and A. Y. Halevy. Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 509-520, 2001
13. A. Doan, J. Madhavan, R. Dhamankar, P. Domingos, A. Y. Halevy. Learning to match ontologies on the Semantic Web. The International Journal on Very Large Data Bases, Volume 12, Issue 4, pp. 303-319, 2003
14. B. He, K. C.-C. Chang, and J. Han. Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach. In Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 148-157, 2004
15. C.-N. Hsu and M.-T. Dung. Generating finite-state transducers for semi-structured data. Information Systems, Volume 23, Issue 9, pp. 521-538, 1998
16. F. Hakimpour and A. Geppert. Resolving Semantic Heterogeneity in Schema Integration: an Ontology Based Approach. In Proceedings of the International Conference on Formal Ontology in Information Systems - Volume 2001, pp. 297-308, 2001
17. M. A. Hernández, R. J. Miller, and L. M. Haas. Clio: a semi-automatic tool for schema mapping. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 607, 2001
18. R. Ichise, H. Takeda and S. Honiden. Integrating Multiple Internet Directories by Instance-based Learning. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 22-28, 2003
19. N. Kushmerick, D. S. Weld, and R. Doorenbos. Wrapper Induction for information extraction. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, pp. 729-737, 1997
20. J. Madhavan, P. A. Bernstein and E. Rahm. Generic Schema Matching with Cupid. In Proceedings of the 27th International Conference on Very Large Data Bases, pp. 49-58, 2001
21. I. Muslea, S. Minton, and C. Knoblock. STALKER: learning extraction rules for semi-structured, Web-based information sources. In Proceedings of AAAI-98 Workshop on AI and Information Integration, pp. 74-81, 1998
22. S. Melnik, H. Garcia-Molona, and E. Rahm. Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching. In Proceedings of the International Conference on Data Engineering, pp. 117-128, 2002
23. B. Magnini, L. Sera_ni, and M. Speranza. Linguistic based matching of local ontologies. In Proceedings of AAAI-02 workshop on Meaning Negotiation, 2002
24. L. Page and S. Brin. The Anatomy of a Search Engine. The 7th International WWW Conference, 1998
25. E. Rahm and P. A. Bernstein. A survey of approaches to automatically schema matching. The International Journal on Very Large Data Bases, Volume 10, Issue 4, pp. 334-350, 2001
26. S. Sarawagi, S. Chakrabarti, and S. Godbole. Cross-training: learning probabilistic mappings between topics. In Proceedings of the ninth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 177-186, 2003
27. J. Wang and F. H. Lochovsky. Data Extraction and Label Assignment for Web Databases. In Proceedings of the 12th International Conference on World Wide Web, pp. 187-196, 2003
28. W. Wu, C. Yu, A. Doan, and W. Meng. An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 95-106, 2004
29. Z. Zhang, B. He, and K. C.-C. Chang. Understanding web query interfaces: Best effort parsing with hidden syntax. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 107-118, 2004
30. Z. Zhang, B. He, and K. C.-C. Chang. On-the-fly constraint mapping across web query interfaces. In Proceedings of the VLDB Workshop on Information Integration on the Web, 2004
31. D. Zhang and W. S. Lee. Web taxonomy integration through co-bootstrapping. In Proceedings of the 27th annual International Conference on Research and Development in Information Retrieval, pp. 410-417, 2004

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2005-7-13

推文