摘要(英) |
The target of information integration on the Web (IIWeb) is to decrease users’’ loads from repetitive work and let users mash Web data in accordance with their desire. Gadget on Demand (GOD) system, which equipped with automatic Web page fetching and unsupervised Web data extraction function, has been designed for Web information integration from single entry point with multiple runs and present result in various forms including table, list, map and calendar. In this paper we improve GOD system by filling in a new query form with extracted data from other Web sources, thus allowing cross-site information integration with multiple entry points. Furthermore, we add the gadget editing function such that gadget could be modified for different representation method. We also deal with the AJAX problem where Web contents are changed by client-side language like Javascript in a dynamic fashion and solve this problem via external calls to Web browser. We enumerate several real-world applications based on the revised GOD system, including integration of online book store and city library for storage checking and conference CFP calendar from DBWorld. The system demonstrates a potential utilization in Web2.0 generation where users are enabled with tools to create their own gadgets on demand.
|
參考文獻 |
[1] M. Alvarez, A. Pan, J. Raposo and J. Hidalgo, Crawling Web Pages with Support for Client-Side Dynamism. WAIM 2006.
[2] R. Baumgartner, S. Flesca and G. Gottlob, Visual Web Information Extraction with Lixto. VLDB 2001.
[3] C.H. Chang, M. Kayed, M. R. Girgis and K. Shaalan, A Survey of Web Information Extraction Systems. TKDE 2006.
[4] K. C.C. Chang, B. He, C. Li, M. Patel and Z. Zhang, Structured Databases on the Web: Observations and Implications. SIGMOD 2004.
[5] M. Dontcheva, S. M. Drucker, D. Salesin and M. F. Cohen, Relations, Cards, and Search Templates: User-Guided Web Data Integration and Layout. UIST 2007.
[6] P. B. Golgher, A. H.F. Laender, A. S. da Silva and B. Ribeiro-Neto, An Example-Based Environment for Wrapper Generation. ER Workshop 2000.
[7] J. Han, D. Han, C. Lin, H.J. Zeng, Z. Chen and Y. Yu, Homepage Live: Automatic Block Tracing for Web Personalization. WWW 2007.
[8] M. Kayed, C.C. Chang, K. Shaalan and M. R. Girgis, FiVaTech: Page-Level Web Data Extraction from Template Pages. ICDMW 2007.
[9] J. P. Lage, A. S. da Silva, P. B. Golgher and A. H.F. Laender, Automatic generation of agents for collecting hidden Web pages for data extraction. Data & Knowledge Engineering 2004.
[10] Steve Lawrence and C. Lee Giles, Accessibility of information on the web. Intelligence 2000.
[11] S. Lingam and S. Elbaum, Supporting End-Users in the Creation of Dependable Web Clips. WWW 2007.
[12] Y.H. Lu, Y. Hong, J. Varia and D. Lee, Pollock: Automatic Generation of Virtual Web Services from Web Sites. SAC 2005.
[13] A. Thor, D. Aumueller and E. Rahm, Data Integration Support for Mashups. IIWeb 2007.
[14] R. Tuchinda, P. Szekely and C. A. Knoblock, Building Mashups By Example. IUI 2008.
[15] G. Vossen and S. Hagemann, Unleashing Web 2.0 From Concepts to Creativity. Oxford:Elesvier 2007.
[16] Shih-Feng Yang, Multiple Source Data Management for Gadget Creation on Web Portals. 2008.
[17] J. Yu, B. Benatallah, F. Casati and F. Daniel, A Framework for Rapid Integration of Presentation Components. WWW 2007.
[18] BrightPlanet.com. The deep web: Surfacing hidden value. Accessible at http://brightplanet.com, July 2000.
[19] Dapper, http://www.dapper.net.
[20] Google Maps, http://maps.google.com.
[21] Google Maps API, http://code.google.com/apis/maps/.
[22] Google, http://www.google.com.
[23] Openkapow, http://openkapow.com.
[24] Yahoo Pipes, http://pipes.yahoo.com.
|