參考文獻 |
[1] A. Z. Border, S. C. Glassman, and M. S. Manasse. Syntactic clustering of the web. In Proceedings of the 6th International World Wide Web Conference(WWW6), pp. 1157-1166, 1997.
[2] A. Arasu and H. Garcia-Molina. Extracting structured data from web pages. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 337-348, San Diego, California, USA, June 9-12, 2003.
[3] B. Liu, R. Grossman, and Y. Zhai. Mining data records in web pages. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2003), pp. 601-606, Washington, DC, USA, August 24 - 27, 2003.
[4] C.-H. Chang and S.-C. Lui. IEPAD:Information extraction based on pattern discovery. In Proceedings of the 10th International Conference on World Wide Web, pp. 681-688, Hong Kong, May 2-6, 2001.
[5] C.-H. Chang and S.-C. Kuo. OLERA:A semi-supervised approach for web data extraction with visual support. IEEE Intelligent Systems, 2003.
[6] Document Object Model(DOM) – W3C Recommendation.
http://www.w3c.org/DOM/
[7] J. Wang and F. H. Lochovsky. Data-rich section extraction from HTML pages. In Proceedings of IEEE Computer Society 2002. 3rd International Conference on Web Information Systems Engineering (WISE 2002), pp. 313-322, Singapore, December 12-14, 2002.
[8] J. Wang and F. H. Lochovsky. Data extraction and label assignment for web databases. In Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 187-196, Budapest, Hungary, May 20-24, 2003
[9] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceeding of 9th ACM-SIAM Symposium on Discrete Algorithms, 1998 and IBM Research Report RJ 10076, May 1997. Extended version in Journal of the ACM 46(1999), pp. 604-632。
[10] L. Ramaswamy, A. lyengar, L. Liu, and F. Douglis. Automatic detection of fragments in dynamically generated web pages. In Proceedings of the Thirteenth International World Wide Web Conference, WWW2004, New York, USA, May 17-22, 2004.
[11] L. Yi, B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2003), pp. 296-305, Washington, DC, USA, August 24 - 27, 2003.
[12] S.-H. Lin and J.-M. Ho. Discovering informative content blocks from web documents. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 588-593, Edmonton Alberta, Canada, July 23-26, 2002.
[13] S. Yu, D. Cai, J.-R. Wen, and W.-Y. Ma. Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 11-18, Budapest, Hungary, May 20-24, 2003.
[14] S. Gupta, G. Kaiser, D. Neistadt, and P. Grimm. DOM-based content extraction of HTML documents. In Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 207-214, Budapest, Hungary, May 20-24, 2003.
[15] T. Munzner, F. Guimbretiere, S. Tasiran, L. Zhang, and Y. Zhou. TreeJuxtaposer:Scalable tree comparison using Focus+Context with guaranteed visibility. In Proceeding of ACM SIGGRAPH 2003. pp. 453-462, July 2003.
[16] V. Crescenzi, G. Mecca, and P. Merialdo. ROADRUNNER:Towards automatic data extraction from large web sites. In Proceedings of 27th International Conference on Very Large Data Bases, pp. 109-118, Roma, Italy, September 11-14, 2001.
[17] Y. Chen, W.-Y. Ma, and H.-J. Zhang. Detecting web pages structure for adpative viewing on small form factor devices. In Proceedings of the Twelfth International World Wide Web Conference, WWW2003, pp. 225-266, Budapest, Hungary, May 20-24, 2003.
[18] Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. In Proceedings of the Eleventh International World Wide Web Conference, WWW2002, pp. 580-591, Honolulu, Hawaii, USA, May 7-11, 2002. |