參考文獻 |
[1] L. Bing, Y. Wang, Y. Zhang and H. Wang. “Primary Content Extraction with Mountain Model”, CIT, IEEE, 2008, pp. 479–484.
[2] D. Cai, S. Yu, J. R. Wen and W. Y. Ma. “VIPS: a Vision-based Page Segmentation Algorithm”, Microsoft Technical Report, MSR-TR-2003-79, 2003.
[3] D. Cao and X. Liao and S. Bai. “Blog Post and Comment Extraction Using Information Quantity of Web Format”, AIRS, ACM, 2008, pp. 298–309.
[4] S. Debnath, P. Mitra, and C. L. Giles. “Automatic extraction of informative blocks from webpages”, SAC, ACM, 2005, pp. 1722–1726.
[5] S. Debnath, P. Mitra, and C. L. Giles. “Identifying content blocks from web documents”, ISMIS, 2005, pp. 285–293.
[6] E. Elgersma and M. de Rijke. “Learning to Recognize Blogs: A Preliminary Exploration”, ECAL Workshop, 2006.
[7] A. Finn, N. Kushmerick, and B. Smyth. “Fact or fiction: Content classification for digital libraries”, DELOS Workshop, 2001.
[8] J. Gibson, B. Wellner, S. Lubar. “Adaptive Web-page Content Identification”, WIDM, ACM, 2007, pp. 105-112.
[9] T. Gottron. “Evaluating content extraction on html documents”, ITA, 2007, pp. 123–132.
[10] T. Gottron. “Combining content extraction heuristics: the combine system”, iiWAS, ACM, 2008, pp. 591–595.
[11] T. Gottron. “Content code blurring: A new approach to content extraction”, DEXA, IEEE, 2008, pp. 29–33.
[12] Y. Guo, H. Tang, L. Song, Y. Wang and G. Ding. “ECON: An Approach to Extract Content from Web News Page”, APWEB, IEEE, 2010, pp. 314–320.
[13] S. Gupta, G. E. Kaiser, P. Grimm, M. F. Chiang, and J. Starren. “Automating content extraction of html documents”, WWW, ACM, 2005, pp. 179–224.
[14] S. Gupta, G. E. Kaiser, D. Neistadt, and P. Grimm. “Dom-based content extraction of html documents”, WWW, ACM, 2003, pp. 207–214.
[15] S. Gupta, G. E. Kaiser, and S. J. Stolfo. “Extracting context to improve accuracy for html content extraction”, WWW, ACM, 2005, pp. 1114–1115.
[16] W. Han, D. Buttler, and C. Pu. “Wrapping web data into xml”, SIGMOD, ACM, 2001, pp. 33–38.
[17] P. Kolari, A. Java, T. Finin, T. Oates and A. Joshi. “Detecting Spam Blogs: A Machine Learning Approach”, AAAI, ACM, 2006, pp. 1351−1356.
[18] J. Lafferty, A. McCallum, and F. Pereira. “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”, ICML, ACM, 2001, pp. 282–289.
[19] J. Liu, L. Birnbaum and B. Pardo. “Categorizing Blogger’s Interests Based on Short Snippets of Blog Posts”, CIKM, ACM, 2008, pp. 1525–1526.
[20] C. Mantratzis, M. A. Orgun, and S. Cassidy. “Separating XHTML content from navigation clutter using DOM-structure block analysis”, Hypertext, ACM, 2005, pp. 145–147.
[21] M. Marek, P. Pecina and M. Spousta. “Web Page Cleaning with Conditional Random Fields”, WWW, vol. 5, 2007, pp. 1−8.
[22] G. Mishne and M. de Rijke. “Deriving Wishlists from Blogs”, WWW, ACM, 2006, pp. 925–926.
[23] I. Ounis, M. de Rijke, C. Macdonald, G. Mishne, and I. Soboroff. “Overview of the TREC-2006 Blog Track”, TREC, 2006.
[24] J. Pasternack and D. Roth. “Extracting article text from the web with maximum subsequence segmentation”, WWW, ACM, 2009, pp. 971–980.
[25] D. Pinto, M. Branstein, R. Coleman, W. B. Croft, M. King, W. Li, and X. Wei. “Quasm: a system for question answering using semi-structured data”, JCDL, ACM, 2002, pp. 46–55.
[26] M.F. Porter. “An algorithm for suffix stripping”, Program, vol. 14, no. 3, 1980, pp. 130−137.
[27] A. F. R. Rahman, H. Alam and R. Hartono. “Content Extraction from HTML Documents”, WDA, 2001, pp. 7–10.
[28] W. L. Ruzzo and M. Tompa. “A Linear Time Algorithm for Finding All Maximal Scoring Subsequences”, AAAI Press, ACM, 1999, pp. 234–241.
[29] L. Song, X. Cheng, Y. Guo, B. Wu and Y. Wang. “Blog Post Extraction Using Title Finding”, Chinese Academy of Sciences, 2009.
[30] R. Song, H. Liu, J. R. Wen, and W. Y. Ma. “Learning Important Models for Web Page Blocks based on Layout and Content Analysis”, SIGKDD, ACM, 2004, pp. 14−23.
[31] H. M. Wallach. “Efficient Training of Conditional Random Fields”, CLUK Research Colloquium, University of Edinburgh, 2002.
[32] H. M. Wallach. “Conditional Random Fields: An Introduction”, Technical Report MS-CIS-04-21, Univ. of Pennsylvania, 2004.
[33] T. Weninger and W. H. Hsu. “Text Extraction from the Web via Text-to-Tag Ratio”, iiWas, ACM, 2008, pp. 23–28.
[34] T. Weninger, W. H. Hsu and J. Han. “CETR – Content Extraction via Tag Ratios”, WWW, ACM, 2010, pp. 971–980.
[35] L. Yang, C. Li and M. Gu. “Extracting Content from Web Pages Using the Sliding Window”, CSA, IEEE, 2009, pp. 1–6.
[36] P. H. Yang and C. H. Chang. “Automatic Labeling for Blog Post Extraction”, NCS, Taiwan, 2009.
|