摘要(英) |
Blogosphere are consisted of blog is a social network, and blogs which are the most popular in the top websites are increased by years. Blog pages are consisted of variety of topics and posted content is not only included objective opinions but also subjective opinions. In past users could get information by TV, magazine or search engine when they need to know some specific problem, but in those ways not only consume more time cost but also get limited information usually. For these reasons, in this paper we provide an opinion search engine on blogsphere which combines blog and search engine, focus on specific topics to show public opinions. Our blog opinion search engine which returns opinions by two ways, one is online system that responses opinions quickly by few fixed domain pages and the other is background system that update opinion which user can know newer information in large number of blog pages by any domains periodically. Because it is impossible for retrieving blog posted content by manually adding pattern in different blog website, we use machine learning to extract posted content, but those pages which consist of non-blog pages will reduce extraction performance and so we construct a blog and nonblog classifier which F-Measure is 90.7% can filter nonblog pages efficiently and raise extraction performance more than 10% F-Measure. Furthermore, according to positive block and negative blocks in a blog page are unbalanced which are called imbalance data, we adopt different way to solve this. In filtering irrelevant pages we add expansion words in original method which improve about 61% F-measure.
|
參考文獻 |
[1] D. Cao and X. Liao and S. Bai. Blog Post and Comment Extraction Using Information Quantity of Web Format. AIRS 2008, pp. 298-309.
[2] C. H. Chang and K. C. Tsai. Aspect Summarization from Blogsphere for Social Study. ICDMW 2007, pp. 9-14.
[3] Y. Choi, C. Cardie, E. Riloff, and S. Pat Wardhan. Identifying sources of opinions with conditional random fields and extraction patterns. HLT 2005, pp. 355-362.
[4] J. G. Conrad and F. Schilder. Opinion Mining in Legal Blogs. ICAIL 2007, pp. 231-236.
[5] K. Dave, S. Lawrence, and D. M. Pennock. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. WWW 2003, pp. 519-523.
[6] E. Elgersma and M. de Rijke. Learning to Recognize Blogs: A Preliminary Exploration. ECAL 2006.
[7] A. Esuli and F. Sebastiani. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. LREC 2006, pp. 417-422.
[8] A. Esuli and F. Sebastiani. Determining the Semantic Orientation of terms through Gloss Classification. CIKM 2005, pp. 617-624.
[9] T. K. Fan and C. H. Chang. Sentiment-Oriented Contextual Advertising. ECIR 2009, Vol. 5478, pp. 202-215.
[10] A. Finn, N. Kushmerick, and B. Smyth. Genre Classification and Domain Transfer for Information Filtering. BCS-IRSG 2002, pp. 353-362.
[11] A. Harb, M. Plantiè, and G. Dray. Web Opinion Mining: How to extract opinions from blogs? CSTST 2008, pp. 211-217.
[12] G. Hattori, K. Hoashi, K. Matsumoto and F. Sugaya. Robust Web Page Segmentation for Mobile Terminal Using Content-Distances and Page Layout Information. WWW 2007, pp. 361-370.
[13] V. Hatzivassiloglou and K. R. McKeown. Predicting the semantic orientation of adjectives. ACL 1997, pp. 174-181.
[14] B. He, C. Macdonald, J. He and I. Ounis. An Effective Statistical Approach to Blog Post Opinion Retrieval. CIKM 2008, pp. 1063-1072.
[15] M. Hu and B. Liu. Mining and Summarizing Customer Reviews. KDD 2004, pp. 168-177.
[16] M. Jiang and S. Argamon. Exploiting subjectivity analysis in blogs to improve political leaning categorization. SIGIR 2008, pp. 725-726.
[17] N. Jindal and B. Liu. Opinion spam and Analysis. WSDM 2008, pp.219-230.
[18] J. Kamps, M. Marx, R. J. Mokken, and M. D. Rijke. Using WordNet to measure semantic orientation of adjectives. LREC 2004, pp. 1115-1118.
[19] S. M. Kim and E. Hovy. Automatic Identification of Pro and Con Reasons in Online Reviews. COLING/ACL 2006, pp. 483-490.
[20] D. Lee, Ok-Ran Jeong, and Sang-goo Lee. Opinion Mining of Customer Feedback Data on the Web. ICUIME 2008, pp. 230-235.
[21] B. Liu, M. Hu, and J. Cheng. Opinion Observer: Analyzing and Comparing Opinions on the web. WWW 2005, pp. 342-351.
[22] Y. Lu and C. Zhai. Opinion integration through semi-supervised topic modeling. WWW 2008, pp. 121-130.
[23] I. Ounis, M. de Rijke, C. MacDonald, G. Mishne, and I. Soboroff. Overview of the TREC-2006 Blog Track. In TREC 2006.
[24] I. Ounis, M. de Rijke, C. MacDonald, G. Mishne, and I. Soboroff. Overview of the TREC-2007 Blog Track. In TREC 2007.
[25] B. Pang and L. Lee. Sentiment education: Sentiment analysis using subjectivity summarization based on minimum cuts. ACL 2004, pp. 271-278.
[26] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP 2002, pp. 79-86.
[27] R. Song, H. Liu, Ji-Rong Wen, and Wei-Ying Ma. Learning Important Models for Web Page Blocks based on Layout and Content Analysis. SIGKDD 2004, Vol. 6 pp. 14-23.
[28] P. Turney. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. ACL 2002, pp. 417-424.
[29] J. M. Wiebe. Learning Subjective Adjectives from Corpora. AAAI 2000, pp. 735-740.
[30] M. Zhang and X. Ye. A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval. SIGIR 2008, pp. 414-418.
[31] W. Zhang, L. Jia, C. Yu, and W. Meng. Improve the Effectiveness of the Opinion Retrieval and Opinion Polarity Classification. CIKM 2008, pp. 1415-1416.
[32] W. Zhang, C. Yu and W. Meng. Opinion Retrieval from Blogs. CIKM 2007, pp. 831-840.
|