博碩士論文 965202050 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:63 、訪客IP:18.118.144.98
姓名 楊萍華(Ping-hua Yang)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 部落格意見檢索系統之設計-部落格內文之擷取與不相關部落格之過濾
(Blog Post Extraction and Irrelevant Blog Filtering for Opinion Search Engine)
相關論文
★ 行程邀約郵件的辨識與不規則時間擷取之研究★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討★ 淨化網頁:網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究★ 同性質網頁資料整合之自動化研究
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) Blogosphere是由部落格 (Blog) 聚集而成的社群,而部落格在前百最受歡迎的網頁中,其佔有率有逐年增加的趨勢。部落格文章可包含多元主題,文章內容不但具有客觀的事實(objective opinions)且包含主觀的意見(subjective opinions)。以往使用者需要瞭解某特定資訊時,雖然使用者可以透過電視、報章雜誌或者搜尋引擎得到所需資訊,但是透過此方式不但需耗費較多的時間成本且所得到的資訊也較為侷限。因此,在此篇論文中我們整合部落格及搜尋引擎,針對某特定主題來展示大眾的主客觀意見,提供方便及快速尋找意見的部落格意見檢索系統。我們設計的部落格搜尋引擎將回傳的部落格網頁透過兩種方式,分別回傳部落格意見且週期性地更新每一個主題的部落格網頁,以利使用者快速掌握最新意見。首先是線上系統,以少量的固定網域網頁快速回傳意見,其次是在背景執行以大量搜尋部落格網頁增加意見的數量,我們採用不同的部落格搜尋引擎,以不限定部落格網域的方式來搜尋大量的部落格網頁。由於抓取異質性網站的部落格網頁,以人工方式擷取內文擷取可能性不高,因此我們透過機器學習的方式擷取部落格內文區塊,然而大量回傳的網頁包含了許多非部落格的網頁,而這些網頁會降低擷取內文的效果,因此我們藉由機器學習的方式,建立部落格與非部落格網頁的分類器,效果可以達到90.7%(F-Measure)。過濾後的部落格內文擷取效果,結果顯示過濾非部落格的效果可以超過約10% (F-measure)。此外有鑒於一個部落格網頁中的內文區塊與非內文區塊的不平衡比例,即非均衡資料(imbalanced data),我們也採用了不同的方法處理。最後是過濾相關程度較低的內文,我們增加了擴充主題字的方式,改善原本過濾的效果,提高約61%(F-Measure)。
摘要(英) Blogosphere are consisted of blog is a social network, and blogs which are the most popular in the top websites are increased by years. Blog pages are consisted of variety of topics and posted content is not only included objective opinions but also subjective opinions. In past users could get information by TV, magazine or search engine when they need to know some specific problem, but in those ways not only consume more time cost but also get limited information usually. For these reasons, in this paper we provide an opinion search engine on blogsphere which combines blog and search engine, focus on specific topics to show public opinions. Our blog opinion search engine which returns opinions by two ways, one is online system that responses opinions quickly by few fixed domain pages and the other is background system that update opinion which user can know newer information in large number of blog pages by any domains periodically. Because it is impossible for retrieving blog posted content by manually adding pattern in different blog website, we use machine learning to extract posted content, but those pages which consist of non-blog pages will reduce extraction performance and so we construct a blog and nonblog classifier which F-Measure is 90.7% can filter nonblog pages efficiently and raise extraction performance more than 10% F-Measure. Furthermore, according to positive block and negative blocks in a blog page are unbalanced which are called imbalance data, we adopt different way to solve this. In filtering irrelevant pages we add expansion words in original method which improve about 61% F-measure.
關鍵字(中) ★ 內文擷取
★ 意見檢索
★ 部落格
關鍵字(英) ★ blog post extract
★ opinion retrieval
論文目次 目錄 i
圖目錄 ii
表目錄 iii
一、緒論 1
二、相關研究 5
2.1.網頁內容擷取 5
2.2.意見檢索 6
2.3.情緒分類 8
三、系統架構 10
3.1.線上系統 11
3.2.背景執行 13
四、部落格內文擷 15
4.1.抓取部落格網頁 16
4.2.部落格內文擷取 16
4.2.1.部落格與非部落格分類器 19
五、意見擷取及檢索 22
5.1.查詢字擴充 22
5.2.過濾不相關部落格網頁 23
5.3.意見擷取 24
5.4.情緒分析 25
六、實驗 26
6.1.部落格內文擷取 26
6.1.1.一階段實驗 26
6.1.2.兩階段實驗 31
6.2.過濾與主題無關部落格網頁 33
七、結論及未來展望 35
7.1.結論 35
7.2.未來展望 36
八、參考文獻 38
附錄 41
參考文獻 [1] D. Cao and X. Liao and S. Bai. Blog Post and Comment Extraction Using Information Quantity of Web Format. AIRS 2008, pp. 298-309.
[2] C. H. Chang and K. C. Tsai. Aspect Summarization from Blogsphere for Social Study. ICDMW 2007, pp. 9-14.
[3] Y. Choi, C. Cardie, E. Riloff, and S. Pat Wardhan. Identifying sources of opinions with conditional random fields and extraction patterns. HLT 2005, pp. 355-362.
[4] J. G. Conrad and F. Schilder. Opinion Mining in Legal Blogs. ICAIL 2007, pp. 231-236.
[5] K. Dave, S. Lawrence, and D. M. Pennock. Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews. WWW 2003, pp. 519-523.
[6] E. Elgersma and M. de Rijke. Learning to Recognize Blogs: A Preliminary Exploration. ECAL 2006.
[7] A. Esuli and F. Sebastiani. SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining. LREC 2006, pp. 417-422.
[8] A. Esuli and F. Sebastiani. Determining the Semantic Orientation of terms through Gloss Classification. CIKM 2005, pp. 617-624.
[9] T. K. Fan and C. H. Chang. Sentiment-Oriented Contextual Advertising. ECIR 2009, Vol. 5478, pp. 202-215.
[10] A. Finn, N. Kushmerick, and B. Smyth. Genre Classification and Domain Transfer for Information Filtering. BCS-IRSG 2002, pp. 353-362.
[11] A. Harb, M. Plantiè, and G. Dray. Web Opinion Mining: How to extract opinions from blogs? CSTST 2008, pp. 211-217.
[12] G. Hattori, K. Hoashi, K. Matsumoto and F. Sugaya. Robust Web Page Segmentation for Mobile Terminal Using Content-Distances and Page Layout Information. WWW 2007, pp. 361-370.
[13] V. Hatzivassiloglou and K. R. McKeown. Predicting the semantic orientation of adjectives. ACL 1997, pp. 174-181.
[14] B. He, C. Macdonald, J. He and I. Ounis. An Effective Statistical Approach to Blog Post Opinion Retrieval. CIKM 2008, pp. 1063-1072.
[15] M. Hu and B. Liu. Mining and Summarizing Customer Reviews. KDD 2004, pp. 168-177.
[16] M. Jiang and S. Argamon. Exploiting subjectivity analysis in blogs to improve political leaning categorization. SIGIR 2008, pp. 725-726.
[17] N. Jindal and B. Liu. Opinion spam and Analysis. WSDM 2008, pp.219-230.
[18] J. Kamps, M. Marx, R. J. Mokken, and M. D. Rijke. Using WordNet to measure semantic orientation of adjectives. LREC 2004, pp. 1115-1118.
[19] S. M. Kim and E. Hovy. Automatic Identification of Pro and Con Reasons in Online Reviews. COLING/ACL 2006, pp. 483-490.
[20] D. Lee, Ok-Ran Jeong, and Sang-goo Lee. Opinion Mining of Customer Feedback Data on the Web. ICUIME 2008, pp. 230-235.
[21] B. Liu, M. Hu, and J. Cheng. Opinion Observer: Analyzing and Comparing Opinions on the web. WWW 2005, pp. 342-351.
[22] Y. Lu and C. Zhai. Opinion integration through semi-supervised topic modeling. WWW 2008, pp. 121-130.
[23] I. Ounis, M. de Rijke, C. MacDonald, G. Mishne, and I. Soboroff. Overview of the TREC-2006 Blog Track. In TREC 2006.
[24] I. Ounis, M. de Rijke, C. MacDonald, G. Mishne, and I. Soboroff. Overview of the TREC-2007 Blog Track. In TREC 2007.
[25] B. Pang and L. Lee. Sentiment education: Sentiment analysis using subjectivity summarization based on minimum cuts. ACL 2004, pp. 271-278.
[26] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP 2002, pp. 79-86.
[27] R. Song, H. Liu, Ji-Rong Wen, and Wei-Ying Ma. Learning Important Models for Web Page Blocks based on Layout and Content Analysis. SIGKDD 2004, Vol. 6 pp. 14-23.
[28] P. Turney. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. ACL 2002, pp. 417-424.
[29] J. M. Wiebe. Learning Subjective Adjectives from Corpora. AAAI 2000, pp. 735-740.
[30] M. Zhang and X. Ye. A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval. SIGIR 2008, pp. 414-418.
[31] W. Zhang, L. Jia, C. Yu, and W. Meng. Improve the Effectiveness of the Opinion Retrieval and Opinion Polarity Classification. CIKM 2008, pp. 1415-1416.
[32] W. Zhang, C. Yu and W. Meng. Opinion Retrieval from Blogs. CIKM 2007, pp. 831-840.
指導教授 張嘉惠(Chia-hui Chang) 審核日期 2009-7-28
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明