中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/9786
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 80990/80990 (100%)
Visitors : 41651690      Online Users : 1521
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/9786


    Title: 部落格意見檢索系統之設計-部落格內文之擷取與不相關部落格之過濾;Blog Post Extraction and Irrelevant Blog Filtering for Opinion Search Engine
    Authors: 楊萍華;Ping-hua Yang
    Contributors: 資訊工程研究所
    Keywords: 內文擷取;意見檢索;部落格;blog post extract;opinion retrieval
    Date: 2009-07-23
    Issue Date: 2009-09-22 11:56:09 (UTC+8)
    Publisher: 國立中央大學圖書館
    Abstract: Blogosphere是由部落格 (Blog) 聚集而成的社群,而部落格在前百最受歡迎的網頁中,其佔有率有逐年增加的趨勢。部落格文章可包含多元主題,文章內容不但具有客觀的事實(objective opinions)且包含主觀的意見(subjective opinions)。以往使用者需要瞭解某特定資訊時,雖然使用者可以透過電視、報章雜誌或者搜尋引擎得到所需資訊,但是透過此方式不但需耗費較多的時間成本且所得到的資訊也較為侷限。因此,在此篇論文中我們整合部落格及搜尋引擎,針對某特定主題來展示大眾的主客觀意見,提供方便及快速尋找意見的部落格意見檢索系統。我們設計的部落格搜尋引擎將回傳的部落格網頁透過兩種方式,分別回傳部落格意見且週期性地更新每一個主題的部落格網頁,以利使用者快速掌握最新意見。首先是線上系統,以少量的固定網域網頁快速回傳意見,其次是在背景執行以大量搜尋部落格網頁增加意見的數量,我們採用不同的部落格搜尋引擎,以不限定部落格網域的方式來搜尋大量的部落格網頁。由於抓取異質性網站的部落格網頁,以人工方式擷取內文擷取可能性不高,因此我們透過機器學習的方式擷取部落格內文區塊,然而大量回傳的網頁包含了許多非部落格的網頁,而這些網頁會降低擷取內文的效果,因此我們藉由機器學習的方式,建立部落格與非部落格網頁的分類器,效果可以達到90.7%(F-Measure)。過濾後的部落格內文擷取效果,結果顯示過濾非部落格的效果可以超過約10% (F-measure)。此外有鑒於一個部落格網頁中的內文區塊與非內文區塊的不平衡比例,即非均衡資料(imbalanced data),我們也採用了不同的方法處理。最後是過濾相關程度較低的內文,我們增加了擴充主題字的方式,改善原本過濾的效果,提高約61%(F-Measure)。 Blogosphere are consisted of blog is a social network, and blogs which are the most popular in the top websites are increased by years. Blog pages are consisted of variety of topics and posted content is not only included objective opinions but also subjective opinions. In past users could get information by TV, magazine or search engine when they need to know some specific problem, but in those ways not only consume more time cost but also get limited information usually. For these reasons, in this paper we provide an opinion search engine on blogsphere which combines blog and search engine, focus on specific topics to show public opinions. Our blog opinion search engine which returns opinions by two ways, one is online system that responses opinions quickly by few fixed domain pages and the other is background system that update opinion which user can know newer information in large number of blog pages by any domains periodically. Because it is impossible for retrieving blog posted content by manually adding pattern in different blog website, we use machine learning to extract posted content, but those pages which consist of non-blog pages will reduce extraction performance and so we construct a blog and nonblog classifier which F-Measure is 90.7% can filter nonblog pages efficiently and raise extraction performance more than 10% F-Measure. Furthermore, according to positive block and negative blocks in a blog page are unbalanced which are called imbalance data, we adopt different way to solve this. In filtering irrelevant pages we add expansion words in original method which improve about 61% F-measure.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File SizeFormat


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明