姓名 周慧鈴(Hui-ling Chou)
論文名稱 從評論中找出最具代表性的K個句子
(Find the most K informative sentences from comments)
摘要(中) 本研究為針對飯店評論做摘要之處理。在早期有關文件之摘要主要是以單一文件為主,因此在分析文件時,不太需要注意內文中是否有衝突之意見,也毋須在意文件發布之時間。然而隨著科技的進步與網路論壇的興起,透過網路來發表自身觀點或經驗的人日益漸增,在這些網站上每天都有數以萬計的評論產生,倘若以早期摘要方式替這些評論做摘要,可能會產生突意見與時間、作者差異之問題。因此本研究針對該類型之文件提出一個新的摘要方式,並以最具代表性的K個句子做為摘要輸出。我們主要考量因素有以下四點,分別為不同作者之可信程度、不同時間點之影響程度、評論本身幫助與否,以及衝突意見之分析。
摘要(英) This study focuses on the summarization of hotel comments. In early work of summarization, the document we analyze is single-document. Therefore, we do not need to pay attention to neither conflict opinion nor the time the document posted. However, with the advance of science and technology and the flourishing of online forums, more and more people write their own opinions and experiences and post them by internet. Every day there are tens of thousands of new comments on these websites. If we summarize these comments by early work of summarization, they may cause the problems of conflict opinion and the differences in time and author. Therefore, this study proposes a new method of summarization for this type of documents and uses the most K informative sentences as a summarization. We have four main considerations and they include the credibility of each author, the influence of difference in time, the helpfulness of comment and the analysis of conflict opinion.
This study adopts hotels’ reviews on Tripadvisor as our dataset and asks twenty subjects comparing sentences which are chosen by three different methods and sorting the method by effect. The three method respectively are only considering the external features of sentences (A), only considering the analysis of context (B) as well as we propose method (C). If the result of experiment shows that C method is better than A and B method, it can prove that it is necessary to take into account the difference in authors and time and the method we proposed is feasible and effective.
關鍵字(中) ★ 評論摘要
★ K-means
關鍵字(英) ★ Comments summarization
★ K-means
論文目次 摘要 i
Abstract ii
圖目錄 vi
表目錄 vii
第一章、緒論 1
1.1 研究背景與動機 1
1.2 研究目標 4
第二章、文獻探討 5
2.1 文件摘要之定義與內容 5
2.2 文件摘要之作法 6
2.3 文件摘要之應用主題 7
第三章、研究方法 10
3.1 評論前處理 11
3.1.1 評論斷句 11
3.1.2 POS處理(Part-of-Speech) 11
3.1.3 停用字去除 12
3.1.4 詞性過濾 12
3.1.5 句子挑選 12
3.2 重要性計算 12
3.2.1 評論者(Comment Author) 13
3.2.2 評論有用性(Comment Helpfulness) 15
3.2.3 評論時間(Comment Time) 16
3.2.4 評論句子(Comment Sentence) 17
3.2.5 句子重要性計算(Sentence Importance) 19
3.2.6 本階段之輸出 20
3.3 相似度計算 20
3.3.1 內容相似度計算(Content-Similarity) 20
3.3.2 情感相似度計算(Sentiment-Similarity) 23
3.3.3 句子相似度計算(Sentence-Similarity) 25
3.4 篩選K句 28
3.4.1 K-means演算法 28
3.4.2 挑出K群群中心的句子輸出 29
第四章、實驗設計 30
4.1 資料蒐集 30
4.2 實驗比較 30
4.3 參數設定 31
4.4 評估方式 31
4.5 實驗結果 32
第五章 結論與未來研究 42
參考文獻 44
指導教授 陳彥良(Yen-liang Chen) 審核日期 2013-6-21
