從評論中找出最具代表性的K個句子

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：23

、訪客IP：3.148.200.145

姓名

周慧鈴(Hui-ling Chou) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

從評論中找出最具代表性的K個句子
(Find the most K informative sentences from comments)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

本研究為針對飯店評論做摘要之處理。在早期有關文件之摘要主要是以單一文件為主，因此在分析文件時，不太需要注意內文中是否有衝突之意見，也毋須在意文件發布之時間。然而隨著科技的進步與網路論壇的興起，透過網路來發表自身觀點或經驗的人日益漸增，在這些網站上每天都有數以萬計的評論產生，倘若以早期摘要方式替這些評論做摘要，可能會產生突意見與時間、作者差異之問題。因此本研究針對該類型之文件提出一個新的摘要方式，並以最具代表性的K個句子做為摘要輸出。我們主要考量因素有以下四點，分別為不同作者之可信程度、不同時間點之影響程度、評論本身幫助與否，以及衝突意見之分析。
本研究將以Tripadvisor上的飯店評論作為分析之資料，並請二十位受試者比較三種不同方式所挑選出之句子，依據效果好壞排序三種方式。而三種方式分別為僅考慮語句外部特徵(A)、僅考慮內文分析(B)，以及本研究方法(C)。倘若實驗結果C大於A與B，即可證明加入作者與時間差異的考量是必要的以及本研究方法是可行且效果良好的。

摘要(英)

This study focuses on the summarization of hotel comments. In early work of summarization, the document we analyze is single-document. Therefore, we do not need to pay attention to neither conflict opinion nor the time the document posted. However, with the advance of science and technology and the flourishing of online forums, more and more people write their own opinions and experiences and post them by internet. Every day there are tens of thousands of new comments on these websites. If we summarize these comments by early work of summarization, they may cause the problems of conflict opinion and the differences in time and author. Therefore, this study proposes a new method of summarization for this type of documents and uses the most K informative sentences as a summarization. We have four main considerations and they include the credibility of each author, the influence of difference in time, the helpfulness of comment and the analysis of conflict opinion.
This study adopts hotels’ reviews on Tripadvisor as our dataset and asks twenty subjects comparing sentences which are chosen by three different methods and sorting the method by effect. The three method respectively are only considering the external features of sentences (A), only considering the analysis of context (B) as well as we propose method (C). If the result of experiment shows that C method is better than A and B method, it can prove that it is necessary to take into account the difference in authors and time and the method we proposed is feasible and effective.

關鍵字(中)

★ 評論摘要
★ NGD
★ PMI
★ K-means

關鍵字(英)

★ Comments summarization
★ NGD
★ PMI
★ K-means

論文目次

摘要 i
Abstract ii
圖目錄 vi
表目錄 vii
第一章、緒論 1
1.1 研究背景與動機 1
1.2 研究目標 4
第二章、文獻探討 5
2.1 文件摘要之定義與內容 5
2.2 文件摘要之作法 6
2.3 文件摘要之應用主題 7
第三章、研究方法 10
3.1 評論前處理 11
3.1.1 評論斷句 11
3.1.2 POS處理(Part-of-Speech) 11
3.1.3 停用字去除 12
3.1.4 詞性過濾 12
3.1.5 句子挑選 12
3.2 重要性計算 12
3.2.1 評論者(Comment Author) 13
3.2.2 評論有用性(Comment Helpfulness) 15
3.2.3 評論時間(Comment Time) 16
3.2.4 評論句子(Comment Sentence) 17
3.2.5 句子重要性計算(Sentence Importance) 19
3.2.6 本階段之輸出 20
3.3 相似度計算 20
3.3.1 內容相似度計算(Content-Similarity) 20
3.3.2 情感相似度計算(Sentiment-Similarity) 23
3.3.3 句子相似度計算(Sentence-Similarity) 25
3.4 篩選K句 28
3.4.1 K-means演算法 28
3.4.2 挑出K群群中心的句子輸出 29
第四章、實驗設計 30
4.1 資料蒐集 30
4.2 實驗比較 30
4.3 參數設定 31
4.4 評估方式 31
4.5 實驗結果 32
第五章結論與未來研究 42
參考文獻 44

參考文獻

[1] I. Mani, Advances in Automatic Text Summarization, MIT Press Cambridge, MA, USA, 1999.
[2] G. Vishal and Lehal and G. Singh, “A Survey of Text Summarization Extractive Techniques”, Journal of Emerging Technologies in Web Intelligence, Vol. 2, No. 3, pp. 258-268, August 2010.
[3] D. R. Radev and E. Hovy and K. McKeown, “Introduction to the special issue on summarization”, Computational Linguistics, Vol. 28, No. 4, pp. 399-401, 2002
[4]. F. Chen and K. Han and G. Chen, “An approach to sentence selection based text summarization”, Proceedings of IEEE ENCON02, Vol. 1, pp.489-493, Octobers 2002.
[5] D. McDonald and H. Chen, “Using sentence-selection heuristics to rank text segments in TXTRACTOR”, JCDL ’02 Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, pp.28-35, 2002.
[6] P.B. Baxendale, “Machine- Made Index for Technical Literature An Experiment”, IBM Journal of Research and Development, Vol. 2, Issue 4, pp. 354-361, Octobers 1958.
[7] H. P. Edmundson, “New methods in automatic extracting”, Journal of the ACM (JACM), Vol. 16, Issue 2, pp. 264-258, April 1969.
[8] J.Kupiec and J. Pederson and F.Chen, “A Trainable Document Summarizer”, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pp.68-73, 1995.
[9] CN Silla Jr, and CAA Kaestner, and AA Freitas, “A Non-Linear Topic Detection Method for Text Summarization UsingWordnet”, Proc. I Workshop em Tecnologia da Informacao e Linguagem Humana, October, 2003.
[10] R. Barzilay and M. Elhadad , “Using Lexical Chains for Text Summarization”, Proceedings of theWorkshop on Intelligent Scalable Text Summarization, August 1997.
[11] Y.H. Tseng et al. , “Patent surrogate extraction and evaluation in the context of patent mapping”, Journal of Information Science, Vol. 33, pp.718-736, December 2007.
[12] 余駿，「本體論為基之智慧型專利文件自動摘要方法論研究」，國立清華大學，碩士論文，民國95年
[13] D. Vazhenin and S. Ishikawa and V. Klyuev, “A user-oriented web retrieval summarization tool”, 2009 Second International Conference on Advances in Human-Oriented and Personalized Mechanisms, Technologies, and Services, pp. 73-78, September 2009.
[14] J. S. Kallimani and K. G. Srinivasa and B. E. Reddy, “Summarizing news paper articles: Experiments with ontology-based, customized, extractive text summary and word scoring”, Cybernetics and Information Technologies, Vol. 12, No. 2, pp.34-50, 2012.
[15] M. Hu and B. Liu, “Mining and summarizing customer reviews”, KDD ’04 Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.168-177, 2004.
[16] L. Zhuang and F. Jing and X. Y. Zhu, “Movie Review Mining and Summarization”, CIKM ’06 Proceedings of the 15th ACM international conference on Information and knowledge management, pp. 43-50, 2006.
[17] D. Wang and S. Zhu and T. Li, “SumView: A Web-based engine for summarizing product reviews and customer opinions”, Expert System with Applications, Vol. 40, Issue 1, pp.27-33, January 2013.
[18] P.D. Turney, “Thumbs up or thumbs down ? : Semantic orientation applied to unsupervised classification of reviews”, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp.417-424, 2002.
[19] X. Meng and H. Wang, “Mining user reviews: from specification to summarization”, Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp.177-180, 2009.
[20] M. F. Porter, “An algorithm for suffix stripping,” Program, Vol. 14, Issue 3, pp.130−137,
1980.
[21] http://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html
[22] 高豪伸，「應用關鍵詞彙辨識技術與測量重要資訊密度之文件自動摘要系統」，國立清華大學，碩士論文，民國94年
[23] R.L. Cilibrasi, P.M.B. Vitanyi, “The Google Similarity Distance,” IEEE Transactions on
Knowledge and Data Engineering, Vol. 19, Issue 3, pp. 370-383, 2007.
[24] http://www.worldwidewebsize.com/
[25] PD. Turney, ML Littman, “Measuring praise and criticism: Inference of semantic
orientation from association,” ACM Transaction Information System, Vol. 21, Issue 4, pp.
315-346, 2003.
[26] M.M.S. Missen, M. Boughanem and G. Cabanac, “Opinion mining: review from word to document level”, Social Network Analysis and Mining, Vol.3, Issue 1, pp.107-125

指導教授

陳彥良(Yen-liang Chen)

審核日期

2013-6-21

推文