以文句網路分群架構萃取多文件摘要

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	黃嘉偉	zh_TW
DC.creator	Jia-Wei Huang	en_US
dc.date.accessioned	2014-7-15T07:39:07Z
dc.date.available	2014-7-15T07:39:07Z
dc.date.issued	2014
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=101423017
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	近年由於資訊科技發展迅速，電子文件數量大增加，為避免讀者花費過多時間吸收文件意涵，透過在文件中萃取重要文句製作摘要可幫助讀者快速吸收。然而傳統的文件摘要萃取方法僅透過該文句是否含有重要詞彙去判斷，較無更高層級的概念，如主題等；且摘要萃取文句並未對整個新聞事件做較為全面性之陳述。本研究使用圖形化摘要方法萃取多文件摘要，為指標表示方法(Indicator representation approaches)的一種，將文件切割使用較小的片段表示，本研究採用文句表示。而利用此較小之片段建立起圖形關聯網路後使用分群與數種鏈結分析方法對節點進行評分，並將其群集權重納入評分的考量後使用被選中的節點製作摘要。實驗採用DUC 2002以及TAC2010之資料集測試系統效能，並以ROUGE衡量摘要品質；經實驗證明，本研究之多文件摘要方法在不同的摘要任務下品質皆具有一定程度，在DUC 2002之50字與100字多文件摘要ROUGE-1值分別可達0.2996與0.3412，與當年研討會之參賽者近似之效能，而200字多文件摘要ROUGE-1值亦有0.4559，具有中等效能；在TAC 2010之Guided Summarization之第一部份之ROUGE-1值可達0.3513，超越所有當年參賽者，而ROUGE-2值亦可達0.0707，亦有中等程度之效能。	zh_TW
dc.description.abstract	Information technology has developed rapidly in recent years, and the number of electronic documents has increased, too. To avoid readers spend too much time realizing the content of article, it’s useful to help them understand quickly that extracting important sentences and then making summarization. However, the traditional extracting method only judges whether the sentences contain the important terms or not, and it doesn’t use the concept of topic, either. In addition, the traditional extracting method also doesn’t focus on the whole news event to make a comprehensive explanation. This paper uses Graph-based Summarization method to extract multi-document summarization, which is a kind of Indicator representation approaches to divide document in smaller fragment, and this study uses sentence to represent it. After using smaller fragment to build Graph-based network, this paper uses clustering and many kinds of link analysis methods to score the nodes. After that, this study takes cluster weight into consideration for scoring and uses the sentence nodes to make summarization. The experiment uses DUC 2002 and TAC 2010 dataset, and uses ROUGE to evaluation the quality of summarization. The result shows that all the methods can reach a well level. The ROUGE-1 score of DUC 2002 50 words and 100 words can reach 0.2996 and 0.3412, it approximate to the peers in DUC 2002. The ROUGE-1 score of the first part of TAC 2010 Guided Summarization can reach 0.3513, and it’s higher than other peers. Finally, the ROUGE-2 score can reach 0.0707, it also has medium quality.	en_US
DC.subject	文字探勘	zh_TW
DC.subject	圖形網路	zh_TW
DC.subject	分群方法	zh_TW
DC.subject	多文件摘要	zh_TW
DC.subject	Text mining	en_US
DC.subject	Graph-based network	en_US
DC.subject	Clustering method	en_US
DC.subject	Multi-document Summarization	en_US
DC.title	以文句網路分群架構萃取多文件摘要	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 101423017 完整後設資料紀錄