博碩士論文 101423017 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator黃嘉偉zh_TW
DC.creatorJia-Wei Huangen_US
dc.date.accessioned2014-7-15T07:39:07Z
dc.date.available2014-7-15T07:39:07Z
dc.date.issued2014
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=101423017
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract近年由於資訊科技發展迅速,電子文件數量大增加,為避免讀者花費過多時間吸收文件意涵,透過在文件中萃取重要文句製作摘要可幫助讀者快速吸收。然而傳統的文件摘要萃取方法僅透過該文句是否含有重要詞彙去判斷,較無更高層級的概念,如主題等;且摘要萃取文句並未對整個新聞事件做較為全面性之陳述。本研究使用圖形化摘要方法萃取多文件摘要,為指標表示方法(Indicator representation approaches)的一種,將文件切割使用較小的片段表示,本研究採用文句表示。而利用此較小之片段建立起圖形關聯網路後使用分群與數種鏈結分析方法對節點進行評分,並將其群集權重納入評分的考量後使用被選中的節點製作摘要。 實驗採用DUC 2002以及TAC2010之資料集測試系統效能,並以ROUGE衡量摘要品質;經實驗證明,本研究之多文件摘要方法在不同的摘要任務下品質皆具有一定程度,在DUC 2002之50字與100字多文件摘要ROUGE-1值分別可達0.2996與0.3412,與當年研討會之參賽者近似之效能,而200字多文件摘要ROUGE-1值亦有0.4559,具有中等效能;在TAC 2010之Guided Summarization之第一部份之ROUGE-1值可達0.3513,超越所有當年參賽者,而ROUGE-2值亦可達0.0707,亦有中等程度之效能。 zh_TW
dc.description.abstractInformation technology has developed rapidly in recent years, and the number of electronic documents has increased, too. To avoid readers spend too much time realizing the content of article, it’s useful to help them understand quickly that extracting important sentences and then making summarization. However, the traditional extracting method only judges whether the sentences contain the important terms or not, and it doesn’t use the concept of topic, either. In addition, the traditional extracting method also doesn’t focus on the whole news event to make a comprehensive explanation. This paper uses Graph-based Summarization method to extract multi-document summarization, which is a kind of Indicator representation approaches to divide document in smaller fragment, and this study uses sentence to represent it. After using smaller fragment to build Graph-based network, this paper uses clustering and many kinds of link analysis methods to score the nodes. After that, this study takes cluster weight into consideration for scoring and uses the sentence nodes to make summarization. The experiment uses DUC 2002 and TAC 2010 dataset, and uses ROUGE to evaluation the quality of summarization. The result shows that all the methods can reach a well level. The ROUGE-1 score of DUC 2002 50 words and 100 words can reach 0.2996 and 0.3412, it approximate to the peers in DUC 2002. The ROUGE-1 score of the first part of TAC 2010 Guided Summarization can reach 0.3513, and it’s higher than other peers. Finally, the ROUGE-2 score can reach 0.0707, it also has medium quality. en_US
DC.subject文字探勘zh_TW
DC.subject圖形網路zh_TW
DC.subject分群方法zh_TW
DC.subject多文件摘要zh_TW
DC.subjectText miningen_US
DC.subjectGraph-based networken_US
DC.subjectClustering methoden_US
DC.subjectMulti-document Summarizationen_US
DC.title以文句網路分群架構萃取多文件摘要zh_TW
dc.language.isozh-TWzh-TW
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明