以文句網路分群架構萃取多文件摘要

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：16

、訪客IP：3.135.192.76

姓名

黃嘉偉(Jia-Wei Huang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

以文句網路分群架構萃取多文件摘要

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年由於資訊科技發展迅速，電子文件數量大增加，為避免讀者花費過多時間吸收文件意涵，透過在文件中萃取重要文句製作摘要可幫助讀者快速吸收。然而傳統的文件摘要萃取方法僅透過該文句是否含有重要詞彙去判斷，較無更高層級的概念，如主題等；且摘要萃取文句並未對整個新聞事件做較為全面性之陳述。本研究使用圖形化摘要方法萃取多文件摘要，為指標表示方法(Indicator representation approaches)的一種，將文件切割使用較小的片段表示，本研究採用文句表示。而利用此較小之片段建立起圖形關聯網路後使用分群與數種鏈結分析方法對節點進行評分，並將其群集權重納入評分的考量後使用被選中的節點製作摘要。
實驗採用DUC 2002以及TAC2010之資料集測試系統效能，並以ROUGE衡量摘要品質；經實驗證明，本研究之多文件摘要方法在不同的摘要任務下品質皆具有一定程度，在DUC 2002之50字與100字多文件摘要ROUGE-1值分別可達0.2996與0.3412，與當年研討會之參賽者近似之效能，而200字多文件摘要ROUGE-1值亦有0.4559，具有中等效能；在TAC 2010之Guided Summarization之第一部份之ROUGE-1值可達0.3513，超越所有當年參賽者，而ROUGE-2值亦可達0.0707，亦有中等程度之效能。

摘要(英)

Information technology has developed rapidly in recent years, and the number of electronic documents has increased, too. To avoid readers spend too much time realizing the content of article, it’s useful to help them understand quickly that extracting important sentences and then making summarization. However, the traditional extracting method only judges whether the sentences contain the important terms or not, and it doesn’t use the concept of topic, either. In addition, the traditional extracting method also doesn’t focus on the whole news event to make a comprehensive explanation. This paper uses Graph-based Summarization method to extract multi-document summarization, which is a kind of Indicator representation approaches to divide document in smaller fragment, and this study uses sentence to represent it. After using smaller fragment to build Graph-based network, this paper uses clustering and many kinds of link analysis methods to score the nodes. After that, this study takes cluster weight into consideration for scoring and uses the sentence nodes to make summarization.
The experiment uses DUC 2002 and TAC 2010 dataset, and uses ROUGE to evaluation the quality of summarization. The result shows that all the methods can reach a well level. The ROUGE-1 score of DUC 2002 50 words and 100 words can reach 0.2996 and 0.3412, it approximate to the peers in DUC 2002. The ROUGE-1 score of the first part of TAC 2010 Guided Summarization can reach 0.3513, and it’s higher than other peers. Finally, the ROUGE-2 score can reach 0.0707, it also has medium quality.

關鍵字(中)

★ 文字探勘
★ 圖形網路
★ 分群方法
★ 多文件摘要

關鍵字(英)

★ Text mining
★ Graph-based network
★ Clustering method
★ Multi-document Summarization

論文目次

摘要 i
Abstract ii
誌謝 iii
一、緒論 1
1-1 研究背景 1
1-2 研究動機 2
1-3 研究目的 4
1-4 論文架構 5
二、文獻探討 6
2-1 自動文件摘要 6
2-2 Guided Summarization 8
2-3 相關文獻作法與本研究差異 9
2-4 特徵分析方法 12
2-4-1 1-gram filtering 12
2-4-2 文件內容與標題之間關聯性 14
2-4-3 Term Frequency-Inverse Sentence Frequency 14
2-4-4 文句長度之研究 14
2-5 向量相似度衡量方法 15
2-6 參與中間度分群 15
2-7 鏈結分析方法 16
2-7-1 Degree 17
2-7-2 Strength 17
2-7-3 K-Core 17
2-7-4 PageRank 17
2-7-5 Locality Index 18
2-8 波達計數法 19
三、研究方法與系統流程 20
3-1 系統流程 20
3-2 文件前處理 21
3-2-1 1-gram filtering 21
3-2-2 關鍵字相關程度 21
3-2-3 文句轉向量 22
3-2-4 文句過濾 22
3-3 文句計分 23
3-3-1 建立文句關係網路 23
3-3-2 文句分群與群集計分 24
3-3-3 文句節點評分 26
3-4 挑選文句 27
四、實驗設計與結果討論 28
4-1 資料集與實驗設置 28
4-1-1 DUC與TAC 28
4-1-2 使用之資料集 28
4-1-3 實驗環境 29
4-1-4 輸入文件 29
4-2 評估摘要成果準則 31
4-3 實驗流程 31
4-4 實驗數據與討論 33
4-4-1 實驗一：單一鏈結方法門檻與篩選 33
4-4-2 實驗二：整合鏈結方法門檻值 45
4-4-3 實驗三：實作Guided Summarization第一部份 56
4-4-4 實驗四：系統效能評比 57
五、結論與未來研究方向 67
5-1 結論 67
5-2 未來研究方向 68
參考文獻 69

參考文獻

中文部份
［1］李浩平，「運用NGD建立適用於使用者回饋資訊不足之文件過濾系統」，國立中央大學，碩士論文，民國100年。
［2］林文羽，「關鍵字為基礎的多主題概念飄移學習」，國立中央大學，碩士論文，民國102年。
［3］楊佩臻，「利用文句關係網路自動萃取文件摘要之研究」，國立中央大學，碩士論文，民國102年。
英文部份
［4］ Aggarwal, C. C., and Zhai, C. (2012). Mining Text Data. Springer New York Dordrecht Heidelberg London.
［5］ Antiqueira, L., Jr., O. N. O., Costa, L. d. F., and Nunes, M. d. G. V. (2009). “A complex network approach to text summarization”. Information Sciences, 179, 584-599.
［6］ Bando, L. L., Scholer, F., and Turpin, A. (2010). Constructing Query-biased Summaries: a Comparison of Human and System Generated Snippets. in Proceedings of the third symposium on Information interaction in context. pp. 195-204.
［7］ Biemann, C., and Bosch, A. v. d. (2011). Structure Discovery in Natural Language. Springer Heidelberg Dordrecht London New York.
［8］ Cai, X., and Li, W. (2011). “A spectral analysis approach to document summarization: Clustering and ranking sentences simultaneously”. Information Sciences, 181, 3816–3827.
［9］ Cai, X., Li, W., Ouyang, Y., and Yan, H. (2010). Simultaneous Ranking and Clustering of Sentences: A Reinforcement Approach to Multi-Document Summarization. in Proceedings of the 23rd International Conference on Computational Linguistics. pp. 134–142.

［10］ Chen, P.-I., and Lin, S.-J. (2011). “Word AdHoc Network: Using Google Core Distance to extract the most relevant information”. Knowledge-Based Systems, 24, 393–405.
［11］ Davis, M., Joann, D. K., and Marion, D. (2012). Scientific Papers and Presentations: Navigating Scientific Communication in Today′s world. Academic Press.
［12］ Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990). “Indexing by Latent Semantic Analysis”. Journal of the American Society for Information Science, 391-407.
［13］ Erkan, G., and Radev, D. R. (2004). “LexRank: Graph-based Lexical Centrality as Salience in Text Summarization”. Articial Intelligence Research, 22, 457-479.
［14］ Girvan, M., and Newman, M. E. (2002). Community structure in social and biological networks. in Proceedings of the National Academy of Sciences. pp. 7821-7826.
［15］ Hagiwara, M., Ogawa, Y., and Toyama, K. (2008). “Effective Use of Indirect Dependency for Distributional Similarity”. Information and Media Tehnologies, 3(4), 864-887.
［16］ Hancocks, P., and Mullen, J. (2014, May 26). Thai general warns protesters after announcing royal endorsement, CNN.com International. Retrieved from http://edition.cnn.com/2014/05/26/world/asia/thailand-coup/
［17］ Huang, A. (2008). Similarity Measures for Text Document Clustering. in Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008). pp. 49-56.
［18］ Kumar, Y. J., Salim, N., and Albaham, A. A. A. T. (2014). “Multi document summarization based on news components using fuzzy cross-document relations”. Applied Soft Computing, 21, 265–279.
［19］ Lopez, C., Prince, V., and Roche, M. (2014). “How can catchy titles be generated without loss of informativeness?”. Expert Systems with Applications, 41, 1051–1062.

［20］ Mani, I., Klein, G., House, D., Hirschman, L., Firmin, T., and Sundheim, B. (2002). “SUMMAC: a text summarization evaluation”. Natural Language Engineering, 8(1), 43-68.
［21］ Mihalcea, R. (2005). Language Independent Extractive Summarization. in Proceedings of the ACL Interactive Poster and Demonstration Sessions. pp. 49-52.
［22］ Neto, J. L., Santos, A. D., Kaestner, C. A. A., and Freitas, A. A. (2000). Document Clustering and Text Summarization. in Proceedings of the 4th International Conference Practical Applications of Knowledge Discovery and Data Mining (PADD-2000). pp. 41–55.
［23］ O′Madadhain, J., Fisher, D., Nelson, T., White, S., and Boey, Y.-B. JUNG: Java Universal Network/Graph Framework. available now at: http://jung.sourceforge.net/
［24］ Olarn, K., Hancocks, P., and Smith-Spark, L. (2014, May 25). Thailand′s ex-PM Yingluck Shinawatra freed from custody, sources say, CNN.com International. Retrieved from http://edition.cnn.com/2014/05/25/world/asia/thailand-coup/
［25］ Ouyang, Y., Li, W., Zhang, R., Li, S., and Lu, Q. (2013). “A progressive sentence selection strategy for document summarization”. Information Processing and Management, 49, 213–221.
［26］ Radev, D. R., Hovy, E., and McKeown, K. (2002). “Introduction to the Special Issue on Summarization”. Computational Linguistics, 28(4), 398-408.
［27］ Salton, G., and McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill.
［28］ Shimizu, N., Hagiwara, M., Ogawa, Y., Toyama, K., and Nakagawa, H. (2008). Metric Learning for Synonym Acquisition. in Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008). pp. 793–800.
［29］ Tombros, A., and Sanderson, M. (1998). Advantages of Query Biased Summaries in Information Retrieval. in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval pp. 2-10.
［30］ Zhang, Z., Ge, S. S., and He, H. (2012). “Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling”. Information Processing and Management, 48, 767–778.

指導教授

林熙禎

審核日期

2014-7-15

推文