以主題事件追蹤為基礎之摘要擷取

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：42

、訪客IP：3.147.86.143

姓名

王蓮淨(Lian-Jing Wang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

以主題事件追蹤為基礎之摘要擷取

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年由於網路發展迅速，使用者只要透過網路即可以取得所需資訊，但過多的資訊造成資訊過載之問題，因此如何在眾多資訊中擷取出重要的資訊供使用者閱讀已成為當今重要議題。然而傳統的摘要模式通常為靜態摘要，並無法針對特定主題做每日摘要的動態更新，因此本研究加入了遺忘因子，每日可更新摘要內容。並採用以主題關鍵字為基礎的方式，產出特定主題摘要內容，本研究將使用查詢式摘要(Query-oriented Summarization)法來進行多文件摘要之擷取。
本研究將採用圖形網路分群架構分析文句之間潛在語意關聯性，分群方式為K-Medoids分群，探討圖形網路中所有文句節點之間的相似度，並將之做分群，得出文句間潛在語意，以提升摘要品質。
實驗採用DUC 2002資料集，並以ROUGE衡量摘要品質，和自行蒐集之CNN新聞文章，其主題分別為尼泊爾大地震、伊斯蘭國及MERS，並觀察摘要結果是否能達到主題事件追蹤的功效；經實驗證明，本研究採用K-Medoids分群架構之多文件摘要方法在DUC 2002之50字、100字和200字多文件摘要，ROUGE-1值分別可達到0.2948、0.3435與0.4375，此結果在50字與100字摘要品質幾乎優於全數當年研討會之參賽者之摘要品質，另外200字摘要結果也與當年參賽者勢均力敵；而在主題事件追蹤之摘要實驗，也證實本系統可以達到主題事件追蹤摘要的功效。
關鍵字：查詢式摘要、擷取式摘要、K-Medoids、遺忘因子、多文件摘要、主題事件追蹤。

摘要(英)

In recent year, the developing technology of Network is getting soon. User can get information through the Internet, but it generates a problem that is information overload. Therefore, how to get some important information to user is really important now. However, the traditional technology of summarization is static, and it can′t trace the specific topic and update the summary everyday. That is why there is a damping factor in this research, and it can update the summary everyday. Also, in this research, using a way which based on topic term, and created the summary of the specific topic. In this research, using the Query-oriented Summarization way is to get Multi-document Summarization.
Using the clustering architecture of graph network is to analyze the hiding semantic relation between sentences in this research. The clustering way is K-Medoids Clustering. Discussing the similarly between all sentences in graph network, and clustering these sentences are to get hiding semantic relation between sentences to rise the quality of summary.
In experiment, using DUC 2002 data set and analyzing quality of summary by ROUGE, and the other data set is CNN news which topics are Nepal earthquake, Islamic State, and MERS. Observing the result of summaries is achieving the efficacy which is tracing topic event or not. The result show that using K-Medoids clustering architecture is to create Multi-document Summarizations which are 50, 100 and 200 words by DUC 2002 data set. The results of ROUGE-1 are 0.2948, 0.3435 and 0.4375. Also, the quality of summaries which are 50 and 100 words are higher than participants in DUC 2002. In addition, the result of summary of 200 words is good as participants in DUC 2002. Furthermore, in experiment of summary of tracing topic event, also proving the system in this research can achieve the efficacy which is tracing topic event.
Keywords: Query-oriented Summarization, Extractive Summarization, K-Medoids, damping factor, Multi-document Summarization and tracing topic event

關鍵字(中)

★ 查詢式摘要
★ 擷取式摘要
★ K-Medoids
★ 遺忘因子
★ 多文件摘要
★ 主題事件追蹤

關鍵字(英)

★ Query-oriented Summarization
★ Extractive Summarization
★ K-Medoids
★ damping factor
★ Multi-document Summarization
★ tracing topic event

論文目次

摘要 i
Abstract v
目錄 vii
圖目錄 x
表目錄 xi
一、緒論 1
1-1 研究背景 1
1-2 研究動機 2
1-3 研究目的 10
1-4 論文架構 10
二、文獻探討 11
2-1 自動文件摘要 11
2-2 相關文獻做法與本研究差異 12
2-3 NGD 15
2-4 特徵分析方法 16
2-4-1 1-gram filtering 16
2-4-2 文件內容與標題之間關聯性 18
2-4-3 文件內容與主題關鍵字之間關聯性 18
2-4-4 Term Frequency-Inverse Sentence Frequency 18
2-4-5 文句長度之研究 19
2-5 向量相似度衡量方法 19
2-6 K-Medoids文句分群 20
2-7 鏈結評分方法 20
2-7-1 Degree 21
2-7-2 Strength 21
2-7-3 K-Core 21
2-7-4 PageRank 22
2-7-5 Locality Index 22
2-8 波達計數法 23
三、系統架構 24
3-1 系統概念與流程 24
3-2 前處理流程 25
3-2-1 1-gram filtering 25
3-2-2 篩選候選關鍵字 26
3-2-2-1 候選關鍵字 26
3-2-2-2 候選關鍵字向量 27
3-2-2-3 候選關鍵字活躍權重(Active Weight, AW) 27
3-2-2-4 計算候選關鍵字權重 28
3-2-3 文句過濾 29
3-2-4 更新候選關鍵字AW權重 30
3-2-5 文句轉向量 32
3-3 文句計分 33
3-3-1 K-Medoids 文句分群與文句計分 33
3-3-2 建立文句關係網路 33
3-3-3 K-Medoids文句分群實作與群集計分 34
3-3-4 鏈結評分法 36
3-3-4-1 單一鏈結評分法 36
3-3-4-2 整合鏈結評分法 37
3-4 挑選文句 37
四、實驗設計與結果討論 37
4-1 資料集與實驗設置 37
4-1-1 實驗環境 38
4-2 評估摘要成果準則 38
4-3 實驗流程 39
4-4 實驗數據與討論 40
4-4-1 實驗一：使用DUC 2002資料集進行單一鏈結法評估 41
4-4-2 實驗二：使用DUC 2002資料集進行整合鏈結法評估 45
4-4-3 實驗三：使用CNN新聞資料集進行主題事件追蹤之評估 47
4-4-4 實驗四：比較系統效能 58
五、結論以及未來研究方向 60
5-1 結論 61
5-2 未來研究方向 62
參考文獻 63

參考文獻

中文部分
[1] 黃嘉偉, 「以文句網路分群架構萃取多文件摘要」. 國立中央大學，碩士論文，民國103年.
[2] 吳登翔, 「使用者模型為基礎的概念念飄移預測」. 國立中央大學，碩士論文，民國103年.
[3] 李浩平, 「運用NGD建立適用於使用者回饋資訊不足之文件過濾系統」. 國立中央大學，碩士論文，民國 100 年.
[4] 楊佩臻, 「利用文句關係網路自動萃取文件摘要之研究」. 國立中央大學，碩士論文，民國 102 年。.
英文部分
[5] L. Kaufman and P. J. Rousseeuw, “Clustering by means of medoids,” Stat. Data Anal. Based L 1-Norm Relat. Methods. First Int. Conf., pp. 405–416416, 1987.
[6] “Google 快訊.” [Online]. Available: https://www.google.com/alerts.
[7] C. Aggarwal and C. Zhai, Mining text data, vol. 4, no. 2(63). Springer New York Dordrecht Heidelberg London, 2012.
[8] I. MANI, G. KLEIN, D. HOUSE, L. HIRSCHMAN, T. FIRMIN, and B. SUNDHEIM, “SUMMAC: a text summarization evaluation,” Natural Language Engineering, vol. 8, no. 01. 2002.
[9] A. Tombros and M. Sanderson, “Advantages of query biased summaries in information retrieval,” Proc. 1998 21st Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr., pp. 2–10, 1998.
[10] L. L. Bando, F. Scholer, and A. Turpin, “Constructing Query-biased Summaries : a Comparison of Human and System Generated Snippets,” Proc. third Symp. Inf. Interact. Context, pp. 195–204, 2010.
[11] L. Antiqueira, O. Oliveirajr, L. Costa, and M. Nunes, “A complex network approach to text summarization,” Inf. Sci. (Ny)., vol. 179, no. 5, pp. 584–599, Feb. 2009.
[12] D. R. Radev, E. Hovy, and K. McKeown, “Introduction to the special issue on summarization,” Comput. Linguist., vol. 28, no. 4, pp. 399–408, 2002.
[13] M. Girvan and M. E. J. Newman, “Community structure in social and biological networks.,” Proc. Natl. Acad. Sci. U. S. A., vol. 99, no. 12, pp. 7821–7826, 2002.
[14] E. Canhasi and I. Kononenko, “Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization,” Expert Syst. Appl., vol. 41, no. 2, pp. 535–543, 2014.
[15] Z. Zhang, S. S. Ge, and H. He, “Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling,” Inf. Process. Manag., vol. 48, no. 4, pp. 767–778, Jul. 2012.
[16] X. Cai and W. Li, “A spectral analysis approach to document summarization: Clustering and ranking sentences simultaneously,” Inf. Sci. (Ny)., vol. 181, no. 18, pp. 3816–3827, Sep. 2011.
[17] G. Erkan and D. Radev, “LexRank: Graph-based lexical centrality as salience in text summarization,” J. Artif. Intell. Res.(JAIR), vol. 22, pp. 457–479, 2004.
[18] D. M. Dunlavy, D. P. O’Leary, J. M. Conroy, and J. D. Schlesinger, “QCS: A system for querying, clustering and summarizing documents,” Inf. Process. Manag., vol. 43, no. 6, pp. 1588–1605, 2007.
[19] G. Yang, D. Wen, Kinshuk, N.-S. Chen, and E. Sutinen, A novel contextual topic model for multi-document summarization, vol. 42, no. 3. Elsevier Ltd, 2015.
[20] M. Mendoza, S. Bonilla, and C. Noguera, “Extractive single-document summarization based on genetic operators and guided local search,” Expert Syst. with …, vol. 41, no. 9, pp. 4158–4169, Jul. 2014.
[21] C. Aggarwal and S. Philip, “A Framework for Clustering Massive Text and Categorical Data Streams.,” Sdm, pp. 479–483, 2006.
[22] K. Sugiyama, K. Hatano, and M. Yoshikawa, “Adaptive web search based on user profile constructed without any effort from users,” in WWW ’04: Proceedings of the 13th international conference on World Wide Web, 2004, pp. 675–684.
[23] R. L. Cilibrasi and P. M. B. Vitányi, “The Google similarity distance,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 3, pp. 370–383, 2007.
[24] P. I. Chen and S. J. Lin, “Word AdHoc Network: Using Google Core Distance to extract the most relevant information,” Knowledge-Based Syst., vol. 24, no. 3, pp. 393–405, 2011.
[25] J. Neto, A. Santos, and C. Kaestner, “Document clustering and text summarization,” Proc. 4th Int. Conf., 2000.
[26] D. Davis, M., Joann, D. K., and Marion, “Scientific Papers and Presentations: Navigating Scientific Communication in Today’s world. Academic Press.,” 2012.
[27] C. Lopez, V. Prince, and M. Roche, “How can catchy titles be generated without loss of informativeness?,” Expert Syst. Appl., vol. 41, no. 4 PART 1, pp. 1051–1062, 2014.
[28] G. Salton and M. J. McGill, “Introduction to modern information retrieval.,” Introd. to Mod. Inf. Retr., 1983.
[29] C. Biemann, Structure Discovery in Natural Language. Springer Heidelberg Dordrecht London New York.
[30] A. Huang, “Similarity measures for text document clustering,” Proc. Sixth New Zeal., no. April, pp. 49–56, 2008.
[31] K. Hagiwara, M., Ogawa, Y., and Toyama, “Effective Use of Indirect Dependency for Distributional Similarity,” Inf. Media Tehnol., no. 3(4), pp. 864–887, 2008.
[32] H. Shimizu, N., Hagiwara, M., Ogawa, Y., Toyama, K., and Nakagawa, “Metric Learning for Synonym Acquisition,” 2008, pp. pp. 793–800.
[33] S. Gu, Y. Tan, and X. He, “Recentness biased learning for time series forecasting,” in Information Sciences, 2013, vol. 237, pp. 29–38.
[34] L. Li, L. Zheng, F. Yang, and T. Li, “Modeling and broadening temporal user interest in personalized news recommendation,” Expert Syst. Appl., vol. 41, no. 7, pp. 3168–3177, 2014.
[35] R. Angheluta, R. De Busser, and M. Moens, “The Use of Topic Segmentation for Automatic Summarization,” in Proceedings of the ACL-2002 Workshop on Automatic Summarization, 2002.
[36] N. Labroche, “Online fuzzy medoid based clustering algorithms,” Neurocomputing, vol. 126, pp. 141–150, 2014.

指導教授

林熙禎(Shi-Jen Lin)

審核日期

2015-7-27

推文