應用字詞關係網路於多文件摘要之方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：54

、訪客IP：13.58.1.103

姓名

朱家霈(Chia-Pei Chu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

應用字詞關係網路於多文件摘要之方法
(Applying relevance terms on graph-based multiple documents summarization)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

近年來網路的發達，讓資訊的傳播更為快速，人們也能隨時接收到許多新的資訊，其中像是新聞訊息更新非常迅速，但是過多、更新過快的新聞內容讓讀者需要花費更多的時間去閱讀每篇新聞文章的完整內容，以掌握新聞的重點。因此本研究的目的在於提出一個應用語句中字詞關聯圖形網路於多文件摘要的方法，找出文章中的重點摘要，讓讀者可以花較少的時間了解新聞的內容。在文件中經常一起出現的字詞組合可能含有其資訊，本研究以關聯規則找出語句中經常一起出現之字詞做為字詞關聯項目，並用其作為節點建立圖形網路，利用圖形中心性找出圖形中較重要之節點，計算語句所涵蓋之關聯規則項目計算語句分數，再根據語句權重分數挑選最高分的語句做為摘要。本研究使用DUC 2004新聞集並進行Task2實驗，輸出665bytes之摘要，透過ROUGE及專家摘要來評估摘要品質。

摘要(英)

Internet develops quickly and makes information spread worldwide. However, update of information in minutes makes people spend much time to read news. Therefore, the purpose of this research is to generate an extractive-based summary for people to have a concept of news. We attempt to apply association rule for extracting relevance terms of sentences from documents and use a graph-based method for calculating the scores of relevance terms and sentences, and then we select the sentence which has higher score to produce summarization of multi-documents. The results of our experiments show that the ROUGE value of applying relevance terms on graph-based multiple documents summarization method could be effective in summarization.

關鍵字(中)

★ 多文件摘要
★ 關聯規則
★ 圖形摘要方法

關鍵字(英)

★ Multi-document Summarization
★ Association Rule
★ Graph-based Summarization Method

論文目次

目錄
中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
一、緒論 1
1-1 研究背景 1
1-2 研究動機 1
1-3 研究目的 2
1-4 研究範圍與限制 2
1-4-1 研究範圍 2
1-4-2 研究限制 2
1-5 論文架構 3
二、文獻探討 4
2-1 文件摘要 4
2-1-1 文件摘要分類 4
2-1-2 文件摘要方法 6
2-2 特徵摘要方法 7
2-3 圖形摘要方法 9
2-3-1 圖形摘要方法之應用 10
2-3-2 關聯規則 11
2-3-3 餘弦定理 12
2-3-4 圖形中心方法 13
三、研究方法 15
3-1 系統架構 15
3-2 文件前處理 16
3-2-1 文件分析 17
3-2-2 計算字詞TF-IDF 18
3-3 關聯字詞計算 18
3-4 字詞關係網路 20
3-4-1 建立字詞關係網路 20
3-4-2 關聯項目計分 21
3-5 語句選取 22
四、實驗分析與結果 24
4-1 實驗環境 24
4-2 實驗資料集 24
4-3 實驗評估指標 26
4-4 實驗設計與流程 27
4-4-1 實驗一流程設計 27
4-4-2 實驗二流程設計 28
4-5 實驗結果 30
4-5-1 實驗一結果 30
4-5-2 實驗二結果 33
4-6 實驗結果討論 36
五、結論與未來研究方向 38
5-1 結論 38
5-2 未來研究方向 39
參考文獻 40

參考文獻

[1] Grin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 30(1-7), 107-117.
[2] Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604-632.
[3] Goldstein, J., Mittal, V., Carbonell, J., & Kantrowitz, M. (2000, April). Multi-document summarization by sentence extraction. In Proceedings of the 2000 NAACL-ANLPWorkshop on Automatic summarization-Volume 4 (pp. 40-48). Association for Computational Linguistics.
[4] Huang, L., He, Y., Wei, F., & Li, W. (2010, April). Modeling document summarization as multi-objective optimization. In Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on (pp. 382-386). IEEE.
[5] Gong, Y., & Liu, X. (2001, September). Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 19-25). ACM.
[6] Das, D., & Martins, A. F. (2007). A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU, 4, 192-195.
[7] Pourvali, M., & Abadeh, M. S. (2012). Automated text summarization base on lexicales chain and graph using of wordnet and wikipedia knowledge base. IJCSI International Journal of Computer Science, 9(1), 343-349.
[8] Sarkar, K. (2009). Sentence clustering-based summarization of multiple text documents. International Journal of Computing Science and Communication Technologies, 2(1), 325-335.
[9] Zhang, P. Y., & Li, C. H. (2009, August). Automatic text summarization based on sentences clustering and extraction. In Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference on (pp. 167-170). IEEE.
[10] Thakkar, K. S., Dharaskar, R. V., & Chandak, M. B. (2010, November). Graph-based algorithms for text summarization. In Emerging Trends in Engineering and Technology (ICETET), 2010 3rd International Conference on (pp. 516-519). IEEE.
[11] Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
[12] Sarkar, K. (2010). Syntactic trimming of extracted sentences for improving extractive multi-document summarization. Journal of Computing, 2(7), 177-184.
[13] Alguliev, R. M., Aliguliyev, R. M., & Isazade, N. R. (2013). Multiple documents summarization based on evolutionary optimization algorithm. Expert Systems with Applications, 40(5), 1675-1689.
[14] Abuobieda, A., Salim, N., Albaham, A. T., Osman, A. H., & Kumar, Y. J. (2012, March). Text summarization features selection method using pseudo genetic-based model. In Information Retrieval & Knowledge Management (CAMP), 2012 International Conference on (pp. 193-197). IEEE.
[15] Fattah, M. A., & Ren, F. (2009). GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Computer Speech & Language, 23(1), 126-144.
[16] Gambhir, M., & Gupta, V. (2017). Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 47(1), 1-66.
[17] Radev, D. R., Jing, H., & Budzikowska, M. (2000, April). Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization (pp. 21-30). Association for Computational Linguistics.
[18] Zhang, Y., Xia, Y., Liu, Y., & Wang, W. (2015, June). Clustering Sentences with Density Peaks for Multi-document Summarization. In HLT-NAACL (pp. 1262-1267).
[19] Mani, I., & Bloedorn, E. (1997, June). Summarizing similarities and differences among related documents. In Computer-Assisted Information Searching on Internet (pp. 373-387).
[20] Barzilay, R., & Elhadad, M. (1999). Using lexical chains for text summarization. Advances in automatic text summarization, 111-121.
[21] Angheluta, R., De Busser, R., & Moens, M. F. (2002, July). The use of topic segmentation for automatic summarization. In Proceedings of the ACL-2002 Workshop on Automatic Summarization (pp. 11-12).
[22] Mihalcea, R. (2004, July). Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions (p. 20). Association for Computational Linguistics.
[23] Chen, F., Han, K., & Chen, G. (2002, October). An approach to sentence-selection-based text summarization. In TENCON′02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering (Vol. 1, pp. 489-493). IEEE.
[24] Gupta, P., Pendluri, V. S., & Vats, I. (2011, February). Summarizing text by ranking text units according to shallow linguistic features. In Advanced Communication Technology (ICACT), 2011 13th International Conference on (pp. 1620-1625). IEEE.
[25] Mihalcea, R., & Tarau, P. (2004, July). TextRank: Bringing order into texts. Association for Computational Linguistics.
[26] Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457-479.
[27] Radev, D. R., Blair-Goldensohn, S., & Zhang, Z. (2001). Experiments in single and multi-document summarization using MEAD. Ann Arbor, 1001, 48109.
[28] Agrawal, R., Imieliński, T., & Swami, A. (1993, June). Mining association rules between sets of items in large databases. In Acm sigmod record (Vol. 22, No. 2, pp. 207-216). ACM.
[29] Baralis, E., Cagliero, L., Jabeen, S., & Fiori, A. (2012, March). Multi-document summarization exploiting frequent itemsets. In Proceedings of the 27th Annual ACM Symposium on Applied Computing (pp. 782-786). ACM.
[30] Baralis, E., Cagliero, L., Fiori, A., & Garza, P. (2015). Mwi-sum: A multilingual summarizer based on frequent weighted itemsets. ACM Transactions on Information Systems (TOIS), 34(1), 5.
[31] Baralis, E., Cagliero, L., Mahoto, N., & Fiori, A. (2013). GraphSum: Discovering correlations among multiple terms for graph-based summarization. Information Sciences, 249, 96-109.
[32] Patil, K., & Brazdil, P. (2007). Text summarization: Using centrality in the pathfinder network. Int. J. Comput. Sci. Inform. Syst [online], 2, 18-32.
[33] Boudin, F. (2013, October). A comparison of centrality measures for graph-based keyphrase extraction. In International Joint Conference on Natural Language Processing (IJCNLP) (pp. 834-838).
[34] Lin, C. Y. (2004, July). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop (Vol. 8).
[35] Lin, C. Y., & Hovy, E. (2003, May). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 71-78). Association for Computational Linguistics.

指導教授

周世傑(Shih-Chieh Chou)

審核日期

2017-7-6

推文