應用語意之字詞分群於多文件自動摘要之方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：34

、訪客IP：3.17.77.61

姓名

林栗岑(Li-Tsen Lin) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

應用語意之字詞分群於多文件自動摘要之方法
(Applying semantic clustering of words on multiple documents summarization method)

相關論文

★ 信用卡盜刷防治簡訊規則製作之決策支援系統	★ 不同檢索策略之效果比較
★ 知識分享過程之影響因子探討	★ 兼具分享功能之檢索代理人系統建構與評估
★ 犯罪青少年電腦態度與學習自我效能之研究	★ 使用AHP分析法在軟體度量議題之研究
★ 優化入侵規則庫	★ 商務資訊擷取效率與品質促進之研究
★ 以分析層級程序法衡量銀行業導入企業應用整合系統(EAI)之關鍵因素	★ 應用基因演算法於叢集電腦機房強迫對流裝置佈局最佳近似解之研究
★ The Development of a CASE Tool with Knowledge Management Functions	★ 以PAT tree 為基礎發展之快速搜尋索引樹
★ 以複合名詞為基礎之文件概念建立方式	★ 利用使用者興趣檔探討形容詞所處位置對評論分類的重要性
★ 透過半結構資訊及使用者回饋資訊以協助使用者過濾網頁文件搜尋結果	★ 利用feature-opinion pair建立向量空間模型以進行使用者評論分類之研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

網路普及改變了我們接收資訊的方式，資訊的取得變得更加容易，但隨手可得的資訊也衍生出許多問題，在面臨龐大的資訊量時，人們無法快速及有效地找到需要的資訊。因此本研究提出一應用語意之字詞分群於多文件自動摘要之方法，自動找出文件重點產生摘要，讓讀者能快速理解文件內容。一般而言，文件通常會涵蓋許多小主題，因此本研究利用WordNet計算字詞間的語意關係，並透過分群找出文件潛在概念，再利用各概念權重表示概念之於文件的重要程度，並結合語句字詞權重、語句概念、語句位置得出語句分數，最後擷取包含重要概念且資訊量較豐富的語句作為摘要。本研究使用DUC 2004新聞文件集進行task2之實驗，作出665 bytes之摘要，並透過ROUGE指標評估摘要品質。

摘要(英)

The popularity of internet has made the spread of information quickly and easier. However it also generates a lot of problems. People cannot find the information they need efficiently when they face huge amounts of information. Therefore, we apply semantic clustering of words on multiple documents summarization method, which can automatically identify the important content of the documents and provide readers a quick review of the news. In general, a document usually covers many topics, so we use WordNet to calculate the semantic relationship between words, and use clustering method to identify the concept of documents. Then we use the weight of concept to represent the importance of concept. Finally we combine the concept of sentence, sentence location, and word weight of sentence to calculate sentence score, and output the sentence which has higher score. In the experiments, we use the DUC 2004 news document set of task2, we generate a summary of 665 bytes, and evaluate the quality through ROUGE measurements.

關鍵字(中)

★ 多文件摘要
★ 摘錄式摘要
★ WordNet
★ 概念萃取

關鍵字(英)

★ Multi-document summarization
★ Extract-based summarization
★ WordNet
★ Concept extraction

論文目次

中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
一、緒論 1
1-1 研究背景 1
1-2 研究動機 1
1-3 研究目的 1
1-4 研究範圍與限制 2
1-4-1 研究範圍 2
1-4-2 研究限制 2
1-5 論文架構 3
二、文獻探討 4
2-1 多文件自動摘要 4
2-1-1 多文件自動摘要種類 4
2-1-2 語句特徵分析方法 5
2-2 概念萃取 5
2-2-1 語意方式 6
2-2-2 機率模型 6
2-3 WordNet 8
2-3-1 WordNet架構 8
2-3-2 WordNet語意相似度 11
2-4 分群方法 12
三、研究方法 15
3-1 系統架構 15
3-2 文件前處理 16
3-3 文件分析 17
3-3-1 計算字詞權重 17
3-3-2 建立關鍵字詞集合 18
3-4 概念萃取 18
3-4-1 計算字詞語意相似度矩陣 18
3-4-2 關鍵字語意分群 20
3-4-3 計算概念權重 21
3-5 語句選取 21
3-5-1 計算語句與概念相似度 22
3-5-2 計算語句分數 22
3-5-3 輸出語句作為摘要 24
四、實驗分析 25
4-1 實驗環境 25
4-2 實驗資料集 25
4-3 評估摘要成果指標 27
4-4 實驗參數設定 28
4-4-1 權重比例最佳化 28
4-4-2 概念比例調整 29
4-5 實驗設計與流程 30
4-5-1 實驗一流程設計 31
4-5-2 實驗二流程設計 31
4-6 實驗結果 32
4-6-1 實驗一結果 32
4-6-2 實驗二結果 35
4-7 實驗結果討論 38
五、結論 40
5-1 研究結論與貢獻 40
5-2 未來研究方向 40
參考文獻 42

參考文獻

[1] Gupta, V., & Lehal, G. S. (2010). A survey of text summarization extractive techniques. Journal of emerging technologies in web intelligence, 2(3), 258-268.
[2] Tombros, A., & Sanderson, M. (1998, August). Advantages of query biased summaries in information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 2-10). ACM.
[3] Bando, L. L., Scholer, F., & Turpin, A. (2010, August). Constructing query-biased summaries: a comparison of human and system generated snippets. In Proceedings of the third symposium on Information interaction in context (pp. 195-204). ACM.
[4] Radev, D. R., Hovy, E., & McKeown, K. (2002). Introduction to the special issue on summarization. Computational linguistics, 28(4), 399-408.
[5] Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
[6] Abuobieda, A., Salim, N., Albaham, A. T., Osman, A. H., & Kumar, Y. J. (2012, March). Text summarization features selection method using pseudo genetic-based model. In Information Retrieval & Knowledge Management (CAMP), 2012 International Conference on (pp. 193-197). IEEE.
[7] Chen, F., Han, K., & Chen, G. (2002, October). An approach to sentence-selection-based text summarization. In TENCON′02. Proceedings. 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering (Vol. 1, pp. 489-493). IEEE.
[8] Hovy E. & Lin C. (1998). Automated text summarization in SUMMARIST. In Proceedings of the TIPSTER Workshop(pp.18-24).
[9] Neto, J. L., Santos, A. D., Kaestner, C. A., & Freitas, A. A. (2000). Generating text summaries through the relative importance of topics. In Advances in Artificial Intelligence (pp. 300-309). Springer Berlin Heidelberg.
[10] Angheluta, R., De Busser, R., & Moens, M. F. (2002, July). The use of topic segmentation for automatic summarization. In Proceedings of the ACL-2002 Workshop on Automatic Summarization (pp. 11-12).
[11] Silla Jr, C. N., Kaestner, C. A., & Freitas, A. A. (2003). A non-linear topic detection method for text summarization using wordnet. In Workshop of Technology Information Language Human (TIL′2003) (Vol. 24).
[12] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.
[13] Zhu, T., & Li, K. (2012). The similarity measure based on LDA for automatic summarization. Procedia Engineering, 29, 2944-2949.
[14] Liu, N., Tang, X. J., Lu, Y., Li, M. X., Wang, H. W., & Xiao, P. (2014, July). Topic-Sensitive Multi-document Summarization Algorithm. In Parallel Architectures, Algorithms and Programming (PAAP), 2014 Sixth International Symposium on (pp. 69-74). IEEE.
[15] Bian, J., Jiang, Z., & Chen, Q. (2014, August). Research on Multi-document Summarization Based on LDA Topic Model. In Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2014 Sixth International Conference on (Vol. 2, pp. 113-116). IEEE.
[16] Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39-41.
[17] Voorhees, E. M. (1993, July). Using WordNet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 171-180). ACM.
[18] Resnik, P. (1999). Disambiguating noun groupings with respect to WordNet senses. In Natural Language Processing Using Very Large Corpora (pp. 77-98). Springer Netherlands.
[19] Voorhees, E. M. (1994, August). Query expansion using lexical-semantic relations. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 61-69). Springer-Verlag New York, Inc..
[20] Scott, S., & Matwin, S. (1998, August). Text classification using WordNet hypernyms. In Use of WordNet in natural language processing systems: Proceedings of the conference (pp. 38-44).
[21] Dang, C. H. E. N. G. H. U. A., & Luo, X. I. N. J. U. N. (2008, April). WordNet-Based Dcument Summarization. In WSEAS International Conference. Proceedings. Mathematics and Computers in Science and Engineering (No. 7). World Scientific and Engineering Academy and Society.
[22] Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004, May). WordNet:: Similarity: measuring the relatedness of concepts. In Demonstration papers at HLT-NAACL 2004 (pp. 38-41). Association for Computational Linguistics.
[23] Chua, S., & Kulathuramaiyer, N. (2004, September). Semantic feature selection using WordNet. In Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence (pp. 166-172). IEEE Computer Society.
[24] Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E. G., & Milios, E. E. (2005, November). Semantic similarity methods in wordNet and their application to information retrieval on the web. In Proceedings of the 7th annual ACM international workshop on Web information and data management (pp. 10-16). ACM.
[25] Bouras, C., & Tsogkas, V. (2012). A clustering technique for news articles using WordNet. Knowledge-Based Systems, 36, 115-128.
[26] MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297).
[27] Sibson, R. (1973). SLINK: an optimally efficient algorithm for the single-link cluster method. The computer journal, 16(1), 30-34.
[28] Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. science, 315(5814), 972-976.
[29] Li, H., & Abe, N. (1998, August). Word clustering and disambiguation based on co-occurrence data. In Proceedings of the 17th international conference on Computational linguistics-Volume 2 (pp. 749-755). Association for Computational Linguistics.
[30] Chang, H. C., & Chiun-Chieh, H. S. U. (2005). Using topic keyword clusters for automatic document clustering. IEICE TRANSACTIONS on Information and Systems, 88(8), 1852-1860.
[31] Sedoc, J., Gallier, J., Ungar, L., & Foster, D. (2016). Semantic Word Clusters Using Signed Normalized Graph Cuts. arXiv preprint arXiv:1601.05403.
[32] Wei, T., Lu, Y., Chang, H., Zhou, Q., & Bao, X. (2015). A semantic approach for text clustering using WordNet and lexical chains. Expert Systems with Applications, 42(4), 2264-2275.
[33] Biemann, C., & van den Bosch, A. (2011). Structure discovery in natural language. Springer Science & Business Media.
[34] Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval.
[35] Yang R. M., and Lee C. H.(2009). Multi-Document Summarization System Based on Mutual Reinforcement Principle. National Chiao Tung University Institute of Multimedia Engineering.
[36] Lin, C. Y., & Hovy, E. (2003, May). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 71-78). Association for Computational Linguistics.

指導教授

周世傑(Shih-Chieh Chou)

審核日期

2017-7-6

推文