利用文句關係網路自動萃取文件摘要之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：56

、訪客IP：18.118.0.145

姓名

楊佩臻(Pei-Chen Yang) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

利用文句關係網路自動萃取文件摘要之研究
(Using Sentence Network to Automatic Document Summarization)

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

本研究提出一個使用基於NGD的通用萃取式圖形化摘要方法，由於NGD擁有只需要文件本身資訊及搜尋引擎搜尋結果數的特點，可除去對外部資源如語料庫及語義辭典的依賴。本研究使用NGD計算文件內字詞之間的關聯找出文件關鍵字並用其建立一向量空間，以文句在向量空間中的餘弦相似度為基準建立文句關係網路，再利用鏈結分析找出文句關係網路中重要的文句節點作為摘要。
經ROUGE評估摘要品質，本研究所提出利用文句關係網路計分的方法在單文件摘要及50字的多文件摘要中，可達到比DUC2001及DUC2002當年利用機器學習摘要組合的方法更佳的結果，而在100字及200字的多文件摘要中，也僅略遜於當年利用機器學習的幾位參賽者。證明本研究確實建立一有效的不需要依賴相關語料庫及語義辭典的通用非監督式文件萃取式摘要方法。

摘要(英)

This paper proposed a Graph-based Summarization method by building a sentence network that represent the relation between sentences with NGD. The method can get rid of the dependence of external resources like corpus and lexical database by using the words in the documents and the search result. Using Wiki Engine to calculate NGD and find out the relation between words. Finally, the keywords in the documents are found out. Building a Vector Space Model by the keywords and calculating the similarity between sentences to build a sentence network. The most import sentences are extracted by using Link Analysis. The experiment results showed that the ROUGE value of proposed graph-based single-document summarization method is better than other machine learning methods, and the ROUGE value of proposed graph-based multi-documents summarization method is just lower than few peers using machine learning methods. It proves that this proposed method is an effective unsupervised document summarization without external resources like corpus and lexical database.

關鍵字(中)

★ 自動文件摘要
★ 文句關係網路
★ 圖形化摘要方法

關鍵字(英)

★ NGD
★ Graph-based Summarization

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 vi
表目錄 vii
一、緒論 1
1-1 研究背景 1
1-2 研究動機 2
1-3 研究目的 4
1-4 論文架構 5
二、文獻探討 6
2-1 自動文件摘要及其種類 6
2-2 文句特徵摘要方法 8
2-2-1 文件標題 10
2-2-2 文句長度 10
2-2-3 文句位置 11
2-2-4 數值數據 12
2-2-5 主題字 12
2-2-6 指標片語 13
2-2-7 小結 13
2-3 監督式摘要方法 14
2-3-1 支援向量機 15
2-3-2 隱藏式馬可夫模型 15
2-3-3 小結 16
2-4 圖形化摘要方法 16
2-4-1 鏈結分析 17
2-4-2 複雜網路圖形 20
2-4-3 餘弦相似度 21
2-4-4 NGD 21
2-4-5 小結 21
2-5 排名組合方法 22
2-5-1 簡單組合方法 22
2-5-2 穩定選擇法 22
2-5-3 指數加權法 23
2-5-4 波達計數法 23
2-5-5 循環法 23
2-5-6 小結 24
三、系統設計與架構 25
3-1 系統架構 25
3-2 文件前處理 27
3-2-1 DUC原始檔案拆解 27
3-2-2 文件內容拆解 29
3-2-3 詞性組合 29
3-2-4 字詞長度 30
3-2-5 Wiki搜尋結果數 30
3-2-6 與文件標題相關程度 30
3-3 文句重要性計分 31
3-3-1 文句轉為向量 31
3-3-2 建立文句關係網路 31
3-3-3 文句節點評分 33
3-4 文句分數排名 34
3-4-1 單一方法排名 34
3-4-2 排名組合 35
3-5 產生文件摘要 35
四、實驗結果與討論 36
4-1 資料集介紹 36
4-1-1 DUC 36
4-1-2 資料集內容 36
4-2 評估準則 37
4-3 實驗環境 38
4-4 實驗結果與討論 38
4-4-1 實驗一：文句關係網路的摘要方法在Google及Wiki上的比較 39
4-4-2 實驗二：與標題相關字詞的 NGD門檻值 42
4-4-3 實驗三：建立節點連結的Cosine Similarity門檻值 45
4-4-4 實驗四：單一方法的摘要表現 49
4-4-5 實驗五：本研究方法摘要品質評估 51
五、結論與未來研究方向 57
5-1 結論 57
5-2 未來研究方向 59
參考文獻 61

參考文獻

［1］ Abuobieda A., Salim N., Albaham A. T., Osman A.H., Kumar Y. J. (2012). Text Summarization Features Selection Method using Pseudo Genetic-based Model. International Conference on Information Retrieval & Knowledge Management, 2012, pp. 193-197.
［2］ Antiqueira L., Jr. O. N. O., Costa, L. D. F., and Nunes, M. D. G. V. (2009). A complex network approach to text summarization. Information Sciences, 179 (2009), pp. 584-599.
［3］ Atkinson, J. and Munoz, R. (2013). Rhetorics-based multi-document summarization. Expert Systems with Applications, (2013)
［4］ Chali, Y. and Hasan, S. A. (2011). Query-focused multi-document summarization: automatic data annotations and supervised learning approaches. Natural Language Engineering, pp. 1-37. doi: 10.1017/S1351324911000167.
［5］ Chen, B., Lin, S. H., Chang, Y. M., and Liu, J. W. (2013). Extractive speech summarization using evaluation metric-related training criteria. Information Processing and Management, 49 (1), pp. 1-12.
［6］ Chen, P. I. and Lin, S. J. (2011). Word AdHoc Network: Using Google Core Distance to extract the most relevant information. Knowledge-Based Systems, 24(3), pp. 393-405.
［7］ Cilibrasi, R.L. and Vitanyi, P.M.B. (2007). The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), pp, 370-383.
［8］ Mihalcea, R. and Radev, D. (2011). Graph-based Natural Language Processing and Information Retrieval . Cambridge University Press.
［9］ Wald, R., Khoshgoftaar, T. M., Dittman, D., Awada, W. and Napolitano, A. (2012). An extensive comparison of feature ranking aggregation techniques in bioinformatics. The 13th IEEE International Conference on Information Reuse and Integration, Las Vegas, USA August 8–10, 2012.
［10］ Zhang, Z., Ge, S. S., and He, H. (2012). Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling. Information Processing and Management, 48 (2012), pp. 767-778.
［11］郭映彤，「運用字詞與語句關係自動萃取文件摘要之研究」，國立中央大學，碩士論文，民國101年。
［12］鄭奕駿，「離線搜尋Wikipedia以縮減NGD運算時間之研究」，國立中央大學，碩士論文，民國101年。

指導教授

林熙禎(Shi-Jen Lin)

審核日期

2013-7-16

推文