運用字詞與語句關係自動萃取文件摘要之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：30

、訪客IP：18.188.12.168

姓名

郭映彤(Yin-Tung Kuo) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

運用字詞與語句關係自動萃取文件摘要之研究
(Automatic Text Summarization Using Relationship between Words and Sentences)

相關論文

★ 網路合作式協同教學設計平台－以國中九年一貫課程為例	★ 內容管理機制於常用問答集(FAQ)之應用
★ 行動多重代理人技術於排課系統之應用	★ 存取控制機制與國內資安規範之研究
★ 信用卡系統導入NFC手機交易機制探討	★ App應用在電子商務的推薦服務-以P公司為例
★ 建置服務導向系統改善生產之流程-以W公司PMS系統為例	★ NFC行動支付之TSM平台規劃與導入
★ 關鍵字行銷在半導體通路商運用-以G公司為例	★ 探討國內田徑競賽資訊系統－以103年全國大專田徑公開賽資訊系統為例
★ 航空地勤機坪作業盤櫃追蹤管理系統導入成效評估—以F公司為例	★ 導入資訊安全管理制度之資安管理成熟度研究－以B個案公司為例
★ 資料探勘技術在電影推薦上的應用研究-以F線上影音平台為例	★ BI視覺化工具運用於資安日誌分析—以S公司為例
★ 特權帳號登入行為即時分析系統之實證研究	★ 郵件系統異常使用行為偵測與處理-以T公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

本研究使用 NGD 建立一使用字詞關係網路的文句特徵摘要法以及一使用文句內聚關係網路的圖形化摘要方法，藉由 NGD 計算只需要文件本身包含字詞以及 Google 搜尋結果數的特點，除去對於相關領域資料集以及字詞關連字典的依賴。接著將兩組方法的結果以非監督式偏好投票式方法組合，達成一具有各方法共識的最終摘要結果。經 ROUGE 評估摘要品質，本方法所提出的利用字詞關係網路計分的文句特徵法可以達成比使用字詞統計資訊的 TF-IDF 計分好的效果。而文句內聚關係網路方法以及整體的排名分數組合法的表現也只略遜於 DUC 2002 當年一利用機器學習摘要組合的方法，證明本研究確實建立一有效的不需依賴相關文集、語義關係字典的非監督式單文件萃取式摘要方法。

摘要(英)

This study proposed a feature-based and a graph-based summarization method by building graphs that represents the text, and interconnects between words and sentence with NGD. The methods can get rid of the reliance on the text corpus and lexical database, because we only use the words in document and the Google search results of word pairs to calculate NGD. We also proposed an aggregate method to combine the results from previous two summarization methods to generate better summary results. The experiment results showed that the ROUGE value of proposed feature-based summarization method was better than the feature-based summarization method using the TF-IDF. And the ROUGE values of proposed graph-based and aggregate summarization methods were only slightly lower by one of the DUC2002 peers. It proved that we proposed an effective unsupervised single-document summarization method without using the text corpus and lexical database.

關鍵字(中)

★ 字詞關係網路
★ 自動文件摘要
★ 文句關係網路
★ 圖形化摘要方法

關鍵字(英)

★ Graph-based summarization
★ NGD
★ Single-document summarization

論文目次

一、緒論........................................................................................................................1
1-1研究背景..................................................................................................1
1-2研究動機..................................................................................................2
1-3研究目的..................................................................................................2
1-4研究方法..................................................................................................3
1-5論文架構..................................................................................................4
二、文獻探討................................................................................................................5
2-1自動文件摘要(Automatic Text Summarization) 與其種類.................5
2-2文句特徵摘要方法..................................................................................8
2-2-1字詞頻率(Term Frequency)............................................................8
2-2-2詞彙頻率與反向文件頻率(TF-IDF)..............................................8
2-2-3線索字詞(Clue Words)...................................................................9
2-2-4文句位置與長度...............................................................................9
2-2-5小結...................................................................................................9
2-3監督式摘要方法(Supervised Approaches)..........................................10
2-3-1貝式分類器(Naive Bayes Classifier)............................................10
2-3-2支持向量機(Support Vector Machines)........................................11
2-3-3小結.................................................................................................11
2-4圖形化摘要方法(Graph-Based Summarization).................................12
2-4-1內文重複程度(Contextual Overlap).............................................13
2-4-2餘弦相似度(Cosine Similarity)....................................................14
2-4-3字詞內聚關係(Lexical Cohesion)................................................15
2-4-4WordNet..........................................................................................15
2-4-5字詞共現(Terms Co-occurrence)..................................................16
2-4-6NGD (Normalized Google Distance)..............................................16
2-4-7圖形排名演算法(Graph-Based Ranking Algorithm)...................17
2-4-8複雜網路指標(Complex Network Measurements)......................19
三、系統設計與架構..................................................................................................22
3-1系統架構................................................................................................22
3-2文章前處理............................................................................................24
3-2-1詞性組合(Part-Of-Speech Combination).....................................25
3-2-2字詞長度(Length of the Word).....................................................25
3-2-3搜尋結果數(Google Search Result).............................................26
3-3文句重要性計分....................................................................................26
3-3-1字詞關係網路文句特徵計分.........................................................26
3-2-2文句內聚關係圖形化計分.............................................................31
3-4文句分數排名組合................................................................................35
3-5產生文件摘要........................................................................................36
四、實驗結果與討論..................................................................................................37
4-1資料集介紹............................................................................................37
4-2評估準則................................................................................................39
4-3實驗環境................................................................................................40
4-4實驗結果與討論....................................................................................41
4-4-1實驗一：節點關係的NGD門檻值..............................................41
4-4-2實驗二：單一方法的摘要表現.....................................................43
4-4-3實驗三：字詞關係網路(wNET) 做為文句特徵摘要方法........45
4-4-4實驗四：排名組合方法的門檻值.................................................46
4-4-5實驗五：本研究方法摘要品質評估.............................................48
五、結論與未來研究方向..........................................................................................50
5-1結論........................................................................................................50
5-2未來研究方向........................................................................................51
參考文獻......................................................................................................................53

參考文獻

〔1〕 I. Mani, G. Klein, D. House, L. Hirschman, T. Firmin and B. Sundheim, “SUMMAC:
a text summarization evaluation”, Natural Language Engineering, vol. 8, no. 1, pp.
43-68, 2002.
〔2〕 I. Mani and M. T. Maybury, Advances in Automatic Text Summarization, MIT Press,
Cambridge, 1999
〔3〕 H. P. Luhn, “The automatic creation of literature abstracts”, IBM Journal of Research
and Development, vol. 2, no. 2, pp. 159-165, April 1958.
〔4〕 E. Hovy and C. Y. Lin, “Automated text summarization and the SUMMARIST
system”, Proceedings of a workshop on held at Baltimore, Maryland: October 13-15,
1998, pp. 197-214, Baltimore, Maryland, 1998.
〔5〕 H. P. Edmundson, “New Methods in Automatic Extracting”, Journal of the ACM, vol.
16, no. 2, pp. 264-285, April 1969.
〔6〕 P. B. Baxendale, “Machine-made index for technical literature: an experiment”, IBM
Journal of Research and Development, vol. 2, no. 4, pp. 354-361, October 1958.
〔7〕 J. Kupiec, J. Pedersen and F. Chen, “A trainable document summarizer”, Proceedings
of the 18th annual international ACM SIGIR conference on Research and
development in information retrieval, pp. 68-73, Seattle, Washington, United States,
December 1995.
〔8〕 T. Hirao, H. Isozaki, E. Maeda and Y. Matsumoto, “Extracting important sentences
with support vector machines”, Proceedings of the 19th international conference on
Computational linguistics - Volume 1, pp. 1-7, Taipei, Taiwan, 2002.
〔9〕 R. Mihalcea, “Graph-based ranking algorithms for sentence extraction, applied to text
summarization”, Proceedings of the ACL 2004 on Interactive poster and
demonstration sessions, pp. 20, Barcelona, Spain, July 2004.54
〔10〕D. R. Radev, H. Jing, M. Styś and D. Tam, “Centroid-based summarization of multiple
documents”, Information Processing & Management, vol. 40, no. 6, pp. 919-938,
October 2004.
〔11〕M. Pourvali and M. S. Abadeh, “Automated Text Summarization Base on Lexicales
Chain and graph Using of WordNet and Wikipedia Knowledge Base”, International
Journal of Computer Science Issues, vol. 9, no. 1, pp. January 2012.
〔12〕G. Huantong, Z. Peng, C. Enhong and C. Qingsheng, “A Novel Automatic Text
Summarization Study Based on Term Co-Occurrence”, The 5th IEEE International
Conference on Cognitive Informatics, pp. 601-606, Beijing, China, July 2006.
〔13〕M. Litvak and M. Last, “Graph-based keyword extraction for single-document
summarization”, Proceedings of the Workshop on Multi-source Multilingual
Information Extraction and Summarization, pp. 17-24, Manchester, United Kingdom,
2008.
〔14〕P. A. Kumar, K. P. Kumar, T. S. Rao and P. K. Reddy, “An improved approach to
extract document summaries based on popularity”, Proceedings of the 4th
international conference on Databases in Networked Information Systems, pp.
310-318, Aizu-Wakamatsu, Japan, 2005.
〔15〕 G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval”,
Information Processing and Management, vol. 24, no. 5, pp. 513-523, January 1988.
〔16〕R. L. Cilibrasi and P. M. B. Vitanyi, “The Google Similarity Distance”, Knowledge
and Data Engineering, IEEE Transactions on, vol. 19, no. 3, pp. 370-383, 2007.
〔17〕李浩平，「運用 NGD 建立適用於使用者回饋資訊不足之文件過濾系統」，國立中
央大學，碩士論文，民國 92 年。
〔18〕T. Jurij, “Preferential Voting: Definition and Classification”, The Midwest Political
Science Association 67th Annual National Conference, Chicago, United States, April
2009.55
〔19〕D. R. Radev, E. Hovy and K. McKeown, “Introduction to the special issue on
summarization”, Computational Linguistics, vol. 28, no. 4, pp. 399-408, December
2002.
〔20〕D. Das and A. Martins, “A Survey on Automatic Text Summarization”, Literature
Survey for the Language and Statistics II course at Carnegie Mellon University,
November 2007.
〔21〕G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval”,
Information Processing and Management, vol. 24, no. 5, pp. 513-523, January 1988.
〔22〕V. N. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, 1998
〔23〕C. J. Hookway, “Semantics”, Philosophy, vol. 53, no. 205, pp. 421-423, January 1978.
〔24〕R. A. Berman, S. Strömqvist and D. I. Slobin, Relating Events in Narrative:
Typological and Contextual Perspectives, Psychology Press, 2004
〔25〕G. A. Miller, “WordNet: a lexical database for English”, Commun. ACM, vol. 38, no.
11, pp. 39-41, 1995.
〔26〕H. J. Peat and P. Willett, “The limitations of term co-occurrence data for query
expansion in document retrieval systems”, Journal of the American Society for
Information Science, vol. 42, no. pp. 378-383, 1991.
〔27〕J. M. Kleinberg, “Authoritative sources in a hyperlinked environment”, Journal of the
ACM (JACM), vol. 46, no. 5, pp. 604-632, September 1999.
〔28〕S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search
Engine”, Computer Networks and ISDN Systems, vol. 30, no. 1-7, pp. 107-117, April
1998.
〔29〕R. Albert and A. L. Barabási, “Statistical mechanics of complex networks”, Reviews
of Modern Physics, vol. 74, no. 1, pp. 47-97, 2002.
〔30〕L. d. F. Costaa, F. A. Rodriguesa, G. Traviesoa and P. R. V. Boasa, “Characterization
of complex networks: A survey of measurements”, Advances in Physics, vol. 56, no. 1, pp. 167-242, April 2007.
〔31〕S. Seidman, “Network structure and minimum degree”, Social Networks, vol. 5, no.
pp. 269-287, September 1983.
〔32〕V. Batagelj and M. Zaversnik, “An O(m) Algorithm for Cores Decomposition of
Networks”, Computing Research Repository (CoRR), vol. cs.DS/0310049, no. pp.
September 2003.
〔33〕P. I. Chen and S. J. Lin, “Word AdHoc Network: Using Google Core Distance to
extract the most relevant information”, Knowledge-Based Systems, vol. 24, no. 3, pp.
393-405, 2011.
〔34〕P. Over, W. Liggett, “Introduction to DUC: An Intrinsic Evaluation of Generic News
Text Summarization Systems”, National Institute of Standards and Technology, 2002.
〔35〕Lin, Chin-Yew. 2004. “ROUGE: a Package for Automatic Evaluation of Summaries“,
In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004),
Barcelona, Spain, July, 2004.

指導教授

林熙禎(Shi-Jen Lin)

審核日期

2012-7-19

推文