姓名 郭映彤(Yin-Tung Kuo)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 運用字詞與語句關係自動萃取文件摘要之研究
(Automatic Text Summarization Using Relationship between Words and Sentences)
摘要(中) 本研究使用 NGD 建立一使用字詞關係網路的文句特徵摘要法以及一使用文句內聚關係網路的圖形化摘要方法,藉由 NGD 計算只需要文件本身包含字詞以及 Google 搜尋結果數的特點,除去對於相關領域資料集以及字詞關連字典的依賴。接著將兩組方法的結果以非監督式偏好投票式方法組合,達成一具有各方法共識的最終摘要結果。經 ROUGE 評估摘要品質,本方法所提出的利用字詞關係網路計分的文句特徵法可以達成比使用字詞統計資訊的 TF-IDF 計分好的效果。而文句內聚關係網路方法以及整體的排名分數組合法的表現也只略遜於 DUC 2002 當年一利用機器學習摘要組合的方法,證明本研究確實建立一有效的不需依賴相關文集、語義關係字典的非監督式單文件萃取式摘要方法。
摘要(英) This study proposed a feature-based and a graph-based summarization method by building graphs that represents the text, and interconnects between words and sentence with NGD. The methods can get rid of the reliance on the text corpus and lexical database, because we only use the words in document and the Google search results of word pairs to calculate NGD. We also proposed an aggregate method to combine the results from previous two summarization methods to generate better summary results. The experiment results showed that the ROUGE value of proposed feature-based summarization method was better than the feature-based summarization method using the TF-IDF. And the ROUGE values of proposed graph-based and aggregate summarization methods were only slightly lower by one of the DUC2002 peers. It proved that we proposed an effective unsupervised single-document summarization method without using the text corpus and lexical database.
關鍵字(中) ★ 字詞關係網路
★ 自動文件摘要
★ 文句關係網路
★ 圖形化摘要方法
關鍵字(英) ★ Graph-based summarization
★ Single-document summarization
論文目次 一、緒論........................................................................................................................1
2-1自動文件摘要(Automatic Text Summarization) 與其種類.................5
2-2-1字詞頻率(Term Frequency)............................................................8
2-2-3線索字詞(Clue Words)...................................................................9
2-3監督式摘要方法(Supervised Approaches)..........................................10
2-3-1貝式分類器(Naive Bayes Classifier)............................................10
2-3-2支持向量機(Support Vector Machines)........................................11
2-4圖形化摘要方法(Graph-Based Summarization).................................12
2-4-1內文重複程度(Contextual Overlap).............................................13
2-4-2餘弦相似度(Cosine Similarity)....................................................14
2-4-3字詞內聚關係(Lexical Cohesion)................................................15
2-4-5字詞共現(Terms Co-occurrence)..................................................16
2-4-6NGD (Normalized Google Distance)..............................................16
2-4-7圖形排名演算法(Graph-Based Ranking Algorithm)...................17
2-4-8複雜網路指標(Complex Network Measurements)......................19
3-2-1詞性組合(Part-Of-Speech Combination).....................................25
3-2-2字詞長度(Length of the Word).....................................................25
3-2-3搜尋結果數(Google Search Result).............................................26
4-4-3實驗三:字詞關係網路(wNET) 做為文句特徵摘要方法........45
指導教授 林熙禎(Shi-Jen Lin) 審核日期 2012-7-19
