姓名 紀涵文(Han-Wen Chi)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 以文本相似度為基礎的段落相似度分析:聖經四福音書之案例研究
(Segment Similarity Based on Text Similarity: A Case Study of Four Gospels)
摘要(中) 文字探勘(Text Mining)是以資料探勘的方式進行文件的文字資料分析,並透過這些分析取得文字間的相關性,進行分類、比較、判別。近十年來,搜尋引擎崛起,文字探勘的技術被更有效應用,創造新的商業價值。隨著網際網路的日新月異,網路資料量的累積使得搜尋引擎的發展愈發快速,改寫了資料檢索不變的定律。
文本相似度(Text Similarity)透過將文字型態之間予以權重(或做:距離),計算文字型態間的相似程度,並加總比較以取得資訊、分類或二元判斷。透過此方法將大量的文章段落進行分析,並取得富含價值的有用資訊。
摘要(英) Text Mining is known as data analysis to documents based on data mining. Main purpose of text mining is to obtain the relevance between text, through these analyzes conclude classification, comparison and discrimination. Over the past decade, search engines have emerged, and text search techniques have been more effectively applied
to create new business value. With the ever-changing Internet, the accumulation of information on the network makes the development of search engines more quickly, also makes a huge on change data retrieval.
Text Similarity, the degree of similarity between the text types is calculated by weighting (distance). Calculate the degree of similarity between text types and obtain information, classify or binary judgments, observe the valuable information through analysis a big quantity of articles.
In this research, we raised a new method of similarity calculation. We treat any part of continuous sentences in the document as a Segment. Compare this segment with other sentences to get scores, and find the similar target segment in the same document from the rank and distribution of the scores. In this research, we use the four gospels in holy bible as cases study. The cases study demonstrate the operation of the algorithm and the expected results.
關鍵字(中) ★ 文本相似度
★ 段落相似度
★ 聖經經文
關鍵字(英) ★ Text Similarity
★ Segment Similarity
★ Bible
★ Latent semantic analysis(LSA)
論文目次 一、緒論.................................... 1
1.1 研究背景與動機.......................... 1
1.2 研究目的 .............................. 3
1.3 論文架構 ................................4
二、文獻探討............................ 5
2.1 資料前處理............................ 5
2.2 屬性挑選 ............................. 8
2.3 建立向量 .......................... 9
2.4 降低維度 ........................... 10
2.5 計算相似度 ......................... 15
2.6 句子相似度 .......................... 17
2.7 詞語相似度 ...................18
三、 研究方法........................... 19
3.1 研究資料 .......................... 19
3.2 階段一:資料前處理....................20
3.3 階段二:計算相似度............................. 21
3.4 階段三:計算輸入段落相似度 .................... 22
3.5 階段四:找出經節群集作為候選段落 ................22
3.6 階段五:取得目標段落 ........................ 24
四、 案例.......................... 25
4.1 案例一:馬太福音十章1 至16 節 ................. 25
4.2 案例二:馬可福音十章46 至52 節 ................ 30
4.3 驗證指標 ..................... 34
五、 結論.............................. 35
六、 參考文獻.................................... 37
指導教授 陳彥良(Yen-Liang Chen) 審核日期 2017-7-25
