姓名 顏微珊(Wei-San Yen)  畢業系所 資訊管理學系
論文名稱 以Ensemble Model改善鏈結預測準確度
(Improving Performance of Link Prediction with Ensemble Model)
摘要(中) 鏈結預測 (Link Prediction) 廣泛地被應用在推薦系統、資訊檢索甚至是生物資訊等各個領域,其核心概念為找出實體之間的關聯,透過鏈結預測,可推敲出網絡的完整樣貌,例如找出網絡當中所遺失的鏈結 (Missing link),以及挖掘尚未被發現的網絡資訊。
鏈結預測的研究多半以計算兩節點 (Nodes) 之間的相似度為主,其中計算相似度又可再細分為全域性指標 (Global information) 以及區域性指標 (Local information),雖然以 global information 為計算基礎的方法有較好的預測結果,因需搜尋網絡的全域資訊,有耗費過多資源的問題,而基於 local information 的做法簡單易用,可應用於現實生活中大型的社群網絡,卻因計算方法簡單,其預測結果較不理想。
基於鏈結預測的這個議題,本篇論文利用 local information 複雜度低的優點,提出了以整合模型 (Ensemble Model) 結合常見的 local information 計算方法與分類器,針對不同預測強度的分類器給予相對應的權重,進而提升預測準度。實驗結果顯示 ensemble model 用於鏈結預測優於只計算單一 local information 的相似度,且與過去文獻當中的 global information 相比,Ensemble model 預測的 performance 並無顯著差異。
摘要(英) Link prediction has been widely used in the field of recommendation system, information retrieval and even biological information. The key concept in link prediction is to find the connections between two nodes. We can get the whole picture of networks through the study in link prediction, finding missing link and digging out the undiscovered information are both the instances.

In link prediction, most of studies put the effort on similarity calculation among nodes in the network. In this area, calculation is divided into two measures: global information and local information. Although the measures based on global information have much better performance on prediction, they are time consuming more than the measures based on local information. Because of its simplicity, local information is suitable for using in large and complex network. As the network which are extracted from real life become more complex than it used to be, local information are actually more practical.

Based on the advantage of local information in low complexity, this paper propose an ensemble model to combine both of common local information similarity and classifiers, giving different weights to each classifiers based on its predictive strength. Our experimental result shows that ensemble model can enhance the performance of link prediction which is better than using single local information similarity. Although ensemble model can’t completely outperform global information, the difference isn’t significant which means that we can use less computational resource to reach the acceptable performance.
關鍵字(中) ★ 鏈結預測
★ 相似度計算
★ 整合模型
關鍵字(英) ★ link prediction
★ similarity
★ ensemble model
論文目次 摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vi
一、 緒論 1
1-1 研究背景 1
1-2 研究動機 1
1-3 研究目的 2
二、 相關文獻探討 4
2-1 Link prediction 4
2-2 Classifiers 6
2-2-1 Decision Tree 6
2-2-2 Naïve Bayes 7
2-2-3 K Nearest Neighbors 7
2-2-4 Random Forest 8
2-2-5 Support Vector Machines 8
2-2-6 Logistic Regression 9
2-3 Ensemble Model 9
2-3-1 Bagging 10
2-3-2 Boosting 12
2-3-3 Stacking 13
三、 方法設計 14
3-1 Local Information Index 15
3-2 Global Information Index 20
3-3 Classifiers setting 24
3-3-1 Stage 1 classifiers 25
3-3-2 Stage 2 classifiers 25
四、 實驗設計 29
4-1 Data Collection 30
4-2 Experimental Result 31
五、 結論與討論 35
5-1 研究限制 35
5-2 結論 35
5-3 未來研究方向 37
參考文獻 38
指導教授 陳彥良 審核日期 2019-6-19
