以Ensemble Model改善鏈結預測準確度

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：17

、訪客IP：18.188.240.4

姓名

顏微珊(Wei-San Yen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

以Ensemble Model改善鏈結預測準確度
(Improving Performance of Link Prediction with Ensemble Model)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-6-18以後開放)

摘要(中)

鏈結預測 (Link Prediction) 廣泛地被應用在推薦系統、資訊檢索甚至是生物資訊等各個領域，其核心概念為找出實體之間的關聯，透過鏈結預測，可推敲出網絡的完整樣貌，例如找出網絡當中所遺失的鏈結 (Missing link)，以及挖掘尚未被發現的網絡資訊。
鏈結預測的研究多半以計算兩節點 (Nodes) 之間的相似度為主，其中計算相似度又可再細分為全域性指標 (Global information) 以及區域性指標 (Local information)，雖然以 global information 為計算基礎的方法有較好的預測結果，因需搜尋網絡的全域資訊，有耗費過多資源的問題，而基於 local information 的做法簡單易用，可應用於現實生活中大型的社群網絡，卻因計算方法簡單，其預測結果較不理想。
基於鏈結預測的這個議題，本篇論文利用 local information 複雜度低的優點，提出了以整合模型 (Ensemble Model) 結合常見的 local information 計算方法與分類器，針對不同預測強度的分類器給予相對應的權重，進而提升預測準度。實驗結果顯示 ensemble model 用於鏈結預測優於只計算單一 local information 的相似度，且與過去文獻當中的 global information 相比，Ensemble model 預測的 performance 並無顯著差異。

摘要(英)

Link prediction has been widely used in the field of recommendation system, information retrieval and even biological information. The key concept in link prediction is to find the connections between two nodes. We can get the whole picture of networks through the study in link prediction, finding missing link and digging out the undiscovered information are both the instances.

In link prediction, most of studies put the effort on similarity calculation among nodes in the network. In this area, calculation is divided into two measures: global information and local information. Although the measures based on global information have much better performance on prediction, they are time consuming more than the measures based on local information. Because of its simplicity, local information is suitable for using in large and complex network. As the network which are extracted from real life become more complex than it used to be, local information are actually more practical.

Based on the advantage of local information in low complexity, this paper propose an ensemble model to combine both of common local information similarity and classifiers, giving different weights to each classifiers based on its predictive strength. Our experimental result shows that ensemble model can enhance the performance of link prediction which is better than using single local information similarity. Although ensemble model can’t completely outperform global information, the difference isn’t significant which means that we can use less computational resource to reach the acceptable performance.

關鍵字(中)

★ 鏈結預測
★ 相似度計算
★ 整合模型

關鍵字(英)

★ link prediction
★ similarity
★ ensemble model

論文目次

摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vi
一、緒論 1
1-1 研究背景 1
1-2 研究動機 1
1-3 研究目的 2
二、相關文獻探討 4
2-1 Link prediction 4
2-2 Classifiers 6
2-2-1 Decision Tree 6
2-2-2 Naïve Bayes 7
2-2-3 K Nearest Neighbors 7
2-2-4 Random Forest 8
2-2-5 Support Vector Machines 8
2-2-6 Logistic Regression 9
2-3 Ensemble Model 9
2-3-1 Bagging 10
2-3-2 Boosting 12
2-3-3 Stacking 13
三、方法設計 14
3-1 Local Information Index 15
3-2 Global Information Index 20
3-3 Classifiers setting 24
3-3-1 Stage 1 classifiers 25
3-3-2 Stage 2 classifiers 25
四、實驗設計 29
4-1 Data Collection 30
4-2 Experimental Result 31
五、結論與討論 35
5-1 研究限制 35
5-2 結論 35
5-3 未來研究方向 37
參考文獻 38

參考文獻

[1] Wang, P., Xu, B., Wu, Y., & Zhou, X. (2015). Link prediction in social networks: the state-of-the-art. Science China Information Sciences, 58(1), 1-38.
[2] Newman, M. E. (2001). Clustering and preferential attachment in growing networks. Physical review E, 64(2), 025102.
[3] Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabási, A. L. (2002). Hierarchical organization of modularity in metabolic networks. science, 297(5586), 1551-1555.
[4] Zhou, T., Lü, L., & Zhang, Y. C. (2009). Predicting missing links via local information. The European Physical Journal B, 71(4), 623-630.
[5] Leicht, E. A., Holme, P., & Newman, M. E. (2006). Vertex similarity in networks. Physical Review E, 73(2), 026120.
[6] Zhu, Y. X., Lü, L., Zhang, Q. M., & Zhou, T. (2012). Uncovering missing links with cold ends. Physica A: Statistical Mechanics and its Applications, 391(22), 5769-5778.
[7] Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social networks, 25(3), 211-230.
[8] Lü, L., Jin, C. H., & Zhou, T. (2009). Similarity index based on local paths for link prediction of complex networks. Physical Review E, 80(4), 046122.
[9] Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18(1), 39-43.
[10] Chen, H. H., Gou, L., Zhang, X. L., & Giles, C. L. (2012, March). Discovering missing links in networks using vertex similarity measures. In Proceedings of the 27th annual ACM symposium on applied computing (pp. 138-143). ACM.
[11] Papadimitriou, A., Symeonidis, P., & Manolopoulos, Y. (2012). Fast and accurate link prediction in social networking systems. Journal of Systems and Software, 85(9), 2119-2132.
[12] Pirotte, A., Renders, J. M., & Saerens, M. (2007). Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on Knowledge & Data Engineering, (3), 355-369.
[13] Jeh, G., & Widom, J. (2002, July). SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 538-543). ACM.
[14] Liben‐Nowell, D., & Kleinberg, J. (2007). The link‐prediction problem for social networks. Journal of the American society for information science and technology, 58(7), 1019-1031.
[15] Lichtenwalter, R. N., Lussier, J. T., & Chawla, N. V. (2010, July). New perspectives and methods in link prediction. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 243-252). ACM.
[16] Kossinets, G. (2006). Effects of missing data in social networks. Social networks, 28(3), 247-268.
[17] Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. science, 286(5439), 509-512.
[18] Pujari, P., & Gupta, J. B. (2012). Improving classification accuracy by using feature selection and ensemble model. International Journal of Soft Computing and Engineering, 2(2), 380-386.
[19] Omar, N., Albared, M., Al-Shabi, A. Q., & Al-Moslmi, T. (2013). Ensemble of classification algorithms for subjectivity and sentiment analysis of Arabic customers′ reviews. International Journal of Advancements in Computing Technology, 5(14), 77.
[20] Liao, H. Y., Chen, K. Y., & Liu, D. R. (2015). Virtual friend recommendations in virtual worlds. Decision Support Systems, 69, 59-69.
[21] Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M. (2006, April). Link prediction using supervised learning. In SDM06: workshop on link analysis, counter-terrorism and security.
[22] Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7(1), 3.
[23] Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3), 660-674.
[24] Soundarajan, S., & Hopcroft, J. (2012, April). Using community information to improve the precision of link prediction methods. In Proceedings of the 21st International Conference on World Wide Web (pp. 607-608). ACM.
[25] Jonsson, P., & Wohlin, C. (2004, September). An evaluation of k-nearest neighbour imputation using likert data. In 10th International Symposium on Software Metrics, 2004. Proceedings. (pp. 108-118). IEEE.
[26] Meng, B., Ke, H., & Yi, T. (2011). Link prediction based on a semi-local similarity index. Chinese Physics B, 20(12), 128902.
[27] Gao, F., Musial, K., Cooper, C., & Tsoka, S. (2015). Link prediction methods and their accuracy for different social networks and network metrics. Scientific programming, 2015, 1.
[28] Moradabadi, B., & Meybodi, M. R. (2016). Link prediction based on temporal similarity metrics using continuous action set learning automata. Physica A: Statistical Mechanics and its Applications, 460, 361-373.
[29] Liu, W., & Lü, L. (2010). Link prediction based on local random walk. EPL (Europhysics Letters), 89(5), 58007.
[30] Liao, H., Zeng, A., & Zhang, Y. C. (2015). Predicting missing links via correlation between nodes. Physica A: Statistical Mechanics and its Applications, 436, 216-223.
[31] YounessZadeh, S., & Meybodi, M. R. (2018). A Link Prediction Method Based on Learning Automata in Social Networks. Journal of Computer & Robotics, 11(1), 43-55.
[32] Tan, P. N. (2018). Introduction to data mining. Pearson Education India.
[33] Salton, G. (1989). Automatic text processing: The transformation, analysis, and retrieval of. Reading: Addison-Wesley, 169.
[34] Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
[35] Schapire, R. E. (1990). The strength of weak learnability. Machine learning, 5(2), 197-227.
[36] Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
[37] Chen, J., Huang, H., Tian, S., & Qu, Y. (2009). Feature selection for text classification with Naïve Bayes. Expert Systems with Applications, 36(3), 5432-5435.
[38] Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth Int. Group, 37(15), 237-251.
[39] V. Batageli and A. Mrvar (2006), Pajek Datasets, available at http://vlado.fmf.uni-lj.si/pub/networks/data/default.htm.
[40] M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).
[41] Ackland, R. (2005). Mapping the US political blogosphere: Are conservative bloggers more prominent?. In BlogTalk Downunder 2005 Conference, Sydney. BlogTalk Downunder 2005 Conference, Sydney.
[42] Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’networks. nature, 393(6684), 440.
[43] Opsahl, T. (2013). Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2), 159-167.
[44] Lusseau, D., & Newman, M. E. (2004). Identifying the role that animals play in their social networks. Proceedings of the Royal Society of London. Series B: Biological Sciences, 271(suppl_6), S477-S481.
[45] Gleiser, P. M., & Danon, L. (2003). Community structure in jazz. Advances in complex systems, 6(04), 565-573.
[46] Wang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert systems with applications, 38(1), 223-230.
[47] Syarif, I., Zaluska, E., Prugel-Bennett, A., & Wills, G. (2012, July). Application of bagging, boosting and stacking to intrusion detection. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 593-602). Springer, Berlin, Heidelberg.

指導教授

陳彥良

審核日期

2019-6-19

推文