基於網絡嵌入的集成學習以改善鏈結預測準確度

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：7

、訪客IP：3.15.149.254

姓名

蕭宸欣(Chen-Hsin Hsiao) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

基於網絡嵌入的集成學習以改善鏈結預測準確度
(An ensemble model for link prediction based on graph embedding)

相關論文

★ 零售業商業智慧之探討	★ 有線電話通話異常偵測系統之建置
★ 資料探勘技術運用於在學成績與學測成果分析 -以高職餐飲管理科為例	★ 利用資料採礦技術提昇財富管理效益 -以個案銀行為主
★ 晶圓製造良率模式之評比與分析－以國內某DRAM廠為例	★ 商業智慧分析運用於學生成績之研究
★ 運用資料探勘技術建構國小高年級學生學業成就之預測模式	★ 應用資料探勘技術建立機車貸款風險評估模式之研究－以A公司為例
★ 績效指標評估研究應用於提升研發設計品質保證	★ 基於文字履歷及人格特質應用機械學習改善錄用品質
★ 以關係基因演算法為基礎之一般性架構解決包含限制處理之集合切割問題	★ 關聯式資料庫之廣義知識探勘
★ 考量屬性值取得延遲的決策樹建構	★ 從序列資料中找尋偏好圖的方法 - 應用於群體排名問題
★ 利用分割式分群演算法找共識群解群體決策問題	★ 以新奇的方法有序共識群應用於群體決策問題

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

網絡是一種數據表示形式，目前已被廣泛應用在多個領域中，例如在社群網絡中，我們將節點視為個人或群體，節點之間的邊則稱作鏈結，而鏈結預測的核心概念是藉由分析網絡中節點之間的交互作用，來推斷節點之間是否存在新關係，或是挖掘網絡中的隱藏鏈結，而有效的網絡分析能夠使我們對數據背後的內容有更深入的了解。
目前鏈結預測已被廣泛的應用在社群網絡、電子商務、生物資訊等各個領域，透過鏈結預測，可以幫助研究人員了解網絡的樣貌，並從中挖掘訊息來間接的反映現實生活中的情形。在鏈結預測中，透過網絡嵌入的方式，能夠將網絡中的節點訊息投射到低維向量空間中，並有效的保留網絡結構。在本論文中，我們將採用三種網絡嵌入的方式，分別是：Matrix Factorization based methods、Random walk based methods 以及Deep learning based methods ，每一種網絡嵌入法都有各自的優缺點，因此我們提出一個集成學習模型來保留每一網絡嵌入的特性，透過不同的網絡嵌入學習節點表示。我們在五個資料集上進行實驗，結果顯示利用多個網絡嵌入表示法的學習，透過多個不同的分類器進行訓練，最後以深度神經網絡作為最後的結果預測，能有效提升鏈結預測的準確率。

摘要(英)

Network is a form of data representation, and it has been widely used in many fields. For example, in social networks, we regard nodes as individuals or groups, and the edges between nodes are called links, which means the interaction of the people. By analyzing the interaction of the nodes, we could learn more information on the relationship of the network. The core idea of link prediction is to predict whether there is a new relationship between the pair of nodes or to discover the hidden links in the network. Nowadays, link prediction has been used in social networks, e-commerce, biological information, and other fields. Moreover, researchers use graph embedding for link prediction, which effectively preserves the network structure and converts the node information into the low-dimensional vector space. In this study, we use three graph embedding methods: Matrix Factorization based methods, Random walk based methods, and Deep learning based methods. Each method has its own strength and weaknesses, so we propose an ensemble model to combine these graph embedding to a new representation for each node. The new representations will be regarded as the input of our link prediction model. The performance evaluations are conducted on multiple datasets. Experimental results show that using multiple graph embedding for representations can effectively improve the performance of link prediction.

關鍵字(中)

★ 鏈結預測
★ 網絡嵌入
★ 集成學習

關鍵字(英)

★ Link prediction
★ Ensemble learning
★ Graph embedding

論文目次

摘要 ...i
ABSTRACT ii
List of Figures v
List of Tables vi
1. Introduction 1
2. Related work 5
2-1 Link prediction 5
2-2 Ensemble learning 6
2-2-1 Bagging 8
2-2-2 Boosting 8
2-2-3 Stacking 9
2-3 Graph embedding 9
2-3-1 Factorization based methods 10
2-3-2 Random walk based methods 11
2-3-3 Deep learning based methods 13
2-3-4 Other 14
2-3-5 Graph embedding summary 15
3. Proposed approach 16
3-1 Model structure 16
3-2 Graph embedding 17
3-2-1 Node combination 18
3-3 Classifiers setting 19
3-3-1 First level classifiers 20
3-3-2 Second level classifier 22
4. Experiments and results 24
4-1 Datasets 25
4-2 Data preprocessing 26
4-3 Experimental setting 27
4-4 Baseline setting 28
4-5 Experimental results 28
4-6 Statistical tests 32
4-7 Experimental summary 39
5. Conclusion 40
5-1 Limitations and future work 41
Reference 42

參考文獻

[1] Daud, N.N., et al., Applications of link prediction in social networks: A review. Journal of Network and Computer Applications, 2020: p. 102716.
[2] Fire, M., et al. Link prediction in social networks using computationally efficient topological features. in 2011 IEEE third international conference on privacy, security, risk and trust and 2011 IEEE third international conference on social computing. 2011. IEEE.
[3] Aiello, L.M., et al., Friendship prediction and homophily in social media. ACM Transactions on the Web (TWEB), 2012. 6(2): p. 1-33.
[4] Chen, A., et al., Finding hidden links in terrorist networks by checking indirect links of different sub-networks, in Counterterrorism and open source intelligence. 2011, Springer. p. 143-158.
[5] Wang, Y.-B., et al., Predicting protein–protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Molecular BioSystems, 2017. 13(7): p. 1336-1344.
[6] Yao, L., et al., Link prediction based on common-neighbors for dynamic social network. Procedia Computer Science, 2016. 83: p. 82-89.
[7] Crichton, G., et al., Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches. BMC bioinformatics, 2018. 19(1): p. 176.
[8] Du, X., J. Yan, and H. Zha. Joint Link Prediction and Network Alignment via Cross-graph Embedding. in IJCAI. 2019.
[9] Cai, H., V.W. Zheng, and K.C.-C. Chang, A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 2018. 30(9): p. 1616-1637.
[10] Wang, X., et al. Community preserving network embedding. in AAAI. 2017.
[11] Nie, F., W. Zhu, and X. Li. Unsupervised large graph embedding. in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. 2017.
[12] Tang, J., et al. Visualizing large-scale and high-dimensional data. in Proceedings of the 25th international conference on world wide web. 2016.
[13] Zhou, Z.-H., Ensemble learning. Encyclopedia of biometrics, 2009. 1: p. 270-273.
[14] Abellán, J. and C.J. Mantas, Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications, 2014. 41(8): p. 3825-3830.
[15] Martínez, V., F. Berzal, and J.-C. Cubero, A survey of link prediction in complex networks. ACM computing surveys (CSUR), 2016. 49(4): p. 1-33.
[16] Wang, P., et al., Link prediction in social networks: the state-of-the-art. Science China Information Sciences, 2015. 58(1): p. 1-38.
[17] Al Hasan, M., et al. Link prediction using supervised learning. in SDM06: workshop on link analysis, counter-terrorism and security. 2006.
[18] Wang, Y. and J. Zeng, Predicting drug-target interactions using restricted Boltzmann machines. Bioinformatics, 2013. 29(13): p. i126-i134.
[19] Krebs, V.E., Mapping networks of terrorist cells. Connections, 2002. 24(3): p. 43-52.
[20] Sagi, O. and L. Rokach, Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2018. 8(4): p. e1249.
[21] Kadam, V.J., S.M. Jadhav, and K. Vijayakumar, Breast cancer diagnosis using feature ensemble learning based on stacked sparse autoencoders and softmax regression. Journal of medical systems, 2019. 43(8): p. 263.
[22] Idrees, F., et al., PIndroid: A novel Android malware detection system using ensemble learning methods. Computers & Security, 2017. 68: p. 36-46.
[23] Da Silva, N.F., E.R. Hruschka, and E.R. Hruschka Jr, Tweet sentiment analysis with classifier ensembles. Decision Support Systems, 2014. 66: p. 170-179.
[24] Breiman, L., Bagging predictors. Machine learning, 1996. 24(2): p. 123-140.
[25] Schapire, R.E., The strength of weak learnability. Machine learning, 1990. 5(2): p. 197-227.
[26] Wolpert, D.H., Stacked generalization. Neural networks, 1992. 5(2): p. 241-259.
[27] Yang, C., et al. Network representation learning with rich text information. in IJCAI. 2015.
[28] Ahmed, A., et al. Distributed large-scale natural graph factorization. in Proceedings of the 22nd international conference on World Wide Web. 2013.
[29] Cao, S., W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. in Proceedings of the 24th ACM international on conference on information and knowledge management. 2015.
[30] Ou, M., et al. Asymmetric transitivity preserving graph embedding. in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
[31] Mikolov, T., et al., Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546, 2013.
[32] Perozzi, B., R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014.
[33] Grover, A. and J. Leskovec. node2vec: Scalable feature learning for networks. in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
[34] Ribeiro, L.F., P.H. Saverese, and D.R. Figueiredo. struc2vec: Learning node representations from structural identity. in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 2017.
[35] Wang, D., P. Cui, and W. Zhu. Structural deep network embedding. in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016.
[36] Kipf, T.N. and M. Welling, Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
[37] Tang, J., et al. Line: Large-scale information network embedding. in Proceedings of the 24th international conference on world wide web. 2015.
[38] Mara, A., J. Lijffijt, and T. De Bie, EvalNE: a framework for evaluating network embeddings on link prediction. arXiv preprint arXiv:1901.09691, 2019.
[39] Kang, B., J. Lijffijt, and T. De Bie, Conditional network embeddings. arXiv preprint arXiv:1805.07544, 2018.
[40] Gao, M., et al. Bine: Bipartite network embedding. in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018.
[41] Liao, H.-Y., K.-Y. Chen, and D.-R. Liu, Virtual friend recommendations in virtual worlds. Decision Support Systems, 2015. 69: p. 59-69.
[42] Sun, J. and H. Li, Financial distress prediction using support vector machines: Ensemble vs. individual. Applied Soft Computing, 2012. 12(8): p. 2254-2265.
[43] Huang, M.-W., et al., SVM and SVM ensembles in breast cancer prediction. PloS one, 2017. 12(1): p. e0161501.
[44] Breiman, L., et al., Classification and regression trees. 1984: CRC press.
[45] Biau, G. and E. Scornet, A random forest guided tour. Test, 2016. 25(2): p. 197-227.
[46] Qi, Y., Random forest for bioinformatics, in Ensemble machine learning. 2012, Springer. p. 307-323.
[47] Xuan, S., et al. Random forest for credit card fraud detection. in 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC). 2018. IEEE.
[48] Masetic, Z. and A. Subasi, Congestive heart failure detection using random forest classifier. Computer methods and programs in biomedicine, 2016. 130: p. 54-64.
[49] Subasi, A., E. Alickovic, and J. Kevric, Diagnosis of chronic kidney disease by using random forest, in CMBEBIH 2017. 2017, Springer. p. 589-594.
[50] Friedman, J.H., Greedy function approximation: a gradient boosting machine. Annals of statistics, 2001: p. 1189-1232.
[51] Wang, J., et al., A short-term photovoltaic power prediction model based on the gradient boost decision tree. Applied Sciences, 2018. 8(5): p. 689.
[52] Zhou, C., et al., Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree. PLoS One, 2017. 12(8): p. e0181426.
[53] Hu, J. and J. Min, Automated detection of driver fatigue based on EEG signals using gradient boosting decision tree model. Cognitive neurodynamics, 2018. 12(4): p. 431-440.
[54] Chen, T. and C. Guestrin. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.
[55] Liu, W., et al., A survey of deep neural network architectures and their applications. Neurocomputing, 2017. 234: p. 11-26.
[56] Ba, L., Adaptive dropout for training deep neural networks. 2013.
[57] Yue, X., et al., Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics, 2020. 36(4): p. 1241-1251.

指導教授

陳彥良(Yen-Liang Chen)

審核日期

2021-6-29

推文