結合多特徵及深度學習擴增技術提升Android小樣本惡意家族分類能力

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：26

、訪客IP：3.144.156.43

姓名

邱柏嘉(Po-Chia Chiu) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

結合多特徵及深度學習擴增技術提升Android小樣本惡意家族分類能力
(Effective Android minor malware family detection using multiple feature integration approach and deep learning augmentation technique)

相關論文

★ 應用數位版權管理機制於數位影音光碟內容保護之研究	★ 以應用程式虛擬化技術達成企業軟體版權管理之研究
★ 以IAX2為基礎之網頁電話架構設計	★ 應用機器學習技術協助警察偵辦詐騙案件之研究
★ 擴充防止詐欺及保護隱私功能之帳戶式票務系統研究-以大眾運輸為例	★ 網際網路半結構化資料之蒐集與整合研究
★ 電子商務環境下網路購物幫手之研究	★ 網路安全縱深防護機制之研究
★ 國家寬頻實驗網路上資源預先保留與資源衝突之研究	★ 以樹狀關聯式架構偵測電子郵件病毒之研究
★ 考量地區差異性之隨選視訊系統影片配置研究	★ 不信任區域網路中數位證據保留之研究
★ 入侵偵測系統事件說明暨自動增加偵測規則之整合性輔助系統研發	★ 利用程序追蹤方法關聯分散式入侵偵測系統之入侵警示研究
★ 一種網頁資訊擷取程式之自動化產生技術研發	★ 應用XML/XACML於工作流程管理系統之授權管制研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近幾年的惡意程式檢測技術，憑藉著硬體運算能力的快速增長，利用深度學習技術檢測惡意程式的研究逐漸增加，且偵測的效果也比傳統技術更加準確，Android 惡意程式的攻擊手法不斷變化，產生許多不同的攻擊類型，而具有較相似攻擊目標及行為的惡意程式則被研究人員歸類在同個惡意家族，以利後續分析，但某些惡意家族的樣本數較少，造成使用深度學習技術來偵測惡意程式的方法無法有效學習這些惡意家族的特徵，使深度學習技術對於辨識特定惡意程式的效果下降。本研究則試圖改善此一問題，使用 Android 應用程式中的多種特徵——Opcode、API 及 Permission，以不同的前處理方式生成三個特徵向量，接著將三個特徵向量結合成 RGB 圖像，並使用深度卷積生成對抗網路（Deep Convolutional Generative Adversarial Network，GAN）擴增少樣本惡意家族中的樣本，最後輸入至卷積神經網路（Convolutional Neural Network，CNN）進行惡意家族分類，提升深度學習對少樣本惡意家族的偵測率。實驗結果顯示結合多特徵及深度卷積生成對抗網路能有效提升深度學習辨識 Android 少樣本惡意家族的能力。

摘要(英)

With the continuous changes in malicious attack methods, the imbalanced Android malware family dataset is a big problem, which causes deep learning model cannot effectively learn the features of small families, resulting in decreased effectiveness of malware detection. This research used three static features in Android applications, which are opcode, API and permission, and used different pre-processing methods to generate feature vectors in order to form the RGB image. After RGB images generated, DCGAN (Deep Convolutional Generative Adversarial Network) is used to augment samples of small families, then input them to Convolutional Neural Networks (CNN) for family classification. The experimental results showed that using multi-feature and DCGAN can effectively improve the ability of Convolutional Neural Network (CNN) to identify small families, and the F1-score of small families can be increased between 2%-20%.

關鍵字(中)

★ Android惡意程式分析
★ 多特徵
★ 惡意程式圖像化
★ 惡意家族分類
★ 深度卷積生成對抗網路
★ 卷積神經網路

關鍵字(英)

★ Android Malware Detection
★ Multi-feature
★ Malware Visualization
★ Malware Family Classification
★ DCGAN
★ CNN

論文目次

中文摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 vi
表目錄 vii
一、前言 1
1.1 研究背景 1
1.2 研究動機 4
1.3 研究貢獻 7
1.4 章節架構 7
二、相關研究 9
2.1 Android惡意程式分析 9
2.1.1 惡意程式特徵 9
2.1.2 分析方式 11
2.1.3 惡意家族分類 14
2.2 惡意程式圖像化及擴增技術 15
2.2.1 惡意程式圖像化 15
2.2.2 卷積神經網路 22
2.2.3 生成對抗網路 26
2.2.4 惡意程式樣本擴增 27
三、系統設計 30
3.1 反編譯模組（Decompile Module） 31
3.2 特徵向量化模組（Feature Vectorization） 32
3.3 RGB圖片生成模組（RGB Image Generation） 36
3.4 惡意程式樣本擴增模組（Augmentation Module） 40
3.5 惡意家族分類模組（Classification Module） 42
四、實驗結果 43
4.1 實驗環境 43
4.2 資料集 44
4.3 實驗設計 44
4.3.1 實驗一 44
4.3.2 實驗二 47
4.3.3 實驗三 50
4.3.4 實驗四 53
4.3.5 實驗五 55
4.3.6 實驗六 58
4.3.7 實驗七 61
4.3.8 實驗八 63
五、結論與未來研究 66
5.1 結論與貢獻 66
5.2 研究限制 67
5.3 未來研究 69
參考文獻 70

參考文獻

[1] S. O′Dea. (2021). Number of smartphone users worldwide from 2016 to 2023. Available: https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/
[2] statcounter. (2021). Mobile Operating System Market Share Worldwide. Available: https://gs.statcounter.com/os-market-share/mobile/worldwide
[3] J. Johnson. (2021). Development of new Android malware worldwide from June 2016 to March 2020. Available: https://www.statista.com/statistics/680705/global-android-malware-volume/
[4] S. Türker and A. B. Can, "Andmfc: Android malware family classification framework," in 2019 IEEE 30th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC Workshops), 2019, pp. 1-6: IEEE.
[5] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue, "Droid-sec: deep learning in android malware detection," in Proceedings of the 2014 ACM conference on SIGCOMM, 2014, pp. 371-372.
[6] A. Bacci, A. Bartoli, F. Martinelli, E. Medvet, and F. Mercaldo, "Detection of obfuscation techniques in Android applications," in Proceedings of the 13th International Conference on Availability, Reliability and Security, 2018, pp. 1-9.
[7] X. Xiao, S. Zhang, F. Mercaldo, G. Hu, A. K. J. M. T. Sangaiah, and Applications, "Android malware detection based on system call sequences and LSTM," vol. 78, no. 4, pp. 3979-3999, 2019.
[8] Y. Lu and J. Li, "Generative adversarial network for improving deep learning based malware classification," in 2019 Winter Simulation Conference (WSC), 2019, pp. 584-593: IEEE.
[9] J. Yan, Y. Qi, Q. J. S. Rao, and C. Networks, "LSTM-based hierarchical denoising network for Android malware detection," (in English), vol. 2018, 2018.
[10] A. Krizhevsky, I. Sutskever, and G. E. J. A. i. n. i. p. s. Hinton, "Imagenet classification with deep convolutional neural networks," vol. 25, pp. 1097-1105, 2012.
[11] W.-N. Hsu, Y. Zhang, and J. Glass, "Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation," in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017, pp. 16-23: IEEE.
[12] I. J. Goodfellow et al., "Generative adversarial networks," 2014.
[13] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223-2232.
[14] R. Huang, S. Zhang, T. Li, and R. He, "Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2439-2448.
[15] M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, "Synthetic data augmentation using GAN for improved liver lesion classification," in 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), 2018, pp. 289-293: IEEE.
[16] C. Bermudez, A. J. Plassard, L. T. Davis, A. T. Newton, S. M. Resnick, and B. A. Landman, "Learning implicit brain MRI manifolds with deep learning," in Medical Imaging 2018: Image Processing, 2018, vol. 10574, p. 105741L: International Society for Optics and Photonics.
[17] Y.-M. Chen, C.-H. Yang, and G.-C. Chen, "Using Generative Adversarial Networks for Data Augmentation in Android Malware Detection," in 2021 IEEE Conference on Dependable and Secure Computing (DSC), 2021, pp. 1-8: IEEE.
[18] G. Iadarola, F. Martinelli, F. Mercaldo, and A. Santone, "Image-based Malware Family Detection: An Assessment between Feature Extraction and Classification Techniques," in IoTBDS, 2020, pp. 499-506.
[19] X. Zhiwu, K. Ren, and F. Song, "Android malware family classification and characterization using CFG and DFG," in 2019 International Symposium on Theoretical Aspects of Software Engineering (TASE), 2019, pp. 49-56: IEEE.
[20] JesusFreke. Baksmali. Available: https://github.com/JesusFreke/smali
[21] J. Qiu et al., "A3CM: automatic capability annotation for android malware," vol. 7, pp. 147156-147168, 2019.
[22] N. Xie, X. Wang, W. Wang, and J. J. F. o. C. S. Liu, "Fingerprinting Android malware families," vol. 13, no. 3, pp. 637-646, 2019.
[23] J. Jiang et al., "Android Malware Family Classification Based on Sensitive Opcode Sequence," in 2019 IEEE Symposium on Computers and Communications (ISCC), 2019, pp. 1-7: IEEE.
[24] B. Kang, S. Y. Yerima, K. McLaughlin, and S. Sezer, "N-opcode analysis for android malware classification and categorization," in 2016 International conference on cyber security and protection of digital services (cyber security), 2016, pp. 1-7: IEEE.
[25] J. Lee, S. Lee, H. J. c. Lee, and security, "Screening smartphone applications using malware family signatures," vol. 52, pp. 234-249, 2015.
[26] G. Suarez-Tangil, J. E. Tapiador, P. Peris-Lopez, and J. J. E. S. w. A. Blasco, "Dendroid: A text mining approach to analyzing and classifying code structures in android malware families," vol. 41, no. 4, pp. 1104-1117, 2014.
[27] Y. Fang, Y. Gao, F. Jing, and L. J. I. A. Zhang, "Android malware familial classification based on DEX file section features," vol. 8, pp. 10614-10627, 2020.
[28] S. Malik, K. J. I. J. o. S. Khatter, and Technology, "System call analysis of android malware families," vol. 9, no. 21, 2016.
[29] M. Aresu, D. Ariu, M. Ahmadi, D. Maiorca, and G. Giacinto, "Clustering android malware families by http traffic," in 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), 2015, pp. 128-135: IEEE.
[30] A. Martín, V. Rodríguez-Fernández, and D. J. E. A. o. A. I. Camacho, "CANDYMAN: Classifying Android malware families by modelling dynamic traces with Markov chains," vol. 74, pp. 121-133, 2018.
[31] L. Massarelli, L. Aniello, C. Ciccotelli, L. Querzoni, D. Ucci, and R. Baldoni, "Android malware family classification based on resource consumption over time," in 2017 12th International Conference on Malicious and Unwanted Software (MALWARE), 2017, pp. 31-38: IEEE.
[32] P. Rovelli and Ý. Vigfússon, "Pmds: Permission-based malware detection system," in International conference on information systems security, 2014, pp. 338-357: Springer.
[33] K. A. Talha, D. I. Alper, and C. J. D. I. Aydin, "APK Auditor: Permission-based Android malware detection system," vol. 13, pp. 1-14, 2015.
[34] R. Sato, D. Chiba, and S. J. P. o. t. A.-P. A. N. Goto, "Detecting android malware by analyzing manifest files," vol. 36, no. 23-31, p. 17, 2013.
[35] J. M. Vidal, M. A. S. Monge, and L. J. G. J. K.-B. S. Villalba, "A novel pattern recognition system for detecting Android malware by analyzing suspicious boot sequences," vol. 150, pp. 198-217, 2018.
[36] V. G. Shankar and G. J. P. C. S. Somani, "Anti-Hijack: Runtime detection of malware initiated hijacking in android," vol. 78, pp. 587-594, 2016.
[37] Y. S. Sun, C.-C. Chen, S.-W. Hsiao, and M. C. Chen, "ANTSdroid: Automatic malware family behaviour generation and analysis for Android apps," in Australasian Conference on Information Security and Privacy, 2018, pp. 796-804: Springer.
[38] S. W. Thomas, B. Adams, A. E. Hassan, and D. Blostein, "Validating the use of topic models for software evolution," in 2010 10th IEEE working conference on source code analysis and manipulation, 2010, pp. 55-64: IEEE.
[39] M. Eskandari, Z. Khorshidpour, S. J. J. o. C. V. Hashemi, and H. Techniques, "HDM-Analyser: a hybrid analysis approach based on data mining techniques for malware detection," vol. 9, no. 2, pp. 77-93, 2013.
[40] A. I. Ali-Gombe, B. Saltaformaggio, D. Xu, G. G. J. c. Richard III, and security, "Toward a more dependable hybrid analysis of android malware using aspect-oriented programming," vol. 73, pp. 235-248, 2018.
[41] R. Surendran, T. Thomas, S. J. J. o. I. S. Emmanuel, and Applications, "A TAN based hybrid model for android malware detection," vol. 54, p. 102483, 2020.
[42] L. Wei, W. Luo, J. Weng, Y. Zhong, X. Zhang, and Z. J. I. A. Yan, "Machine learning-based malicious application detection of android," vol. 5, pp. 25591-25601, 2017.
[43] X. Xiao, Z. Wang, Q. Li, S. Xia, and Y. J. I. I. S. Jiang, "Back‐propagation neural network on Markov chains from system call sequences: a new approach for detecting Android malware with system call sequences," vol. 11, no. 1, pp. 8-15, 2017.
[44] Z. Yuan, Y. Lu, Y. J. T. S. Xue, and Technology, "Droiddetector: android malware characterization and detection using deep learning," vol. 21, no. 1, pp. 114-123, 2016.
[45] Y. Li and Z. Jin, "An Android malware detection method based on feature codes," in 2015 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering, 2015: Atlantis Press.
[46] R. Sihwail, K. Omar, K. A. Z. J. I. J. o. A. S. Ariffin, Engineering, and I. Technology, "A survey on malware analysis techniques: Static, dynamic, hybrid and memory analysis," vol. 8, no. 4-2, p. 1662, 2018.
[47] F. Alswaina and K. J. E. Elleithy, "Android malware family classification and analysis: Current status and future directions," vol. 9, no. 6, p. 942, 2020.
[48] M. Fan et al., "Android malware familial classification and representative sample selection via frequent subgraph analysis," vol. 13, no. 8, pp. 1890-1905, 2018.
[49] M. Fan et al., "Frequent subgraph based familial classification of android malware," in 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), 2016, pp. 24-35: IEEE.
[50] H. Zhou, W. Zhang, F. Wei, and Y. Chen, "Analysis of Android malware family characteristic based on isomorphism of sensitive API call graph," in 2017 IEEE Second International Conference on Data Science in Cyberspace (DSC), 2017, pp. 319-327: IEEE.
[51] L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath, "Malware images: visualization and automatic classification," in Proceedings of the 8th international symposium on visualization for cyber security, 2011, pp. 1-7.
[52] M. Yang and Q. Wen, "Detecting android malware by applying classification techniques on images patterns," in 2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), 2017, pp. 344-347: IEEE.
[53] T. Hsien-De Huang and H.-Y. Kao, "R2-d2: Color-inspired convolutional neural network (cnn)-based android malware detections," in 2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 2633-2642: IEEE.
[54] F. Mercaldo, A. J. J. o. C. V. Santone, and H. Techniques, "Deep learning for image-based mobile malware detection," pp. 1-15, 2020.
[55] A. J. G. Rakhlin, "Convolutional Neural Networks for Sentence Classification," 2016.
[56] N. McLaughlin et al., "Deep android malware detection," in Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, 2017, pp. 301-308.
[57] Z. Xu, K. Ren, S. Qin, and F. Craciun, "CDGDroid: Android malware detection based on deep learning using CFG and DFG," in International Conference on Formal Engineering Methods, 2018, pp. 177-193: Springer.
[58] E. B. Karbab, M. Debbabi, A. Derhab, and D. J. D. I. Mouheb, "MalDozer: Automatic framework for android malware detection using deep learning," (in English), vol. 24, pp. S48-S59, 2018.
[59] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. J. a. p. a. Dean, "Distributed representations of words and phrases and their compositionality," 2013.
[60] J. Pennington, R. Socher, and C. Manning, "Global Vectors for Word Representation," 2015.
[61] A. Hota and P. Irolla, "Deep Neural Networks for Android Malware Detection," in ICISSP, 2019, pp. 657-663.
[62] Q. Le and T. Mikolov, "Distributed representations of sentences and documents," in International conference on machine learning, 2014, pp. 1188-1196: PMLR.
[63] N. Huang, M. Xu, N. Zheng, T. Qiao, and K.-K. R. Choo, "Deep Android malware classification with API-based feature graph," in 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2019, pp. 296-303: IEEE.
[64] W. Guo, T. Wang, and J. Wei, "Malware detection with convolutional neural network using hardware events," in CCF National Conference on Compujter Engineering and Technology, 2017, pp. 104-115: Springer.
[65] Y. Ye et al., "AiDroid: When heterogeneous information network marries deep neural network for real-time Android malware detection," 2018.
[66] M. Mirza and S. J. a. p. a. Osindero, "Conditional generative adversarial nets," (in English), 2014.
[67] A. Radford, L. Metz, and S. J. a. p. a. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," (in English), 2015.
[68] L. Chen, S. Hou, Y. Ye, and S. Xu, "Droideye: Fortifying security of learning-based classifier against adversarial android malware attacks," in 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2018, pp. 782-789: IEEE.
[69] W. Hu and Y. J. a. p. a. Tan, "Generating adversarial malware examples for black-box attacks based on GAN," (in English), 2017.
[70] J. W. Stokes, D. Wang, M. Marinescu, M. Marino, and B. Bussone, "Attack and defense of dynamic analysis-based, adversarial neural malware detection models," in MILCOM 2018-2018 IEEE Military Communications Conference (MILCOM), 2018, pp. 1-8: IEEE.
[71] I. Rosenberg, A. Shabtai, Y. Elovici, and L. J. a. p. a. Rokach, "Query-efficient gan based black-box attack against sequence based machine and deep learning classifiers," (in English), 2018.
[72] J.-Y. Kim, S.-J. Bu, and S.-B. J. I. S. Cho, "Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders," (in English), vol. 460, pp. 83-102, 2018.
[73] A. Desnos. Androguard. Available: https://androguard.readthedocs.io/en/latest/
[74] J. Xu, Y. Li, R. Deng, K. J. I. T. o. D. Xu, and S. Computing, "SDAC: A Slow-Aging Solution for Android Malware Detection Using Semantic Distance Based API Clustering," 2020.
[75] A. Bacci, A. Bartoli, F. Martinelli, E. Medvet, F. Mercaldo, and C. A. Visaggio, "Impact of Code Obfuscation on Android Malware Detection based on Static and Dynamic Analysis," in ICISSP, 2018, pp. 379-385.

指導教授

陳奕明(Yi-Ming Chen)

審核日期

2021-7-26

推文