整合區塊特徵萃取與多頭注意力機制之Android惡意程式偵測系統

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：87

、訪客IP：3.144.123.172

姓名

何岸錡(An-Chi He) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

整合區塊特徵萃取與多頭注意力機制之Android惡意程式偵測系統

相關論文

★ 應用數位版權管理機制於數位影音光碟內容保護之研究	★ 以應用程式虛擬化技術達成企業軟體版權管理之研究
★ 以IAX2為基礎之網頁電話架構設計	★ 應用機器學習技術協助警察偵辦詐騙案件之研究
★ 擴充防止詐欺及保護隱私功能之帳戶式票務系統研究-以大眾運輸為例	★ 網際網路半結構化資料之蒐集與整合研究
★ 電子商務環境下網路購物幫手之研究	★ 網路安全縱深防護機制之研究
★ 國家寬頻實驗網路上資源預先保留與資源衝突之研究	★ 以樹狀關聯式架構偵測電子郵件病毒之研究
★ 考量地區差異性之隨選視訊系統影片配置研究	★ 不信任區域網路中數位證據保留之研究
★ 入侵偵測系統事件說明暨自動增加偵測規則之整合性輔助系統研發	★ 利用程序追蹤方法關聯分散式入侵偵測系統之入侵警示研究
★ 一種網頁資訊擷取程式之自動化產生技術研發	★ 應用XML/XACML於工作流程管理系統之授權管制研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

隨著深度學習技術地快速發展，對行動惡意程式的偵測任務有了突破性的進展。然而，基於時間序列的深度學習模型，在輸入長序列特徵時，仍然會因為遞歸神經網路的記憶限制，產生梯度消散的問題。因此，後續有許多研究針對長序列特徵提出特徵壓縮、提取方法，但目前尚未發現有研究能在壓縮序列的同時，仍能涵蓋原始序列的完整特徵資訊與語意的時序關係。因此，本研究提出一個多模型惡意程式偵測架構，著重在涵蓋全局特徵的前提下，壓縮特徵間仍能保有部份的時序關係，並在整合多頭注意力(Multi-head Attention)機制後，改善遞歸神經網路的記憶問題。模型分為兩個階段執行：前處理階段，主要針對Android底層操作碼（Dalvik Opcode）進行分段、統計，後續輸入 Bi-LSTM進行語意萃取，此階段有助於將原始Opcode序列進行壓縮，產生富有時序意義的語意區塊序列，作為下游分類器的分類特徵；在分類階段，本研究改良Transformer模型，由Multi-head Attention機制對序列特徵進行有效率的專注，後續加入全局池化層(Global Pooling Layer)，強化模型對數據的敏感度，並進行降維，減少模型的過度擬合。實驗結果顯示在多家族分類的偵測準確率達99.30%，且二元分類、小樣本分類效能相比現有研究皆有顯著的提升，此外，本研究亦進行多項消融測試證實各個模型在整體架構中的重要性。

摘要(英)

With the rapid development of deep learning technology, the task of detecting mobile malware has made breakthrough progress. However, the deep learning model based on time series still has the problem of gradient vanishing due to the memory limitation of the recurrent neural net-work when inputting long sequence features. Many researchers have proposed feature com-pression and extraction methods for processing the long sequence features, but no research has been found that can compress the sequence while retaining the global features of the original sequence and the semantic relationship. Therefore, we propose a multi-model malware detection architecture that focuses on holding the whole global features while retaining partial timing rela-tionships among compressed features. We also apply the Multi-head Attention mechanism to improve the memory problem of the recurrent neural network. The model is executed in two stages: the pre-processing stage, which mainly performs segmentation and statistics for the An-droid underlying operation code (Dalvik Opcode), and then enters Bi-LSTM for semantic ex-traction. This stage helps to compress the original Opcode sequence to generate Semantic block sequences feature rich in temporal significance are used as the classification features of down-stream classifiers; in the classification stage, this research improves the Transformer model, and uses the Multi-head Attention mechanism to focus on block sequence features efficiently, and then adds the global pooling layer (Global Pooling Layer), strengthen the sensitivity of the model to the block feature, and reduce the dimensionality to reduce the over-fitting of the model. Experimental results show that the detection accuracy of multi-family classification is 99.30%, and the performance of binary classification and small sample classification have been signifi-cantly improved. In addition, this study also conducted multiple ablation tests to confirm the importance of each model in the overall architecture.

關鍵字(中)

★ 深度學習
★ 多頭注意力
★ Transformer
★ Bi-LSTM
★ 靜態分析

關鍵字(英)

★ Deep learning
★ Multi-head Attention
★ Transformer
★ Bi-LSTM
★ Staticanalysis

論文目次

目錄
論文摘要 v
Abstract vi
目錄 vii
圖目錄 ix
表目錄 xi
第一章緒論 1
1-1 研究動機 4
1-2 研究貢獻 9
1-3 章節架構 9
第二章相關研究 10
2-1 Dalvik opcode靜態特徵分析之研究 10
2-2 基於RNN深度學習模型之相關研究 17
2-3 Transformer多頭注意力機制相關研究 19
2-4 小結 22
第三章系統架構 24
3-1 系統架構 24
3-1-1 反編譯模組(Decompile Module) 25
3-1-2 特徵向量化模組(Feature Vectorization Module) 26
3-1-3 語意萃取模組(Semantic Extraction Module) 28
3-1-4 注意力分類模組(Attention-based Classification Module) 29
3-2 系統運作流程 33
第四章實驗結果 35
4-1 實驗環境 35
4-2 各種序列壓縮方法之比較 35
4-1-1實驗一惡意程式二元分類 35
4-2-2實驗二惡意程式多元分類 38
4-2-3 實驗三惡意程式小樣本分類 41
4-2-4 實驗四圖像壓縮方法之比較 43
4-3多模型各模組之消融測試 45
4-3-1 實驗五區塊壓縮方法 45
4-3-2 實驗六語意萃取模組 46
4-3-3 實驗七 Global Pooling 47
4-3-4實驗八梯度消散測試 48
4-3實驗結果與討論 49
第五章結論與未來貢獻 51
5-1 結論與貢獻 51
5-2 研究限制與未來研究 53
參考文獻 54

參考文獻

參考文獻
[參考網站]
[1] Gameloft. "APKpure." https://apkpure.com/tw/. (accessed.
[2] IDC. "Smartphone Market Share." https://www.idc.com/promo/smartphone-market-share/os (accessed.
[3] Kaspersky. "IT threat evolution Q1 2020. Statistics." https://securelist.com/it-threat-evolution-q1-2020-statistics/96959/ (accessed.
[4] Statista. "Number of smartphone users worldwide from 2016 to 2021." https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/ (accessed.
[5] Wiśniewski, R. "APKTOOL." https://ibotpeaches.github.io/Apktool/ (accessed.
[中文網站]
[6] 徐振皓, "一種針對LSTM長序列問題之新型前處理降維方法研究－以Android惡意程式分析為例;A Novel Preprocessing Method for Solving Long Sequence Problem in Android Malware Detection," 國立中央大學資訊管理所碩士論文, 2019.
[7] 曾博彥, "基於系統呼叫序列與注意力LSTM模型偵測Android惡意軟體之研究;Android Malware Analysis Based on System Call sequences and Attention-LSTM," 國立中央大學資訊管理所碩士論文, 2019.
[英文網站]
[8] Adhikari, A., Ram, A., Tang, R., and Lin, J., "Docbert: Bert for document classification," arXiv preprint arXiv:1904.08398, 2019.
[9] Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C., "Drebin: Effective and explainable detection of android malware in your pocket," Ndss, Vol. 14, pp. 23-26, 2014.
[10] Bahdanau, D., Cho, K., and Bengio, Y., "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
[11] Canfora, G., De Lorenzo, A., Medvet, E., Mercaldo, F., and Visaggio, C. A., "Effectiveness of opcode ngrams for detection of multi family android malware," 2015 10th International Conference on Availability, Reliability and Security, pp. 333-340, 2015.
[12] Chen, T., Mao, Q., Yang, Y., Lv, M., and Zhu, J. J. M. i. s., "TinyDroid: a lightweight and efficient model for Android malware detection and classification," Vol. 2018, 2018.
[13] Chen, Y. M., Hsu, C. H., and Chung, K. C. K., "A Novel Preprocessing Method for Solving Long Sequence Problem in Android Malware Detection," 2019 Twelfth International Conference on Ubi-Media Computing (Ubi-Media), pp. 12-17, 2019.
[14] Cui, Z., Xue, F., Cai, X., Cao, Y., Wang, G.-g., and Chen, J. J. I. T. o. I. I., "Detection of malicious code variants based on deep learning," Vol. 14, No. 7, pp. 3187-3196, 2018.
[15] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K., "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[16] Elman, J. L., "Finding structure in time," Cognitive science, Vol. 14, No. 2, pp. 179-211, 1990.
[17] Graves, A., "Long short-term memory," Supervised sequence labelling with recurrent neural networks, pp. 37-45, 2012.
[18] Hasegawa, C. and Iyatomi, H., "One-dimensional convolutional neural networks for android malware detection," 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA), pp. 99-102, 2018.
[19] He, G., Xu, B., Zhu, H. J. S., and Networks, C., "AppFA: a novel approach to detect malicious android applications on the network," Vol. 2018, 2018.
[20] Huang, Y.-T., Chen, T.-Y., Sun, Y. S., and Chen, M. C. J. a. p. a., "Learning Malware Representation based on Execution Sequences," 2019.
[21] Jerome, Q., Allix, K., State, R., and Engel, T., "Using opcode-sequences to detect malicious Android applications," 2014 IEEE International Conference on Communications (ICC), pp. 914-919, 2014.
[22] Kang, J., Jang, S., Li, S., Jeong, Y.-S., and Sung, Y., "Long short-term memory-based malware classification method for information security," Computers & Electrical Engineering, Vol. 77, pp. 366-375, 2019.
[23] Le, Q., Boydell, O., Mac Namee, B., and Scanlon, M., "Deep learning at the shallow end: Malware classification for non-domain experts," Digital Investigation, Vol. 26, pp. S118-S126, 2018.
[24] Lin, M., Chen, Q., and Yan, S., "Network in network," arXiv preprint arXiv:1312.4400, 2013.
[25] Ma, Z., Ge, H., Wang, Z., Liu, Y., and Liu, X. J. a. p. a., "Droidetec: Android malware detection and malicious code localization through deep learning," 2020.
[26] Maiorca, D., Ariu, D., Corona, I., Aresu, M., and Giacinto, G., "Stealth attacks: An extended insight into the obfuscation effects on android malware," Computers & Security, Vol. 51, pp. 16-31, 2015.
[27] McLaughlin, N. et al., "Deep android malware detection," Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 301-308, 2017.
[28] Nataraj, L., Karthikeyan, S., Jacob, G., and Manjunath, B. S., "Malware images: visualization and automatic classification," Proceedings of the 8th international symposium on visualization for cyber security, pp. 1-7, 2011.
[29] Naway, A. and Li, Y., "A review on the use of deep learning in android malware detection," arXiv preprint arXiv:1812.10360, 2018.
[30] Oak, R., Du, M., Yan, D., Takawale, H., and Amit, I., "Malware Detection on Highly Imbalanced Data through Sequence Modeling," Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pp. 37-48, 2019.
[31] Pappagari, R., Zelasko, P., Villalba, J., Carmiel, Y., and Dehak, N., "Hierarchical Transformers for Long Document Classification," 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 838-844, 2019.
[32] Pascanu, R., Stokes, J. W., Sanossian, H., Marinescu, M., and Thomas, A., "Malware classification with recurrent networks," 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1916-1920, 2015.
[33] Ren, B., Liu, C., Cheng, B., Guo, J., and Chen, J., "MobiSentry: Towards easy and effective detection of android malware on smartphones," Mobile Information Systems, Vol. 2018, 2018.
[34] Rumelhart, D. E., Hinton, G. E., and Williams, R. J., "Learning representations by back-propagating errors," nature, Vol. 323, No. 6088, pp. 533-536, 1986.
[35] Sun, G. and Qian, Q., "Deep learning and visualization for identifying malware families," IEEE Transactions on Dependable and Secure Computing, 2018.
[36] Sundermeyer, M., Schlüter, R., and Ney, H., "LSTM neural networks for language modeling," Thirteenth annual conference of the international speech communication association, 2012.
[37] Vaswani, A. et al., "Attention is all you need," Advances in neural information processing systems, pp. 5998-6008, 2017.
[38] Vinayakumar, R., Soman, K., Poornachandran, P., and Sachin Kumar, S., "Detecting Android malware using long short-term memory (LSTM)," Journal of Intelligent & Fuzzy Systems, Vol. 34, No. 3, pp. 1277-1288, 2018.
[39] Wang, W., Gao, Z., Zhao, M., Li, Y., Liu, J., and Zhang, X., "DroidEnsemble: Detecting Android malicious applications with ensemble of string and structural static features," IEEE Access, Vol. 6, pp. 31798-31807, 2018.
[40] Wei, F., Li, Y., Roy, S., Ou, X., and Zhou, W., "Deep ground truth analysis of current android malware," International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 252-276, 2017.
[41] Xiao, X., Zhang, S., Mercaldo, F., Hu, G., and Sangaiah, A. K., "Android malware detection based on system call sequences and LSTM," Multimedia Tools and Applications, Vol. 78, No. 4, pp. 3979-3999, 2019.
[42] Xiaofeng, L., Xiao, Z., Fangshuo, J., Shengwei, Y., and Jing, S., "ASSCA: API based sequence and statistics features combined malware detection architecture," Procedia Computer Science, Vol. 129, pp. 248-256, 2018.
[43] Yan, J., Qi, Y., and Rao, Q., "Detecting malware with an ensemble method based on deep neural network," Security and Communication Networks, Vol. 2018, 2018.
[44] Yan, J., Qi, Y., and Rao, Q., "LSTM-based hierarchical denoising network for Android malware detection," Security and Communication Networks, Vol. 2018, 2018.
[45] Ye, Y. et al., "AiDroid: When Heterogeneous Information Network Marries Deep Neural Network for Real-time Android Malware Detection," 2018.
[46] Yuan, Z., Lu, Y., Wang, Z., and Xue, Y., "Droid-sec: deep learning in android malware detection," Proceedings of the 2014 ACM conference on SIGCOMM, pp. 371-372, 2014.
[47] Yuan, Z., Lu, Y., Xue, Y. J. T. S., and Technology, "Droiddetector: android malware characterization and detection using deep learning," Vol. 21, No. 1, pp. 114-123, 2016.
[48] Zhang, F., Huang, H., Zhu, S., Wu, D., and Liu, P., "ViewDroid: Towards obfuscation-resilient mobile application repackaging detection," Proceedings of the 2014 ACM conference on Security and privacy in wireless & mobile networks, pp. 25-36, 2014.
[49] Zhang, Y., Yang, Y., and Wang, X., "A novel android malware detection approach based on convolutional neural network," Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, pp. 144-149, 2018.

指導教授

陳奕明(Yi-Ming Chen)

審核日期

2020-7-28

推文