結合自然語言處理與可解釋性技術之Android惡意程式分析加速研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：64

、訪客IP：3.147.27.210

姓名

陳立凱(Li-Kai Chen) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

結合自然語言處理與可解釋性技術之Android惡意程式分析加速研究
(Accelerating Android Malware Analysis by Combining Natural Language Processing and Interpretability Technique)

相關論文

★ 應用數位版權管理機制於數位影音光碟內容保護之研究	★ 以應用程式虛擬化技術達成企業軟體版權管理之研究
★ 以IAX2為基礎之網頁電話架構設計	★ 應用機器學習技術協助警察偵辦詐騙案件之研究
★ 擴充防止詐欺及保護隱私功能之帳戶式票務系統研究-以大眾運輸為例	★ 網際網路半結構化資料之蒐集與整合研究
★ 電子商務環境下網路購物幫手之研究	★ 網路安全縱深防護機制之研究
★ 國家寬頻實驗網路上資源預先保留與資源衝突之研究	★ 以樹狀關聯式架構偵測電子郵件病毒之研究
★ 考量地區差異性之隨選視訊系統影片配置研究	★ 不信任區域網路中數位證據保留之研究
★ 入侵偵測系統事件說明暨自動增加偵測規則之整合性輔助系統研發	★ 利用程序追蹤方法關聯分散式入侵偵測系統之入侵警示研究
★ 一種網頁資訊擷取程式之自動化產生技術研發	★ 應用XML/XACML於工作流程管理系統之授權管制研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2025-7-31以後開放)

摘要(中)

隨著科技高速發展，人們的生活與網路密不可分。不論是通過電腦、智慧型手機、或是智慧手環等產品，其中又以手機普遍使用頻率最高。然而，伴隨這個現象而來的就是行動裝置惡意程式的日益增長，這會讓行動裝置的使用受到嚴重的威脅。本研究會針對行動裝置作業系統市占率最高的Android作為研究主題，為了應對行動裝置惡意程式快速成長的環境，系統會使用靜態分析的方式，從APK（Android Application Package）檔案中提取出操作碼，並用其建立一個自然語言處理模型，學習操作碼的之間的關係，以增強特徵表示，用更少量的特徵就表達操作碼序列，接下來將操作碼通過自然語言模型轉換成向量，輸入分類器來進行訓練，以判斷APK是否為惡意應用程式，因為用的特徵量更少，訓練速度可以得以提升，訓練成本隨之下降。惡意程式快速成長就會有越來越多未知的樣本，當面對可能的誤報時，只能由研究人員一一檢查，但有限的人力無法應付如此大量的惡意應用程式。因此，本研究會利用可解釋性技術SHAP對訓練好的模型進行分析，產生解釋性資料，再根據這些資料製作成指標，可以篩選出較可能為誤報的樣本，研究人員便可優先分析這些有價值的樣本，增加研究人員的效率，之後分析完這些未知樣本，便可加入訓練集來訓練，以面對這些未知樣本。

摘要(英)

With the rapid development of technology, people′s lives are closely tied to the internet. Whether it is through computers, smartphones, or smartwatches, among which smartphones have the highest usage frequency. However, this situation has also led to the growing of malicious software on mobile devices. which can put the use of mobile devices at serious risk. This study focuses on Android, the mobile operating system with the highest market share, to address the rapidly growing environment of mobile malware. The system uses static analysis to extract the opcode from the APK file and builds a Natural Language Processing (NLP) Model to learn the relationships between opcodes, enhancing feature representation to express opcode sequences with fewer features. The opcode is then converted into vectors through the NLP model and input into the classifier for training to detect whether the APK is a malicious application. Because fewer features are used, training speed can be improved, and training costs are reduced. As malicious programs grow rapidly, there will be more and more unknown samples. When facing possible false alerts, researchers can only check them one by one. Therefore, this study will use the interpretability technique SHAP to analyze the trained models to generate XAI data, and then make indicators based on these data, which can filter out samples that are more likely to be misreported, so that researchers can analyze these valuable samples first, increasing researchers efficiency.

關鍵字(中)

★ Android惡意程式
★ 深度學習
★ 操作碼
★ 自然語言處理
★ 可解釋性

關鍵字(英)

★ Android malware
★ Deep learning
★ Opcode
★ Natural Language Processing
★ Explainable AI

論文目次

摘要 I
Abstract II
致謝 III
目錄 IV
圖目錄 VI
表目錄 VIII
一、緒論 1
1.1. 研究背景 1
1.2. 研究動機與目的 3
1.3. 研究貢獻 4
1.4. 章節架構 5
二、相關研究 6
2.1. Android惡意程式檢測 6
2.2. 基於自然語言處理偵測 11
2.3. 可解釋性技術的應用 16
2.4. 相關研究小結 21
三、研究方法 23
3.1. 系統架構 23
3.1.1. 特徵提取模組（Feature Extraction Module） 24
3.1.2. 自然語言處理模組（Natural Language Processing Module） 25
3.1.3. 分類模組（Classification Module） 27
3.1.4. 解釋性分析模組（XAI Module） 27
3.2. 評估指標 29
3.3. 系統流程 31
3.3.1. 訓練流程 31
3.3.2. 測試流程 32
四、實驗與評估 33
4.1. 實驗環境與資料集 33
4.2. 不同分類器的效能比較 36
4.2.1. 實驗一 (Drebin) 36
4.2.2. 實驗二 (CICMalDroid2020) 39
4.2.3. 實驗三-TopN操作碼效能比較 40
4.3. 本研究方法的有效性 41
4.3.1. 實驗四-效能比較 41
4.3.2. 實驗五-家族分類 43
4.4. 可解釋性技術的應用 45
4.4.1. 實驗六-篩選錯誤的判斷 45
4.4.2. 實驗七-剔除特定樣本重新訓練 46
4.5. 實驗總結 51
五、結論與未來研究 53
5.1. 研究總結 53
5.2. 本研究之限制與未來研究 55
參考文獻 57

參考文獻

[1] Kaspersky. （2022）. IT Threat Evolution in Q3 2022. Mobile Statistics. Available: https://securelist.com/it-threat-evolution-in-q3-2022-mobile-statistics/107978/ （accessed 2023）.
[2] Kaspersky. （2022）. The mobile malware threat landscape in 2022. Available: https://securelist.com/mobile-threat-report-2022/108844/ （accessed 2023）.
[3] Statcounter. （2022）. Mobile Operating System Market Share Worldwide Jan 2022 - Jan 2023. Available: https://gs.statcounter.com/os-market-share/mobile/worldwide （accessed 2023）.
[4] Alrawi, Omar, et al., "The Betrayal At Cloud City: An Empirical Analysis Of Cloud-Based Mobile Backends," USENIX Security Symposium, Vol. 19, 2019.
[5] M. Zheng, M. Sun and J. C. Lui, "Droid analytics: A signature based analytic system to collect extract analyze and associate android malware," Proc. 12th IEEE Int. Conf. Trust Secur. Privacy Comput. Commun., Jul. 2013.
[6] A. Saracino, D. Sgandurra, G. Dini and F. Martinelli, "MADAM: Effective and efficient behavior-based Android malware detection and prevention," IEEE Trans. Depend. Sec. Comput., vol. 15, no. 1, pp. 83-97, Jan. 2018.
[7] E. Mariconti, L. Onwuzurike, P. Andriotis, E. De Cristofaro, G. Ross and G. Stringhini, "MaMaDroid: Detecting Android malware by building Markov chains of behavioral models," Proc. Netw. Distrib. Syst. Secur. Symp., pp. 1-34, 2017.
[8] N. McLaughlin et al., "Deep Android malware detection," Proc. 7th ACM Conf. Data Appl. Security Privacy, pp. 301-308, 2017.
[9] S. Dong et al., "Understanding android obfuscation techniques: A large-scale investigation in the wild," International conference on security and privacy in communication systems, pp. 172–192, 2018.
[10] K. Allix, T. F. Bissyandé, J. Klein and Y. Le Traon, "Are your training datasets yet relevant?", Proc. Int. Symp. Eng. Secure Softw. Syst., pp. 51-67, 2015.
[11] VentureBeat. （2022）. Report: Average time to detect and contain a breach is 287 days. Available:https://venturebeat.com/security/report-average-time-to-detect-and-contain-a-breach-is-287-days/ （accessed 2023）.
[12] A. Adadi and M. Berrada, "Peeking inside the black-box: A survey on explainable artificial intelligence （XAI）," IEEE Access, vol. 6, pp. 52138-52160, 2018.
[13] Arrieta A.B., et al., "Explainable artificial intelligence （XAI）: Concepts, taxonomies, opportunities and challenges toward responsible AI," Information Fusion, vol. 58 , pp. 82-115, 2020.
[14] A. Bacci, A. Bartoli, F. Martinelli, E. Medvet, F. Mercaldo and C. A. Visaggio, "Impact of code obfuscation on android malware detection based on static and dynamic analysis," ICISSP, pp. 379-385, 2018.
[15] B. Kang, S. Y. Yerima, S. Sezer and K. Mclaughlin, "N-gram opcode analysis for android malware detection," Intl. J. Cyber. Situational Awareness, vol. 1, no. 1, pp. 231-254, 2016.
[16] T. Kim, B. Kang, M. Rho, S. Sezer and E. G. Im, "A multimodal deep learning method for android malware detection using various features," IEEE Trans. Inf. Forensics Secur., vol. 14, no. 3, pp. 773-788, Mar. 2019.
[17] M. K. Alzaylaee, S. Y. Yerima and S. Sezer, "DL-droid: Deep learning based Android malware detection using real devices," Comput. Secur., vol. 89, Feb. 2020.
[18] P. Yadav, N. Menon, V. Ravi, S. Vishvanathan and T. D. Pham, "A two-stage deep learning framework for image-based Android malware detection and variant classification," Comput. Intell., May 2022.
[19] Y. Liu, C. Tantithamthavorn, L. Li, and Y. Liu, "Explainable AI for Android Malware Detection: Towards Understanding Why the Models Perform So Well?," 2022 IEEE 33rd International Symposium on Software Reliability Engineering （ISSRE）, pp. 169–180, 2022.
[20] Daniel Arp, Michael Spreitzenbarth, Malte Huebner, Hugo Gascon, and Konrad Rieck, "Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket," presented at the 21th Annual Network and Distributed System Security Symposium （NDSS）, 2014.
[21] Samaneh Mahdavifar, Andi Fitriah Abdul Kadir, Rasool Fatemi, Dima Alhadidi, and A. A. Ghorbani, "Dynamic Android Malware Category Classification using Semi-Supervised Deep Learning," presented at the 18th IEEE International Conference on Dependable, Autonomic, and Secure Computing （DASC）, 2020.
[22] Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon, "AndroZoo: Collecting Millions of Android Apps for the Research Community," in Proceedings of the 13th International Conference on Mining Software Repositories, 2016: ACM, pp. 468-471.
[23] Z. Aung and W. Zaw, "Permission-based Android malware detection", Int. J. Sci. Technol. Res., vol. 2, no. 3, pp. 228-234, 2013.
[24] N. Peiravian and X. Zhu, "Machine Learning for Android Malware Detection Using Permission and API Calls", 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, pp. 300-305, 2013.
[25] M. K. Alzaylaee, S. Y. Yerima and S. Sezer, "DynaLog: An automated dynamic analysis framework for characterizing android applications", 2016 International Conference on Cyber Security and Protection Of Digital Services （Cyber Security）, pp. 1-8, 2016.
[26] V. Sihag, M. Vardhan and P. Singh, "BLADE: Robust malware detection against obfuscation in android", Forensic Sci. Int. Digit. Invest., vol. 38, Sep. 2021.
[27] 張櫻瀞, "整合注意力機制與圖像化操作碼之 Android 惡意程式分析研究", 碩士論文, 資訊管理學系, 國立中央大學, 2021.。
[28] T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
[29] Q. Le and T. Mikolov, "Distributed representations of sentences and documents," Proc. 31st Int. Conf. Machine Learning, pp. 1188-1196, 2014.
[30] A. Vaswani et al., "Attention is all you need," Proc. Adv. Neural Inf. Process. Syst., pp. 5998-6008, 2017.
[31] M. Mimura, R. Ito, "Applying NLP Techniques to Malware Detection in a Practical Environment," Int. J. Inf. Secur., 21, 279–291, 2022.
[32] Vinay Pandya, "Contextualized Vector Embeddings for Malware Detection," Master’s Theses and Graduate Research, San Jose State University, 2022.
[33] M. T. Ribeiro, S. Singh and C. Guestrin, "‘Why should I trust you?’: Explaining the predictions of any classifier," Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp. 1135-1144, 2016.
[34] S. M. Lundberg and S.-I Lee, "A unified approach to interpreting model predictions." Advances in neural information processing systems, 30, 2017.
[35] R. Alenezi and S. A. Ludwig, "Explainability of cybersecurity threats data using SHAP", Proc. IEEE Symp. Comput. Intell. (SSCI), pp. 1-10, Dec. 2021.
[36] M. Fan, W. Wei, X. Xie, Y. Liu, X. Guan and T. Liu, "Can we trust your explanations? sanity checks for interpreters in android malware analysis," IEEE Transactions on Information Forensics and Security, vol. 16, pp. 838-853, 2020.
[37] Chen, Ching-Ju, et al., "Improving CNN-based pest recognition with a post-hoc explanation of XAI," preprint, In Review, 26 Aug. 2021.
[38] A. Kapishnikov, T. Bolukbasi, F. Viégas, and M. Terry, "XRAI: Better attributions through regions," in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4948–4957, 2019.
[39] E. Lee, Y. Lee, and T. Lee. "Automatic False Alarm Detection Based on XAI and Reliability Analysis," Applied Sciences, vol. 12,13, 6761, 2022.
[40] "Apktool." https://ibotpeaches.github.io/Apktool/ (accessed 2022).

指導教授

陳奕明(Yi-Ming Chen)

審核日期

2023-7-28

推文