結合函式呼叫圖語意特徵及域適應技術之Android 抗混淆惡意軟體檢測模型研究;A Research of Android Anti-Obfuscated Malware Detection Combined with Function Call Graph Semantic Feature and Domain Adaptation

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/92666

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/92666

題名:	結合函式呼叫圖語意特徵及域適應技術之Android 抗混淆惡意軟體檢測模型研究;A Research of Android Anti-Obfuscated Malware Detection Combined with Function Call Graph Semantic Feature and Domain Adaptation
作者:	楊蕙瑄;Yang, Hui-Hsuan
貢獻者:	資訊管理學系
關鍵詞:	混淆攻擊;深度學習;遷移學習;Android惡意軟體檢測;靜態分析;obfuscate attack;deep learning;transfer learning;Android malware detection;static analysis
日期:	2023-07-28
上傳時間:	2023-10-04 16:07:54 (UTC+8)
出版者:	國立中央大學
摘要:	近年來人工智慧技術被廣泛應用在Android惡意程式檢測研究中。但是惡意軟體開發人員也會透過不同方式逃避檢測，一種常見的方式叫做混淆攻擊，透過這種攻擊方式可以改變APK結構，使得檢測系統提取之特徵改變，導致模型判斷錯誤。根據先前的研究，一個原本可以達到97.7%的惡意軟體檢測模型，在接收到經過API Call Obfuscation技術混淆之資料後準確率會只剩下50.3%。本研究從特徵面與模型面思考如何防禦混淆問題，從特徵面來看，APK經過混淆後特徵雖然被改變，但還是要能夠表現出混淆前的行為，所以如果在特徵前處理的過程中可以表達軟體的行為將降低混淆對檢測系統的影響。本研究選擇函式呼叫圖（Function Call Graph）做為特徵基礎，並利用節點崁入（Node Embedding）技術學習節點之間表達的語意訊息，以建模軟體的行為特徵。而從模型面思考，儘管Node Embedding可以學習到APK的語意訊息，一些進階的混淆技術會透過修改程式碼的方式使得不同語意可以表達出相同行為。所以在模型面，本研究將使用遷移學習（Transfer Learning）中的域適應（Domain Adaptation）技術訓練模型，讓模型可以拉近混淆前後資料集在特徵空間中的距離，使得模型能夠判斷經過混淆之資料集，以達到抗混淆之目的。本研究所提出的檢測系統在未經混淆的情況下可以達到0.9888的檢測準確率，而在受到多種混淆技術的情況下可以維持平均0.9672的檢測準確率。其中Domain Adaptation技術將經過CallIndirection混淆影響的檢測準確率從87%提升到95%。;Artificial intelligence（AI）is widely used in Android malware detection. However, malware developers will use different methods to evade detection. A common method is called obfuscate attack. APK structure can be changed through the attack, resulting in model misjudgment. According to other research, a malware detection model that can reach 97.7% accuracy only have an accuracy rate of 51.3% after receiving the APK obfuscated by API Call Obfuscation. This research shows how to defend obfuscation in two aspects. From the sight of features, although the characteristics of APK will change after obfuscation, it still needs to keep the behavior before obfuscation. Therefore, if the behavior of an APK can be extracted in the process of feature preprocessing, the impact of obfuscation will reduce. As a result, this study chooses Function Call Graph（FCG）as a feature and uses Node Embedding to learn the semantic information between functions. From the perspective of the model, some advanced obfuscation attacks will modify code structure letting different semantics express the same behavior. Therefore, this study uses Domain Adaptation to train the model, so that the model can shorten the distance between different domains. Resulting the model to classify the obfuscated dataset to achieve the purpose of anti-obfuscation. My detection system can achieve 98% accuracy without obfuscated attacks. When facing multiple types of obfuscation attacks, it can maintain an average accuracy of 96%. In addition, Domain Adaptation improves the detection accuracy affected by CallIndirection from 87% to 95%.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	56	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....