應用生成對抗網路於資料擴增之Android惡意程式分析研究;Using Generative Adversarial Networks for Data Augmentation in Android Malware Detection

NCUIR > School of Management at National Central University > Graduate Institute of Information Management > Electronic Thesis & Dissertation > Item 987654321/84074

Please use this identifier to cite or link to this item: http://ir.lib.ncu.edu.tw/handle/987654321/84074

Title:	應用生成對抗網路於資料擴增之Android惡意程式分析研究;Using Generative Adversarial Networks for Data Augmentation in Android Malware Detection
Authors:	楊竣憲;Yang, Chun-Hsien
Contributors:	資訊管理學系
Keywords:	生成對抗網路;資料擴增;深度學習;Android;GAN;Data augmentation;Deep learning;Android
Date:	2020-07-29
Issue Date:	2020-09-02 18:01:25 (UTC+8)
Publisher:	國立中央大學
Abstract:	隨著惡意攻擊手法不斷推陳出新，面對這些層出不窮的新穎惡意程式，資料集中經常出現樣本不平衡的問題，使得分類器在訓練過程無法透過足夠數據學習某些類別其潛在惡意特徵。本研究將應用生成對抗網路於Android惡意程式分析領域，生成對抗網路是一種針對圖像進行訓練和生成數據的深度學習架構，已經被廣泛用為資料擴增於其他的機器視覺圖像辨識研究中。本論文透過將Android程式特徵轉為圖像化表達，並將數量稀少的惡意家族由該方法進行資料生成，藉此平衡、擴增原有資料集。同時本研究也比較了其他傳統的資料擴增技術，探討是否有益於辨識出少量的惡意類別樣本。測試證實不論是傳統圖像擴增方法或是生成對抗網路皆能提升分類的準確率，但生成對抗網路能更有效提高分類模型檢測出資料集中原本因數量較少而辨識準確率較低的惡意家族，實驗結果表示在Drebin四千筆與AMD兩萬筆資料的不同資料集中，對於樣本數量較稀少的類別經由生成對抗網路擴增後，相較於擴增前，兩者準確率的差異可達5%~20%。;As malicious attack techniques continue to evolve, in the face of these endless new malicious programs, the problem of sample imbalance often occurs in the dataset, making the classifier unable to learn certain categories of its potential malicious features through sufficient data during the training process. In this study, will apply the Generative Adversarial Networks(GAN), which is a kind of deep learning architecture that trains and generates data for images, to the field of Android malware analysis. GAN has been widely used as data augmentation for other machine vision image recognition researching. In this paper, the characteristics of Android programs are converted into graphical expressions, and a few malicious families are generated by this method to balance and expand the original data set. At the same time, this study also compared other traditional data amplification techniques to explore whether it is beneficial to identify a small number of malicious category samples. Tests have confirmed that both traditional image amplification methods and GAN can improve the accuracy of classification, but the GAN can more effectively improve the classification model. The detection accuracy of the data set was originally low due to the small number of data. The malicious family, the experimental results show that in the different data sets of Drebin′s 4,000 and AMD′s 20,000 samples, the accuracy of the two types of samples with a relatively small number of samples is amplified by the generation of the anti-network, compared to before the amplification. The difference can reach 5%~20%.
Appears in Collections:	[Graduate Institute of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	174	View/Open

社群 sharing

Loading...