摘要: | 針對每一特定類型的惡意程式進行分類是很重要的,以便得知每一種類的惡意程式特性,如此才能作相對應的防護措施。現今的惡意程式數量不僅逐漸上升,還不斷的變種,使得單一一個惡意程式的特性不只一個可能性,還可能包含了其他惡意程式類別的特性,故本研究除了將檢測惡意程式進行分類,還會檢測其是否包含其他類別的風險值。本研究採用省時且覆蓋率高的靜態分析深入研究,在特徵提取的部分,以往文獻幾乎都採用permissions, API calls, components等等來檢測惡意程式,但這些特徵都需仰賴專家分析來過濾這些特徵,才可進行使用,而opcode不需仰賴專家分析,可直接使用原始資料來進行分析,且和應用程式的程式碼密切相關。本研究提出一個應用程式檢測平台,採用opcode sequence與機器學習來分類其檢測應用程式,我們採用靜態分析文獻常用的J48、RandomForest(RF)、NaiveBayes、LibSVM與Partial Decision Tree(PART)五種分類演算法來進行訓練與10折交叉驗證,其RandomForest 搭配4gram opcode sequence的F-Measure最高擁有97.5%。分類後再進行風險值計算,計算其檢測應用程式是否包含其他種類的惡意程式特性,給予其百分比做為判斷依據。;It is important to classify each particular type of malware in order to know the malware features of each class, so that the corresponding protective measures can be made. The number of malware is not only gradually rising and constantly variants. Making a malware features more than one possibility class, but also may contain other malware class characteristics. In this study have to detection of malware for classification and in addition to check whether it contains other classes of risk values, the use of time-saving and high coverage of the static analysis. The static analysis past literature extraction feature almost all use permissions, API calls, components and so on to detect malicious programs, but these features need to rely on expert analysis to filter these features before they can be used, and opcode do not need to rely on expert analysis, Directly using raw data for analysis, and is closely related to the application code, this study uses opcode as a static analysis feature as a study. In this study, we propose an application detection platform, which uses opcode sequence and machine learning to classify. We use J48, RandomForest (RF), NaiveBayes, LibSVM and Partial Decision Tree (PART), which are commonly used in static analysis literature. We use 10-fold cross validation to training and testing. The result is the RandomForest with 4gram opcode sequence of F-Measure has of 97.5%. After classification we can calculate risk value of application that whether contains other class of malware features and given the percentage as a basis for judging. |