Abstract: | 近年來,隨著智慧手機的普及,Android惡意程式成為了一個越來越嚴重的問題,導致許多使用者的隱私資訊被洩漏,進而導致實質上的財產損失。為了解決這個問題,許多研究者采用了各種方法來辨別和分類惡意程式,包括靜態分析、動態分析和機器學習等技術。然而,市場上出現了許多經過混淆的惡意程式,這些惡意程式往往能夠繞過現有的檢測方法,使得檢測率下降。在這種情況下,許多研究者開始使用動態分析的方法來解決混淆惡意程式的問題。但是,動態分析需要實際執行應用程式來擷取動態特徵,而且當資料集相對龐大時,前處理時間會非常冗長。相比之下,靜態分析不需要實際執行應用程式,前處理時間相對精簡許多,但常用的特徵如API_CALL容易受到混淆技術的影響,從而降低模型的準確率。為了克服這個問題,本研究提出了一種特殊的前處理方法,該方法將對靜態特徵進行向量轉換,從而使混淆技術對這些靜態特徵的影響降至最低。同時,本研究還結合了污點分析技術,以提高Android惡意程式檢測的準確率和效率。 在未混淆資料集上達到了99%的準確率,並在混淆後的資料集中達到了97.8的準確率,且對比動態分析降低了接近20倍的前處理時間。 ;In recent years, with the popularity of smartphones, Android malware has become an increasingly serious problem, leading to the leakage of many users′ private information, which in turn leads to real property loss. To solve this problem, many researchers have adopted various methods to identify and classify malware, including static analysis, dynamic analysis, and machine learning techniques. However, there are many obfuscated malware on the market that can often bypass existing detection methods, resulting in a decrease in detection rates. In this context, many researchers have started to use dynamic analysis to address the problem of obfuscated malware. Dynamic analysis requires the actual execution of the application to capture dynamic features, and the pre-processing time can be very long when the dataset is relatively large. In contrast, static analysis does not require actual application execution, and the preprocessing time is much more streamlined, but common features such as API_CALL are susceptible to obfuscation techniques, thus reducing the accuracy of the model. To overcome this problem, this study proposes a special preprocessing method that performs vector transformation on static features, thus minimizing the effect of obfuscation techniques on these static features. This study also combines the taint analysis technique to improve the accuracy and efficiency of Android malware detection.The accuracy of 99% is achieved in the unobfuscated dataset and 97.8 in the obfuscated dataset, and the pre-processing time is improved by nearly 20 times compared to the dynamic analysis. |