博碩士論文 106423003 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:40 、訪客IP:35.171.45.91
姓名 徐振皓(Cheng-Hao Hsu)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 一種針對LSTM長序列問題之新型前處理降維方法研究-以Android惡意程式分析為例
(A Novel Preprocessing Method for Solving Long Sequence Problem in Android Malware Detection)
相關論文
★ 應用數位版權管理機制於數位影音光碟內容保護之研究★ 以應用程式虛擬化技術達成企業軟體版權管理之研究
★ 以IAX2為基礎之網頁電話架構設計★ 應用機器學習技術協助警察偵辦詐騙案件之研究
★ 網際網路半結構化資料之蒐集與整合研究★ 電子商務環境下網路購物幫手之研究
★ 網路安全縱深防護機制之研究★ 國家寬頻實驗網路上資源預先保留與資源衝突之研究
★ 以樹狀關聯式架構偵測電子郵件病毒之研究★ 考量地區差異性之隨選視訊系統影片配置研究
★ 不信任區域網路中數位證據保留之研究★ 入侵偵測系統事件說明暨自動增加偵測規則之整合性輔助系統研發
★ 利用程序追蹤方法關聯分散式入侵偵測系統之入侵警示研究★ 一種網頁資訊擷取程式之自動化產生技術研發
★ 應用XML/XACML於工作流程管理系統之授權管制研究★ 快速建置SIP服務的設計與實作研究
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2021-7-31以後開放)
摘要(中) 目前Android手機市場的佔比最高,而惡意軟體的成長速度幾乎是以倍數成長。傳統惡意軟體偵測方法採用多種特徵,如:API、 system call、控制流、權限等方式做機器學習分析,然而,這些特徵容易被攻擊者修改以及混淆,另外傳統機器學習大多採用N-gram的方式,之後再特徵選取,不僅運算量大,面對新樣本時特徵又要重新提取。針對LSTM等序列深度學習模型將原始資料輸入模型後也會遇到長序列問題。所謂長序列問題,即輸入越長,模型越難記憶早期特徵,稱為梯度消散。因此部分研究採取訓練Embedding層以及Autoencoder等方式降維,亦即透過將特徵投影到另一維度做降維,但只要資料集有變化,其訓練出的結果就會不同。本篇論文提出一個基於深度學習與創新前處理壓縮技術的Android軟體偵測架構對惡意軟體做偵測,採用較底層的opcode操作碼當作特徵,其具有豐富意義也不容易遭到修改,並提出一種創新的前處理降維方法,在前處理時減少模型輸入資料量,解決深度學習會遭遇到的長序列問題,來達到快速偵測以及彈性訓練模型的目的。在未來面對新特徵及新樣本出現的同時,也可以很容易的擴充現有模型。本研究使用前處理後的opcode特徵向量輸入LSTM模型,實驗結果證明可以在不到3分鐘內訓練出高達95.58%準確度的家族分類模型。
摘要(英) Traditional machine learning mostly uses N-gram methods for serialization data predic-tion, which is not only time-consuming in the pre-processing but also computationally ex-pensive for the model. For the current common malware detection methods, a variety of features such as API, system call, control flow, and permissions are used for machine learn-ing analysis. However, these features depend on expert analysis and to extract multiple fea-tures is also time-consuming. This study uses Dalvik opcode as a feature, which is infor-mation rich and easy to extract. However, for the time series features of the opcode, the LSTM model and other sequence models will need effective dimension reduction approach because of the long sequence problem and variable feature length, resulting in poor training performance and long training time. Some study uses the training Embedding layer and Au-toencoder to reduce the feature dimension. This method requires a layer of network training time. Another method is through feature selection. This method will result in different re-sults as long as the data set changes or the sequence semantic is lost after feature selection. Therefore, in order to solve the above problems, this paper proposes a new pre-processing method to solve the long sequence problem that the LSTM model will encounter, so as to achieve fast training and high accuracy. This study uses a new pre-processing approach combined with an LSTM model to detect malware and achieve 95.58% accuracy on Drebin 10 family and only take 45 seconds to train a model. In addition, in the face of the small training sample problems common to deep learning, this research experiment also proved effective.
關鍵字(中) ★ Android
★ 靜態分析
★ 操作碼
★ 前處理
★ 惡意程式分類
★ LSTM
關鍵字(英) ★ Android
★ Static analysis
★ opcode
★ Preprocessing
★ LSTM
論文目次 論文摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
第一章 緒論 1
1-1 研究背景 1
1-2 研究動機 4
1-3 研究貢獻 8
1-4 章節架構 9
第二章 相關研究 10
2-1 近期使用傳統機器學習方法之相關文獻 10
2-2 使用深度學習之相關文獻 14
2-3 探討長序列以及降低資料維度之相關研究 17
2-4 小結 20
第三章 系統架構 22
3-1 系統架構 22
3-1-1 反編譯模組(Decompile Module) 23
3-1-2 向量提取模組(Vector Extraction Module) 25
3-1-3 分類模組(Classification Module) 29
3-2 系統運作流程 32
第四章 實驗結果 34
4-1 實驗環境與使用資料集 34
4-2 實驗一-惡意程式家族分類 35
4-3 實驗二-惡意程式二元分類 37
4-4 實驗三-惡意程式少量樣本家族分類 39
4-5 實驗四-與深度學習做比較 40
4-6 實驗五-與動態深度學習比較 42
4-7 實驗六-少量樣本測試 44
4-8 實驗結果與討論 45
第五章 結論與未來貢獻 46
5-1 結論與貢獻 46
5-2 研究限制與未來研究 48
參考文獻 50
參考文獻 [參考網站]
[1] APKpure [Online]. Available: https://apkpure.com/tw/.
[2] Chau, M. (2018 ). Smartphone Market Share [Online]. Available: https://www.idc.com/promo/smartphone-market-share/os.
[3] Chebyshev, V. (2019). Mobile malware evolution 2018 [Online]. Available: https://securelist.com/mobile-malware-evolution-2018/89689/.
[4] Hoffman, G. (2018). Introduction to LSTMs with TensorFlow [Online]. Available: https://www.oreilly.com/ideas/introduction-to-lstms-with-tensorflow.
[5] Paller, G. (2018). Dalvik opcodes [Online]. Available: http://pallergabor.uw.hu/androidblog/dalvik_opcodes.html.
[英文文獻]
[6] Alshahrani, H., Mansourt, H., Thorn, S., Alshehri, A., Alzahrani, A., and Fu, H., "DDefender: Android application threat detection using static and dynamic analysis," in 2018 IEEE International Conference on Consumer Electronics (ICCE), 2018: IEEE, pp. 1-6.
[7] Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., and Siemens, C., "Drebin: Effective and explainable detection of android malware in your pocket," in Ndss, 2014, vol. 14, pp. 23-26.
[8] Bacci, A., Bartoli, A., Martinelli, F., Medvet, E., and Mercaldo, F., "Detection of Obfuscation Techniques in Android Applications," in Proceedings of the 13th International Conference on Availability, Reliability and Security, 2018: ACM, p. 57.
[9] Bengio, Y., Simard, P., and Frasconi, P., "Learning long-term dependencies with gradient descent is difficult," IEEE transactions on neural networks, vol. 5, no. 2, pp. 157-166, 1994.
[10] Canfora, G., De Lorenzo, A., Medvet, E., Mercaldo, F., and Visaggio, C. A., "Effectiveness of opcode ngrams for detection of multi family android malware," in 2015 10th International Conference on Availability, Reliability and Security, 2015: IEEE, pp. 333-340.
[11] Canfora, G., Mercaldo, F., and Visaggio, C. A., "Mobile malware detection using op-code frequency histograms," in 2015 12th International Joint Conference on e-Business and Telecommunications (ICETE), 2015, vol. 4: IEEE, pp. 27-38.
[12] Chen, T., Mao, Q., Yang, Y., Lv, M., and Zhu, J., "TinyDroid: A Lightweight and Efficient Model for Android Malware Detection and Classification," Mobile Information Systems, vol. 2018, 2018.
[13] Graves, A., "Long short-term memory," in Supervised sequence labelling with recurrent neural networks: Springer, 2012, pp. 37-45.
[14] Hasegawa, C. and Iyatomi, H., "One-dimensional convolutional neural networks for Android malware detection," in 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA), 2018: IEEE, pp. 99-102.
[15] Hochreiter, S. and Schmidhuber, J., "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[16] Hsien-De Huang, T. and Kao, H.-Y., "R2-d2: Color-inspired convolutional neural network (cnn)-based android malware detections," in 2018 IEEE International Conference on Big Data (Big Data), 2018: IEEE, pp. 2633-2642.
[17] Ioffe, S. and Szegedy, C., "Batch normalization: Accelerating deep network training by reducing internal covariate shift," International Conference on Machine Learning, 2015.
[18] Jerome, Q., Allix, K., State, R., and Engel, T., "Using opcode-sequences to detect malicious Android applications," in 2014 IEEE International Conference on Communications (ICC), 2014: IEEE, pp. 914-919.
[19] Kang, B., Kang, B., Kim, J., and Im, E. G., "Android malware classification method: Dalvik bytecode frequency analysis," in Proceedings of the 2013 research in adaptive and convergent systems, 2013: ACM, pp. 349-350.
[20] Kang, J., Jang, S., Li, S., Jeong, Y.-S., and Sung, Y., "Long short-term memory-based Malware classification method for information security," Computers & Electrical Engineering, vol. 77, pp. 366-375, 2019.
[21] Le, Q., Boydell, O., Mac Namee, B., and Scanlon, M., "Deep learning at the shallow end: Malware classification for non-domain experts," Digital Investigation, vol. 26, pp. S118-S126, 2018.
[22] Liu, Y., Guo, K., Huang, X., Zhou, Z., and Zhang, Y., "Detecting android malwares with high-efficient hybrid analyzing methods," Mobile Information Systems, vol. 2018, 2018.
[23] McKinney, W., "pandas: a foundational Python library for data analysis and statistics," Python for High Performance and Scientific Computing, vol. 14, 2011.
[24] McLaughlin, N. et al., "Deep android malware detection," in Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, 2017: ACM, pp. 301-308.
[25] Naway, A. and LI, Y., "A Review on The Use of Deep Learning in Android Malware Detection," International Journal of Computer Science and Mobile Computing, vol. 7, no. 12, pp. 42-58, 2018.
[26] Pascanu, R., Stokes, J. W., Sanossian, H., Marinescu, M., and Thomas, A., "Malware classification with recurrent networks," in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015: IEEE, pp. 1916-1920.
[27] Ren, B., Liu, C., Cheng, B., Guo, J., and Chen, J., "MobiSentry: Towards Easy and Effective Detection of Android Malware on Smartphones," Mobile Information Systems, vol. 2018, 2018.
[28] Şahın, D. Ö., Kural, O. E., Akleylek, S., and Kiliç, E., "New results on permission based static analysis for Android malware," in 2018 6th International Symposium on Digital Forensic and Security (ISDFS), 2018: IEEE, pp. 1-4.
[29] Sak, H., Senior, A., and Beaufays, F., "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Fifteenth annual conference of the international speech communication association, 2014.
[30] Sorzano, C. O. S., Vargas, J., and Montano, A. P., "A survey of dimensionality reduction techniques," arXiv:1403.2877 [stat.ML], 2014.
[31] Sun, G. and Qian, Q., "Deep Learning and Visualization for Identifying Malware Families," IEEE Transactions on Dependable and Secure Computing, 2018.
[32] Vinayakumar, R., Soman, K., Poornachandran, P., and Sachin Kumar, S., "Detecting Android malware using long short-term memory (LSTM)," Journal of Intelligent & Fuzzy Systems, vol. 34, no. 3, pp. 1277-1288, 2018.
[33] Wang, W., Gao, Z., Zhao, M., Li, Y., Liu, J., and Zhang, X., "DroidEnsemble: Detecting Android malicious applications with ensemble of string and structural static features," IEEE Access, vol. 6, pp. 31798-31807, 2018.
[34] Wang, Y. and Zong, H., "DroidGene: Detecting Android Malware Using Its Malicious Gene," in International Conference on Bio-Inspired Computing: Theories and Applications, 2018: Springer, pp. 315-330.
[35] Winsniewski, R., "Android–apktool: A tool for reverse engineering android apk files," ed: Tech. Rep, 2012.
[36] Xiao, X., Zhang, S., Mercaldo, F., Hu, G., and Sangaiah, A. K., "Android malware detection based on system call sequences and LSTM," Multimedia Tools and Applications, vol. 78, no. 4, pp. 3979-3999, 2019.
[37] Xiaofeng, L., Xiao, Z., Fangshuo, J., Shengwei, Y., and Jing, S., "ASSCA: API based Sequence and Statistics Features Combined Malware Detection Architecture," Procedia Computer Science, vol. 129, pp. 248-256, 2018.
[38] Yan, J., Qi, Y., and Rao, Q., "LSTM-Based Hierarchical Denoising Network for Android Malware Detection," Security and Communication Networks, vol. 2018, 2018.
[39] Yan, J., Qi, Y., and Rao, Q., "Detecting malware with an ensemble method based on deep neural network," Security and Communication Networks, vol. 2018, 2018.
[40] Ye, Y., Li, T., Adjeroh, D., and Iyengar, S. S., "A survey on malware detection using data mining techniques," ACM Computing Surveys (CSUR), vol. 50, no. 3, p. 41, 2017.
[41] Yuan, Z., Lu, Y., Wang, Z., and Xue, Y., "Droid-sec: deep learning in android malware detection," in ACM SIGCOMM Computer Communication Review, 2014, vol. 44, no. 4: ACM, pp. 371-372.
[42] Yuan, Z., Lu, Y., and Xue, Y., "Droiddetector: android malware characterization and detection using deep learning," Tsinghua Science and Technology, vol. 21, no. 1, pp. 114-123, 2016.
[43] Zhang, H., Zhang, W., Lv, Z., Sangaiah, A. K., Huang, T., and Chilamkurti, N., "MALDC: a depth detection method for malware based on behavior chains," World Wide Web, pp. 1-20, 2019.
[44] Zhang, J., Qin, Z., Zhang, K., Yin, H., and Zou, J., "Dalvik opcode graph based android malware variants detection using global topology features," IEEE Access, vol. 6, pp. 51964-51974, 2018.
[45] Zhang, Y., Yang, Y., and Wang, X., "A Novel Android Malware Detection Approach Based on Convolutional Neural Network," in Proceedings of the 2nd International Conference on Cryptography, Security and Privacy, 2018: ACM, pp. 144-149.
指導教授 陳奕明(Yi-Ming Chen) 審核日期 2019-7-29
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明