結合微調大型語言模型與圖神經網路之 Android 原始碼漏洞類型檢測;Android Source Code Vulnerability Type Detection via Fine-tuned Large Language Model and Graph Neural Network Integration

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/98408

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98408

题名:	結合微調大型語言模型與圖神經網路之 Android 原始碼漏洞類型檢測;Android Source Code Vulnerability Type Detection via Fine-tuned Large Language Model and Graph Neural Network Integration
作者:	塗家瑋;Tu, Chia-Wei
贡献者:	資訊管理學系
关键词:	大型語言模型;圖神經網路;Android;漏洞檢測;原始碼檢測;Large Language Model;Graph Neural Network;Android;Vulnerability Detection;Source Code
日期:	2025-08-01
上传时间:	2025-10-17 12:45:17 (UTC+8)
出版者:	國立中央大學
摘要:	隨著行動裝置的普及與軟體系統日益複雜，Android 應用程式的安全性問題日漸嚴重，程式碼漏洞成為潛在風險的重要來源。現有多數 Android 原始碼漏洞檢測研究仍倚賴傳統機器學習方法，其特徵工程仰賴專家經驗，難以有效建模複雜語意與結構資訊。近年大型語言模型（LLM）於程式碼理解任務上展現強大潛力，然而其對程式結構掌握有限，亦限制其在漏洞偵測上的表現。本研究以開源的 Code LLaMA 為語意特徵基礎，並採用 QLoRA 技術進行高效微調，降低訓練成本與資源消耗。同時，為彌補 LLM 在結構理解上的不足，進一步引入程式碼屬性圖（Code Property Graph，簡稱 CPG）搭配圖卷積網路（Graph Convolutional Network，簡稱 GCN）進行結構特徵學習。為提升圖神經網路的學習效率與準確性，本研究提出一套以敏感漏洞行為以及以常見漏洞露行為核心的圖裁切策略，濃縮關鍵節點與依賴關係，去除與漏洞無關的冗餘資訊。最終，透過語意與結構特徵的融合模型進行漏洞類別分類任務，實驗結果顯示本研究所提出之方法可以在漏洞類型多分類任務上達到 97.23% F1-score，優於傳統基準方法（如 ACVED）約 3% 。;With the growing complexity of software systems, Android application security has become increasingly critical. Traditional vulnerability detection methods often rely on handcrafted features, limiting their ability to model complex semantics and structures in code. While Large Language Models (LLMs) like Code LLaMA excel at semantic understanding, they lack structural awareness. To address this, we apply QLoRA for efficient fine-tuning of Code LLaMA to extract semantic features with reduced resource cost. Additionally, we incorporate structural information via Code Property Graphs (CPGs) and Graph Convolutional Networks (GCNs). To improve graph learning efficiency, we propose a slicing strategy centered on vulnerability-relevant patterns, reducing noise while preserving critical dependencies. We then design a fusion model that integrates both semantic and structural features for vulnerability type classification. Experiments show our approach outperforms traditional methods (e.g., ACVED), achieving an F1-score of 3%.
显示于类别:	[資訊管理研究所] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	56	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....