透過大型語言模型之檢索增強生成方法探討Android 漏洞行檢測與漏洞可解釋性;Investigating Android Vulnerability Line Detection and Explainability via Retrieval-Augmented Generation with Large Language Models

NCUIR > School of Management at National Central University > Graduate Institute of Information Management > Electronic Thesis & Dissertation > Item 987654321/98403

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98403

Title:	透過大型語言模型之檢索增強生成方法探討Android 漏洞行檢測與漏洞可解釋性;Investigating Android Vulnerability Line Detection and Explainability via Retrieval-Augmented Generation with Large Language Models
Authors:	楊晴閔;Yang, Qing-Min
Contributors:	資訊管理學系
Keywords:	檢索增強生成;大型語言模型;資料流圖;Android 漏洞偵測;少量示例提示;可解釋性;Retrieval-Augmented Generation (RAG);Large Language Models;Data Flow Graph;Android Vulnerability Detection;Few-shot Prompting;Explainability
Date:	2025-08-01
Issue Date:	2025-10-17 12:44:50 (UTC+8)
Publisher:	國立中央大學
Abstract:	隨著 Android 平台上的應用程式數量持續攀升，如何強化原始碼層級的漏洞檢測能力成為資安領域的重要課題。儘管大型語言模型（Large Language Models,LLM）已被廣泛應用於程式碼分析與漏洞定位，其生成結果仍容易受到幻覺現象（ hallucination）影響，導致錯誤或無關的語句，降低預測正確率與可解釋性。為解決此問題，本研究提出一套結合「檢索增強生成（Retrieval-Augmented Generation,RAG）」機制的漏洞行級檢測與解釋方法。本研究利用資料流圖（Data Flow Graph,DFG）建構程式語意表示，並建立外部知識庫，儲存具語意相似性的程式碼片段，輔助模型進行行級漏洞定位。此外，設計少樣本提示語（ few-shot prompt）以引導 LLM 聚焦在與目標CWE-ID類型語意相關的知識，有效提升模型在語意模糊情境下的預測能力。於解釋層面，本研究分析模型生成的解釋語句，並透過BERTScore評估其與標準答案（Ground Truth）的語意一致性。實驗以MobSF所產出之 Android 漏洞資料集為基礎，針對定位準確度與語句解釋品質進行評估。結果顯示，融合外部知識與少樣本提示後，模型在漏洞行級定位任務中的F1-score 較原始模型提升48% Precision提升34% 所生成的解釋語句亦能與標準答案維持高度語意一致性（BERTScore > 0.85）。本研究證實RAG機制能有效降低語句幻覺現象，並顯著提升模型在程式碼漏洞偵測與可解釋性任務上的整體表現。關鍵字: Retrieval-Augmented Generation, Large Language Models, Data Flow Graph, Android Vulnerability Detection, Few-shot Prompting, Explainability;With the rapid growth of Android applications, enhancing source-level vulnerability detection has become increasingly critical in the field of cybersecurity. Although Large Language Models (LLMs) have shown promise in code analysis and vulnerability localization, they are still prone to generating hallucinations—irrelevant or incorrect outputs—especially in semantically ambiguous scenarios, thus reducing both accuracy and interpretability. To address this issue, this study proposes a line-level vulnerability detection and explanation framework based on Retrieval-Augmented Generation (RAG). Our approach constructs Data Flow Graphs (DFG) to represent code semantics and builds an external knowledge base consisting of semantically related code snippets, which serve as context for improving LLM predictions. In addition, we design few-shot prompts tailored to the target CWE-ID type, guiding the LLM to focus on relevant patterns for more accurate localization and explanation. To evaluate explanation quality, we use BERTScore to measure the semantic similarity between the generated explanation and the ground truth. Experiments conducted on a MobSF-generated Android vulnerability dataset show that our RAG-based method significantly outperforms the original model: the F1-score improves by 48%, and precision increases by 34%. Furthermore, explanations generated under the few shot setting achieve a BERTScore above 0.85. These results demonstrate that RAG not only enhances line-level vulnerability localization but also effectively mitigates hallucinations, contributing to better interpretability and robustness in code understanding tasks. Keywords: Retrieval-Augmented Generation, Large Language Models, Data Flow Graph, Android Vulnerability Detection, Few-shot Prompting, Explainability
Appears in Collections:	[Graduate Institute of Information Management] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	295	View/Open

社群 sharing

Loading...