基於度量學習之網站應用特徵相似度估計方法;Fingerprint Similarity Estimation Method for Web Applications Based on Metric Learning

NCU Institutional Repository > 資訊電機學院 > 資訊工程研究所 > 博碩士論文 > Item 987654321/98262

請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/98262

題名:	基於度量學習之網站應用特徵相似度估計方法;Fingerprint Similarity Estimation Method for Web Applications Based on Metric Learning
作者:	陳冠宇;Chen, Guan-Yu
貢獻者:	資訊工程學系
關鍵詞:	網站相似度估計;度量學習;Deep Sets;特徵向量嵌入;滲透測試輔助;Website Similarity Estimation;Metric Learning;Deep Sets;Feature Vector Embedding;Penetration Testing Support
日期:	2025-07-17
上傳時間:	2025-10-17 12:33:33 (UTC+8)
出版者:	國立中央大學
摘要:	在滲透測試實務中，網站系統類型與潛在漏洞的快速判斷對調查效率具有關鍵影響。過去研究指出，相似網站往往共享相似漏洞特徵，顯示相似性分析在安全測試中具有高度應用潛力。實務上亦可觀察到，由同一開發商建置或採用相似模板開發的網站，常共用近似的路徑結構、資源命名與腳本設計，展現一致性的系統行為與特徵風格。然而，現有方法多著重於靜態資料庫建構與單頁特徵比對，難以整合跨頁面結構所組成之網站整體特徵，限制其於本研究聚焦任務中的應用。為解決此問題，提出WSimNet，一套結合特徵採樣、集合型神經網路與度量學習的網站語意相似度估計方法。設計上參考Deep Sets概念以處理頁面數變動，並透過Triplet Loss建構語意嵌入空間。本研究自行蒐集來自CMS與多家網站開發商之資料，涵蓋15類共6174筆網站實例，作為模型訓練與評估基礎。實驗結果顯示，類內相似度平均高出類間156至244倍，平均測試案例召回率達0.9819，Top-1至Top-9案例推薦任務之mPrecision與mNDCG皆維持在0.98以上，分類任務中Precision、Recall與F1-score亦皆超過0.96。結果驗證所提方法能有效建構語意嵌入空間並產生高品質相似度估算結果，於類型識別與案例推薦任務中展現穩定且優異之表現，顯示其在實務應用場景中的可行性與潛力。;In practical penetration testing, quickly identifying website types and potential vulnerabilities is essential for investigative efficiency. Prior research suggests that similar websites often share common vulnerability traits, indicating strong potential for similarity-based analysis. From our observations, sites developed by the same vendor or with similar templates typically exhibit consistent structural and stylistic features, such as path structures and script design. However, existing methods focus on static databases and single-page features, limiting their ability to capture multi-page website characteristics. To address this, we propose WSimNet, a semantic similarity estimation approach integrating feature sampling, set-based neural networks, and metric learning. Leveraging Deep Sets and Triplet Loss, WSimNet handles varying page counts and builds a semantic embedding space. To support model training and evaluation, we constructed a dataset of 6,174 websites across 15 categories, collected from CMS platforms and various commercial developers. Experimental results show that intra-class similarity is 156–244 times higher than inter-class similarity, with an average recall of 0.9819. In Top-1 to Top-9 case recommendation tasks, mPrecision and mNDCG exceed 0.98; in classification, Precision, Recall, and F1-score all surpass 0.96. These results demonstrate WSimNet’s effectiveness and practical value in case recommendation and system classification for security testing.
顯示於類別:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	112	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....