基於度量學習之網站應用特徵相似度估計方法;Fingerprint Similarity Estimation Method for Web Applications Based on Metric Learning

NCUIR > College of Electrical Engineering & Computer Science > Graduate Institute of Computer Science and Information Engineering > Electronic Thesis & Dissertation > Item 987654321/98262

Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98262

Title:	基於度量學習之網站應用特徵相似度估計方法;Fingerprint Similarity Estimation Method for Web Applications Based on Metric Learning
Authors:	陳冠宇;Chen, Guan-Yu
Contributors:	資訊工程學系
Keywords:	網站相似度估計;度量學習;Deep Sets;特徵向量嵌入;滲透測試輔助;Website Similarity Estimation;Metric Learning;Deep Sets;Feature Vector Embedding;Penetration Testing Support
Date:	2025-07-17
Issue Date:	2025-10-17 12:33:33 (UTC+8)
Publisher:	國立中央大學
Abstract:	在滲透測試實務中，網站系統類型與潛在漏洞的快速判斷對調查效率具有關鍵影響。過去研究指出，相似網站往往共享相似漏洞特徵，顯示相似性分析在安全測試中具有高度應用潛力。實務上亦可觀察到，由同一開發商建置或採用相似模板開發的網站，常共用近似的路徑結構、資源命名與腳本設計，展現一致性的系統行為與特徵風格。然而，現有方法多著重於靜態資料庫建構與單頁特徵比對，難以整合跨頁面結構所組成之網站整體特徵，限制其於本研究聚焦任務中的應用。為解決此問題，提出WSimNet，一套結合特徵採樣、集合型神經網路與度量學習的網站語意相似度估計方法。設計上參考Deep Sets概念以處理頁面數變動，並透過Triplet Loss建構語意嵌入空間。本研究自行蒐集來自CMS與多家網站開發商之資料，涵蓋15類共6174筆網站實例，作為模型訓練與評估基礎。實驗結果顯示，類內相似度平均高出類間156至244倍，平均測試案例召回率達0.9819，Top-1至Top-9案例推薦任務之mPrecision與mNDCG皆維持在0.98以上，分類任務中Precision、Recall與F1-score亦皆超過0.96。結果驗證所提方法能有效建構語意嵌入空間並產生高品質相似度估算結果，於類型識別與案例推薦任務中展現穩定且優異之表現，顯示其在實務應用場景中的可行性與潛力。;In practical penetration testing, quickly identifying website types and potential vulnerabilities is essential for investigative efficiency. Prior research suggests that similar websites often share common vulnerability traits, indicating strong potential for similarity-based analysis. From our observations, sites developed by the same vendor or with similar templates typically exhibit consistent structural and stylistic features, such as path structures and script design. However, existing methods focus on static databases and single-page features, limiting their ability to capture multi-page website characteristics. To address this, we propose WSimNet, a semantic similarity estimation approach integrating feature sampling, set-based neural networks, and metric learning. Leveraging Deep Sets and Triplet Loss, WSimNet handles varying page counts and builds a semantic embedding space. To support model training and evaluation, we constructed a dataset of 6,174 websites across 15 categories, collected from CMS platforms and various commercial developers. Experimental results show that intra-class similarity is 156–244 times higher than inter-class similarity, with an average recall of 0.9819. In Top-1 to Top-9 case recommendation tasks, mPrecision and mNDCG exceed 0.98; in classification, Precision, Recall, and F1-score all surpass 0.96. These results demonstrate WSimNet’s effectiveness and practical value in case recommendation and system classification for security testing.
Appears in Collections:	[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	111	View/Open

社群 sharing

Loading...