結合過濾法、包裝法及嵌入法之集成式特徵選擇於軟 體缺陷預測中之應用;Integrating Filter, Wrapper, and Embedded Methods for Ensemble Feature Selection in Software Defect Prediction

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/92578

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/92578

題名:	結合過濾法、包裝法及嵌入法之集成式特徵選擇於軟體缺陷預測中之應用;Integrating Filter, Wrapper, and Embedded Methods for Ensemble Feature Selection in Software Defect Prediction
作者:	吳冠諭;Wu, Kuan-Yu
貢獻者:	資訊管理學系
關鍵詞:	軟體缺陷預測;機器學習;特徵選擇;集成式特徵選擇;高維度資料;Software Defect Prediction;Machine Learning;Feature Selection;Ensemble Feature Selection;High-dimensional Data
日期:	2023-07-14
上傳時間:	2023-10-04 16:05:34 (UTC+8)
出版者:	國立中央大學
摘要:	軟體測試是軟體開發生命週期中一項重要的工作，其在整個週期中佔了大量的時間，如果能針對容易出現缺陷的模組進行有效預測並事先修復，將可節省許多成本並交付更高品質的產品，因此軟體缺陷預測技術被應用於幫助開發人員降低其測試成本，其中，軟體度量是一種獲得原始碼客觀特徵描述的方法，所產生的指標也常被用於軟體偵錯。本研究使用NASA MDP與PROMISE的軟體缺陷預測資料集，這些資料集透過提取原始碼的多項靜態軟體度量指標作為機器學習模型的輸入特徵，然而因資料集屬於高維度資料，容易導致訓練上的複雜性及過擬合(Overfitting)問題。為解決此問題，本研究採用集成式特徵選擇，降低資料集維度再進行訓練，且不同於過往軟體缺陷預測領域的研究，本研究結合三種不同類型的特徵選擇技術，分別為過濾法(Filter)、包裝法(Wrapper)和內嵌法(Embedded)，並搭配三種聚合方法來產生特徵子集，包括交集(Intersection)、聯集(Union)和多重交集(Multi-intersection)，希望藉此克服單一特徵選擇方法的局限性，進而提升軟體缺陷預測的性能表現。研究結果顯示，基於聯集的集成式特徵選擇方法相較於單一特徵選擇擁有更高的預測準確率，同時也維持了良好的特徵縮減率。;Software testing is an important stage in the software development life cycle, which takes significant time. Therefore, if we can predict and fix modules prone to defects in advance, it can save a considerable amount of costs and deliver higher-quality products. Therefore, software defect prediction techniques are applied to assist developers in reducing testing costs, software metrics are one of the methods to obtain objective descriptions of the source code, and the metrics are often used for software debugging. In this study, the NASA MDP dataset and PROMISE datasets were used. These datasets extract multiple static software metrics from the source code as input features for machine learning models. However, the datasets’ high dimensionality can lead to training complexity and overfitting issues. An ensemble feature selection method was adopted in this research to reduce the dimensionality of the datasets before training. Distinct from previous studies in software defect prediction, our research integrates three types of feature selection techniques: filter, wrapper, and embedded methods. Furthermore, three aggregation methods are employed to generate feature subsets, including union, intersection, and multi-intersection. This combination aims to overcome the limitations of a single feature selection method, and to enhance software defect prediction performance. The result of this study indicated that the ensemble feature selection based on the union method, provides higher accuracy of prediction compared to single feature selection methods, while maintaining a good feature reduction rate.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	28	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....