結合過濾法、包裝法及嵌入法之集成式特徵選擇於軟 體缺陷預測中之應用

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	吳冠諭	zh_TW
DC.creator	Kuan-Yu Wu	en_US
dc.date.accessioned	2023-7-14T07:39:07Z
dc.date.available	2023-7-14T07:39:07Z
dc.date.issued	2023
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110423060
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	軟體測試是軟體開發生命週期中一項重要的工作，其在整個週期中佔了大量的時間，如果能針對容易出現缺陷的模組進行有效預測並事先修復，將可節省許多成本並交付更高品質的產品，因此軟體缺陷預測技術被應用於幫助開發人員降低其測試成本，其中，軟體度量是一種獲得原始碼客觀特徵描述的方法，所產生的指標也常被用於軟體偵錯。本研究使用NASA MDP與PROMISE的軟體缺陷預測資料集，這些資料集透過提取原始碼的多項靜態軟體度量指標作為機器學習模型的輸入特徵，然而因資料集屬於高維度資料，容易導致訓練上的複雜性及過擬合(Overfitting)問題。為解決此問題，本研究採用集成式特徵選擇，降低資料集維度再進行訓練，且不同於過往軟體缺陷預測領域的研究，本研究結合三種不同類型的特徵選擇技術，分別為過濾法(Filter)、包裝法(Wrapper)和內嵌法(Embedded)，並搭配三種聚合方法來產生特徵子集，包括交集(Intersection)、聯集(Union)和多重交集(Multi-intersection)，希望藉此克服單一特徵選擇方法的局限性，進而提升軟體缺陷預測的性能表現。研究結果顯示，基於聯集的集成式特徵選擇方法相較於單一特徵選擇擁有更高的預測準確率，同時也維持了良好的特徵縮減率。	zh_TW
dc.description.abstract	Software testing is an important stage in the software development life cycle, which takes significant time. Therefore, if we can predict and fix modules prone to defects in advance, it can save a considerable amount of costs and deliver higher-quality products. Therefore, software defect prediction techniques are applied to assist developers in reducing testing costs, software metrics are one of the methods to obtain objective descriptions of the source code, and the metrics are often used for software debugging. In this study, the NASA MDP dataset and PROMISE datasets were used. These datasets extract multiple static software metrics from the source code as input features for machine learning models. However, the datasets’ high dimensionality can lead to training complexity and overfitting issues. An ensemble feature selection method was adopted in this research to reduce the dimensionality of the datasets before training. Distinct from previous studies in software defect prediction, our research integrates three types of feature selection techniques: filter, wrapper, and embedded methods. Furthermore, three aggregation methods are employed to generate feature subsets, including union, intersection, and multi-intersection. This combination aims to overcome the limitations of a single feature selection method, and to enhance software defect prediction performance. The result of this study indicated that the ensemble feature selection based on the union method, provides higher accuracy of prediction compared to single feature selection methods, while maintaining a good feature reduction rate.	en_US
DC.subject	軟體缺陷預測	zh_TW
DC.subject	機器學習	zh_TW
DC.subject	特徵選擇	zh_TW
DC.subject	集成式特徵選擇	zh_TW
DC.subject	高維度資料	zh_TW
DC.subject	Software Defect Prediction	en_US
DC.subject	Machine Learning	en_US
DC.subject	Feature Selection	en_US
DC.subject	Ensemble Feature Selection	en_US
DC.subject	High-dimensional Data	en_US
DC.title	結合過濾法、包裝法及嵌入法之集成式特徵選擇於軟體缺陷預測中之應用	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Integrating Filter, Wrapper, and Embedded Methods for Ensemble Feature Selection in Software Defect Prediction	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 110423060 完整後設資料紀錄