此研究為信用風險管理提供了一種有效的前處理組合策略,即先進行重採樣平衡資料集後,再進行特徵選取選出具代表性的特徵,相較於單一技術的應用,能夠有效提升模型的預測效能,特別是在小規模且高度不平衡的數據集中效果更為優秀,該策略有助於改進信用評分模型,從而更精確地識別和處理高風險貸款。;Credit risk management is a core issue for banks, and accurately assessing high-risk loans and establishing reliable credit scoring models is extremely important. Traditional machine learning algorithms perform well with balanced data, but when facing imbalanced class distributions, these models tend to favor the majority class (i.e., good credit) while neglecting the minority important class (i.e., poor credit). This bias could lead to misclassification of poor credit as good credit, potentially causing significant financial losses for financial institutions when these borrowers default.
To solve the imbalance issue, this study combined feature selection and resampling techniques, collecting five credit risk datasets from public platforms. It employed three feature selection methods and eight resampling techniques, and conducted extensive experiments on six different classifier models. Through systematic comparative analysis, this study evaluated the performance of individual and combined preprocessing techniques and explored the impact of the order of these techniques on the model prediction results.
This research offers an effective preprocessing combination strategy for credit risk, which involves first resampling to balance the dataset and then selecting representative features through feature selection. Compared to the application of a single technique, this strategy can effectively enhance the predictive performance of models, especially in small and highly imbalanced datasets. This strategy contributes to the improvement of credit models, thereby enabling more accurate identification and management of high-risk loans.