結合特徵選取與重採樣技術應用於信用風險預測;Combining Feature Selection and Resampling Techniques for Credit Risk Prediction

NCU Institutional Repository > 管理學院 > 資訊管理學系碩士在職專班 > 博碩士論文 > Item 987654321/95453

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://ir.lib.ncu.edu.tw/handle/987654321/95453

题名:	結合特徵選取與重採樣技術應用於信用風險預測;Combining Feature Selection and Resampling Techniques for Credit Risk Prediction
作者:	陳奕嫻;Chen, Yi-Hsien
贡献者:	資訊管理學系在職專班
关键词:	信用風險;特徵選取;重採樣;不平衡資料;機器學習;資料探勘;Credit Risk;Feature Selection;Resampling;Imbalanced Data;Machine Learning;Data Mining
日期:	2024-06-11
上传时间:	2024-10-09 16:52:04 (UTC+8)
出版者:	國立中央大學
摘要:	信用風險管理是銀行的核心議題，精確評估高風險貸款並建立可靠的信用評分模型極為重要。傳統機器學習演算法在處理平衡數據時表現良好，但在面對不平衡的類別分布時，這些模型往往偏向多數類別（即良好信用），而忽略了少數重要的類別（即不良信用）。這種偏差可能導致不良信用被錯誤地分類為良好信用，當這些借款人違約時，金融機構可能面臨巨大的財務損失。為了解決不平衡問題，在本研究中結合了特徵選取和重採樣技術，從公開平台收集了五個信用風險數據集，採用了三種特徵選取與八種重採樣技術，並對六種不同的分類器模型進行了廣泛的實驗。通過系統性的比較分析，本研究評估了單獨與組合前處理技術的性能，並探討了不同前處理技術的應用順序對模型預測結果的影響。此研究為信用風險管理提供了一種有效的前處理組合策略，即先進行重採樣平衡資料集後，再進行特徵選取選出具代表性的特徵，相較於單一技術的應用，能夠有效提升模型的預測效能，特別是在小規模且高度不平衡的數據集中效果更為優秀，該策略有助於改進信用評分模型，從而更精確地識別和處理高風險貸款。;Credit risk management is a core issue for banks, and accurately assessing high-risk loans and establishing reliable credit scoring models is extremely important. Traditional machine learning algorithms perform well with balanced data, but when facing imbalanced class distributions, these models tend to favor the majority class (i.e., good credit) while neglecting the minority important class (i.e., poor credit). This bias could lead to misclassification of poor credit as good credit, potentially causing significant financial losses for financial institutions when these borrowers default. To solve the imbalance issue, this study combined feature selection and resampling techniques, collecting five credit risk datasets from public platforms. It employed three feature selection methods and eight resampling techniques, and conducted extensive experiments on six different classifier models. Through systematic comparative analysis, this study evaluated the performance of individual and combined preprocessing techniques and explored the impact of the order of these techniques on the model prediction results. This research offers an effective preprocessing combination strategy for credit risk, which involves first resampling to balance the dataset and then selecting representative features through feature selection. Compared to the application of a single technique, this strategy can effectively enhance the predictive performance of models, especially in small and highly imbalanced datasets. This strategy contributes to the improvement of credit models, thereby enabling more accurate identification and management of high-risk loans.
显示于类别:	[資訊管理學系碩士在職專班 ] 博碩士論文

文件中的档案:

档案	描述	大小	格式	浏览次数
index.html		0Kb	HTML	169	检视/开启

在NCUIR中所有的数据项都受到原著作权保护.

社群 sharing

数据加载中.....