基於深度表格生成模型的過採樣方法 於信用及破產預測領域的效能分析;Effectiveness Analysis of Deep Tabular Generation-Based Oversampling Method in Credit Risk and Bankruptcy Prediction

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/93266

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93266

題名:	基於深度表格生成模型的過採樣方法於信用及破產預測領域的效能分析;Effectiveness Analysis of Deep Tabular Generation-Based Oversampling Method in Credit Risk and Bankruptcy Prediction
作者:	陳信瑋;Chen, Hsin-Wei
貢獻者:	資訊管理學系
關鍵詞:	過採樣;深度學習;不平衡資料集;生成模型;Oversampling;Deep learning;Imbalance dataset;Generative Model
日期:	2023-07-24
上傳時間:	2024-09-19 16:51:09 (UTC+8)
出版者:	國立中央大學
摘要:	信用評估及破產預測領域中，由於資料收集的困難與領域特性，經常面臨到資料不平衡的狀況。為解決資料不平衡對於模型預測造成的問題，目前常見的處理方式為使用基於內插法的傳統過採樣方法，對資料集進行平衡。近年來，隨著個人資料隱私的被重視，逐漸發展出使用生成模型學習原始資料集分佈與特徵，並生成合成資料集的技術。該技術得以讓研究學者使用合成的資料集，在不洩漏個人隱私的情況下繼續進行研究。由於該類技術所生成的樣本具有類似於原始樣本特徵與分布的特性，因此有學者嘗試將其應用於解決資料不平衡的問題。本研究將使用兩種具代表性的深度表格生成模型 (CopulaGAN與TVAE) 作為深度過採樣的代表方法，並與四種具代表性的傳統過採樣方法 (SMOTE、polynomial-fit-SMOTE、Borderline SMOTE與ADASYN)，在所蒐集三個信用領域的資料集及三個破產領域的資料集中進行比較，觀察六種方法於信用評估及破產預測領域當中的適用性。本研究發現TVAE在信用評估及破產預測領域當中的表現優於其它五種過採樣方法。最終，本研究進一步將實驗中的最佳深度過採樣方法 (TVAE) ，與最佳傳統過採樣方法 (ADASYN) 進行結合使用。發現以整體而言，先使用深度過採樣方法進行過採樣後，再使用傳統過採樣方法進行過採樣可以進一步獲得更低的TypeII錯誤率。 ;In the field of credit risk prediction and bankruptcy prediction, data imbalance is a common challenge due to difficulties in data collection and the characteristics of the domain. To address the issues caused by data imbalance in model predictions, the common approach currently used is to balance the dataset through traditional oversampling methods based on interpolation. In recent years, with the increasing emphasis on personal data privacy, methods have been developed to learn the distribution and the relationship between features of the original samples using deep generative models and generating synthetic datasets. This allows researchers to continue their studies without compromising individual privacy. Since the samples generated by such methods possess similar characteristics and distributions to the original samples, some researchers have attempted to apply them to solve the data imbalance problem. This approach is referred to as the deep oversampling method. Our research compares two representative deep tabular generative models (CopulaGAN and TVAE) with four representative traditional oversampling methods (SMOTE, Polynomial-fit-SMOTE, Borderline SMOTE, and ADASYN) in three credit risk and three bankruptcy datasets. The goal is to observe the performance of these six methods in the fields of credit risk and bankruptcy prediction. Our study found that TVAE outperformed the other five oversampling methods in credit risk and bankruptcy prediction domains. We further combine the best deep oversampling method (TVAE) with the best traditional oversampling method (ADASYN) in the research and found that, overall, using deep oversampling followed by applying the traditional oversampling method leads to even lower Type II Error.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	14	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....