基於深度表格生成模型的過採樣方法 於信用及破產預測領域的效能分析

DC 欄位	值	語言
DC.contributor	資訊管理學系	zh_TW
DC.creator	陳信瑋	zh_TW
DC.creator	Hsin-Wei Chen	en_US
dc.date.accessioned	2023-7-24T07:39:07Z
dc.date.available	2023-7-24T07:39:07Z
dc.date.issued	2023
dc.identifier.uri	http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110423032
dc.contributor.department	資訊管理學系	zh_TW
DC.description	國立中央大學	zh_TW
DC.description	National Central University	en_US
dc.description.abstract	信用評估及破產預測領域中，由於資料收集的困難與領域特性，經常面臨到資料不平衡的狀況。為解決資料不平衡對於模型預測造成的問題，目前常見的處理方式為使用基於內插法的傳統過採樣方法，對資料集進行平衡。近年來，隨著個人資料隱私的被重視，逐漸發展出使用生成模型學習原始資料集分佈與特徵，並生成合成資料集的技術。該技術得以讓研究學者使用合成的資料集，在不洩漏個人隱私的情況下繼續進行研究。由於該類技術所生成的樣本具有類似於原始樣本特徵與分布的特性，因此有學者嘗試將其應用於解決資料不平衡的問題。本研究將使用兩種具代表性的深度表格生成模型 (CopulaGAN與TVAE) 作為深度過採樣的代表方法，並與四種具代表性的傳統過採樣方法 (SMOTE、polynomial-fit-SMOTE、Borderline SMOTE與ADASYN)，在所蒐集三個信用領域的資料集及三個破產領域的資料集中進行比較，觀察六種方法於信用評估及破產預測領域當中的適用性。本研究發現TVAE在信用評估及破產預測領域當中的表現優於其它五種過採樣方法。最終，本研究進一步將實驗中的最佳深度過採樣方法 (TVAE) ，與最佳傳統過採樣方法 (ADASYN) 進行結合使用。發現以整體而言，先使用深度過採樣方法進行過採樣後，再使用傳統過採樣方法進行過採樣可以進一步獲得更低的TypeII錯誤率。	zh_TW
dc.description.abstract	In the field of credit risk prediction and bankruptcy prediction, data imbalance is a common challenge due to difficulties in data collection and the characteristics of the domain. To address the issues caused by data imbalance in model predictions, the common approach currently used is to balance the dataset through traditional oversampling methods based on interpolation. In recent years, with the increasing emphasis on personal data privacy, methods have been developed to learn the distribution and the relationship between features of the original samples using deep generative models and generating synthetic datasets. This allows researchers to continue their studies without compromising individual privacy. Since the samples generated by such methods possess similar characteristics and distributions to the original samples, some researchers have attempted to apply them to solve the data imbalance problem. This approach is referred to as the deep oversampling method. Our research compares two representative deep tabular generative models (CopulaGAN and TVAE) with four representative traditional oversampling methods (SMOTE, Polynomial-fit-SMOTE, Borderline SMOTE, and ADASYN) in three credit risk and three bankruptcy datasets. The goal is to observe the performance of these six methods in the fields of credit risk and bankruptcy prediction. Our study found that TVAE outperformed the other five oversampling methods in credit risk and bankruptcy prediction domains. We further combine the best deep oversampling method (TVAE) with the best traditional oversampling method (ADASYN) in the research and found that, overall, using deep oversampling followed by applying the traditional oversampling method leads to even lower Type II Error.	en_US
DC.subject	過採樣	zh_TW
DC.subject	深度學習	zh_TW
DC.subject	不平衡資料集	zh_TW
DC.subject	生成模型	zh_TW
DC.subject	Oversampling	en_US
DC.subject	Deep learning	en_US
DC.subject	Imbalance dataset	en_US
DC.subject	Generative Model	en_US
DC.title	基於深度表格生成模型的過採樣方法於信用及破產預測領域的效能分析	zh_TW
dc.language.iso	zh-TW	zh-TW
DC.title	Effectiveness Analysis of Deep Tabular Generation-Based Oversampling Method in Credit Risk and Bankruptcy Prediction	en_US
DC.type	博碩士論文	zh_TW
DC.type	thesis	en_US
DC.publisher	National Central University	en_US

博碩士論文 110423032 完整後設資料紀錄