信用評估及破產預測領域中,由於資料收集的困難與領域特性,經常面臨到資料不平衡的狀況。為解決資料不平衡對於模型預測造成的問題,目前常見的處理方式為使用基於內插法的傳統過採樣方法,對資料集進行平衡。近年來,隨著個人資料隱私的被重視,逐漸發展出使用生成模型學習原始資料集分佈與特徵,並生成合成資料集的技術。該技術得以讓研究學者使用合成的資料集,在不洩漏個人隱私的情況下繼續進行研究。由於該類技術所生成的樣本具有類似於原始樣本特徵與分布的特性,因此有學者嘗試將其應用於解決資料不平衡的問題。 本研究將使用兩種具代表性的深度表格生成模型 (CopulaGAN與TVAE) 作為深度過採樣的代表方法,並與四種具代表性的傳統過採樣方法 (SMOTE、polynomial-fit-SMOTE、Borderline SMOTE與ADASYN),在所蒐集三個信用領域的資料集及三個破產領域的資料集中進行比較,觀察六種方法於信用評估及破產預測領域當中的適用性。 本研究發現TVAE在信用評估及破產預測領域當中的表現優於其它五種過採樣方法。最終,本研究進一步將實驗中的最佳深度過採樣方法 (TVAE) ,與最佳傳統過採樣方法 (ADASYN) 進行結合使用。發現以整體而言,先使用深度過採樣方法進行過採樣後,再使用傳統過採樣方法進行過採樣可以進一步獲得更低的TypeII錯誤率。 ;In the field of credit risk prediction and bankruptcy prediction, data imbalance is a common challenge due to difficulties in data collection and the characteristics of the domain. To address the issues caused by data imbalance in model predictions, the common approach currently used is to balance the dataset through traditional oversampling methods based on interpolation. In recent years, with the increasing emphasis on personal data privacy, methods have been developed to learn the distribution and the relationship between features of the original samples using deep generative models and generating synthetic datasets. This allows researchers to continue their studies without compromising individual privacy. Since the samples generated by such methods possess similar characteristics and distributions to the original samples, some researchers have attempted to apply them to solve the data imbalance problem. This approach is referred to as the deep oversampling method. Our research compares two representative deep tabular generative models (CopulaGAN and TVAE) with four representative traditional oversampling methods (SMOTE, Polynomial-fit-SMOTE, Borderline SMOTE, and ADASYN) in three credit risk and three bankruptcy datasets. The goal is to observe the performance of these six methods in the fields of credit risk and bankruptcy prediction. Our study found that TVAE outperformed the other five oversampling methods in credit risk and bankruptcy prediction domains. We further combine the best deep oversampling method (TVAE) with the best traditional oversampling method (ADASYN) in the research and found that, overall, using deep oversampling followed by applying the traditional oversampling method leads to even lower Type II Error.