博碩士論文 110423032 完整後設資料紀錄

DC 欄位 語言
DC.contributor資訊管理學系zh_TW
DC.creator陳信瑋zh_TW
DC.creatorHsin-Wei Chenen_US
dc.date.accessioned2023-7-24T07:39:07Z
dc.date.available2023-7-24T07:39:07Z
dc.date.issued2023
dc.identifier.urihttp://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=110423032
dc.contributor.department資訊管理學系zh_TW
DC.description國立中央大學zh_TW
DC.descriptionNational Central Universityen_US
dc.description.abstract信用評估及破產預測領域中,由於資料收集的困難與領域特性,經常面臨到資料不平衡的狀況。為解決資料不平衡對於模型預測造成的問題,目前常見的處理方式為使用基於內插法的傳統過採樣方法,對資料集進行平衡。近年來,隨著個人資料隱私的被重視,逐漸發展出使用生成模型學習原始資料集分佈與特徵,並生成合成資料集的技術。該技術得以讓研究學者使用合成的資料集,在不洩漏個人隱私的情況下繼續進行研究。由於該類技術所生成的樣本具有類似於原始樣本特徵與分布的特性,因此有學者嘗試將其應用於解決資料不平衡的問題。 本研究將使用兩種具代表性的深度表格生成模型 (CopulaGAN與TVAE) 作為深度過採樣的代表方法,並與四種具代表性的傳統過採樣方法 (SMOTE、polynomial-fit-SMOTE、Borderline SMOTE與ADASYN),在所蒐集三個信用領域的資料集及三個破產領域的資料集中進行比較,觀察六種方法於信用評估及破產預測領域當中的適用性。 本研究發現TVAE在信用評估及破產預測領域當中的表現優於其它五種過採樣方法。最終,本研究進一步將實驗中的最佳深度過採樣方法 (TVAE) ,與最佳傳統過採樣方法 (ADASYN) 進行結合使用。發現以整體而言,先使用深度過採樣方法進行過採樣後,再使用傳統過採樣方法進行過採樣可以進一步獲得更低的TypeII錯誤率。zh_TW
dc.description.abstractIn the field of credit risk prediction and bankruptcy prediction, data imbalance is a common challenge due to difficulties in data collection and the characteristics of the domain. To address the issues caused by data imbalance in model predictions, the common approach currently used is to balance the dataset through traditional oversampling methods based on interpolation. In recent years, with the increasing emphasis on personal data privacy, methods have been developed to learn the distribution and the relationship between features of the original samples using deep generative models and generating synthetic datasets. This allows researchers to continue their studies without compromising individual privacy. Since the samples generated by such methods possess similar characteristics and distributions to the original samples, some researchers have attempted to apply them to solve the data imbalance problem. This approach is referred to as the deep oversampling method. Our research compares two representative deep tabular generative models (CopulaGAN and TVAE) with four representative traditional oversampling methods (SMOTE, Polynomial-fit-SMOTE, Borderline SMOTE, and ADASYN) in three credit risk and three bankruptcy datasets. The goal is to observe the performance of these six methods in the fields of credit risk and bankruptcy prediction. Our study found that TVAE outperformed the other five oversampling methods in credit risk and bankruptcy prediction domains. We further combine the best deep oversampling method (TVAE) with the best traditional oversampling method (ADASYN) in the research and found that, overall, using deep oversampling followed by applying the traditional oversampling method leads to even lower Type II Error.en_US
DC.subject過採樣zh_TW
DC.subject深度學習zh_TW
DC.subject不平衡資料集zh_TW
DC.subject生成模型zh_TW
DC.subjectOversamplingen_US
DC.subjectDeep learningen_US
DC.subjectImbalance dataseten_US
DC.subjectGenerative Modelen_US
DC.title基於深度表格生成模型的過採樣方法 於信用及破產預測領域的效能分析zh_TW
dc.language.isozh-TWzh-TW
DC.titleEffectiveness Analysis of Deep Tabular Generation-Based Oversampling Method in Credit Risk and Bankruptcy Predictionen_US
DC.type博碩士論文zh_TW
DC.typethesisen_US
DC.publisherNational Central Universityen_US

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明