English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 80990/80990 (100%)
造訪人次 : 41268842      線上人數 : 154
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋


    請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/93266


    題名: 基於深度表格生成模型的過採樣方法 於信用及破產預測領域的效能分析;Effectiveness Analysis of Deep Tabular Generation-Based Oversampling Method in Credit Risk and Bankruptcy Prediction
    作者: 陳信瑋;Chen, Hsin-Wei
    貢獻者: 資訊管理學系
    關鍵詞: 過採樣;深度學習;不平衡資料集;生成模型;Oversampling;Deep learning;Imbalance dataset;Generative Model
    日期: 2023-07-24
    上傳時間: 2024-09-19 16:51:09 (UTC+8)
    出版者: 國立中央大學
    摘要: 信用評估及破產預測領域中,由於資料收集的困難與領域特性,經常面臨到資料不平衡的狀況。為解決資料不平衡對於模型預測造成的問題,目前常見的處理方式為使用基於內插法的傳統過採樣方法,對資料集進行平衡。近年來,隨著個人資料隱私的被重視,逐漸發展出使用生成模型學習原始資料集分佈與特徵,並生成合成資料集的技術。該技術得以讓研究學者使用合成的資料集,在不洩漏個人隱私的情況下繼續進行研究。由於該類技術所生成的樣本具有類似於原始樣本特徵與分布的特性,因此有學者嘗試將其應用於解決資料不平衡的問題。
    本研究將使用兩種具代表性的深度表格生成模型 (CopulaGAN與TVAE) 作為深度過採樣的代表方法,並與四種具代表性的傳統過採樣方法 (SMOTE、polynomial-fit-SMOTE、Borderline SMOTE與ADASYN),在所蒐集三個信用領域的資料集及三個破產領域的資料集中進行比較,觀察六種方法於信用評估及破產預測領域當中的適用性。
    本研究發現TVAE在信用評估及破產預測領域當中的表現優於其它五種過採樣方法。最終,本研究進一步將實驗中的最佳深度過採樣方法 (TVAE) ,與最佳傳統過採樣方法 (ADASYN) 進行結合使用。發現以整體而言,先使用深度過採樣方法進行過採樣後,再使用傳統過採樣方法進行過採樣可以進一步獲得更低的TypeII錯誤率。
    ;In the field of credit risk prediction and bankruptcy prediction, data imbalance is a common challenge due to difficulties in data collection and the characteristics of the domain. To address the issues caused by data imbalance in model predictions, the common approach currently used is to balance the dataset through traditional oversampling methods based on interpolation. In recent years, with the increasing emphasis on personal data privacy, methods have been developed to learn the distribution and the relationship between features of the original samples using deep generative models and generating synthetic datasets. This allows researchers to continue their studies without compromising individual privacy. Since the samples generated by such methods possess similar characteristics and distributions to the original samples, some researchers have attempted to apply them to solve the data imbalance problem. This approach is referred to as the deep oversampling method.
    Our research compares two representative deep tabular generative models (CopulaGAN and TVAE) with four representative traditional oversampling methods (SMOTE, Polynomial-fit-SMOTE, Borderline SMOTE, and ADASYN) in three credit risk and three bankruptcy datasets. The goal is to observe the performance of these six methods in the fields of credit risk and bankruptcy prediction. Our study found that TVAE outperformed the other five oversampling methods in credit risk and bankruptcy prediction domains. We further combine the best deep oversampling method (TVAE) with the best traditional oversampling method (ADASYN) in the research and found that, overall, using deep oversampling followed by applying the traditional oversampling method leads to even lower Type II Error.
    顯示於類別:[資訊管理研究所] 博碩士論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML14檢視/開啟


    在NCUIR中所有的資料項目都受到原著作權保護.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明