博碩士論文 107423021 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:54 、訪客IP:3.16.203.27
姓名 羅聖明(Sheng-Ming Lo)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 在破產預測與信用評估領域對資料正規化與離散化的比較分析
(Comparative Analysis of Data Normalization and Discretization for Bankruptcy Prediction and Credit Scoring)
相關論文
★ 具代理人之行動匿名拍賣與付款機制★ 網路攝影機遠端連線安全性分析
★ HSDPA環境下的複合式細胞切換機制★ 樹狀結構為基礎之行動隨意網路IP位址分配機制
★ 平面環境中目標區域之偵測 - 使用行動感測網路技術★ 藍芽Scatternet上的P2P檔案分享機制
★ 交通壅塞避免之動態繞路機制★ 運用UWB提升MANET上檔案分享之效能
★ 合作學習平台對團體迷思現象及學習成效之影響–以英文字彙學習為例★ 以RFID為基礎的室內定位機制─使用虛擬標籤的經驗法則
★ 適用於實體購物情境的行動商品比價系統-使用影像辨識技術★ 信用卡網路刷卡安全性
★ DEAP:適用於行動RFID系統之高效能動態認證協定★ 在破產預測與信用評估領域對前處理方式與分類器組合的比較分析
★ 單一類別分類方法於不平衡資料集-搭配遺漏值填補和樣本選取方法★ 正規化與變數篩選在破產領域的適用性研究
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2025-7-1以後開放)
摘要(中) 在過往的破產預測以及信用評估領域中,有許多研究在前處理時進行正規化,然而大多研究僅採用單一種正規化方法進行實驗。為了瞭解正規化在破產預測與信用評估領域的適用性,本研究蒐集了四個信用資料集(Australia、Japan、Germany、Kaggle)與四個破產資料集(Bankruptcy、Japan、TEJ-Taiwan、USA),搭配四種正規化方法,minMAX、MaxAbs、Standard、Robust,並以三種不同的分類器,K-Nearest Neighbor、Logistic Regression、Support Vector Machine進行分類,期望能了解不同正規化方法對於結果的影響。另外,有鑑於近年也有研究在正規化後進行離散化,因此本研究也進一步探討是否正規化搭配離散化能夠更提升準確率並改善效能,主要採用三種離散化方法,最小化描述長度原則(Minimum Description Length Principle,MDLP)、卡方分箱法(ChiMerge)、CAIM(Class-Attribute Interdependence Maximization)。本研究發現在整體平均下,正規化方法(MaxAbs、Standard、Robust)對於AUC及Type II具有正面影響。而正規化若進一步搭配CAIM或MDLP,對於AUC及Type II會有更進一步的提升。在所有實驗組合中,Robust搭配MDLP在三種分類器都會達到最佳的AUC,而Standard搭配MDLP則會有最佳的Type II結果。
摘要(英) In the field of bankruptcy prediction and credit evaluation, many studies implemented normalization in data pre-processing, but most studies only conducted with a single normalization method. So, in order to understand the applicability of normalization in the field of bankruptcy prediction and credit evaluation. We collected four credit datasets (Australia, Japan, Germany, Kaggle) and four bankrupt datasets (Bankruptcy, Japan, TEJ-Taiwan, USA), with four normalization methods (minMAX, MaxAbs, Standard, Robust) and using three kinds of prediction models (K-Nearest Neighbor, Logistic Regression, Support Vector Machine) to examine the prediction performance of normalization. In addition, some studies have performed discretization after normalization in recent years, so this study further explores whether discretization after normalization can improve accuracy and performance, we use three differents kinds of discretization methods, Minimum Description Length Principle (MDLP), ChiMerge and Class-Attribute Interdependence Maximization (CAIM).
This study founds that under the overall average, the normalization methods (MaxAbs, Standard and Robust) has a positive effect on AUC and Type II. Moreover, if the normalization is further combine with CAIM or MDLP, the AUC and Type II will effectively improve. In all experimental combinations, Robust with MDLP will achieve the best AUC in three classifiers, and Standard with MDLP will have the best Type II result.
關鍵字(中) ★ 正規化
★ 離散化
★ 破產預測
★ 信用評估
★ 機器學習
關鍵字(英) ★ Normalization
★ Discretization
★ Bankruptcy Prediction
★ Credit Scoring
★ Machine Learning
論文目次 摘要 i
Abstract ii
誌謝 iii
目錄 iv
圖目錄 vi
表目錄 viii
一、 緒論 1
1-1 研究背景 1
1-2 研究動機 2
1-3 研究目的 3
1-4 研究架構 4
二、 文獻探討 5
2-1 過往破產預測、信用評估研究 5
2-2 資料正規化 6
2-3 資料離散化 9
2-4 分類器 15
2-5 資料平衡化 18
三、 研究設計 19
3-1 研究資料集 19
3-2 研究一 20
3-3 研究二 21
3-4 實驗參數設定、方法 22
3-5 評估指標 23
四、 實驗結果與分析 26
4-1 正規化在破產與信用領域的影響 26
4-1-1 正規化於各分類器的影響 27
4-1-2 最佳正規化方法 36
4-1-3 最佳分類器 40
4-1-4 小結 42
4-2 正規化搭配離散化在破產與信用領域的影響 43
4-2-1 正規化搭配離散化在各正規化下的影響 44
4-2-2 正規化搭配離散化下的最佳離散化方法 60
4-2-3 最佳分類器 66
4-2-4 小結 69
4-3 正規化與正規化搭配離散化效能分析 70
4-4 分析與討論 72
4-4-1 大資料集、小資料集最佳組合分析 72
4-4-2 信用資料集、破產資料集最佳組合分析 74
4-4-3 不同抽樣方法效能比較 76
4-4-4 不同的SVM Kernel對於結果的影響 78
4-4-5 應用最佳組合於TEJ-Taiwan(New)資料集 79
4-4-6 各資料集最佳結果與過往研究比較 80
五、 結論 81
5-1 結論與貢獻 81
5-2 未來研究方向與建議 83
參考文獻 84
參考文獻 [1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, and M. Lanctot, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484-489, January 2016.
[2] J. Ginsberg, M. H. Mohebbi, R. S. Patel, L. Brammer, M. S. Smolinski, and L. Brilliant, “Detecting influenza epidemics using search engine query data,” Nature, vol. 457, no. 7232, pp. 1012-1014, February 2009.
[3] T. Wu, S. Liu, J. Zhang, and Y. Xiang, “Twitter spam detection based on deep learning,” in Proceedings of Australasian Computer Science Week Multiconference, 2017, pp. 1-8.
[4] F. A. Batarseh, and E. A. Latif, “Assessing the quality of service using big data analytics: With application to healthcare,” Big Data Research, vol. 4, pp. 13-24, October 2016.
[5] S. García, J. Luengo, and F. Herrera, “Tutorial on practical tips of the most influential data preprocessing algorithms in data mining,” Knowledge-Based Systems, vol. 98, pp. 1-29, April 2016.
[6] C.-F. Tsai, “Feature selection in bankruptcy prediction,” Knowledge-Based Systems, vol. 22, no. 2, pp. 120-127, March 2009.
[7] L. Zhou, “Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods,” Knowledge-Based Systems, vol. 41, pp. 16-25, March 2013.
[8] J. Han, J. Pei, and M. Kamber, Data mining: Concepts and techniques: Elsevier, 2011.
[9] D. Liang, C.-F. Tsai, A.-J. Dai, and W. Eberle, “A novel classifier ensemble approach for financial distress prediction,” Knowledge and Information Systems, vol. 54, no. 2, pp. 437-462, May 2018.
[10] A. Kalousis, J. Prados, and M. Hilario, “Stability of feature selection algorithms: A study on high-dimensional spaces,” Knowledge and Information Systems, vol. 12, no. 1, pp. 95-116, May 2007.
[11] J. Catlett, “On changing continuous attributes into ordered discrete attributes,” in Proceedings of European Working Session on Learning, 1991, pp. 164-178.
[12] T. Ahmad, and M. N. Aziz, “Data preprocessing and feature selection for machine learning intrusion detection systems,” ICIC Express Letter, vol. 13, no. 2, pp. 93-101, February 2019.
[13] J. Zhou, W. Li, J. Wang, S. Ding, and C. Xia, “Default prediction in p2p lending from high-dimensional data based on machine learning,” Physica A: Statistical Mechanics and its Applications, vol. 534, pp. 1-11, November 2019.
[14] E. I. Altman, “Financial ratios, discriminant analysis and the prediction of corporate bankruptcy,” The Journal of Finance, vol. 23, no. 4, pp. 589-609, September 1968.
[15] Y. E. Orgler, “A credit scoring model for commercial loans,” Journal of Money, Credit and Banking, vol. 2, no. 4, pp. 435-445, November 1970.
[16] D. Liang, C.-F. Tsai, and H.-T. Wu, “The effect of feature selection on financial distress prediction,” Knowledge-Based Systems, vol. 73, pp. 289-297, January 2015.
[17] W. C. Lin, Y. H. Lu, and C. F. Tsai, “Feature selection in single and ensemble learning‐based bankruptcy prediction models,” Expert Systems, vol. 36, no. 1, pp. 1-8, August 2019.
[18] Y.-P. Huang, and M.-F. Yen, “A new perspective of performance comparison among machine learning algorithms for financial distress prediction,” Applied Soft Computing, vol. 83, pp. 1-14, October 2019.
[19] D. Liang, C.-C. Lu, C.-F. Tsai, and G.-A. Shih, “Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study,” European Journal of Operational Research, vol. 252, no. 2, pp. 561-572, July 2016.
[20] A. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodal biometric systems,” Pattern Recognition, vol. 38, no. 12, pp. 2270-2285, December 2005.
[21] L. Al Shalabi, and Z. Shaaban, “Normalization as a preprocessing engine for data mining and the approach of preference matrix,” in Proceedings of International Conference on Dependability of Computer Systems, 2006, pp. 207-214.
[22] Z. Liu, “A method of svm with normalization in intrusion detection,” Procedia Environmental Sciences, vol. 11, pp. 256-262, December 2011.
[23] L. Latha, and S. Thangasamy, “Efficient approach to normalization of multimodal biometric scores,” International Journal of Computer Applications, vol. 32, no. 10, pp. 57-64, October 2011.
[24] J. Cabrera, A. Dionisio, and G. Solano, “Lung cancer classification tool using microarray data and support vector machines,” in Proceedings of Information, Intelligence, Systems and Applications (IISA), 2015, pp. 1-6.
[25] M. Salehi, and M. D. Pour, “Bankruptcy prediction of listed companies on the tehran stock exchange,” International Journal of Law and Management, vol. 58, pp. 545-561, September 2016.
[26] G. Kumar, and S. Roy, “Development of hybrid boosting technique for bankruptcy prediction,” in Proceedings of International Conference on Information Technology (ICIT), 2016, pp. 248-253.
[27] T. Le, M. T. Vo, B. Vo, M. Y. Lee, and S. W. Baik, “A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction,” Complexity, vol. 2019, pp. 1-12, August 2019.
[28] S. Garcia, J. Luengo, J. A. Sáez, V. Lopez, and F. Herrera, “A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, pp. 734-750, February 2012.
[29] H. Liu, F. Hussain, C. L. Tan, and M. Dash, “Discretization: An enabling technique,” Data Mining and Knowledge Discovery, vol. 6, no. 4, pp. 393-423, October 2002.
[30] J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in Proceedings of Machine Learning, 1995, pp. 194-202.
[31] D. M. Maslove, T. Podchiyska, and H. J. Lowe, “Discretization of continuous features in clinical datasets,” Journal of the American Medical Informatics Association, vol. 20, no. 3, pp. 544-553, May 2013.
[32] L. A. Kurgan, and K. J. Cios, “Caim discretization algorithm,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 2, pp. 145-153, February 2004.
[33] X. Wu, D. Yang, W. Zhang, and S. Zhang, “A hybrid ensemble model for corporate bankruptcy prediction based on feature engineering method,” International Journal of Information and Communication Sciences, vol. 4, no. 3, pp. 52-58, January 2019.
[34] J. Huang, H. Wang, and G. Kochenberger, “Distressed chinese firm prediction with discretized data,” Management Decision, vol. 55, no. 5, pp. 786-807, June 2017.
[35] C.-F. Tsai, and Y.-C. Chen, “The optimal combination of feature selection and data discretization: An empirical study,” Information Sciences, vol. 505, pp. 282-293, December 2019.
[36] R. Kerber, “Chimerge: Discretization of numeric attributes,” in Proceedings of The Tenth National Conference on Artificial Intelligence, 1992, pp. 123-128.
[37] U. Fayyad, and K. Irani, “Multi-interval discretization of continuous-valued attributes for classification learning,” in Proceedings of International Joint Conference on Artificial Intelligence, 1993, pp. 1022-1027.
[38] S. Balcaen, and H. Ooghe, “35 years of studies on business failure: An overview of the classic statistical methodologies and their related problems,” The British Accounting Review, vol. 38, no. 1, pp. 63-93, September 2006.
[39] H. A. Alaka, L. O. Oyedele, H. A. Owolabi, V. Kumar, S. O. Ajayi, O. O. Akinade, and M. Bilal, “Systematic review of bankruptcy prediction models: Towards a framework for tool selection,” Expert Systems with Applications, vol. 94, pp. 164-184, March 2018.
[40] W. Zhang, “Machine learning approaches to predicting company bankruptcy,” Journal of Financial Risk Management, vol. 6, no. 04, pp. 364-374, January 2017.
[41] Y. Li, and Y. Wang, “Machine learning methods of bankruptcy prediction using accounting ratios,” Open Journal of Business and Management, vol. 6, pp. 1-20, January 2017.
[42] T. Cover, and P. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, January 1967.
[43] A. Navlani. "Knn classification using scikit-learn," 20200520; https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn.
[44] D. R. Cox, “The regression analysis of binary sequences,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 20, no. 2, pp. 215-232, July 1958.
[45] V. Vapnik, The nature of statistical learning theory: Springer Science & Business Media, 2013.
[46] L. Paula Branco, and R. Ribeiro, “A survey of predictive modeling on imbalanced domains,” ACM Computing Surveys, vol. 49, pp. 1-50, August 2016.
[47] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, June 2002.
[48] T. M. Padmaja, N. Dhulipalla, P. R. Krishna, R. S. Bapi, and A. Laha, “An unbalanced data classification model using hybrid sampling technique for fraud detection,” in Proceedings of International Conference on Pattern Recognition and Machine Intelligence, 2007, pp. 341-348.
[49] Y. Sui, Y. Wei, and D. Zhao, “Computer-aided lung nodule recognition by svm classifier based on combination of random undersampling and smote,” Computational and Mathematical Methods in Medicine, vol. 2015, pp. 1-13, April 2015.
[50] L. Zhou, and K. K. Lai, “Adaboost models for corporate bankruptcy prediction with missing data,” Computational Economics, vol. 50, no. 1, pp. 69-94, April 2017.
[51] D. L. Olson, D. Delen, and Y. Meng, “Comparative analysis of data mining methods for bankruptcy prediction,” Decision Support Systems, vol. 52, no. 2, pp. 464-473, January 2012.
[52] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, and V. Dubourg, “Scikit-learn: Machine learning in python,” The Journal of Machine Learning Research, vol. 12, pp. 2825-2830, October 2011.
[53] T. Fawcett, “An introduction to roc analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, June 2006.
[54] W. Bao, N. Lianju, and K. Yue, “Integration of unsupervised and supervised machine learning algorithms for credit risk assessment,” Expert Systems with Applications, vol. 128, pp. 301-315, February 2019.
[55] C. R. Ivanescu, “Statistical learning and benchmarking: Credit approval using artificial neural networks,” Egyptian Computer Science Journal, vol. 43, no. 1, pp. 26-32, January 2019.
[56] C. Guotai, M. Z. Abedin, and F. E. Moula, “Modeling credit approval data with neural networks: An experimental investigation and optimization,” Journal of Business Economics and Management, vol. 18, no. 2, pp. 224-240, January 2017.
[57] T.-T. Wong, and S.-J. Yeh, “Weighted random forests for evaluating financial credit risk,” in Proceedings of Engineering and Technology Innovation, 2019, pp. 1-9.
[58] D. Boughaci, and A. A. Alkhawaldeh, “Appropriate machine learning techniques for credit scoring and bankruptcy prediction in banking and finance: A comparative study,” Risk and Decision Analysis, vol. 8, no. 2, pp. 15-24, May 2020.
[59] M. Wang, H. Chen, H. Li, Z. Cai, X. Zhao, C. Tong, J. Li, and X. Xu, “Grey wolf optimization evolving kernel extreme learning machine: Application to bankruptcy prediction,” Engineering Applications of Artificial Intelligence, vol. 63, pp. 54-68, May 2017.
[60] T. Le, B. Vo, H. Fujita, N.-T. Nguyen, and S. W. Baik, “A fast and accurate approach for bankruptcy forecasting using squared logistics loss with gpu-based extreme gradient boosting,” Information Sciences, vol. 494, pp. 294-310, April 2019.
[61] M.-S. Cheng, “Build machine learning module of bankrupt prediction,” National Central University Master′s Thesis, July 2016.
指導教授 蘇坤良(Kuen-Liang Sue) 審核日期 2020-7-29
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明