摘要: | 在過往的許多研究已經對補值有許多的討論,但破產預測以及信用評估領域中被討論的較少,大多研究僅採用UCI資料集進行實驗或者僅在訓練資料集探討補值效果,另一方面,深度神經網路用於補值或進行分類預測在過往的研究中是較少被提及的,仍是一個未知且需要被探討的議題。 為了瞭解補值在破產預測與信用評估領域的適用性,本研究蒐集了五個信用資料集(澳洲信用、日本信用、德國信用、Kaggle和pakdd)與四個破產資料集(Bankruptcy、日本破產、TEJ台灣破產、美國破產),搭配四種補值方法,K-鄰近算法(K-nearest neighbor)、隨機森林(Random Forest)、鏈式方程多重插補法(Multivariate Imputation by Chained Equations)和深度神經網路(Deep Neural Network),並以四種不同的分類器,支持向量機(Support Vector Machine)、隨機森林(Random Forest)、深度神經網路(Deep Neural Network)和深度信賴網路搭疊深度神經網路(Deep Belief Network Stacked Deep Neural Network)進行分類,探討不同補值方法對於結果的影響。另外,進一步探討補值搭配正規化能否進一步提升預測的準確率。 本研究發現在整體平均下,補值可以提升分類的準確率,在補值搭配正規化能有效提升類神經網路的效果,並且對比神經網路與機器學習,經過正規化後神經網路顯著優於機器學習。經過正規化後所有實驗組合中,DBN-DNN都為最佳的分類器,在小遺漏時搭配RF補值可以獲得最佳AUC、搭配MICE則能得最佳的Type II;在大遺漏時搭配RF補值即為最佳AUC和Type II結果。 ;In past studies, many had discussed about missing value imputation but most of them did experiments on UCI datasets or only on training datasets. Seldom does a study discuss about missing value on bankruptcy prediction and credit scoring. On the other hand, using deep neural network for imputation or classification prediction is rarely mentioned in past research, and is still an unknown and need to be discussed. The applicability of missing value imputation on bankruptcy prediction and credit scoring is analyzed by using five credit datasets (Australia, Japan, Germany, Kaggle and pakdd) and four bankruptcy datasets (Bankruptcy, Japan Bankruptcy, TEJ Taiwan Bankruptcy and US Bankruptcy) with four imputation methods, including Deep Neural Network, K-nearest neighbor, Random Forest and Multivariate Imputation by Chained Equations, and at last using four different classifiers: Support Vector Machine, Random Forest, Deep Neural Network and Deep Belief Network Stacked Deep Neural Network respectively to discuss the effect of imputation methods on outcomes. Furthermore, this experiment also explore the possibility on whether data normalization will improve prediction accuracy. This experiment finds out that on average, imputation improves the classification accuracy, and data normalization along with imputation, can elaborate the effect of artificial neural network. Besides, in comparison with machine learning, neural network performs much better after data normalization. In every pair of experiments after normalization, DBN-DNN outstands other classifiers. When missing rate is low, the combination with Random Forest outputs the best AUC, with MICE on the other hand, gets the lowest type II error. When missing rate is high, the combination with Random Forest takes the first place for the best AUC and the lowest type II error. |