博碩士論文 105423049 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:24 、訪客IP:3.149.251.154
姓名 黃星瑋(Hsing-Wei Huang)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 正規化與變數篩選在破產領域的適用性研究
相關論文
★ 具代理人之行動匿名拍賣與付款機制★ 網路攝影機遠端連線安全性分析
★ HSDPA環境下的複合式細胞切換機制★ 樹狀結構為基礎之行動隨意網路IP位址分配機制
★ 平面環境中目標區域之偵測 - 使用行動感測網路技術★ 藍芽Scatternet上的P2P檔案分享機制
★ 交通壅塞避免之動態繞路機制★ 運用UWB提升MANET上檔案分享之效能
★ 合作學習平台對團體迷思現象及學習成效之影響–以英文字彙學習為例★ 以RFID為基礎的室內定位機制─使用虛擬標籤的經驗法則
★ 適用於實體購物情境的行動商品比價系統-使用影像辨識技術★ 信用卡網路刷卡安全性
★ DEAP:適用於行動RFID系統之高效能動態認證協定★ 在破產預測與信用評估領域對前處理方式與分類器組合的比較分析
★ 單一類別分類方法於不平衡資料集-搭配遺漏值填補和樣本選取方法★ 分群式前處理方法於類別不平衡問題之研究
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 在破產分析的領域中,一定會遇到類別不平衡的問題。因為在現實世界中,破產公司的數量一定會比非破產公司少,在過去都是依靠傳統的統計方法或是個人直覺,來判斷是否要將金額借款給其他公司,但這樣往往令公司面臨破產的危機。許多學者紛紛開始使用機器學習法來解決諸如此類的問題,希望能提供銀行公司一個準確的分類模型,讓分類器來自行判斷是否要將金錢借出,如此一來便能減少公司破產的機率。
許多機器學習演算法在建立模型時,都會進行內建的正規化,因為正規化不但能減少分類器的訓練時間,也能讓使資料更容易閱讀,許多學者在進行研究時,都會註明該篇研究是否有將破產公司資料集進行正規化,但卻沒有研究是關於,在破產領域裡,是否正規化一定能讓分類結果提升,又或是不同的類別不平衡比率的資料集和變數篩選的方法,是否會影響正規化的適用性。
本研究將台灣與大陸這兩份真實的資料,模擬成五種類別不平衡的比率,分別為 1、2、5、10 和 20,再比較正規化前與正規化後,是否會對不同的分類器而有不同的影響,藉此探討在破產領域裡,正規化在不同不平衡比率的適用性,此外本研究也會使用三種變數篩選的方法,分別為 GA、CART 與 Information Gain 來探討變數篩選在不同不平衡的比率下對正規化的影響,希望能藉此了解正規化是否真的適用於破產領域。
摘要(英) In the field of bankruptcy prediction, it will definitely to face the class imbalance. Because in the real world, the amount of bankruptcy companies will be actually less than the non-bankruptcy companies. In the past, it was all relying on traditional statistical methods or personal intuition to determine whether to lend the money to other companies or not, but this often put the company in a crisis of bankruptcy. Many researches have begun to use machine learning to solve such problems, hoping to provide an accurate classification model for bank companies.
Many scholars will indicate whether their study has normalized the bankruptcy data or not. However, no research concerned about whether normalize can improve the classification results. In our study, we make the two real data into five categories of imbalances ratios: 1,2,5,10,20 respectively. By this way, we will know the relation of imbalance ratios and normalize. Furthermore, our study will also consider about feature selection. Hopes to learn whether normalization really applies to bankruptcy prediction or not.
關鍵字(中) ★ 機器學習
★ 破產分析
★ 正規化
★ 類別不平衡
★ 變數篩選
關鍵字(英) ★ Machine Learning
★ Bankruptcy Prediction
★ Normalize
★ Class Imbalance
★ Feature Selection
論文目次 摘要... i
Abstract ii
誌謝... iii
目錄... iv
圖目錄... vi
表目錄... viii
一、緒論... 1
1-1 研究背景... 1
1-2 研究動機... 2
1-3 研究目的... 3
1-4 研究架構... 4
二、文獻探討... 5
2-1 類別不平衡... 5
2-2 解決類別不平衡問題... 6
2-2-1 減少多數法... 6
2-2-2 增加少數法... 7
2-3 分類器... 7
2-3-1 Naive Bayes單純貝式分類器... 8
2-3-2 支援向量機(Support Vector Machine, SVM)... 8
2-3-3 決策樹(Decision Tree, DT)... 10
2-3-4 類神經網路(Artificial Neural Network, ANN)... 12
2-4 變數篩選(Feature Selection, FS)... 13
2-4-2 基因演算法(Genetic Algorithm, GA)... 14
2-4-3 資訊獲利(Information Gain)... 15
2-4-4 CART決策樹(Decision Tree CART, DT)... 17
2-5 正規化... 17
2-6 評估指標... 19
2-6-1 AUC(Area Under ROC Curve)... 20
2-6-2 Type II error 21
2-7 相關文獻... 22
2-8 變數篩選相關文獻摘要與比較... 23
三、研究方法... 26
3-1 資料集... 26
3-2 研究一 正規化的有無在不同不平衡比率下的影響... 27
3-2-1 10折交叉驗算... 28
3-2-2 衡量準則... 29
3-3 研究二 變數篩選與正規化的探討與研究... 30
四、實驗結果... 32
4-1 正規化的有無在不同不平衡比率下的影響... 35
4-1-1 大陸資料與正規化的比較和分析... 36
4-1-2 台灣資料與正規化的比較和分析... 43
4-1-3 類別不平衡比率對正規化的小結論... 50
4-2 變數篩選與正規化的探討與研究... 52
4-2-1 變數篩選與正規化的順序研究... 53
4-2-2 有無變數篩選在不同不平衡比率下對正規化的影響... 55
4-3 類別平衡與原始資料的分類結果比較... 59
4-4 最佳前處理方式驗證... 62
五、結論... 66
5-1 結論與貢獻... 66
5-2 後續研究... 68
六、參考資料... 69
七、附錄... 73
參考文獻 [1]. Tsai, C. F., Lu, Y. H., Hung, Y. C., & Yen, D. C. (2016). Intangible assets evaluation: The machine learning perspective. Neurocomputing, 175, 110-120.
[2]. Olson, D. L., Delen, D., & Meng, Y. (2012). Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems, 52(2), 464-473.
[3]. Koutanaei, F. N., Sajedi, H., & Khanbabaei, M. (2015). A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. Journal of Retailing and Consumer Services, 27, 11-23.
[4]. Zhou, L., Lu, D., & Fujita, H. (2015). The performance of corporate financial distress prediction models with features selection guided by domain knowledge and data mining approaches. Knowledge-Based Systems, 85, 52-61.
[5]. Zhou, L. (2013). Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods. Knowledge-Based Systems, 41, 16-25.
[6]. Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1), 20-29
[7]. Kim, H. J., Jo, N. O., & Shin, K. S. (2016). Optimization of cluster-based evolutionary undersampling for the artificial neural networks in corporate bankruptcy prediction. Expert Systems with Applications, 59, 226-234.
[8]. Piri, S., Delen, D., & Liu, T. (2017). A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decision Support Systems.
[9]. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
[10]. Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83, 405-417.
[11]. Zhou, L., Lai, K. K., & Yen, J. (2014). Bankruptcy prediction using SVM models with a new approach to combine features selection and parameter optimisation. International Journal of Systems Science, 45(3), 241-253.
[12]. Zanaty, E. A. (2012). Support vector machines (SVMs) versus multilayer perception (MLP) in data classification. Egyptian Informatics Journal, 13(3), 177-183.
[13]. Tsai, C. F., Lu, Y. H., Hung, Y. C., & Yen, D. C. (2016). Intangible assets evaluation: The machine learning perspective. Neurocomputing, 175, 110-120.
[14]. Saeys, Y., Inza, I., & Larranaga, P. (2007). A review of feature selection techniques in bioinformatics. bioinformatics, 23(19), 2507-2517.
[15]. Mafarja, M., & Mirjalili, S. (2018). Whale optimization approaches for wrapper feature selection. Applied Soft Computing, 62, 441-453.
[16]. Lin, F., Liang, D., Yeh, C. C., & Huang, J. C. (2014). Novel feature selection methods to financial distress prediction. Expert Systems with Applications, 41(5), 2472-2483.
[17]. Tsai, C. F. (2009). Feature selection in bankruptcy prediction. Knowledge-Based Systems, 22(2), 120-127.
[18]. Gordini, N. (2014). A genetic algorithm approach for SMEs bankruptcy prediction: Empirical evidence from Italy. Expert Systems with Applications, 41(14), 6433-6445.
[19]. Tsai, C. F., Eberle, W., & Chu, C. Y. (2013). Genetic algorithms in feature and instance selection. Knowledge-Based Systems, 39, 240-247.
[20]. Soufan, O., Kleftogiannis, D., Kalnis, P., & Bajic, V. B. (2015). DWFS: a wrapper feature selection tool based on a parallel genetic algorithm. PloS one, 10(2), e0117988.
[21]. Chen, H., Jiang, W., Li, C., & Li, R. (2013). A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm. Mathematical problems in Engineering, 2013.
[22]. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463-484.
[23]. Liu, X. Y., & Zhou, Z. H. (2013). Ensemble methods for class imbalance learning. Imbalanced Learning: Foundations, Algorithms, and Applications, 61-82.
[24]. Olson, D. L., Delen, D., & Meng, Y. (2012). Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems, 52(2), 464-473.
[25]. Liang, D., Tsai, C. F., & Wu, H. T. (2015). The effect of feature selection on financial distress prediction. Knowledge-Based Systems, 73, 289-297.
[26]. Zi?ba, M., Tomczak, S. K., & Tomczak, J. M. (2016). Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Systems with Applications, 58, 93-101.
[27]. Jadhav, S., He, H., & Jenkins, K. (2018). Information Gain Directed Genetic Algorithm Wrapper Feature selection for Credit Rating. Applied Soft Computing.
[28]. Naseriparsa, M., Bidgoli, A. M., & Varaee, T. (2014). A hybrid feature selection method to improve performance of a group of classification algorithms. arXiv preprint arXiv:1403.2372.
[29]. Yoo, J. K. (2018). Partial least squares fusing unsupervised learning. Chemometrics and Intelligent Laboratory Systems, 175, 82-86.
[30]. Lopez, V., Fernandez, A., Garcia, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113-141.
[31]. Zhou, L. (2013). Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods. Knowledge-Based Systems, 41, 16-25.
[32]. Liang, D., Lu, C. C., Tsai, C. F., & Shih, G. A. (2016). Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study. European Journal of Operational Research, 252(2), 561-572.
[33]. Brown, I. (2012). An experimental comparison of classification techniques for imbalanced credit scoring data sets using SASO Enterprise Miner. In Proceedings of SAS Global Forum.
[34]. Lee, Y. C. (2007). Application of support vector machines to corporate credit rating prediction. Expert Systems with Applications, 33(1), 67-74.
[35]. Garcia, V., Sanchez, J. S., & Mollineda, R. A. (2012). On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems, 25(1), 13-21.
[36]. Hosmer DW, Lemeshow S (2000). Applied logistic regression, 2nd ed. Wiley, 156-164
[37]. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182.
[38]. Murphy, K. P. (2006). Naive bayes classifiers. University of British Columbia, 18.
[39]. Liu, H., Motoda, H., Setiono, R., & Zhao, Z. (2010, May). Feature selection: An ever evolving frontier in data mining. In Feature Selection in Data Mining (pp. 4-13).
[40]. Elrahman, S. M. A., & Abraham, A. (2013). A review of class imbalance problem. Journal of Network and Innovative Computing, 1(2013), 332-340.
[41]. Wang, G., Ma, J., Huang, L., & Xu, K. (2012). Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Systems, 26, 61-68.
[42]. Kumar, G., & Roy, S. (2016, December). Development of hybrid boosting technique for bankruptcy prediction. In Information Technology (ICIT), 2016 International Conference on (pp. 248-253). IEEE.
[43]. Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
[44]. Al Shalabi, L., & Shaaban, Z. (2006, May). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In Dependability of Computer Systems, 2006. DepCos-RELCOMEX′06. International Conference on (pp. 207-214). IEEE.
指導教授 蘇坤良 審核日期 2018-7-23
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明