摘要: | 財務危機預測問題(Financial crisis prediction preblem)已被廣泛地討論,不論是分類器的選擇、特徵選取的方法,抑或是各種分類器的組合使用,都是全球金融圈及學術界討論的方向。 以往大部分的學者研究FCP時均使用靜態的N-Fold模型分析FCP問題,沒有考慮到FCP問題是有Time series的特性,在Data mining領域已對具有Time series特性的問題作討論,且近年來開始有學者使用Time series模型分析FCP問題。 此外以往在台灣地區的FCP研究多直接使用台灣經濟新報(TEJ)對危機公司的分類,但我們認為僅「有實際造成損失」之危機公司才是需要被預測出來的,其餘屬於「無實際造成損失」之危機公司可以用來作為Training set的一員,幫助找出「有實際造成損失」之危機公司,本身並不需加入Type I error的計算。 我們用實驗證明我們所提出的Modeling strategy對於N-Fold模型,以及漸進式模型都可以提升模型的準確率,並且發現在台灣資料集下使用我們所推薦的Modeling strategy並用漸進式模型建模時,傳統認為機器學習準確率顯著優於統計方法的觀點無法成立,我們也試圖找出該觀點無法成立的原因,發現不同Modeling strategy所占原因不高,漸進式模型才是主要原因。;Financial crisis prediction problem (FCP problem) has been important and widely studied, many different classifiers, feature selection methods even the ensemble learning have been discussed. In the past, most researcher use static model to solve FCP problem, which is N-Fold model, they did not consider that FCP problem has time series characteristics. As in the data mining domain, some scholars have been discussed the issue about time series problem must use time series model to solve. Recently, some researchers began to ues time series model on FCP problem. On the other side, the FCP problem studied in Taiwan, the definition of crisis was using the TEJ database as a source, but we find that only the firms that cause actual losses should be consider as crisis firms, other kinds of crisis firms can only be used as training data for we to find the actual losses firm, when computing the Type I error this kinds of crisis firms don’t need to be counted, this is our proposed modeling strategy (PMS). Finally, our experiment result shows that our PMS outperform the traditional modeling strategy, both in N-fold model and in Time series model. And for Taiwan dataset, using our PMS in Time series model the traditional wisdom, which is “machine learning approaches outperform the statistical methods”, does not hold. We are also trying to figure out why it doesn′t still hold, the experiment result show that the main reason is not the different modeling strategy, the main reason is Time series model. |