在破產分析的領域中,一定會遇到類別不平衡的問題。因為在現實世界中,破產公司的數量一定會比非破產公司少,在過去都是依靠傳統的統計方法或是個人直覺,來判斷是否要將金額借款給其他公司,但這樣往往令公司面臨破產的危機。許多學者紛紛開始使用機器學習法來解決諸如此類的問題,希望能提供銀行公司一個準確的分類模型,讓分類器來自行判斷是否要將金錢借出,如此一來便能減少公司破產的機率。 許多機器學習演算法在建立模型時,都會進行內建的正規化,因為正規化不但能減少分類器的訓練時間,也能讓使資料更容易閱讀,許多學者在進行研究時,都會註明該篇研究是否有將破產公司資料集進行正規化,但卻沒有研究是關於,在破產領域裡,是否正規化一定能讓分類結果提升,又或是不同的類別不平衡比率的資料集和變數篩選的方法,是否會影響正規化的適用性。 本研究將台灣與大陸這兩份真實的資料,模擬成五種類別不平衡的比率,分別為 1、2、5、10 和 20,再比較正規化前與正規化後,是否會對不同的分類器而有不同的影響,藉此探討在破產領域裡,正規化在不同不平衡比率的適用性,此外本研究也會使用三種變數篩選的方法,分別為 GA、CART 與 Information Gain 來探討變數篩選在不同不平衡的比率下對正規化的影響,希望能藉此了解正規化是否真的適用於破產領域。;In the field of bankruptcy prediction, it will definitely to face the class imbalance. Because in the real world, the amount of bankruptcy companies will be actually less than the non-bankruptcy companies. In the past, it was all relying on traditional statistical methods or personal intuition to determine whether to lend the money to other companies or not, but this often put the company in a crisis of bankruptcy. Many researches have begun to use machine learning to solve such problems, hoping to provide an accurate classification model for bank companies. Many scholars will indicate whether their study has normalized the bankruptcy data or not. However, no research concerned about whether normalize can improve the classification results. In our study, we make the two real data into five categories of imbalances ratios: 1,2,5,10,20 respectively. By this way, we will know the relation of imbalance ratios and normalize. Furthermore, our study will also consider about feature selection. Hopes to learn whether normalization really applies to bankruptcy prediction or not.