摘要: | 互聯網銀行業務發展迅猛,並且主要利潤來源於中小企業(medium-sized enterprises , SMEs)。然而中小企業違約風險較高,因此需要構建風險識別模型來識別企業信貸違約。該模型應具備:提前預測能力使銀行對不良貸款行為有快速回應能力;使用公開信用數據而不是傳統的財務數據;保證在樣本不平衡率較高水準下仍能保持較高的精確率(Recall)。本研究通過使用公開可獲得的外部風險事件時序數據和橫截面數據構建了一個兩階段模型來預測中小企業的違約風險。第一階段設計了RS-Ripper演算法 ,該演算法改進了Prefix-SPAN演算法提取風險事件的頻繁項,並構建了基於規則的分類器。第二階段通過使用橫截面數據構建LightGBM提升模型精確度(Recall)。該模型在違約預測方面平均提前預測天數達350天,在違約樣本和非違約樣本比例為1:1情況下查全率(Recall),查準率(Precision),準確率 (Accuracy)和 AUC分別為0.92, 0.911, 0.915, 0.956, 在違約樣本和非違約樣本比例為1:16情況下查全率(Recall),查準率(Precision),準確率(Accuracy)和 AUC分別為0.751, 0.618, 0.958, 0.962。;Online banks receive much publicity, and they profit by loaning to small and medium-sized enterprises (SMEs). However, a risk detection model is required to reduce the risk involved in nonperforming loans. This task involves three requirements: predicting the future to enable banks to react to bad loans, considering publicly available credit data instead of financial reports or managers’ personal records, and ensuring that the model has a large area under the receiver operating curve (AUC) and high recall and precision when the data are highly skewed. This study proposes a two-stage model to predict the risk of SME default by using sequences of risk events available on public websites. In the first stage, 1) revised prefix-projected sequential pattern mining and repeated incremental pruning to reduce error are combined and 2) sequences of events are used as input to generate a rule-based classifier with consistent performance as imbalanced increases. The method is combined with LightGBM to increase recall. On average, the proposed method can provide banks with 350 days of early warning. In an ideal scenario, where the number of defects and normal profiles are the same, the recall, precision, accuracy, and AUC of the method can reach 0.92, 0.911, 0.915, 0.956, respectively. In a near-worst-case scenario, with a 1:16 imbalance ratio, the recall, precision, accuracy, and AUC can reach 0.751, 0.618, 0.958, and 0.962, respectively. |