本研究將文本資料(公司管理階層觀點)透過自然語言模型(NLP)處理後匯出文本向量表示,並加入檢索增強生成(RAG)的輸出結果,最後再結合傳統財務數據並匯入極限梯度提升法(XGBoost)進行企業信用評級的預測,來獲得一個兼顧準確率與可解釋力的多元分類模型。 ;Corporate credit rating serves as a critical reference for investors in the capital market to evaluate a company. Publicly available rating results enhance transparency regarding a company’s default risk and credit status, reducing information asymmetry between investors and operators, and promoting fairer transactions in the capital market.
Traditional corporate credit rating processes are time-consuming and costly. To respond to market changes promptly and efficiently, applying machine learning methods to predict corporate credit ratings has become a widely discussed research topic in recent years.
This study leverages textual data (MD&A), processed through natural language models (NLP), to generate text vector representations. It incorporates the output of Retrieval-Augmented Generation (RAG) and integrates it with traditional financial data. These inputs are then fed into the Extreme Gradient Boosting (XGBoost) algorithm to predict corporate credit ratings, aiming to develop a multi-class classification model that balances accuracy and interpretability.