隨著金融市場結構日益複雜化,機器學習技術在股票報酬率預測中的應用愈 加普及,但模型的「黑盒子」特性使決策過程難以理解。傳統的特徵重要性分析 多聚焦於單一特徵的邊際效應,忽視了特徵間交互作用對預測準確性的潛在影響。 市場中各種因子並非獨立運作,而是存在複雜的相互關係,且這些關係會隨市場 環境變化而改變。因??,發展能夠捕捉特徵交互效應並解釋其時變特性的預測方 法,對提升模型實用性具有重要意義。
本研究旨在透過 SHAP-IQ 框架深入探究股票報酬預測中的特徵交互效應, 我們以預測誤差為分析目標而非模型輸出,以提供理解模型失效原因的新視角。 建立更為全面且可解釋的預測模型。研究採用台灣股市 1991 年至 2023 年共 33 年的資料,包含價格資訊、財務數據、技術指標及市場變數等 110 個特徵。運用 XGBoost 模型結合滑動窗口技術,以前三年資料預測下一年股票報酬率,並透過 SHAP-IQ 方法量化特徵交互對預測誤差的貢獻程度,以??探討導致預測不穩定 的關鍵特徵組合。研究建立的分析框架不僅擴展了量化金融理論的知識邊界,為 投資決策者提供更可靠的模型診斷工具。;As financial markets become increasingly complex, the use of machine learning techniques in forecasting stock returns has grown in popularity. However, the inherent ""black-box"" nature of such models often hinders interpretability, limiting their practical application in financial decision-making. Conventional feature importance analyses primarily emphasize the marginal effects of individual variables, while often overlooking the potential influence of interactions among features on predictive accuracy. In reality, financial and market factors rarely operate in isolation; instead, they exhibit intricate interdependencies that may shift over time with changes in the market environment. Accordingly, developing approaches that can account for feature interactions and their temporal dynamics is essential for enhancing model robustness and explanatory power.
This study adopts an interaction-based analytical framework to examine the effects of feature interactions on stock return prediction. By focusing on prediction errors rather than model outputs, we aim to offer insights into the underlying causes of model underperformance. The empirical analysis is based on data from the Taiwan stock market, spanning the years 1991 to 2023. The dataset includes 110 variables, covering price information, financial indicators, technical metrics, and market-level features. We implement an XGBoost model combined with a rolling-window approach, using three years of historical data to predict returns in the subsequent year. Feature interaction effects on prediction error are evaluated through a decomposition method based on SHAP (SHapley Additive exPlanations), which allows for quantifying the contribution of specific interactions to model performance. The proposed framework contributes to the literature by providing a systematic approach for model diagnostics and enhancing the interpretability of machine learning forecasts in financial contexts.