dc.description.abstract | In this paper, we consider the relationships among cross-sectional data factors with three types of data arrangement: single, parallel, and panel. We use three machine learning models—Gradient Boosting, XGBoost, and Random Forest—along with LSTM and Random Effects (RE) model for stock market predictions. Fundamental, technical, and chip analysis data are selected as features, with log returns as the target variable. Based on the results of the model prediction, we examine the investment performance of portfolios, including traditional Sharp, selected equal weight (sEW) and selected Sharp (sSharp) methods. The analysis shows that the data arrangement significantly impacts the portfolio performance. The single arrangement performs better for some models and periods, whereas the parallel and panel arrangements are more suitable for others. Gradient Boosting performs well with single data but less so with parallel data. Random Forest shows robust performance across both single and parallel arrangements, with parallel generally outperforming single. XGBoost excels in the short-term single arrangement but is less effective in other configurations. LSTM is better suited for the short-term investment period with the parallel arrangement, while the panel arrangement performs better for the monthly period. The Random Effect method proves more effective for long-term investment periods than short-term ones. Our study highlights the importance of data arrangements and weighting methods to improve the predictive accuracy and performance of machine learning models in financial markets. | en_US |