本研究結果結果顯示連續型資料相較單日資料有更好的表現。針對財經領域詞彙預訓練的模型相對一般的文字特徵模型有更好的表現。探討新聞內容、新聞標題、新聞內容+新聞標題,三者的預測水準差異,實驗結果顯示預測水準差異不大。探討移除不顯著的資料標籤對預測水準有更好的表現。探討新聞內容結合當日股價標籤進行預測並且近一步迴歸分析計算出RMSE,得知連續五天的RMSE值較小,代表預測股價跟實際股價在連續五天的情況差異較小,並且從連續五天的不同文字特徵搭配不同機器學習深度學習模型比較,得知在使用FinBERT萃取平均下的RMSE值最小。以及RF分類器相對其他分類器在計算RMSE下有更好的表現。 ;Stock price prediction plays a crucial role in the financial market and is a significant research topic. It has a significant impact on investment strategies, risk management, market analysis, trade execution, and portfolio allocation. However, predicting stock prices is challenging due to complex factors. Previous research focused on using historical stock price information, technical indicators, and time series models to predict prices. Recent studies have explored using financial news and social media text for text mining, evaluating prediction performance with different machine learning and deep learning techniques. However, few studies have investigated the impact of continuous data on prediction. This study examines the effect of data volume on prediction performance, finding that continuous data performs better. Models pre-trained on financial vocabulary outperform general text feature models. Differences in prediction performance between news content, headlines, and their combination are minimal. Removing insignificant data labels improves prediction performance. Combining news content with daily stock price labels and conducting regression analysis shows that the RMSE is smaller for a five-day period, indicating a closer alignment between predicted and actual prices. Comparing different text features and classifiers, using FinBERT for average extraction and the RF classifier yield the best performance in terms of RMSE.