探討使用連續性財務新聞於股價預測之影響 :以文字探勘與深度學習為例

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：15

、訪客IP：3.135.190.163

姓名

林崇恩(Chung-En Lin) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

探討使用連續性財務新聞於股價預測之影響 :以文字探勘與深度學習為例
(Exploring the Impact of Using Sequential Financial News on Stock Price Prediction: A Case Study of Text Mining and Deep Learning)

相關論文

★ 利用資料探勘技術建立商用複合機銷售預測模型	★ 應用資料探勘技術於資源配置預測之研究-以某電腦代工支援單位為例
★ 資料探勘技術應用於航空業航班延誤分析-以C公司為例	★ 全球供應鏈下新產品的安全控管-以C公司為例
★ 資料探勘應用於半導體雷射產業-以A公司為例	★ 應用資料探勘技術於空運出口貨物存倉時間預測-以A公司為例
★ 使用資料探勘分類技術優化YouBike運補作業	★ 特徵屬性篩選對於不同資料類型之影響
★ 資料探勘應用於B2B網路型態之企業官網研究-以T公司為例	★ 衍生性金融商品之客戶投資分析與建議-整合分群與關聯法則技術
★ 應用卷積式神經網路建立肝臟超音波影像輔助判別模型	★ 基於卷積神經網路之身分識別系統
★ 能源管理系統電能補值方法誤差率比較分析	★ 企業員工情感分析與管理系統之研發
★ 資料淨化於類別不平衡問題: 機器學習觀點	★ 資料探勘技術應用於旅客自助報到之分析—以C航空公司為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2028-7-1以後開放)

摘要(中)

股價預測在金融市場上一直以來都扮演很重要的角色，並且一直以來都被視為是一個相當重要的研究議題。主要原因有投資策略、風險管理、市場分析、交易執行、資金配置等方面都具有很顯著的影響。但是股價預測受到許多複雜因素影響，讓股價預測變得困難。過去有許多研究主要集中在利用歷史股價資訊以及技術指標，以及很常探討的時間序列模型演算法來預測股價。近年也有許多研究專注在使用財經新聞以及社群媒體文本來進行文字探勘，並且透過各種文字特徵模型搭配不同的機器學習與深度學習技術來評估預測表現水準，針對不同的文字特徵以及分類器進行全面性的比較。但是較少有研究針對資料量是否連續來進行探討，因此本研究針對資料作進一步探討，探討單日資料量跟連續天數資料量對預測表現的影響。

本研究結果結果顯示連續型資料相較單日資料有更好的表現。針對財經領域詞彙預訓練的模型相對一般的文字特徵模型有更好的表現。探討新聞內容、新聞標題、新聞內容+新聞標題，三者的預測水準差異，實驗結果顯示預測水準差異不大。探討移除不顯著的資料標籤對預測水準有更好的表現。探討新聞內容結合當日股價標籤進行預測並且近一步迴歸分析計算出RMSE，得知連續五天的RMSE值較小，代表預測股價跟實際股價在連續五天的情況差異較小，並且從連續五天的不同文字特徵搭配不同機器學習深度學習模型比較，得知在使用FinBERT萃取平均下的RMSE值最小。以及RF分類器相對其他分類器在計算RMSE下有更好的表現。

摘要(英)

Stock price prediction plays a crucial role in the financial market and is a significant research topic. It has a significant impact on investment strategies, risk management, market analysis, trade execution, and portfolio allocation. However, predicting stock prices is challenging due to complex factors. Previous research focused on using historical stock price information, technical indicators, and time series models to predict prices. Recent studies have explored using financial news and social media text for text mining, evaluating prediction performance with different machine learning and deep learning techniques. However, few studies have investigated the impact of continuous data on prediction.
This study examines the effect of data volume on prediction performance, finding that continuous data performs better. Models pre-trained on financial vocabulary outperform general text feature models. Differences in prediction performance between news content, headlines, and their combination are minimal. Removing insignificant data labels improves prediction performance. Combining news content with daily stock price labels and conducting regression analysis shows that the RMSE is smaller for a five-day period, indicating a closer alignment between predicted and actual prices. Comparing different text features and classifiers, using FinBERT for average extraction and the RF classifier yield the best performance in terms of RMSE.

關鍵字(中)

★ 文字探勘
★ 自然語言處理
★ 股價預測
★ 連續型資料
★ 機器學習
★ 深度學習

關鍵字(英)

★ text mining
★ natural language processing
★ stock price prediction
★ continuous data
★ machine learning
★ deep learning

論文目次

目錄
摘要…………………………………………………………………………………….i
Abstract………………………………………………………………………………..ii
誌謝…………………………………………………………………………………...iii
目錄…...………………………………………………………………………………iv
圖目錄……………………………………………….………………………………..vi
表目錄………………………………………………………………………………..vii
第一章緒論 1
1.1 研究背景 1
1.2 研究動機 4
1.3 研究目的 6
1.4 研究架構 8
第二章文獻回顧與探討 9
2.1 文字探勘於股價預測 9
2.2 文字表示(Text Repesentation) 10
2.3 分類演算法(Classification Algorithms) 15
2.4 股價預測相關研究 21
第三章研究方法與實驗設計 24
3.1 實驗概述(Overview) 24
3.2 實驗一架構(Study One) 26
3.2.1 實驗準備 27
3.2.2 實驗資料集 27
3.2.3 資料集切分 29
3.2.4 資料前處理 (Data Preprocessing) 31
3.2.5 特徵表示方法(Feature Representation) 33
3.2.6 分類器選擇(Classifiers) 34
3.2.7 衡量指標(Evaluation Metrics) 35
3.2.8 預測模型參數設定 37
3.3 實驗二架構(Study Two) 38
3.3.1 回歸分析方法 38
第四章實驗結果 39
4.1 實驗一 39
4.1.1 實驗一結果-預測下一天 40
4.1.2 實驗一結果-預測下一周 50
4.1.3 實驗一結果-預測下一月 59
4.1.4 實驗一小結論 68
4.2 實驗二 69
4.2.1 實驗二結果一-新聞標題以及新聞內容綜合比較 70
4.2.2 實驗二結果二-移除不顯著的資料標籤 74
4.2.3 實驗二結果三-RMSE 79
4.2.4 實驗二小結論 87
第五章結論 88
5.1 總結與貢獻 88
5.2 未來研究方向與建議 89
參考文獻 90

參考文獻

參考文獻
Al Amrani, Y., Lazaar, M., & El Kadiri, K. E. (2018). Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Computer Science, 127, 511-520.
Alanyali, M., Moat, H. S., & Preis, T. (2013). Quantifying the relationship between financial news and the stock market. Scientific reports, 3(1), 1-6.
Albahli, S., Awan, A., Nazir, T., Irtaza, A., Alkhalifah, A., & Albattah, W. (2022). A deep learning method DCWR with HANet for stock market prediction using news articles. Complex & Intelligent Systems, 8(3), 2471-2487. https://doi.org/10.1007/s40747-022-00658-0
Altszyler, E., Sigman, M., Ribeiro, S., & Slezak, D. F. (2016). Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. arXiv preprint arXiv:1610.01520.
Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676.
Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
Cavalcante, R. C., Brasileiro, R. C., Souza, V. L., Nobrega, J. P., & Oliveira, A. L. (2016). Computational intelligence and financial markets: A survey and future directions. Expert Systems with Applications, 55, 194-211.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing, 408, 189-215.
Chandola, D., Mehta, A., Singh, S., Tikkiwal, V. A., & Agrawal, H. (2022). Forecasting Directional Movement of Stock Prices using Deep Learning. Annals of Data Science, 1-18.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Ferguson, N. J., Philip, D., Lam, H., & Guo, J. M. (2015). Media content and stock returns: The predictive power of press. Multinational Finance Journal, 19(1), 1-31.
Guo, J., & Tuckfield, B. (2020). News-based machine learning and deep learning methods for stock prediction. Journal of Physics: Conference Series,
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Huang, A. H., Wang, H., & Yang, Y. (2023). FinBERT: A large language model for extracting information from financial text. Contemporary Accounting Research, 40(2), 806-841.
Huang, H., Liu, X., Zhang, Y., & Feng, C. (2022). News-driven stock prediction via noisy equity state representation. Neurocomputing, 470, 66-75.
Kilimci, Z. H., & Duvar, R. (2020). An efficient word embedding and deep learning based model to forecast the direction of stock exchange market using twitter and financial news sites: a case of istanbul stock exchange (bist 100). IEEE Access, 8, 188186-188198.
Kim, Y., Jeong, S. R., & Ghani, I. (2014). Text opinion mining to analyze news for stock market prediction. Int. J. Advance. Soft Comput. Appl, 6(1), 2074-8523.
Lawrence, S., Giles, C. L., Tsoi, A. C., & Back, A. D. (1997). Face recognition: A convolutional neural-network approach. IEEE transactions on neural networks, 8(1), 98-113.
Lin, W.-C., Tsai, C.-F., & Chen, H. (2022). Factors affecting text mining based stock prediction: Text feature representations, machine learning models, and news platforms. Applied Soft Computing, 130, 109673.
Long, W., Song, L., & Tian, Y. (2019). A new graphic kernel method of stock price trend prediction based on financial news semantic and structural similarity. Expert Systems with Applications, 118, 411-424.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
Mishra, P., Pai, P., Singh, P., Kulkarni, S., & Weakey, S. A. (2021). Analysis of Effect of Historical Prices And News on The Stock Market. 2021 International Conference on Communication information and Computing Technology (ICCICT),
Mondal, P., Shit, L., & Goswami, S. (2014). Study of effectiveness of time series modeling (ARIMA) in forecasting stock prices. International Journal of Computer Science, Engineering and Applications, 4(2), 13.
Nam, K., & Seong, N. (2019). Financial news-based stock movement prediction using causality analysis of influence in the Korean stock market. Decision Support Systems, 117, 100-112.
Navarro, J. M., Martínez-España, R., Bueno-Crespo, A., Martínez Carreras, R., & Cecilia, J. (2020). Sound Levels Forecasting in an Acoustic Sensor Network Using a Deep Neural Network. Sensors, 20, 903. https://doi.org/10.3390/s20030903
Ray, S., Alshouiliy, K., & Agrawal, D. (2020). Dimensionality Reduction for Human Activity Recognition Using Google Colab. Information, 12, 6. https://doi.org/10.3390/info12010006
Rong, X. (2014). word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 513-523.
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11), 2673-2681.
Selvin, S., Vinayakumar, R., Gopalakrishnan, E., Menon, V. K., & Soman, K. (2017). Stock price prediction using LSTM, RNN and CNN-sliding window model. 2017 international conference on advances in computing, communications and informatics (icacci),
Seong, N., & Nam, K. (2021). Predicting stock movements based on financial news with segmentation. Expert Systems with Applications, 164, 113988.
Shen, J., & Shafiq, M. O. (2020). Short-term stock market price trend prediction using a comprehensive deep learning system. Journal of big Data, 7(1), 1-33.
Souma, W., Vodenska, I., & Aoyama, H. (2019). Enhanced news sentiment analysis using deep learning methods. Journal of Computational Social Science, 2(1), 33-46.
Stoll, H. R., & Whaley, R. E. (1990). Stock market structure and volatility. The Review of Financial Studies, 3(1), 37-71.
Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2015). Preprocessing techniques for text mining-an overview. International Journal of Computer Science & Communication Networks, 5(1), 7-16.
Xing, F. Z., Cambria, E., & Welsch, R. E. (2018). Natural language based financial forecasting: a survey. Artificial Intelligence Review, 50(1), 49-73.
Xu, D., Xu, Z., Chen, S., & Fujita, H. (2022). A multi-channel cross-residual deep learning framework for news-oriented stock movement prediction. Economic Research-Ekonomska Istraživanja, 1-18.
Zhang, Y., Jin, R., & Zhou, Z.-H. (2010). Understanding bag-of-words model: a statistical framework. International journal of machine learning and cybernetics, 1, 43-52.

指導教授

蔡志豐(Chih-Fong Tsai)

審核日期

2023-7-18

推文