姓名 鄭心愉(Hsin-Yu Cheng)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 整合深度學習與傳統分類器於財務新聞對高漲幅股價預測之研究
(A Study on Integrating Deep Learning and Traditional Classifiers for Predicting High-Volatility Stock Prices Using Financial News)
摘要(中) 在當今經濟體系之中,由於Covid-19疫情的衝擊,股市的波動性顯著增加,進而凸顯了準確預測股價動向的必要性。傳統的股價預測方法,例如技術指標與統計分析,已在迅速變化的市場環境下展現出其限制。本研究致力於提出最佳的股價預測模型組合,該組合融合深度學習技術與傳統機器學習方法,尤其著眼於高風險及高報酬股票的預測領域。考量到現有股價預測的文獻大多集中於中低風險領域,缺乏對高風險、高報酬的投資需求,本研究擴展了研究範圍,包含不同風險等級的股票預測,並著手探討表現最優的文字特徵與分類器組合。鑒於高風險股票的漲幅資料相對稀少,存在明顯的類別不平衡問題。本研究將透過對多種文字特徵提取方法和分類器模型,以及採樣方法深入分析與比較,旨在提供針對不同情境下或股票類型的最佳預測模型組合,以提高股價預測的準確性和適用性。
摘要(英) In the current economic system, the impact of the Covid-19 pandemic has significantly increased the volatility of the stock market, thereby highlighting the necessity for accurate stock price prediction. Traditional methods of stock price forecasting, such as technical indicators and statistical analysis, have shown their limitations in a rapidly changing market environment. This research is dedicated to proposing the best combination of stock price prediction models, integrating deep learning techniques with traditional machine learning methods, particularly focusing on the prediction of high-risk and high-reward stocks. Considering that existing literature on stock price prediction primarily focuses on low to medium-risk areas, lacking in catering to the needs for high-risk, high-reward investments, this study expands its scope to include predictions across different risk levels and explores the most effective combinations of textual features and classifier models. Given the relatively sparse data on significant price movements of high-risk stocks, there is a clear issue of class imbalance. This research will conduct a thorough analysis and comparison of various text feature extraction methods and classifier models, as well as sampling techniques, aimed at providing the best model combination for different scenarios or types of stocks, to enhance the accuracy and applicability of stock price predictions.

The results of this study indicate that there is no single combination of feature extraction method and classifier that consistently shows the best performance in all scenarios. In fact, each specific situation has its optimal combination, demonstrating that in the establishment of stock price prediction models, it is essential to choose the appropriate feature extraction method and classifier algorithm based on different market conditions and data characteristics. Moreover, another significant finding is that addressing data imbalance issues through oversampling and hybrid sampling techniques can effectively improve the performance of stock price predictions. This reveals that proper data preprocessing strategies are crucial for enhancing the performance of prediction models when dealing with financial time series data.
關鍵字(中) ★ 股價預測
★ 文字探勘
★ 自然語言處理
★ 機器學習
★ 深度學習
★ 類別不平衡
關鍵字(英) ★ Stock Price Prediction
★ Text Mining
★ Natural Language Processing
★ Machine Learning
★ Deep Learning
★ Class Imbalance
論文目次 摘要…………………………………………………………………………………….i
第一章 緒論............................................................................................................ 1
1.1 研究背景.................................................................................................... 1
1.2 研究動機.................................................................................................... 3
1.3 研究目的.................................................................................................... 4
1.4 研究架構.................................................................................................... 5
第二章 文獻回顧與探討 ....................................................................................... 7
2.1 文字探勘於股價預測................................................................................ 7
2.2 文字表示(Text representation) .................................................................. 7
2.2.1 Term Frequency-Inverse Document Frequency ............................ 8
2.2.2 Word2vec ....................................................................................... 8
2.2.3 Bidirectional Encoder Representations from Transformers ........ 10
2.2.4 FinBERT ...................................................................................... 11
2.2.5 Embeddings from Language Models ........................................... 12
2.3 分類演算法(Classification Algorithms) .................................................. 13
2.3.1 線性支持向量分類(Linear Support Vector Classification) ........ 13
2.3.2 隨機森林(Random Forest) .......................................................... 14
2.3.3 孤立森林(Isolation Forest) .......................................................... 15
2.3.4 單類別支援向量機(One-Class SVM, OCSVM) ........................ 15
2.3.5 長短期記憶網路 (Long Short-Term Memory) ........................... 17
2.3.6 卷積神經網路(Convolutional Neural Network) ......................... 18
2.4 類別不平衡股與價預測相關研究.......................................................... 19
第三章 研究方法與實驗設計 ............................................................................. 22
3.1 實驗概述(Overview) ............................................................................... 22
3.2 實驗一架構(Study One) .......................................................................... 23
3.2.1 實驗準備...................................................................................... 24
3.2.2 實驗資料集.................................................................................. 24
3.2.3 資料集切分.................................................................................. 26
3.2.4 資料前處理 (Data Preprocessing) ............................................... 27
3.2.5 特徵表示方法(Feature Representation) ...................................... 29
3.2.6 分類器選擇(Classifiers) .............................................................. 30
3.2.7 衡量指標(Evaluation Metrics) .................................................... 31
3.2.8 參數設定...................................................................................... 33
3.3 實驗二架構(Study Two) ......................................................................... 34
第四章 實驗結果.................................................................................................. 36
4.1 實驗一...................................................................................................... 36
4.1.1 實驗一結果.................................................................................. 36
4.1.2 實驗一小結.................................................................................. 42
4.2 實驗二...................................................................................................... 44
4.2.1 實驗二結果.................................................................................. 44
4.2.2 實驗二小結.................................................................................. 59
第五章 結論.......................................................................................................... 60
5.1 總結與貢獻.............................................................................................. 60
5.2 研究限制.................................................................................................. 61
5.3 未來研究方向與建議.............................................................................. 62
第六章 參考文獻.................................................................................................. 64
指導教授 蔡志豐(Chih-Fong Tsai) 審核日期 2024-7-3
