博碩士論文 111423020 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:33 、訪客IP:18.222.179.232
姓名 鄭心愉(Hsin-Yu Cheng)  查詢紙本館藏   畢業系所 資訊管理學系
論文名稱 整合深度學習與傳統分類器於財務新聞對高漲幅股價預測之研究
(A Study on Integrating Deep Learning and Traditional Classifiers for Predicting High-Volatility Stock Prices Using Financial News)
相關論文
★ 利用資料探勘技術建立商用複合機銷售預測模型★ 應用資料探勘技術於資源配置預測之研究-以某電腦代工支援單位為例
★ 資料探勘技術應用於航空業航班延誤分析-以C公司為例★ 全球供應鏈下新產品的安全控管-以C公司為例
★ 資料探勘應用於半導體雷射產業-以A公司為例★ 應用資料探勘技術於空運出口貨物存倉時間預測-以A公司為例
★ 使用資料探勘分類技術優化YouBike運補作業★ 特徵屬性篩選對於不同資料類型之影響
★ 資料探勘應用於B2B網路型態之企業官網研究-以T公司為例★ 衍生性金融商品之客戶投資分析與建議-整合分群與關聯法則技術
★ 應用卷積式神經網路建立肝臟超音波影像輔助判別模型★ 基於卷積神經網路之身分識別系統
★ 能源管理系統電能補值方法誤差率比較分析★ 企業員工情感分析與管理系統之研發
★ 資料淨化於類別不平衡問題: 機器學習觀點★ 資料探勘技術應用於旅客自助報到之分析—以C航空公司為例
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 (2029-7-1以後開放)
摘要(中) 在當今經濟體系之中,由於Covid-19疫情的衝擊,股市的波動性顯著增加,進而凸顯了準確預測股價動向的必要性。傳統的股價預測方法,例如技術指標與統計分析,已在迅速變化的市場環境下展現出其限制。本研究致力於提出最佳的股價預測模型組合,該組合融合深度學習技術與傳統機器學習方法,尤其著眼於高風險及高報酬股票的預測領域。考量到現有股價預測的文獻大多集中於中低風險領域,缺乏對高風險、高報酬的投資需求,本研究擴展了研究範圍,包含不同風險等級的股票預測,並著手探討表現最優的文字特徵與分類器組合。鑒於高風險股票的漲幅資料相對稀少,存在明顯的類別不平衡問題。本研究將透過對多種文字特徵提取方法和分類器模型,以及採樣方法深入分析與比較,旨在提供針對不同情境下或股票類型的最佳預測模型組合,以提高股價預測的準確性和適用性。
本研究結果顯示,在所有情境下並不存在一個特定的特徵萃取方式與分類器組合能持續展現最佳表現。實際上,每一種特定情況均有其對應的最佳組合,顯示在股價預測模型的建立中,應依據不同的市場狀況和資料特性選擇合適的特徵萃取方法和分類器算法。此外,另一項重要的研究發現是,當面對資料不平衡的問題時,採用過採樣和混合採樣技術對資料進行平衡處理,可以有效改善股價預測的性能。這顯示在處理金融時間序列資料時,適當的資料前處理策略是提升預測模型表現的關鍵因素。
摘要(英) In the current economic system, the impact of the Covid-19 pandemic has significantly increased the volatility of the stock market, thereby highlighting the necessity for accurate stock price prediction. Traditional methods of stock price forecasting, such as technical indicators and statistical analysis, have shown their limitations in a rapidly changing market environment. This research is dedicated to proposing the best combination of stock price prediction models, integrating deep learning techniques with traditional machine learning methods, particularly focusing on the prediction of high-risk and high-reward stocks. Considering that existing literature on stock price prediction primarily focuses on low to medium-risk areas, lacking in catering to the needs for high-risk, high-reward investments, this study expands its scope to include predictions across different risk levels and explores the most effective combinations of textual features and classifier models. Given the relatively sparse data on significant price movements of high-risk stocks, there is a clear issue of class imbalance. This research will conduct a thorough analysis and comparison of various text feature extraction methods and classifier models, as well as sampling techniques, aimed at providing the best model combination for different scenarios or types of stocks, to enhance the accuracy and applicability of stock price predictions.

The results of this study indicate that there is no single combination of feature extraction method and classifier that consistently shows the best performance in all scenarios. In fact, each specific situation has its optimal combination, demonstrating that in the establishment of stock price prediction models, it is essential to choose the appropriate feature extraction method and classifier algorithm based on different market conditions and data characteristics. Moreover, another significant finding is that addressing data imbalance issues through oversampling and hybrid sampling techniques can effectively improve the performance of stock price predictions. This reveals that proper data preprocessing strategies are crucial for enhancing the performance of prediction models when dealing with financial time series data.
關鍵字(中) ★ 股價預測
★ 文字探勘
★ 自然語言處理
★ 機器學習
★ 深度學習
★ 類別不平衡
關鍵字(英) ★ Stock Price Prediction
★ Text Mining
★ Natural Language Processing
★ Machine Learning
★ Deep Learning
★ Class Imbalance
論文目次 摘要…………………………………………………………………………………….i
Abstract………………………………………………………………………………..ii
誌謝…………………………………………………………………………………...iii
目錄…...………………………………………………………………………………iv
圖目錄……………………………………………….………………………………..vi
表目錄………………………………………………………………………………..vii
第一章 緒論............................................................................................................ 1
1.1 研究背景.................................................................................................... 1
1.2 研究動機.................................................................................................... 3
1.3 研究目的.................................................................................................... 4
1.4 研究架構.................................................................................................... 5
第二章 文獻回顧與探討 ....................................................................................... 7
2.1 文字探勘於股價預測................................................................................ 7
2.2 文字表示(Text representation) .................................................................. 7
2.2.1 Term Frequency-Inverse Document Frequency ............................ 8
2.2.2 Word2vec ....................................................................................... 8
2.2.3 Bidirectional Encoder Representations from Transformers ........ 10
2.2.4 FinBERT ...................................................................................... 11
2.2.5 Embeddings from Language Models ........................................... 12
2.3 分類演算法(Classification Algorithms) .................................................. 13
2.3.1 線性支持向量分類(Linear Support Vector Classification) ........ 13
2.3.2 隨機森林(Random Forest) .......................................................... 14
2.3.3 孤立森林(Isolation Forest) .......................................................... 15
2.3.4 單類別支援向量機(One-Class SVM, OCSVM) ........................ 15
2.3.5 長短期記憶網路 (Long Short-Term Memory) ........................... 17
2.3.6 卷積神經網路(Convolutional Neural Network) ......................... 18
2.4 類別不平衡股與價預測相關研究.......................................................... 19
第三章 研究方法與實驗設計 ............................................................................. 22
3.1 實驗概述(Overview) ............................................................................... 22
3.2 實驗一架構(Study One) .......................................................................... 23
3.2.1 實驗準備...................................................................................... 24
3.2.2 實驗資料集.................................................................................. 24
3.2.3 資料集切分.................................................................................. 26
3.2.4 資料前處理 (Data Preprocessing) ............................................... 27
3.2.5 特徵表示方法(Feature Representation) ...................................... 29
3.2.6 分類器選擇(Classifiers) .............................................................. 30
3.2.7 衡量指標(Evaluation Metrics) .................................................... 31
3.2.8 參數設定...................................................................................... 33
3.3 實驗二架構(Study Two) ......................................................................... 34
第四章 實驗結果.................................................................................................. 36
4.1 實驗一...................................................................................................... 36
4.1.1 實驗一結果.................................................................................. 36
4.1.2 實驗一小結.................................................................................. 42
v
4.2 實驗二...................................................................................................... 44
4.2.1 實驗二結果.................................................................................. 44
4.2.2 實驗二小結.................................................................................. 59
第五章 結論.......................................................................................................... 60
5.1 總結與貢獻.............................................................................................. 60
5.2 研究限制.................................................................................................. 61
5.3 未來研究方向與建議.............................................................................. 62
第六章 參考文獻.................................................................................................. 64
參考文獻 Adeodato, P., & Melo, S. (2022). A geometric proof of the equivalence between AUC_ROC and Gini index area metrics for binary classifier performance assessment. 2022 International Joint Conference on Neural Networks (IJCNN), 1–6. https://doi.org/10.1109/IJCNN55064.2022.9892048
Araci, D. (2019). FinBERT: Financial Sentiment Analysis with Pre-trained Language Models (arXiv:1908.10063). arXiv. http://arxiv.org/abs/1908.10063
Bathla, G. (2020). Stock Price prediction using LSTM and SVR. 2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC), 211–214. https://doi.org/10.1109/PDGC50313.2020.9315800
Bouktif, S., Fiaz, A., & Awad, M. (2019). Stock Market Movement Prediction using Disparate Text Features with Machine Learning. 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), 1–6. https://doi.org/10.1109/ICDS47004.2019.8942303
BREIMAN, L. (2001). Random Forests. https://link.springer.com/article/10.1023/A:1010933404324
Chen, C. (2023). Stock Price Prediction Based on the Fusion of CNN-GRU Combined Neural Network and Attention Mechanism. 2023 6th International Conference on Electronics Technology (ICET), 1166–1170. https://doi.org/10.1109/ICET58434.2023.10211379
Chen, H. (2021). Text Mining in Stock Prediction by Traditional Machine Learning and Deep Learning Techniques with Different Financial News. http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=108423019&fileName=GC108423019.pdf
Chiao, C., & Wang, Z. (2008, January 7). 臺灣股市委託價格群聚現象之實證研究. 359–380.
CORTES, C., & VAPNIK, V. (1995). Support-vector networks. https://link.springer.com/article/10.1007/BF00994018
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (arXiv:1810.04805). arXiv. http://arxiv.org/abs/1810.04805
Fan, X., & Tang, K. (2010). Enhanced Maximum AUC Linear Classifier. 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery, 1540–1544. https://doi.org/10.1109/FSKD.201f569339
Gandhi, C., Kumar Sarangi, P., Saxena, M., & Sahoo, A. K. (2023). SMS Spam Detection Using Deep Learning Techniques: A Comparative Analysis of DNN Vs LSTM Vs Bi-LSTM. 2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), 189–194. https://doi.org/10.1109/CISES58720.2023.10183634
Gomes Sousa, M., Sakiyama, K., Souza Rodrigues, L. de, Henrique Moraes, P., Rezende Fernandes, E., & Takashi Matsubara, E. (2019). BERT for Stock Market Sentiment Analysis. https://doi.org/10.1109/ICTAI.2019.00231
Guo, Y. (2020). Stock Price Prediction Based on LSTM Neural Network: The Effectiveness of News Sentiment Analysis. 2020 2nd International Conference on Economic Management and Model Engineering (ICEMME), 1018–1024. https://doi.org/10.1109/ICEMME51517.2020.00206
Guo, Y. (2022). Stock Price Prediction Using Machine Learning. https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1672304&dswid=4796
Gupta, A., Dengre, V., Kheruwala, H. A., & Shah, M. (2020). Comprehensive review of text-mining applications in finance. Financial Innovation, 6(1), 39. https://doi.org/10.1186/s40854-020-00205-1
Gupta, H., & Patel, M. (2020). Study of Extractive Text Summarizer Using The Elmo Embedding. https://doi.org/10.1109/I-SMAC49090.2020.9243610
Habeeb, S., Rabbani, M. R., Ahmad, N., Moh’d Ali, M. A., & Bashar, A. (2021). Post COVID-19 challenges for the sustainable entrepreneusrhip. 2021 International Conference on Sustainable Islamic Business and Finance, 154–158. https://doi.org/10.1109/IEEECONF53626.2021.9686341
Hagenau, M., Hauser, M., Liebmann, M., & Neumann, D. (2013). Reading All the News at the Same Time: Predicting Mid-term Stock Price Developments Based on News Momentum. 2013 46th Hawaii International Conference on System Sciences, 1279–1288. https://doi.org/10.1109/HICSS.2013.460
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Ilenia Orlandi, Luca Oneto, & Davide Anguita. (2016). Random Forests Model Selection. https://www.esann.org/sites/default/files/proceedings/legacy/es2016-48.pdf
Insan, H., Suryani Prasetiyowati, S., & Sibaroni, Y. (2023). SMOTE-LOF and Borderline-SMOTE Performance to Overcome Imbalanced Data and Outliers on Classification. https://doi.org/10.1109/ICICyTA60173.2023.10428902
Jaskowiak, P. A., Gesteira Costa, I., & José Gabrielli Barreto Campello, R. (2020). The Area Under the ROC Curve as a Measure of Clustering Quality. https://doi.org/10.1007/s10618-022-00829-0
Khan, S., Rabbani, M. R., Bashar, A., & Kamal, M. (2021). Stock Price Forecasting Using Deep Learning Model. 2021 International Conference on Decision Aid Sciences and Application (DASA), 215–219. https://doi.org/10.1109/DASA53625.2021.9682319
LeCun, Y., Bengio, Y., & Laboratories, T. B. (1995). Convolutional Networks for Images, Speech, and Time-Series.
Leonard, G., Sisnadi, F., Vigo Wardhana, N., Abdul Aziz Al-Ghofari, M., & Suganda Girsang, A. (2022). News Classification Based On News Headline Using SVC Classifier. https://doi.org/10.1109/TSSA56819.2022.10063879
Li, X., Pu, R., & Yuan, Y. (2022). Deep Neural Networks for Stock Market Prediction. 2022 International Conference on Computers, Information Processing and Advanced Education (CIPAE), 214–218. https://doi.org/10.1109/CIPAE55637.2022.00053
Liu, D., Chen, A., & Wu, J. (2020). Research on Stock Price Prediction Method Based on Deep Learning. 2020 2nd International Conference on Information Technology and Computer Application (ITCA), 69–72. https://doi.org/10.1109/ITCA52113.2020.00022
Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation Forest. https://doi.org/10.1109/ICDM.2008.17
Ma, L., & Zhang, Y. (2015). Using Word2Vec to process big text data. https://doi.org/10.1109/BigData.2015.7364114
Meesad, P., & Li, J. (2014). Stock trend prediction relying on text mining and sentiment analysis with tweets. https://doi.org/10.1109/WICT.2014.7077275
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space (arXiv:1301.3781). arXiv. http://arxiv.org/abs/1301.3781
Murphy, J. J. (1999). Technical Analysis of the Financial Markets (SUB UPD EX). New York Institute of Finance.
Naresh, E., J Ananda, B., S Keerthi, K., & R Tejonidhi, R. (2022). Predicting the Stock Price Using Natural Language Processing and Random Forest Regressor. https://doi.org/10.1109/ICDSIS55133.2022.9915940
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep Contextualized Word Representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227–2237. https://doi.org/10.18653/v1/N18-1202
Preeti, P. (2021). Review on Text Mining: Techniques, Applications and Issues. https://doi.org/10.1109/SMART52563.2021.9676285
Rokach, L. (2005). Decision Trees. https://doi.org/10.1007/0-387-25465-X_9
Sarika, V., Kamal, G. V. S., Pratham, S. V., Deepak, N. V. S. S., & Veneela, T. (2023). An LSTM-Based Model for Stock Price Prediction. 2023 Annual International Conference on Emerging Research Areas: International Conference on Intelligent Systems (AICERA/ICIS), 1–6. https://doi.org/10.1109/AICERA/ICIS59538.2023.10420270
Scholkopf, B. (1999). Support Vector Method for Novelty Detection. https://proceedings.neurips.cc/paper_files/paper/1999/file/8725fb777f25776ffa9076e44fcfd776-Paper.pdf
Selvin, S., Vinayakumar, R., Gopalakrishnan, E. A., Menon, V. K., & Soman, K. P. (2017). Stock price prediction using LSTM, RNN and CNN-sliding window model. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 1643–1647. https://doi.org/10.1109/ICACCI.2017.8126078
Shirui Wang, Wenan Zhou, & Chao Jiang. (2019). A Survey of Word Embeddings Based On Deep Learning. https://www.scribd.com/document/654526359/s00607-019-00768-7
Suganda Girsang, A., & Stanley. (2023). Hybrid LSTM and GRU for Cryptocurrency Price Forecasting Based on Social Network Sentiment Analysis Using FinBERT. https://doi.org/10.1109/ACCESS.2023.3324535
Tseng, C.-K. (2020, June). One class classification on imbalanced datasets with missing value imputation and instance selection. http://ir.lib.ncu.edu.tw:88/thesis/view_etd.asp?URN=107423050&fileName=GC107423050.pdf#
Wijaya, A. Y., Fatichah, C., & Saikhu, A. (2023). Prediction of Stock Trend Using Random Forest Optimization. 2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation (ICAMIMIA), 1–6. https://doi.org/10.1109/ICAMIMIA60881.2023.10427958
Xing, F. Z., Cambria, E., & Welsch, R. E. (2017). Natural language based financial forecasting: A survey. https://doi.org/10.1007/s10462-017-9588-9
Yang, K., Kpotufe, S., & Feamster, N. (2021). An Efficient One-Class SVM for Anomaly Detection in the Internet of Things. https://www.semanticscholar.org/paper/An-Efficient-One-Class-SVM-for-Anomaly-Detection-in-Yang-Kpotufe/5b76ea2e6e73c05a3698a2d064c93557e282419c
指導教授 蔡志豐(Chih-Fong Tsai) 審核日期 2024-7-3
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明