Q學習結合監督式學習在股票市場的應用

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：84

、訪客IP：18.191.43.140

姓名

簡豪新(Hao-Hsin Chien) 查詢紙本館藏

畢業系所

統計研究所

論文名稱

Q學習結合監督式學習在股票市場的應用
(Application of Q-learning combined with supervised learning in the stock market)

相關論文

★ 基於Q-learning與非監督式學習之交易策略

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

Q-learning 是一種強化學習算法，通過使用歷史股價數據作為環境反饋來學習最優投資決策。監督學習可用於通過股票價格相關特徵來訓練未來股票價格的狀態分類模型。本研究提出了一種基於Q-learning的投資策略，並結合監督學習對未來股價趨勢進行分類，以定義Qlearning過程中所需的狀態輸入值。最後，將所提出的方法應用於台灣上市股票以評估其運營績效。數值結果表明，該方法在考慮交易費用的情況下具有良好的盈利表現。

摘要(英)

Q-learning is a reinforcement learning algorithm that learns optimal investment decisions by using historical stock price data as feedback from the environment. Supervised learning can be applied to train a state classification model for future stock prices via stock price-related features. This study proposes an investment strategy based on Q-learning, and combines supervised learning to classify future stock price trends to define the state input values required in the Qlearning process. Finally, the proposed method is applied to Taiwan′s listed stocks to evaluate its perational performance. The numerical results show that the proposed method has a good profit performance under the consideration of transaction fees.

關鍵字(中)

★ 投資策略
★ Q學習
★ 監督式學習

關鍵字(英)

★ Investment strategy
★ Q-Learning
★ Supervised Learning

論文目次

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Mean-Variance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1 Establishment of Single Stock Forecast . . . . . . . . . . . . . . . . . . . . 8
3.2 Classification of States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Perform Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Experiment and Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1 Presentation of a single stock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Comparison of Portfolios & Individual Stock . . . . . . . . . . . . . . . 15
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

參考文獻

Basak, S., Kar, S., Saha, S., Khaidem, L., & Dey, S. R. (2019). Predicting the direction of
stock market prices using tree-based classifiers. The North American Journal of
Economics and Finance, 47, 552-567.
Bellman, R. J. N. J. (1957). Dynamic programming princeton university press
princeton. New Jersey Google Scholar, 24-73.
Chakole, J. B., Kolhe, M. S., Mahapurush, G. D., Yadav, A., & Kurhekar, M. P. (2021). A
Q-learning agent for automated trading in equity stock markets. Expert Systems with
Applications, 163, 113761.
Cui, T., Ding, S., Jin, H., & Zhang, Y. (2023). Portfolio constructions in cryptocurrency
market: A CVaR-based deep reinforcement learning approach. Economic Modelling,
119, 106078.
Cutler, D. R., Edwards Jr, T. C., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J.
J. (2007). Random forests for classification in ecology. Ecology, 88(11), 2783-2792.
Moody, J., & Saffell, M. (2001). Learning to trade via direct reinforcement. IEEE
transactions on neural Networks, 12(4), 875-889.
Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic
programming. John Wiley & Sons.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences.
Machine learning, 3, 9-44.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Zhao, X., Chen, L., & Lu, J. (2018). A similarity-based method for prediction of drug
side effects with heterogeneous information. Mathematical biosciences, 306, 136-
144.

指導教授

黃士峰王紹宣(Shih-Feng Huang Shao-Hsuan Wang)

審核日期

2023-7-26

推文