基於Q-learning與非監督式學習之交易策略

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：79

、訪客IP：18.223.108.71

姓名

李濬紘(Chun-Hung Lee) 查詢紙本館藏

畢業系所

統計研究所

論文名稱

基於Q-learning與非監督式學習之交易策略
(A Trading Strategy Based on Q-learning and Unsupervised Learning)

相關論文

★ Q學習結合監督式學習在股票市場的應用

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2026-7-1以後開放)

摘要(中)

在股票交易中，根據不同情況設計一個盈利的交易策略是一個重大挑戰。近年來，人工智能的發展為股票市場帶來了新的投資方法。Q-learning，一種強化學習演算法，可以幫助投資者學習市場趨勢並提供更合理的投資決策。在Q-learning中，狀態的制定尤其重要，因為不同的制定方法會影響其表現。本文提出了一種基於非監督式學習的數據驅動方法來設置Q-learning所需的狀態，將多維度的股票市場資料作為特徵，並藉由動態時間校正(DTW) 與 t-SNE 來找尋所需狀態。本文以台灣股市為例，建構單一資產的Q-learning投資決策，並相應地提出了一個由多個資產組成的適當投資組合。實證結果顯示，所提出的方法提供了不錯的投資表現。

摘要(英)

Designing a profitable trading strategy based on different situations is a major challenge in
stock trading. In recent years, the development of artificial intelligence has brought new investment methods to the stock market. Q-learning, a reinforcement learning algorithm, can
help investors to learn market trends and recommend more reasonable investment decisions.
In Q-learning, the formulation of states is particularly important since different formulation methods can affect its performance. We propose a data-driven approach based on a
non-supervised learning method to set the states required in Q-learning. By utilizing multidimensional stock market data as features and leveraging Dynamic Time Warping (DTW) and t-SNE, the proposed approach efficiently identifies the desired states for Q-learning. In this work, using the Taiwan stock market as an example, we obtain the Q-learning investment decision of a single asset and propose an appropriate investment portfolio consisting of multiple assets accordingly. The empirical results reveal that the proposed method provides a satisfactory investment performance.

關鍵字(中)

★ 動態時間校正
★ 非監督式學習
★ 投資組合選擇
★ Q-learning
★ t-SNE

關鍵字(英)

論文目次

1 Introduction 1
2 Review 3
2.1 Markov Decisions Process 3
2.2 Q-learning 4
2.3 t-Distributed Stochastic Neighbor Embedding 5
2.4 Dynamic Time Warping-based t-SNE 7
3 Methods 9
3.1 Data Pretreatment 10
3.2 Cluster to represent the state for Q-learning 11
3.3 Portfolio based on Q-learning 14
4 Experiment and results 17
4.1 Datasets 17
4.2 Experiment setting 18
4.3 Experimental results 19
5 Conclusion and discussion 28
Reference 30
A Preliminary clustering for other 16 stocks 32
B Q-learning performance for other 16 stocks 35

參考文獻

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, England, 1989.
C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 3:279–292, 1992.
D. Berndt & J. Clifford. Using dynamic time warping to find patterns in time series. AAAI94 Workshop on Knowledge Discovery in Databases (KDD-94), Seattle, Washington,
1994.
H. Sakoe and S. Chiba. Dynamic Programming Algorithm Optimization for Spoken Word
Recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 43-49,
1978.
J. B. Chakole et al. A Q-learning agent for automated trading in equity stock markets.
Expert Systems with Applications, Vol. 163, 113761, ISSN 0957-4174, 2021.
J. Hu and M. P. Wellman. Nash Q-Learning for General-Sum Stochastic Games. Journal
of Machine Learning Research, 2003.
K. Y. Wong and F. L. Chung. Visualizing Time Series Data with Temporal Matching Based
t-SNE. International Joint Conference on Neural Networks, Budapest, Hungary, pages
14-19, 2019.
L. Bu¸soniu, R. Babu˘ska, and B. De Schutter. Multi-agent reinforcement learning: An
overview. Chapter 7 in Innovations in Multi-Agent Systems and Applications – 1 (D.
Srinivasan and L.C. Jain, eds.), vol. 310 of Studies in Computational Intelligence,
Berlin, Germany: Springer, pp. 183–221, 2010.
L. v. d. Maaten and G. Hinton. Visualizing Data using t-SNE. Journal of Machine Learning
Research, 2008.
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming.
John Wiley & Sons, New York, 1994.
R. S. Sutton & A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 2018.

指導教授

黃士峰王紹宣

審核日期

2023-7-26

推文