基於強化式學習與自編碼器壓縮特徵之資產配置方法

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：4

、訪客IP：18.191.57.163

姓名

李逸軒(Yi-Hsuan Lee) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

基於強化式學習與自編碼器壓縮特徵之資產配置方法
(Portfolio Management with Autoencoders and Reinforcement Learning)

相關論文

★ 行程邀約郵件的辨識與不規則時間擷取之研究	★ NCUFree校園無線網路平台設計及應用服務開發
★ 網際網路半結構性資料擷取系統之設計與實作	★ 非簡單瀏覽路徑之探勘與應用
★ 遞增資料關聯式規則探勘之改進	★ 應用卡方獨立性檢定於關連式分類問題
★ 中文資料擷取系統之設計與研究	★ 非數值型資料視覺化與兼具主客觀的分群
★ 關聯性字組在文件摘要上的探討	★ 淨化網頁：網頁區塊化以及資料區域擷取
★ 問題答覆系統使用語句分類排序方式之設計與研究	★ 時序資料庫中緊密頻繁連續事件型樣之有效探勘
★ 星狀座標之軸排列於群聚視覺化之應用	★ 由瀏覽歷程自動產生網頁抓取程式之研究
★ 動態網頁之樣版與資料分析研究	★ 同性質網頁資料整合之自動化研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-12-31以後開放)

摘要(中)

金融科技是人工智慧應用的主要領域之一，其中包括股票漲跌幅預測、資
產配置等項目。然而，僅依賴股票價格預測並無法保證投資回報的最大化，投資
者還必須兼顧資產配置策略，以達到最大化收益或最小化損失的目的。在這種
需要與環境互動以獲得報酬的情境下，強化學習（Reinforcement Learning，RL）成為一種理想的解決方法。因此，在本研究中，我們提出了一種使用RL的ActorCritic技術進行股票投資的策略。為了提升投資決策的效果，我們運用自動編碼器（AutoEncoder，AE）學習股票中的多項技術指標特徵，並用其進行股票持有配
置的決策和回報估計。
然而，投資組合管理在優化配置策略和精確預測回報方面仍面臨挑戰，特別
是在面對市場波動時。傳統策略通常偏向專注於短期或長期投資，這造成市場上
缺乏一個能靈活應對各種情境的模型。為解決這個問題，我們的研究提出了一種
新的結合強化學習和自動編碼器的方法，希望透過這種方式填補市場上的這個空缺。
我們透過消融實驗來探討AutoEncoder的編碼維度與歷史資料長度對狀態編
碼的影響。結果顯示，使用過去30日的歷史資料並將其壓縮至5個維度，能夠得到
最佳的狀態編碼效果。我們也發現加入AutoEncoder Predictor的預測結果能提高累積收益。此外，我們更進一步探討了三種不同的投資策略：RL+AE Predictor，RL Only，以及AE Predictor。透過效能分析、與大盤相關係數的探討，以及誤判率分析，我們評估這三種策略在不同市場環境下的表現。
實驗結果顯示，作為約束的投資策略，RL+AE Predictor在資產最大化上表
現最佳，且學習過程穩定。尤其在市場劇變時，該策略展現了良好的抗風險能力
並能維持穩定的投資回報。此外，該策略與大盤相關係數較低，顯示出其與市場
指數波動的獨立性。在誤判率分析中，RL+AE Predictor模型的FPR(False Positive Rate，誤判率)為6.46%相較於AE Predictor 38.38%及RL Only 36.09%來的低，顯示其在預測股票資產配置的表現最佳，誤判率最低。
我們將此方法驗證於台灣的股票市場環境，以2019年至2021年的台股資料進
行實驗，並與TW50指數、傳統投資組合理論(Mean-variance optimization，MVO)、以及使用強化學習Policy Gradient技術的Jiang’s研究進行比較。實驗結果顯示，本研究的贏率在短期投資3個月至中長期6-9個月的投資週期以及長期投
資(1年-2年)下優於比較的基準TW50、Jiang’s及MVO，且在長期12個月及24個月
的長期投資週期下達到最高總收益。即使在2019年多頭牛市及2020年熊市兩個不同投資起始點進行的2年固定投資時間的長期投資比較中，本論文所提出的方法仍
能贏過TW50指數、MVO以及Jiang’s。
總結來說，這項研究提供了強化學習和自動編碼器在投資組合管理中，無論
在累積回報率還是夏普比率上，都優於傳統的MVO、TW50指數以及Jiang的混合
型深度學習方法的實證證據，並強調了AI在複雜的金融決策中的潛力，並指出了
需要一個更靈活，通用的模型來填補短期和長期投資策略之間的差距。這些研究
成果為投資策略的發展和改進提供了重要的參考價值。

摘要(英)

Financial technology (FinTech) has emerged as one of the key areas for the application of artificial intelligence (AI), including but not limited to, the prediction of stock
market movements and asset allocation. However, relying solely on stock price forecasting does not guarantee the maximization of investment returns. An investor also needs to
consider asset allocation strategies to either maximize the returns or minimize the losses.
In such a scenario that requires interactions with the environment to reap rewards, reinforcement learning (RL) emerges as an ideal solution. Consequently, in this study, we
propose a strategy for stock investment that employs the Actor-Critic techniques of RL.
To enhance the effectiveness of investment decisions, we employed an AutoEncoder (AE) to learn the features of various technical indicators in stocks, which then aids in making
decisions on stock allocation and return estimation.
However, portfolio management still faces challenges in optimizing allocation strategies and accurately forecasting returns, especially during market volatility. Traditional
strategies often focus on either short-term or long-term investments, leading to a lack of a model in the market that can adapt flexibly to various situations. To address this
problem, we have introduced a novel method that combines reinforcement learning and autoencoders, hoping to fill this gap in the market.
We employed ablation experiments to explore the effects of the dimension of AutoEncoder encoding and the length of historical data on state encoding. The results indicate that by compressing the past 30 days of historical data into five dimensions, the optimal state encoding effect can be achieved. We also discovered that incorporating the
prediction results of the AutoEncoder Predictor can enhance cumulative earnings. Furthermore, we investigated three different investment strategies: RL+AE Predictor, RL
Only, and AE Predictor. Through performance analysis, correlation with the broader market, and error rate analysis, we evaluated the performance of these three strategies in
various market environments.
Experimental results reveal that as a constrained investment strategy, RL+AE Predictor performs the best in maximizing assets and exhibits a stable learning process. Especially during significant market changes, this strategy showcases superior risk resistance and can maintain stable investment returns. Moreover, this strategy has a lower correlation coefficient with the broader market, indicating its independence from market index volatility. In the error rate analysis, the RL+AE Predictor model has a False Positive
Rate (FPR) of 6.46%, which is lower compared to AE Predictor at 38.38% and RL Only at 36.09%. This shows that it outperforms in predicting stock asset allocation, with the lowest error rate.
We validated this method in Taiwan’s stock market environment, conducting experiments with Taiwan stock data from 2019 to 2021 and compared it with the TW50
Index, traditional portfolio theory (Mean-variance optimization, MVO), and Jiang’s
research that uses reinforcement learning Policy Gradient techniques. The experimental results show that the win rate of this study in short-term (3 months), mid-to-long term
(6-9 months), and long-term (1-2 years) investment periods is superior to the benchmarks
TW50, Jiang’s, and MVO, reaching the highest total return in the 12 and 24 month lognterm investment periods. Even in the two-year fixed investment time comparison, starting
at two different investment points, the bull market of 2019 and the bear market of 2020,
the method proposed in this thesis still outperforms the TW50 Index, MVO, and Jiang’s.
In summary, this research provides empirical evidence that a combination of reinforcement learning and autoencoders in portfolio management outperforms the traditional
MVO, TW50 Index, and Jiang’s hybrid deep learning methods in both cumulative return rate and Sharpe ratio. It highlights the potential of AI in complex financial decisions and
points out the need for a more flexible, universal model to bridge the gap between shortterm and long-term investment strategies. These research findings provide significant
reference value for the development and improvement of investment strategies

關鍵字(中)

★ 強化式學習
★ LSTM自編碼器
★ 交易
★ 投資組合分配

關鍵字(英)

★ Reinforcement learning
★ LSTM Autoencoder
★ Trading
★ Portfolio Allocation

論文目次

中文摘要 i
Abstract iii
目錄 v
圖目錄 vi
表目錄 vii
一、緒論 1
二、文獻探討 4
2.1 傳統金融資產配置方法 4
2.2 深度式強化學習的優化 7
三、 RL-EPR 系統架構 9
3.1 強化式學習介紹 9
3.2 問題定義 11
3.3 演算法 11
3.3.1 LSTM AutoEncoder 13
3.3.2 PPO(Proximal Policy Optimization) 15
四、實驗 19
4.1 資料集 19
4.2 實驗設計 21
4.2.1 問題定義 21
4.2.2 模型參數 22
4.3 實驗比較基準及評估方法 23
4.4 實驗結果 24
4.4.1 與其他經典資產配置方法進行實驗 24
4.4.2 消融實驗：探討AutoEncoder與歷史資料長度對編碼狀態的影響 27
4.4.3 探討AE Predictor+RL、RL only以及AE Predictor的效能 30
五、結論與未來研究 35
參考文獻 38
A.1 本系統之運作示意圖表 41
B.1 系統使用之技術指標 42
B.2 LSTM AutoEncoder 訓練模型表現 42
C.1 實驗股票清單 43

參考文獻

1. Harry Markowitz. Portfolio selection. The Journal of Finance, 7(1):77–91, 1952.
2. Zhengyao Jiang and Jinjun Liang. Cryptocurrency portfolio management with deep
reinforcement learning. In 2017 Intelligent Systems Conference (IntelliSys), pages
905–913, New York, NY, USA, 2017. IEEE.
3. Yuan Qi and Jing Xiao. Fintech: Ai powers financial services to improve people’s
lives. Communications of the ACM, 61(11):65–69, 2018.
4. Ahmet Murat Ozbayoglu, Mehmet Ugur Gudelek, and Omer Berat Sezer. Deep
learning for financial applications : A survey. Applied Soft Computing, 93:106384,
2020.
5. Dmitry Sizykh. Performance indicators comparative analysis of stocks investment
portfolios with various approaches to their formation. In 2020 13th International
Conference "Management of large-scale system development" (MLSD), pages 1–5,
New York, NY, USA, 2020. IEEE.
6. Yash S. Asawa. Modern machine learning solutions for portfolio selection. IEEE
Engineering Management Review, 50(1):94–112, 2021.
7. Weimin Ma, Yingying Wang, and Ningfang Dong. Study on stock price prediction
based on bp neural network. In 2010 IEEE International Conference on Emergency
Management and Management Sciences, pages 57–60, New York, NY, USA, 2010.
IEEE.
8. Timothée Lesort, Natalia Díaz-Rodríguez, Jean-Franois Goudou, and David Filliat.
State representation learning for control: An overview. Neural Networks, 108:379–
392, 2018.
9. Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. Deep direct reinforcement learning for financial signal representation and trading. IEEE
Transactions on Neural Networks and Learning Systems, 28(3):653–664, 2017.
10. Bo An, Shuo Sun, and Rundong Wang. Deep reinforcement learning for quantitative
trading: Challenges and opportunities. IEEE Intell. Syst., 37(2):23–26, 2022.
11. Amirhosein Mosavi, Yaser Faghan, Pedram Ghamisi, Puhong Duan, Sina Faizollahzadeh Ardabili, Ely Salwana, and Shahab S. Band. Comprehensive review of
deep reinforcement learning methods and applications in economics. Mathematics,
8(10), 2020.
12. Akhil Raj Azhikodan, Anvitha G. K. Bhat, and Mamatha V. Jadhav. Stock trading
bot using deep reinforcement learning. In H. S. Saini, Rishi Sayal, A. Govardhan,
and Rajkumar Buyya, editors, Innovations in Computer Science and Engineering,
pages 41–49, Singapore, 2019. Springer Singapore.
13. Tarrin Skeepers, Terence L. van Zyl, and Andrew Paskaramoorthy. Ma-fdrnn: Multiasset fuzzy deep recurrent neural network reinforcement learning for portfolio management. In 2021 8th International Conference on Soft Computing & Machine Intelligence (ISCMI), pages 32–37, New York, NY, USA, 2021. IEEE.
14. Qinma Kang, Huizhuo Zhou, and Yunfan Kang. An asynchronous advantage actorcritic reinforcement learning method for stock selection and portfolio management.
In Proceedings of the 2nd International Conference on Big Data Research, ICBDR
2018, page 141–145, New York, NY, USA, 2018. Association for Computing Machinery.
15. Mao Guan and Xiao-Yang Liu. Explainable deep reinforcement learning for portfolio management: An empirical approach. In Proceedings of the Second ACM
International Conference on AI in Finance, ICAIF ’21, New York, NY, USA, 2022.
Association for Computing Machinery.
16. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.
Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
17. Richard S. Sutton and Andrew G. Barto. Reinforcement learning - an introduction.
Adaptive computation and machine learning. MIT Press, 1998.
18. Liu XiaoYang, Li Zechu, Zhaoran Wang, and Zheng Jiahao. ElegantRL: Massively parallel framework for cloud-native deep reinforcement learning. https:
//github.com/AI4Finance-Foundation/ElegantRL, 2021.
19. Wei Bao, Jun Yue, and Yulei Rao. A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLOS ONE, 12(7):1–
24, 07 2017.
20. Zhipeng Liang, Kangkang Jiang, Hao Chen, Junhao Zhu, and Yanran Li. Deep
reinforcement learning in portfolio management. CoRR, abs/1808.09940, 2018.
21. Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg
Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller,
and David Silver. Emergence of locomotion behaviours in rich environments. CoRR,
abs/1707.02286, 2017.
22. Herman Kahn and Theodore E Harris. Estimation of particle transmission by random sampling. National Bureau of Standards applied mathematics series, 12:27–30,
1951.
23. Taiwan Stock Exchange. Taiwan stock exchange. https://www.twse.com.
tw/, 2022.
24. Min-Syue Chang. Application of learning to rank and autoencoder hybrid technology in portfolio strategy. Master’s thesis, National Central University, Taoyuan,
Taiwan, 2021.
25. Investopedia. Investopedia. https://www.investopedia.com/terms/t/
technicalindicator.asp, 2022.
26. Luciano Floridi and Massimo Chiriatti. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
27. Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch.
Multi-agent actor-critic for mixed cooperative-competitive environments. Neural
Information Processing Systems (NIPS), 2017.
28. Antonio C. Briza and Prospero C. Naval. Stock trading system based on the multiobjective particle swarm optimization of technical indicators on end-of-day market
data. Applied Soft Computing, 11(1):1191–1201, 2011.

指導教授

張嘉惠(Chia-Hui Chang)

審核日期

2023-7-17

推文