運用DDPG與PPO深度強化學習於資產配置最佳化的研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：31

、訪客IP：3.17.155.11

姓名

孟繁淳(Fan-Chun Meng) 查詢紙本館藏

畢業系所

資訊管理學系

論文名稱

運用DDPG與PPO深度強化學習於資產配置最佳化的研究
(Using Deep Reinforcement Learning on Asset Allocation Optimization via Deep Deterministic Policy Gradient and Proximal Policy Optimization)

相關論文

★ 針對提昇資料倉儲資料庫執行效能之知識管理與相關系統設計	★ 以關聯規則探勘為基礎，探討詐騙車手提領型態互動之研究
★ 部落格之網路口碑評比機制平台管理與應用	★ 虛擬貨幣交易平台之實現
★ 適用於多種設備的可否認鑑別協定之設計	★ 交易程式最佳化的多維度分析平台之設計與建置
★ 多商品多策略程式交易績效曲線比較和分群機制之研究	★ 整合分位數回歸分析與單因子選股的股票選股策略回測與績效評估之研究
★ 以工作流程與Portlet為基礎整合學習管理系統以支援課程編組	★ 使用服務導向技術建構具支援二線廠客製化能力的電子中樞系統之研究
★ 以流程為中心的Portlet重用性分析	★ 應用資料倉儲技術建構平衡計分卡資訊系統之研究-以某消費性電子製造公司人力資源計分卡為例
★ 自動化的產品平台管理與應用	★ 以代理人為基礎的資訊系統協助新產品開發流程的自動化
★ 以整合式的教練引導開發以框架為基礎的專案	★ 支援新產品研發的整合性知識管理系統

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

資產配置在金融投資領域有著重要的地位，通過合理而有效的資產配置策略，投資者可以在不同的金融商品間進行資金比例的調節，在抑制風險的同時最大化回報。在傳統的資產配置研究中，常使用 Markowitz 所提出的均值方差模型(Mean-Variance Model)，然而其在處理金融產品的時間序列數據時，缺乏足夠的非線性表現能力，且不善於處理金融市場的動態性。

近年來深度強化學習興起，開始有研究者利用深度強化學習來處理資產配置問題，在目前的相關研究中，針對模型內的獎勵函數，多為單純計算投資報酬的變化量，皆尚未考慮風險的因素，然而風險是資產配置策略所需要考慮的重要面相，其在實務上也有一定的重要性。

本研究使用深度強化學習模型來研究資產配置最佳化問題，主要分為兩大階段，第一階段為深度強化學習模型內 CNN 神經網路參數調整，第二階段為比較七組獎勵函數與四種重新平衡頻率，對模型交易績效的影響。

第一階段，本研究以 CNN 分別搭配 DDPG 以及 PPO 演算法設計兩模型，透過各項參數組合之測試，探討深度強化學習內的 CNN 神經網路參數如何影響模型的交易績效。

第二階段，將以實驗後得到之最佳參數組合，透過七組獎勵函數與四種重新平衡頻率之測試，比較模型的交易績效，找尋更適用於資產配置之獎勵函數因子。

本研究之權重配置模型(DDPG 與 PPO 模型)因為需要調整的參數較多，所以將採取局部最佳化的方式進行。雖然獎勵函數與重新平衡頻率也有可能會影響 CNN 神經網路最佳參數的選擇，然而如果每組獎勵函數與重新平衡頻率，皆進行 CNN 神經網路參數的調整，將會使整體實驗的數量龐大許多。因此為了簡化實驗過程，本研究將透過上述所提之階段式的方式來進行實驗。

本研究發現兩模型皆較適合淺層的卷積層，而DDPG模型在卷積層中適合使用較多的卷積核，PPO模型在卷積層中適合使用較少的卷積核，另外，兩模型皆較適合淺層的全連接層與較少的神經元。

本研究將平均投資報酬變化率、波動性、夏普率、最大回撤與年均複合增長率作為獎勵函數的因子，並比較了模型的交易績效，發現DDPG與PPO模型內最適當的獎勵函數皆為平均投資報酬變化率。此外，本研究也發現儘管使用了最適當的獎勵函數，但若DDPG與PPO模型內的CNN神經網路參數組合不適當，依然無法獲得良好的交易績效。因此本研究評斷CNN神經網路參數與獎勵函數，皆對深度強化學習模型的交易績效有重要的影響力。

而波動性、夏普率、最大回撤與年均複合增長率，雖然在衡量整體交易策略績效與風險時是重要且適當的指標，但作為獎勵函數時無法使深度強化學習模型獲得良好的學習。因此以深度強化學習模型來進行資產配置最佳化時，獎勵函數考慮的因子必須審慎考量，並透過合理的實驗仔細尋找。

摘要(英)

Asset allocation is an important issue in the field of financial investment. Through a reasonable and effective asset allocation strategy, investors can adjust the proportion of funds between different financial commodities, maximizing returns while suppressing risks. In traditional asset allocation research, the Mean-Variance Model proposed by Markowitz is often used. However, when dealing with time series data of financial products, it lacks sufficient nonlinear performance and is not good at handling the dynamics of financial markets.

In recent years, deep reinforcement learning has emerged, and some researchers have begun to use deep reinforcement learning to deal with asset allocation issues. In the current related research, the reward function in the model is mostly simply calculated the amount of change in investment returns, and the risk has not been considered. However, risk is an important aspect that needs to be considered in asset allocation strategies.

This study uses a deep reinforcement learning model to study the optimization of asset allocation, which is mainly divided into two major stages. The first stage is the CNN neural network parameter adjustment in the deep reinforcement learning model, and the second stage is to compare the seven sets of reward functions and four The impact of this rebalancing frequency on model transaction performance.

In the first stage, this study designed two models with CNN and DDPG and PPO algorithms respectively. Through the test of various parameter combinations, we discussed how the parameters of CNN neural network in deep reinforcement learning affect the transaction performance of the model.

In the second stage, the best parameter combination obtained after the experiment is used to compare the transaction performance of the model through seven sets of reward functions and four rebalance frequency tests to find a reward function factor that is more suitable for asset allocation.

The DDPG and PPO models in this study need to be adjusted because of the large number of parameters that need to be adjusted. Although the reward function and rebalance frequency may also affect the selection of the best parameters of the CNN neural network, if each group of reward function and rebalance frequency are adjusted by the CNN neural network parameters, the overall number of experiments will be huge a lot of. Therefore, in order to simplify the experimental process, this study will carry out the experiment through the above-mentioned phased approach.

This study found that both models are more suitable for shallow convolutional layers, while the DDPG model is suitable for using more convolution kernels in the convolutional layer, and the PPO model is suitable for using fewer convolution kernels in the convolutional layer. In addition, both models It is more suitable for shallow fully connected layers and fewer neurons.

In this study, the average rate of return on investment returns, volatility, Sharpe rate, maximum drawdown and average annual compound growth rate are used as the factors of the reward function, and the transaction performance of the model is compared. The most appropriate reward functions in the DDPG and PPO models are found. It is the rate of change of the average investment return. In addition, this study also found that despite the use of the most appropriate reward function, if the combination of CNN neural network parameters in the DDPG and PPO models is not appropriate, good transaction performance cannot be obtained. Therefore, this study judges the CNN neural network parameters and reward function, which have an important influence on the transaction performance of the deep reinforcement learning model.

While volatility, Sharpe rate, max drawdown and compound annual growth rate are important and appropriate indicators when measuring the overall trading strategy performance and risk, they cannot make deep reinforcement learning models obtain good learning when used as a reward function. Therefore, when using deep reinforcement learning models to optimize asset allocation, the factors considered by the reward function must be carefully considered and carefully searched through reasonable experiments.

關鍵字(中)

★ 深度強化學習
★ 資產配置最佳化
★ CNN
★ DDPG
★ PPO

關鍵字(英)

★ Deep Reinforcement Learning
★ Asset Allocation Optimization
★ CNN
★ DDPG
★ PPO

論文目次

摘要 i
Abstract iii
致謝 v
目錄 vi
圖目錄 viii
表目錄 ix
第一章、緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
第二章、文獻探討 4
2.1 交易策略績效評估 4
2.2 資產配置 5
2.3 資金配置法 5
2.4 神經網路 6
2.5 深度學習 6
2.5.1 卷積神經網路 7
2.5.2 激勵函數 8
2.6 強化學習 9
2.6.1 基於價值的深度強化學習 10
2.6.2 基於策略的深度強化學習 10
2.6.3 Actor-Critic 11
第三章、系統設計與實作 12
3.1 系統流程與架構 12
3.2 資產配置模型設計與流程 13
3.2.1 實驗交易商品的資料集 15
3.2.2 買入持有交易策略設計 16
3.2.3 權重配置模型設計與流程 17
3.3 獎勵函數因子與重新平衡頻率之調整 23
3.4 資產配置模型之整體評估 25
第四章、系統驗證與結果 26
4.1 系統流程與驗證 26
4.2 調整權重配置模型參數與元件 (丁) 31
4.2.1 調整DDPG模型之卷積層與卷積核 (丁-CL1) 31
4.2.2 調整DDPG模型之全連接層與神經元 (丁-FL1) 35
4.2.3 調整PPO模型之卷積層與卷積核 (丁-CL1) 38
4.2.4 調整PPO模型之全連接層與神經元 (丁-FL1) 41
4.3 調整權重配置模型獎勵函數與重新平衡頻率 (戊) 44
4.3.1 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF1) 45
4.3.2 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF2) 47
4.3.3 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF3) 50
4.3.4 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF4) 53
4.3.5 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF5) 57
4.3.6 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF6) 61
4.3.7 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF7) 65
4.3.8 權重配置模型之獎勵函數與重新平衡頻率的整體評估 (戊) 70
4.4 調整平均投資報酬變化率之週期 (戊) 72
4.4.1 調整平均投資報酬變化率之週期(戊-DDPG模型) 74
4.4.2 調整平均投資報酬變化率之週期(戊-PPO模型) 75
4.5 驗證與評估模型有效性 (己) 76
4.5.1 驗證與評估模型有效性(戊-DDPG模型，台灣50) 77
4.5.2 驗證與評估模型有效性(戊-DDPG模型，中型100) 79
4.5.3 驗證與評估模型有效性(戊-PPO模型，台灣50) 81
4.5.4 驗證與評估模型有效性(戊-PPO模型，中型100) 83
4.5.5 驗證與評估模型有效性之整體評估 85
第五章、結論 86
5.1 結論 86
5.2 研究限制 88
5.3 未來展望 88
參考文獻 89

參考文獻

[1]. Markowitz, H. M., 1968, Portfolio selection: efficient diversification of
investments, Yale university press, Vol. 16
[2]. Akita, R., 2016, Deep learning for stock prediction using numerical and textual information, IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1-6
[3]. Jiang, Z., 2017, Cryptocurrency Portfolio Management with Deep Reinforcement Learning, from https://arxiv.org/abs/1612.01277
[4]. Liang, Z., 2018, Adversarial Deep Reinforcement Learning in Portfolio Management, from https://arxiv.org/abs/1808.09940
[5]. Narang, R. K., 2009, Inside the Black Box: The Simple Truth about Quantitative Trading. Description Based on Print Version Record: J. Wiley et Sons.
[6]. Xiao, L., 2017, A Secure Mobile Crowdsensing Game with Deep Reinforcement Learning, IEEE Transactions on Information Forensics and Security, Vol.13, pp.35-47
[7]. Liu, Z., 2019, Towards Understanding Chinese Checkers with Heuristics, Monte Carlo Tree Search, and Deep Reinforcement Learning, from https://arxiv.org/abs/1903.01747
[8]. Sutton, R., and Barto, A., 2014, Reinforcement Learning, Reinforcement Learning: An Introduction, pp.2-5
[9]. William F., 1992, Asset Allocation: Management Style and Performance Measurment, Journal of Portfolio Management, pp. 7-19
[10]. Meucci, A., 2009, Risk and asset allocation
[11]. Chollet, F., 2017, The Fundamental of Deep Learning, Deep Learning with Python.
[12]. Cortes, C., 2012, L2 Regularization for Learning Kernels, from https://arxiv.org/abs/1205.2653
[13]. Mahmood, H., 2019, Gradient Descent, from https://towardsdatascience.com/gradient-descent-3a7db7520711
[14]. Pai, A., 2020, Analyzing 3 Types of Neural Networks in Deep Learning, from https://www.analyticsvidhya.com/blog/2020/02/cnn-vs-rnn-vs-mlp-analyzing-3-types-of-neural-networks-in-deep-learning
[15]. Zhan, R., 2017, CS221 Project Final Report Deep Reinforcement Learning in Portfolio Management, from https://pdfs.semanticscholar.org/ec54/b8edf44070bc3166084f59ac9372176d7d86.pdf
[16]. Saha, S., 2018, A Comprehensive Guide to Convolutional Neural Networks, from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
[17]. Shih, M., 2019, Covolutional Neural Networks, from https://shihs.github.io/blog/machine%20learning/2019/02/25/Machine-Learning-Covolutional-Neural-Networks(CNN)
[18]. Krizhevsky, A., Sutskever, I., and Hinton, E., 2012, ImageNet Classification with Deep Convolutional Neural Network, Proceedings of the 25th International Conference on Neural Information Processing Systems (NOPS’12), Vol.1, pp.1097-1105
[19]. Perera, S., 2019, An introduction to Reinforcement Learning, from https://towardsdatascience.com/an-introduction-to-reinforcement-learning-1e7825c60bbe
[20]. Mnih, V., 2015, Human-level control through deep reinforcement learning, Proceedings of Nature, Vol.518, pp.529-533
[21]. Hasselt, H., Guez, A., and Silver, D., 2015, Deep Reinforcement Learning with Double Q-learning, from https://arxiv.org/abs/1509.06461
[22]. Wang, Z., 2016, Dueling Network Architectures for Deep Reinforcement Learning, from https://arxiv.org/abs/1511.06581
[23]. Sutton, R., 2000, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Proceedings of the 12th International Conference on Neural Information Processing Systems, pp.1057-1063
[24]. Rosenstein, M., 2004, Supervised Actor-Critic Reinforcement Learning, from https://www-anw.cs.umass.edu/pubs/2004/rosenstein_b_ADP04.pdf
[25]. Silver, D., 2014, Deterministic Policy Gradient Algorithms, Proceedings of International Conference on Machine Learning, Vol.32.
[26]. Schulman, J., 2017, Proximal Policy Optimization Algorithms, from https://arxiv.org/abs/1707.06347
[27]. Jiang, Z., 2017, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem, from https://arxiv.org/abs/1706.10059
[28]. TensorLayer, https://github.com/tensorlayer/tensorlayer/tree/master/examples/reinforcement_learning
[29]. Szegedy, C., 2015, Going deeper with convolutions, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9

指導教授

許智誠

審核日期

2020-6-29

推文