運用DDPG與PPO深度強化學習於資產配置最佳化的研究;Using Deep Reinforcement Learning on Asset Allocation Optimization via Deep Deterministic Policy Gradient and Proximal Policy Optimization

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/83970

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/83970

題名:	運用DDPG與PPO深度強化學習於資產配置最佳化的研究;Using Deep Reinforcement Learning on Asset Allocation Optimization via Deep Deterministic Policy Gradient and Proximal Policy Optimization
作者:	孟繁淳;Meng, Fan-Chun
貢獻者:	資訊管理學系
關鍵詞:	深度強化學習;資產配置最佳化;CNN;DDPG;PPO;Deep Reinforcement Learning;Asset Allocation Optimization;CNN;DDPG;PPO
日期:	2020-06-29
上傳時間:	2020-09-02 17:48:02 (UTC+8)
出版者:	國立中央大學
摘要:	資產配置在金融投資領域有著重要的地位，通過合理而有效的資產配置策略，投資者可以在不同的金融商品間進行資金比例的調節，在抑制風險的同時最大化回報。在傳統的資產配置研究中，常使用 Markowitz 所提出的均值方差模型(Mean-Variance Model)，然而其在處理金融產品的時間序列數據時，缺乏足夠的非線性表現能力，且不善於處理金融市場的動態性。近年來深度強化學習興起，開始有研究者利用深度強化學習來處理資產配置問題，在目前的相關研究中，針對模型內的獎勵函數，多為單純計算投資報酬的變化量，皆尚未考慮風險的因素，然而風險是資產配置策略所需要考慮的重要面相，其在實務上也有一定的重要性。本研究使用深度強化學習模型來研究資產配置最佳化問題，主要分為兩大階段，第一階段為深度強化學習模型內 CNN 神經網路參數調整，第二階段為比較七組獎勵函數與四種重新平衡頻率，對模型交易績效的影響。第一階段，本研究以 CNN 分別搭配 DDPG 以及 PPO 演算法設計兩模型，透過各項參數組合之測試，探討深度強化學習內的 CNN 神經網路參數如何影響模型的交易績效。第二階段，將以實驗後得到之最佳參數組合，透過七組獎勵函數與四種重新平衡頻率之測試，比較模型的交易績效，找尋更適用於資產配置之獎勵函數因子。本研究之權重配置模型(DDPG 與 PPO 模型)因為需要調整的參數較多，所以將採取局部最佳化的方式進行。雖然獎勵函數與重新平衡頻率也有可能會影響 CNN 神經網路最佳參數的選擇，然而如果每組獎勵函數與重新平衡頻率，皆進行 CNN 神經網路參數的調整，將會使整體實驗的數量龐大許多。因此為了簡化實驗過程，本研究將透過上述所提之階段式的方式來進行實驗。本研究發現兩模型皆較適合淺層的卷積層，而DDPG模型在卷積層中適合使用較多的卷積核，PPO模型在卷積層中適合使用較少的卷積核，另外，兩模型皆較適合淺層的全連接層與較少的神經元。本研究將平均投資報酬變化率、波動性、夏普率、最大回撤與年均複合增長率作為獎勵函數的因子，並比較了模型的交易績效，發現DDPG與PPO模型內最適當的獎勵函數皆為平均投資報酬變化率。此外，本研究也發現儘管使用了最適當的獎勵函數，但若DDPG與PPO模型內的CNN神經網路參數組合不適當，依然無法獲得良好的交易績效。因此本研究評斷CNN神經網路參數與獎勵函數，皆對深度強化學習模型的交易績效有重要的影響力。而波動性、夏普率、最大回撤與年均複合增長率，雖然在衡量整體交易策略績效與風險時是重要且適當的指標，但作為獎勵函數時無法使深度強化學習模型獲得良好的學習。因此以深度強化學習模型來進行資產配置最佳化時，獎勵函數考慮的因子必須審慎考量，並透過合理的實驗仔細尋找。;Asset allocation is an important issue in the field of financial investment. Through a reasonable and effective asset allocation strategy, investors can adjust the proportion of funds between different financial commodities, maximizing returns while suppressing risks. In traditional asset allocation research, the Mean-Variance Model proposed by Markowitz is often used. However, when dealing with time series data of financial products, it lacks sufficient nonlinear performance and is not good at handling the dynamics of financial markets. In recent years, deep reinforcement learning has emerged, and some researchers have begun to use deep reinforcement learning to deal with asset allocation issues. In the current related research, the reward function in the model is mostly simply calculated the amount of change in investment returns, and the risk has not been considered. However, risk is an important aspect that needs to be considered in asset allocation strategies. This study uses a deep reinforcement learning model to study the optimization of asset allocation, which is mainly divided into two major stages. The first stage is the CNN neural network parameter adjustment in the deep reinforcement learning model, and the second stage is to compare the seven sets of reward functions and four The impact of this rebalancing frequency on model transaction performance. In the first stage, this study designed two models with CNN and DDPG and PPO algorithms respectively. Through the test of various parameter combinations, we discussed how the parameters of CNN neural network in deep reinforcement learning affect the transaction performance of the model. In the second stage, the best parameter combination obtained after the experiment is used to compare the transaction performance of the model through seven sets of reward functions and four rebalance frequency tests to find a reward function factor that is more suitable for asset allocation. The DDPG and PPO models in this study need to be adjusted because of the large number of parameters that need to be adjusted. Although the reward function and rebalance frequency may also affect the selection of the best parameters of the CNN neural network, if each group of reward function and rebalance frequency are adjusted by the CNN neural network parameters, the overall number of experiments will be huge a lot of. Therefore, in order to simplify the experimental process, this study will carry out the experiment through the above-mentioned phased approach. This study found that both models are more suitable for shallow convolutional layers, while the DDPG model is suitable for using more convolution kernels in the convolutional layer, and the PPO model is suitable for using fewer convolution kernels in the convolutional layer. In addition, both models It is more suitable for shallow fully connected layers and fewer neurons. In this study, the average rate of return on investment returns, volatility, Sharpe rate, maximum drawdown and average annual compound growth rate are used as the factors of the reward function, and the transaction performance of the model is compared. The most appropriate reward functions in the DDPG and PPO models are found. It is the rate of change of the average investment return. In addition, this study also found that despite the use of the most appropriate reward function, if the combination of CNN neural network parameters in the DDPG and PPO models is not appropriate, good transaction performance cannot be obtained. Therefore, this study judges the CNN neural network parameters and reward function, which have an important influence on the transaction performance of the deep reinforcement learning model. While volatility, Sharpe rate, max drawdown and compound annual growth rate are important and appropriate indicators when measuring the overall trading strategy performance and risk, they cannot make deep reinforcement learning models obtain good learning when used as a reward function. Therefore, when using deep reinforcement learning models to optimize asset allocation, the factors considered by the reward function must be carefully considered and carefully searched through reasonable experiments.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	127	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....