以強化學習多代理人架構為基礎之動態資產配置

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：22

、訪客IP：18.116.65.119

姓名

張譽騰(Yu-Teng Chang) 查詢紙本館藏

畢業系所

資訊管理學系在職專班

論文名稱

以強化學習多代理人架構為基礎之動態資產配置
(Dynamic Asset Allocation Based on Reinforcement Learning with Multi-Agent Architecture)

相關論文

★ 台灣50走勢分析：以多重長短期記憶模型架構為基礎之預測	★ 以多重遞迴歸神經網路模型為基礎之黃金價格預測分析
★ 增量學習用於工業4.0瑕疵檢測	★ 遞回歸神經網路於電腦零組件銷售價格預測之研究
★ 長短期記憶神經網路於釣魚網站預測之研究	★ 基於深度學習辨識跳頻信號之研究
★ Opinion Leader Discovery in Dynamic Social Networks	★ 深度學習模型於工業4.0之機台虛擬量測應用
★ A Novel NMF-Based Movie Recommendation with Time Decay	★ 以類別為基礎sequence-to-sequence模型之POI旅遊行程推薦
★ A DQN-Based Reinforcement Learning Model for Neural Network Architecture Search	★ Neural Network Architecture Optimization Based on Virtual Reward Reinforcement Learning
★ 生成式對抗網路架構搜尋	★ 以漸進式基因演算法實現神經網路架構搜尋最佳化
★ Enhanced Model Agnostic Meta Learning with Meta Gradient Memory	★ 遞迴類神經網路結合先期工業廢水指標之股價預測研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-7-1以後開放)

摘要(中)

本研究提出了一種基於強化學習和多代理人架構的動態資產配置方法。在建構典型的股債資產配置投資組合後，採用多代理人架構，由資金管理代理人控管投資組合的資產配置，並將資金交由股票交易代理人及債券交易代理人分別進行股票ETF及債券ETF之交易。此架構促進專業分工，使各代理人能專注其特定任務，提升學習效率和決策品質。研究選用適用於連續動作空間的DDPG演算法實施細膩且精準的動態資產配置。

摘要(英)

This study proposes a dynamic asset allocation method based on reinforcement learning and a multi-agent framework. After constructing a typical stock-bond asset allocation portfolio, a multi-agent framework is adopted, where a fund management agent controls the asset allocation of the portfolio, and then allocates funds to stock trading agent and bond trading agent to trade stock ETFs and bond ETFs, respectively. This framework promotes professional specialization, allowing each agent to focus on their specific tasks, thereby enhancing learning efficiency and decision-making quality. The DDPG algorithm, suitable for continuous action spaces, is selected to implement fine and precise dynamic asset allocation.

關鍵字(中)

★ 強化學習
★ 多代理人
★ 動態資產配置
★ DDPG

關鍵字(英)

★ Reinforcement Learning
★ Multi-agent
★ Dynamic Asset Allocation
★ DDPG

論文目次

摘要 i
ABSTRACT ii
目錄 iii
圖目錄 v
表目錄 vi
第一章緒論 1
1.1 研究背景 1
1.2 研究動機與目的 2
1.3 研究架構 4
第二章文獻探討 6
2.1 資產配置相關研究 6
2.2 強化學習 8
2.3 強化學習在資產配置的應用 11
第三章研究方法 14
3.1 資產選擇及前處理 14
3.2 多代理人模型架構 16
3.3 強化學習演算法 18
第四章實驗結果與分析 22
4.1 資料集介紹 22
4.2 模型建立與比較模型 23
4.3 模型績效驗證 24
4.4 視窗大小對模型影響分析 26
4.5 獎勵設定對模型影響討論 28
4.6 實例分析 29
第五章結論與未來研究方向 32
5.1 研究結論 32
5.2 研究限制 32
5.3 未來研究方向 33
第六章參考文獻 35

參考文獻

[1]H. Markowitz, "Portfolio Selection," The Journal of Finance, vol. 7, no.1, 77–91, 1952.
[2]許家榮，「ETF於多元資產配置的應用」，證券服務，(624)，44-46，2014。
[3]徐曉薇，「機構投資人ETF資產配置新趨勢－出席2012首爾全球ETF論壇」，證交資料，(608)，52-58，2012。
[4]邱元贊，「多元創新　指數投資再上層樓」，證券服務，(683)，5-8，2021。
[5]F. Black & R. Litterman, "Global portfolio optimization," Financial analysts journal, vol. 48, no. 5, pp. 28-43, 1992.
[6]張士傑、杜昌燁，「最適資產配置問題：考慮資產流動性」，風險管理學報，20(2)，85-105，2018。
[7]陳信宏、韋伯韜、蔡憲唐、傳懷慧，「應用時間序列ARMA模型於資産配置之研究」，中國統計學報，43(1)，15-31，2005。
[8]S. Basak & A. Shapiro, "Value-at-risk-based risk management: optimal policies and asset prices," The review of financial studies, vol. 14, no. 2, pp. 371-405, 2001.
[9]顔錫銘、李美杏，「考慮極値與VaR限制之最適資產配置」，臺大管理論叢，17(2)，41-68，2007。
[10]R. G. Ibbotson, "The importance of asset allocation," Financial Analysts Journal, vol. 66, no. 2, pp. 18-20, 2010.
[11]J. Bender, J. Le Sun, & R. Thomas, "Asset Allocation vs. Factor Allocation—Can We Build a Unified Method?," Journal of Portfolio Management, vol. 45, no. 2, pp. 9-22, 2019.
[12]R. S. Sutton and A. G. Barto, "Reinforcement learning: An introduction," Robotica, vol. 17, no. 2, pp. 229-235, 1999.
[13]C. J. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, pp. 279-292, 1992.
[14]H. Hasselt, "Double Q-learning," in Advances in Neural Information Processing Systems, vol. 23, 2010.
[15]R. Neuneier, "Enhancing Q-learning for optimal asset allocation," in Advances in Neural Information Processing Systems, vol. 10, 1997.
[16]R. Neuneier, "Optimal asset allocation using adaptive dynamic programming," in Advances in Neural Information Processing Systems, vol. 8, 1995.
[17]J. W. Lee, J. Park, J. O., J. Lee, and E. Hong, "A multiagent approach to Q-learning for daily stock trading," IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 37, no. 6, pp. 864-877, 2007.
[18]V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, ..., & D. Hassabis, "Human-level control through deep reinforcement learning," nature, vol. 518, no. 7540, pp. 529-533, 2015.
[19]D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, ..., & D. Hassabis, "Mastering the game of Go with deep neural networks and tree search," nature, vol. 529, no. 7587, pp. 484-489, 2016.
[20]D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, ..., & D. Hassabis, "Mastering the game of go without human knowledge," nature, vol. 550, no. 7676, pp. 354-359, 2017.
[21]S. Levine, C. Finn, T. Darrell, & P. Abbeel, "End-to-end training of deep visuomotor policies," Journal of Machine Learning Research, vol. 17, no. 39, pp. 1-40, 2016.
[22]L. Chen and Q. Gao, "Application of Deep Reinforcement Learning on Automated Stock Trading," 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), pp. 29-33, Beijing, China, 2019.
[23]M. Liu, F. Yu, Y. Teng, V. Leung, M. Song, “Performance optimization for Blockchain-enabled Industrial Internet of Things (IIoT) systems: A Deep Reinforcement learning approach,” IEEE Transactions on Industrial Informatics., vol. 15, no. 6, pp. 3559–3570, 2019.
[24]Z. Cao, P. Zhou, R. Li, S. Huang, D. Wu, “Multiagent Deep Reinforcement learning for joint multichannel access and task offloading of mobile-edge computing in industry 4.0,” IEEE Internet Things Journal, vol. 7, no. 7, pp. 6201–6213, 2020.
[25]B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing Neural Network Architectures using Reinforcement Learning,” In Proceedings of the 5th International Conference on Learning Representations (ICLR’17), 2017.
[26]T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, ... & D. Wierstra, "Continuous control with deep reinforcement learning," arXiv:1509.02971, 2015.
[27]X. Wu, S. Liu, T. Zhang, L. Yang, Y. Li and T. Wang, "Motion Control for Biped Robot via DDPG-based Deep Reinforcement Learning," 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA), pp. 40-45, Beijing, China, 2018.
[28]Y. Dong and X. Zou, "Mobile Robot Path Planning Based on Improved DDPG Reinforcement Learning Algorithm," 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), pp. 52-56, Beijing, China, 2020.
[29]S. Wang, D. Jia, & X. Weng, "Deep reinforcement learning for autonomous driving," arXiv:1811.11329, 2018.
[30]C. C. Chang, J. Tsai, J. H. Lin, & Y. M. Ooi, "Autonomous driving control using the DDPG and RDPG algorithms," Applied Sciences, vol. 11, no. 22, p. 10659, 2021.
[31]Z. Liu, Y. Liu, H. Xu, S. Liao, K. Zhu, & X. Jiang, "Dynamic economic dispatch of power system based on DDPG algorithm," Energy Reports, vol. 8, pp. 1122-1129, 2022.
[32]X. Y. Liu, Z. Xiong, S. Zhong, H. Yang, & A. Walid, "Practical deep reinforcement learning approach for stock trading," arXiv preprint arXiv:1811.07522, 2018.
[33]L. Conegundes and A. C. M. Pereira, "Beating the Stock Market with a Deep Reinforcement Learning Day Trading System," 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1-8, Glasgow, UK, 2020.
[34]王博弘，「運用深度強化式學習建置智慧型股票預測系統」，國立成功大學，碩士論文，2022。
[35]F. Lin, M. Wang, R. Liu and Q. Hong, "A DDPG Algorithm for Portfolio Management," 2020 19th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), pp. 222-225, Xuzhou, China, 2020.
[36]H. Zhang, Z. Jiang and J. Su, "A Deep Deterministic Policy Gradient-based Strategy for Stocks Portfolio Management," 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA), pp. 230-238, Xiamen, China, 2021.
[37]Z. Jiang, D. Xu, and J. Liang, "A deep reinforcement learning framework for the financial portfolio management problem," arXiv:1706.10059, 2017.
[38]Z. Wang, S. Jin and W. Li, "Research on Portfolio Optimization Based on Deep Reinforcement Learning," 2022 4th International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), pp. 391-395, Shanghai, China, 2022.
[39]劉上瑋，「深度增強學習在動態資產配置上之應用—以美國ETF為例」，國立政治大學，碩士論文，2017。
[40]孟繁淳，「運用DDPG與PPO深度強化學習於資產配置最佳化的研究」，國立中央大學，碩士論文，2020。
[41]林冠宇，「應用強化學習與卷積神經網路於投資組合配置」，國立政治大學，碩士論文，2022。
[42]W. F. Sharpe, "Mutual fund performance," The Journal of Business, vol. 39, no. 1, pp. 119-138, 1966.
[43]M. Magdon-Ismail and A. F. Atiya, "Maximum drawdown," Risk Magazine, vol. 17, no. 10, pp. 99-102, 2004.

指導教授

陳以錚

審核日期

2024-7-6

推文