利用強化學習探索可再生能源交易市場中的參與者策略

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：64

、訪客IP：3.129.194.133

姓名

賴龍斌(Long-Bin Lai) 查詢紙本館藏

畢業系所

統計研究所

論文名稱

利用強化學習探索可再生能源交易市場中的參與者策略
(Exploring Participant Strategies in Renewable Energy Trading Markets Using Reinforcement Learning)

相關論文

★ Q學習結合監督式學習在股票市場的應用	★ 基於Q-learning與非監督式學習之交易策略
★ 視覺化股票市場之狀態變動	★ SNF效應的理論解釋和高影響力聚類特徵的識別
★ 基於I-score和Q-learning的投資組合	★ 軟訊息下的滯後多元貝氏結構GARCH模型及其應用
★ 基於動態網絡和vine copula的投資組合優化

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2029-8-1以後開放)

摘要(中)

本文探討了能源市場中的拍賣行為，使用多代理模型進行模擬。我們將電力供應商和消費者建模為自主代理，他們在多代理環境中做出決策以最大化其效用。然而，由於代理之間的信息不足，每個代理都難以實現其最佳決策。為了解決這個問題，我們提出使用納許Q學習，它結合了納許均衡和Q學習，以在考慮其他代理出價行為的同時最大化每個參與者的效用。在多個案例研究中，我們證明了納許Q學習算法能夠確保參與者最終達到納許均衡。

摘要(英)

This paper explores auction behavior in the energy market using a multi-agent model. We model electricity suppliers and consumers as autonomous agents who make decisions to maximize their utilities in a multi-agent environment. However, due to insufficient information between the agents, each agent faces difficulty achieving his/her optimal decision. To address this issue, we propose using Nash Q-learning, consisting of Nash equilibrium and Q-learning, to maximize each participant′s utility while considering the bidding behavior of the other agents. In several case studies, we demonstrate that the Nash Q-learning algorithm ensures participants eventually reach the Nash equilibriums.

關鍵字(中)

★ 納許均衡
★ Q學習
★ 強化學習

關鍵字(英)

★ Nash equilibrium
★ Q-learning
★ reinforcement learning

論文目次

摘要 I
Abstract II
致谢辞 III
Contents IV
List of Figures V
List of Tables VII

1 Introduction 1

2 Literature Review 3
2.1 Single-Agent Q-learning 3
2.1.1 Markov Decision Process 3
2.1.2 Reinforcement Learning 4
2.2 Nash Equilibrium 4
2.3 Nash Q-learning 5
2.4 Utility Function 5

3 Environment and Market Design 6
3.1 Environment Design 6
3.2 Market Design 6
3.3 Agent Design 7

4 Numerical Study 9
4.1 Dataset 9
4.2 The Result of Nash Q-learning 13
4.2.1 Nash Equilibrium 13
4.2.2 All Possible 16
4.2.3 Comparison 17
4.3 Translation by Utility Function 17
4.4 The Result of Multi-agent Q-learning 21

5 Conclusion and Discussion 23

References 33

參考文獻

An, B., Gatti, N., and Lesser, V. (2016). Alternating-offers bargaining in one-to-many and many-to-many settings. Annals of Mathematics and Artificial Intelligence, 77, 67-103.

Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 679-684.

Berezvai, Z., Hortay, O., and Szőke, T. (2022). The impact of COVID-19 measures on intraday electricity load curves in the European Union: A panel approach. Sustainable Energy,
Grids and Networks, 32, 100930.

Bugera, V., Konno, H., and Uryasev, S. (2002). Credit cards scoring with quadratic utility functions. Journal of Multi‐Criteria Decision Analysis, 11(4‐5), 197-211.

Christensen, L. R., Jorgenson, D. W., and Lau, L. J. (1975). Transcendental logarithmic utility functions. The American Economic Review, 65(3), 367-383.

Devraj, A. M., and Meyn, S. P. (2017). Fastest convergence for Q-learning. arXiv preprint 1707.03770.

Fink, A. M. (1964). Equilibrium in a stochastic n-person game. Journal of Science of the Hiroshima University, Series ai (Mathematics), 28(1), 89-93.

Foruzan, E., Soh, L. K., and Asgarpoor, S. (2018). Reinforcement learning approach for optimal distributed energy management in a microgrid. IEEE Transactions on Power Systems, 33(5), 5749-5758.

Gerber, H. U., and Pafum, G. (1998). Utility functions: from risk theory to finance. North American Actuarial Journal, 2(3), 74-91.

Hu, J., and Wellman, M. P. (1998). Multiagent reinforcement learning: theoretical framework and an algorithm. ICML, 98, 242-250.

Hu, J., and Wellman, M. P. (2003). Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4, 1039-1069.

Myerson, R. B. (1978). Refinements of the Nash equilibrium concept. International Journal of Game Theory, 7, 73-80

Navarro-González, F. J., and Villacampa, Y. (2021). A foundation for logarithmic utility function of money. Mathematics, 9(6), 665

Orsborn, S., Cagan, J., and Boatwright, P. (2009). Quantifying aesthetic form preference in a utility function. Journal of Mechanical Design, 131(6), 0610011-06100110

Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley and Sons.

Soeryana, E., Fadhlina, N., Rusyaman, E., and Supian, S. (2017). Mean-variance portfolio optimization by using time series approaches based on logarithmic utility function. IOP Conference Series: Materials Science and Engineering, 166(1), 012003.

Vandael, S., Claessens, B., Ernst, D., Holvoet, T., and Deconinck, G. (2015). Reinforcement learning of heuristic EV fleet charging in a day-ahead electricity market. IEEE Transactions on Smart Grid, 6(4), 1795-1805.

Watkins, C. J., and Dayan, P. (1992). Q-learning. Machine learning, 8, 279-292.

Shen, S., Wu, X., Sun, P., Zhou, H., Wu, Z., and Yu, S. (2023). Optimal privacy preservation strategies with signaling Q-learning for edge-computing-based IoT resource grant systems. Expert Systems with Applications, 225, 120192.

指導教授

黃士峰(Shih-Feng Huang)

審核日期

2024-7-11

推文