參考文獻 |
An, B., Gatti, N., and Lesser, V. (2016). Alternating-offers bargaining in one-to-many and many-to-many settings. Annals of Mathematics and Artificial Intelligence, 77, 67-103.
Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 679-684.
Berezvai, Z., Hortay, O., and Szőke, T. (2022). The impact of COVID-19 measures on intraday electricity load curves in the European Union: A panel approach. Sustainable Energy,
Grids and Networks, 32, 100930.
Bugera, V., Konno, H., and Uryasev, S. (2002). Credit cards scoring with quadratic utility functions. Journal of Multi‐Criteria Decision Analysis, 11(4‐5), 197-211.
Christensen, L. R., Jorgenson, D. W., and Lau, L. J. (1975). Transcendental logarithmic utility functions. The American Economic Review, 65(3), 367-383.
Devraj, A. M., and Meyn, S. P. (2017). Fastest convergence for Q-learning. arXiv preprint 1707.03770.
Fink, A. M. (1964). Equilibrium in a stochastic n-person game. Journal of Science of the Hiroshima University, Series ai (Mathematics), 28(1), 89-93.
Foruzan, E., Soh, L. K., and Asgarpoor, S. (2018). Reinforcement learning approach for optimal distributed energy management in a microgrid. IEEE Transactions on Power Systems, 33(5), 5749-5758.
Gerber, H. U., and Pafum, G. (1998). Utility functions: from risk theory to finance. North American Actuarial Journal, 2(3), 74-91.
Hu, J., and Wellman, M. P. (1998). Multiagent reinforcement learning: theoretical framework and an algorithm. ICML, 98, 242-250.
Hu, J., and Wellman, M. P. (2003). Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4, 1039-1069.
Myerson, R. B. (1978). Refinements of the Nash equilibrium concept. International Journal of Game Theory, 7, 73-80
Navarro-González, F. J., and Villacampa, Y. (2021). A foundation for logarithmic utility function of money. Mathematics, 9(6), 665
Orsborn, S., Cagan, J., and Boatwright, P. (2009). Quantifying aesthetic form preference in a utility function. Journal of Mechanical Design, 131(6), 0610011-06100110
Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley and Sons.
Soeryana, E., Fadhlina, N., Rusyaman, E., and Supian, S. (2017). Mean-variance portfolio optimization by using time series approaches based on logarithmic utility function. IOP Conference Series: Materials Science and Engineering, 166(1), 012003.
Vandael, S., Claessens, B., Ernst, D., Holvoet, T., and Deconinck, G. (2015). Reinforcement learning of heuristic EV fleet charging in a day-ahead electricity market. IEEE Transactions on Smart Grid, 6(4), 1795-1805.
Watkins, C. J., and Dayan, P. (1992). Q-learning. Machine learning, 8, 279-292.
Shen, S., Wu, X., Sun, P., Zhou, H., Wu, Z., and Yu, S. (2023). Optimal privacy preservation strategies with signaling Q-learning for edge-computing-based IoT resource grant systems. Expert Systems with Applications, 225, 120192. |