參考文獻 |
[1] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, and others, "Mastering the game of go without human knowledge," Nature, vol. 550, no. 7676, pp. 354-359, 2017.
[2] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, and others, "Mastering chess and shogi by self-play with a general reinforcement learning algorithm," arXiv preprint arXiv:1712.01815, 2017.
[3] Y. Li, "Deep reinforcement learning: An overview," arXiv preprint arXiv:1701.07274, 2017.
[4] R. Nian, J. Liu, and B. Huang, "A review on reinforcement learning: Introduction and applications in industrial process control," Computers & Chemical Engineering, vol. 139, p. 106886, 2020.
[5] M. Van Otterlo and M. Wiering, "Reinforcement learning and Markov decision processes," in Reinforcement learning: State-of-the-art, pp. 3-42, Springer, 2012.
[6] A. Agarwal, S. M. Kakade, J. D. Lee, and G. Mahajan, "Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes," arXiv preprint arXiv:1908.00261, 2019.
[7] W. Li, F. Zhou, K. R. Chowdhury, and W. Meleis, "QTCP: Adaptive congestion control with reinforcement learning," IEEE Transactions on Network Science and Engineering, vol. 6, no. 3, pp. 445-458, 2018.
[8] Z. Xu, J. Tang, C. Yin, Y. Wang, and G. Xue, "Experience-driven congestion control: When multi-path TCP meets deep reinforcement learning," IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1325-1336, 2019.
[9] C. Tessler, Y. Shpigelman, G. Dalal, A. Mandelbaum, D. Haritan Kazakov, B. Fuhrer, G. Chechik, and S. Mannor, "Reinforcement learning for datacenter congestion control," ACM SIGMETRICS Performance Evaluation Review, vol. 49, no. 2, pp. 43-46, 2022.
[10] F. Ruffy, M. Przystupa, and I. Beschastnikh, "Iroko: A framework to prototype reinforcement learning for data center traffic control," arXiv preprint arXiv:1812.09975, 2018.
[11] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, "Practical black-box attacks against machine learning," in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506-519, 2017.
[12] C. Rudin, "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead," Nature machine intelligence, vol. 1, no. 5, pp. 206-215, 2019.
[13] E. Giunchiglia, M. C. Stoian, and T. Lukasiewicz, "Deep learning with logical constraints," arXiv preprint arXiv:2205.00523, 2022.
[14] B. Hu and J. Li, "An adaptive hierarchical energy management strategy for hybrid electric vehicles combining heuristic domain knowledge and data-driven deep reinforcement learning," IEEE Transactions on Transportation Electrification, vol. 8, no. 3, pp. 3275-3288, 2021.
[15] J. Postel, "Transmission control protocol, " RFC 793, 1981.
[16] M. Allman, V. Paxson, and E. Blanton, "TCP congestion control, " RFC 5681, 2009.
[17] “TCP slow start” Access on: May 18, 2023. [Online]. Available: https://developer.mozilla.org/en-US/docs/Glossary/TCP_slow_start
[18] S. Sah, "Machine learning: a review of learning types," Preprints, 2020.
[19] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems, vol. 12, 1999.
[20] C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, pp. 279-292, 1992.
[21] H. Hasselt, "Double Q-learning," in Advances in neural information processing systems, vol. 23, 2010.
[22] T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al., "Soft actor-critic algorithms and applications," arXiv preprint arXiv:1812.05905, 2018.
[23] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
[24] S. Fujimoto, D. Meger, and D. Precup, "Off-policy deep reinforcement learning without exploration," in International Conference on Machine Learning, pp. 2052-2062, 2019.
[25] C. Gelada and M. G. Bellemare, "Off-policy deep reinforcement learning by bootstrapping the covariate shift," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 3647-3655, 2019.
[26] “The idea behind Actor-Critics and how A2C and A3C improve them” Access on: May 18, 2023. [Online]. Available: https://theaisummer.com/Actor_critics/ #back-to-a2c
[27] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial networks," Communications of the ACM, vol. 63, no. 11, pp. 139-144, 2020.
[28] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, "Generative adversarial networks: An overview," IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 53-65, 2018.
[29] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing, "Harnessing deep neural networks with logic rules," arXiv preprint arXiv:1603.06318, 2016.
[30] T. Dash, S. Chitlangia, A. Ahuja, and A. Srinivasan, "A review of some techniques for inclusion of domain-knowledge into deep neural networks," Scientific Reports, vol. 12, no. 1, p. 1040, 2022.
[31] K. Kurata, G. Hasegawa, and M. Murata, "Fairness comparisons between TCP Reno and TCP Vegas for future deployment of TCP Vegas," in Proceedings of INET, vol. 2000, no. 2.2, pp. 2, 2000.
[32] “TCP Reno and Congestion Management” Access on: May 19, 2023. [Online]. Available: https://intronetworks.cs.luc.edu/current/html/reno.html
[33] S. Ha, I. Rhee, and L. Xu, "CUBIC: a new TCP-friendly high-speed TCP variant," ACM SIGOPS Operating Systems Review, vol. 42, no. 5, pp. 64-74, 2008.
[34] I. Rhee, L. Xu, S. Ha, A. Zimmermann, L. Eggert, and R. Scheffenegger, "CUBIC for fast long-distance networks," RFC 8312, 2018.
[35] N. Jay, N. Rotman, B. Godfrey, M. Schapira, and A. Tamar, "A deep reinforcement learning perspective on internet congestion control," in International Conference on Machine Learning, pp. 3050-3059, 2019.
[36] S. Abbasloo, C.-Y. Yen, and H. J. Chao, "Classic meets modern: A pragmatic learning-based congestion control for the internet," in Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 632-647, 2020.
[37] S. Patro and K. K. Sahu, "Normalization: A preprocessing stage," arXiv preprint arXiv:1503.06462, 2015.
[38] Y. Zheng, H. Chen, Q. Duan, L. Lin, Y. Shao, W. Wang, X. Wang, and Y. Xu, "Leveraging domain knowledge for robust deep reinforcement learning in networking," in IEEE INFOCOM 2021-IEEE Conference on Computer Communications, pp. 1-10, 2021.
[39] I. M. Ozcelik and C. Ersoy, "ALVS: Adaptive Live Video Streaming using deep reinforcement learning," Journal of Network and Computer Applications, vol. 205, pp. 103451, 2022.
[40] H. Mao, R. Netravali, and M. Alizadeh, "Neural adaptive video streaming with Pensieve," in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), pp. 197-210, 2017.
[41] H. Mao, S. B. Venkatakrishnan, M. Schwarzkopf, and M. Alizadeh, "Variance reduction for reinforcement learning in input-driven environments," arXiv preprint arXiv:1807.02264, 2018.
[42] V. Sivakumar, O. Delalleau, T. Rocktäschel, A. H. Miller, H. Küttler, N. Nardelli, M. Rabbat, J. Pineau, and S. Riedel, "Mvfst-rl: An asynchronous RL framework for congestion control with delayed actions," arXiv preprint arXiv:1910.04054, 2019.
[43] G. Chen, "A gentle tutorial of recurrent neural network with error backpropagation," arXiv preprint arXiv:1610.02583, 2016.
[44] “ikostrikov/pytorch-a3c - GitHub” Access on: May 26, 2023. [Online]. Available: https://github.com/ikostrikov/pytorch-a3c
[45] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, "Automatic differentiation in PyTorch," n NIPS Autodiff Workshop, 2017.
[46] P. Gawłowicz and A. Zubow, "ns3-gym: Extending openai gym for networking research," arXiv preprint arXiv:1810.03943, 2018.
[47] “tkn-tub/ns3-gym - GitHub” Access on: June 03, 2023. [Online]. Available: https://github.com/tkn-tub/ns3-gym
[48] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "OpenAI Gym," arXiv preprint arXiv:1606.01540, 2016.
[49] G. F. Riley and T. R. Henderson, "The ns-3 network simulator," in Modeling and Tools for Network Simulation, pp. 15-34, Springer, 2010.
[50] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, "Asynchronous methods for deep reinforcement learning," in International Conference on Machine Learning, pp. 1928-1937, 2016.
[51] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[52] Y. Yu, X. Si, C. Hu, and J. Zhang, "A review of recurrent neural networks: LSTM cells and network architectures," Neural Computation, vol. 31, no. 7, pp. 1235-1270, 2019. |