參考文獻 |
Abe, Naoki, et al. “Cross Channel Optimized Marketing by Reinforcement Learning.” Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’04, ACM Press, 2004, p. 767. DOI.org (Crossref), doi:10.1145/1014052.1016912.
Agra, Agostinho, et al. “The Robust Vehicle Routing Problem with Time Windows.” Computers & Operations Research, vol. 40, no. 3, Mar. 2013, pp. 856–66. DOI.org (Crossref), doi:10.1016/j.cor.2012.10.002.
Applegate, David L., editor. The Traveling Salesman Problem: A Computational Study. Princeton University Press, 2006.
Atienza, Rowel. Advanced Deep Learning with Keras. 2018. Open WorldCat, http://sbiproxy.uqac.ca/login?url=https://international.scholarvox.com/book/88865386.
Avella, Pasquale, et al. “Solving a Fuel Delivery Problem by Heuristic and Exact Approaches.” European Journal of Operational Research, vol. 152, no. 1, Jan. 2004, pp. 170–79. DOI.org (Crossref), doi:10.1016/S0377-2217(02)00676-8.
Baldacci, Roberto, et al. “Recent Exact Algorithms for Solving the Vehicle Routing Problem under Capacity and Time Window Constraints.” European Journal of Operational Research, vol. 218, no. 1, Apr. 2012, pp. 1–6. DOI.org (Crossref), doi:10.1016/j.ejor.2011.07.037.
Bellman, Richard. Dynamic Programming. Dover ed, Dover Publications, 2003.
Bello, Irwan, et al. “Neural Combinatorial Optimization with Reinforcement Learning.” ArXiv:1611.09940 [Cs, Stat], Jan. 2017. arXiv.org, http://arxiv.org/abs/1611.09940.
Boutilier, C., et al. “Decision-Theoretic Planning: Structural Assumptions and Computational Leverage.” Journal of Artificial Intelligence Research, vol. 11, July 1999, pp. 1–94. DOI.org (Crossref), doi:10.1613/jair.575.
Cai, Xiaoyan, et al. “HITS-Based Attentional Neural Model for Abstractive Summarization.” Knowledge-Based Systems, vol. 222, June 2021, p. 106996. DOI.org (Crossref), doi:10.1016/j.knosys.2021.106996.
Clarke, G., and J. W. Wright. “Scheduling of Vehicles from a Central Depot to a Number of Delivery Points.” Operations Research, vol. 12, no. 4, Aug. 1964, pp. 568–81. DOI.org (Crossref), doi:10.1287/opre.12.4.568.
Côté, Jean-François, et al. “The Vehicle Routing Problem with Stochastic Two-Dimensional Items.” Transportation Science, Jan. 2020, p. trsc.2019.0904. DOI.org (Crossref), doi:10.1287/trsc.2019.0904.
Dhingra, Bhuwan, et al. “Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access.” Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 2017, pp. 484–95. DOI.org (Crossref), doi:10.18653/v1/P17-1045.
Dullaert, Wout, and Olli Bräysy. “Routing Relatively Few Customers per Route.” Top, vol. 11, no. 2, Dec. 2003, pp. 325–36. Springer Link, doi:10.1007/BF02579048.
Frank, M. J. “By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism.” Science, vol. 306, no. 5703, Dec. 2004, pp. 1940–43. DOI.org (Crossref), doi:10.1126/science.1102941.
Gaur, Vishal, and Marshall L. Fisher. “A Periodic Inventory Routing Problem at a Supermarket Chain.” Operations Research, vol. 52, no. 6, Dec. 2004, pp. 813–22. DOI.org (Crossref), doi:10.1287/opre.1040.0150.
Goel,Asvin; Gruhn,Volker, “A General Vehicle Routing Problem.” European Journal of Operational Research, vol. 191, no. 3, Dec. 2008, pp. 650–60. www.sciencedirect.com, doi:10.1016/j.ejor.2006.12.065.
Golden, Bruce, et al., editors. The Vehicle Routing Problem: Latest Advances and New Challenges. Springer US, 2008. DOI.org (Crossref), doi:10.1007/978-0-387-77778-8.
Google/or-Tools. 2015. Google, 2021. GitHub, https://github.com/google/or-tools.
Haghani, Ali, and Soojung Jung. “A Dynamic Vehicle Routing Problem with Time-Dependent Travel Times.” Computers & Operations Research, vol. 32, no. 11, Nov. 2005, pp. 2959–86. DOI.org (Crossref), doi:10.1016/j.cor.2004.04.013.
Hinton, Geoffrey E., et al. “A Fast Learning Algorithm for Deep Belief Nets.” Neural Computation, vol. 18, no. 7, July 2006, pp. 1527–54. DOI.org (Crossref), doi:10.1162/neco.2006.18.7.1527.
Hwang, C. P., et al. “A Tour Construction Heuristic for the Travelling Salesman Problem.” Journal of the Operational Research Society, vol. 50, no. 8, Aug. 1999, pp. 797–809. DOI.org (Crossref), doi:10.1057/palgrave.jors.2600761.
Ioffe, Sergey, and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” ArXiv:1502.03167 [Cs], Mar. 2015. arXiv.org, http://arxiv.org/abs/1502.03167.
Jang, Beakcheol, et al. “Q-Learning Algorithms: A Comprehensive Classification and Applications.” IEEE Access, vol. 7, 2019, pp. 133653–67. DOI.org (Crossref), doi:10.1109/ACCESS.2019.2941229.
Kaelbling, L. P., et al. “Reinforcement Learning: A Survey.” ArXiv:Cs/9605103, Apr. 1996. arXiv.org, http://arxiv.org/abs/cs/9605103.
Kober, Jens, et al. “Reinforcement Learning in Robotics: A Survey.” The International Journal of Robotics Research, vol. 32, no. 11, Sept. 2013, pp. 1238–74. DOI.org (Crossref), doi:10.1177/0278364913495721.
La, Prashanth, and Shalabh Bhatnagar. “Reinforcement Learning With Function Approximation for Traffic Signal Control.” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 2, June 2011, pp. 412–21. DOI.org (Crossref), doi:10.1109/TITS.2010.2091408.
Laporte, Gilbert. “The Vehicle Routing Problem: An Overview of Exact and Approximate Algorithms.” European Journal of Operational Research, vol. 59, no. 3, June 1992, pp. 345–58. DOI.org (Crossref), doi:10.1016/0377-2217(92)90192-C.
Lin, S., and B. W. Kernighan. “An Effective Heuristic Algorithm for the Traveling-Salesman Problem.” Operations Research, vol. 21, no. 2, Apr. 1973, pp. 498–516. April 1973, doi:10.1287/opre.21.2.498.
Lingaitienė, Olga, et al. “The Model of Vehicle and Route Selection for Energy Saving.” Sustainability, vol. 13, no. 8, Jan. 2021, p. 4528. www.mdpi.com, doi:10.3390/su13084528.
Peters, Jan. “Policy Gradient Methods.” Scholarpedia, vol. 5, no. 11, Nov. 2010, p. 3698. www.scholarpedia.org, doi:10.4249/scholarpedia.3698.
Pisinger ,David; Ropke,Stefan, “A General Heuristic for Vehicle Routing Problems.” Computers & Operations Research, vol. 34, no. 8, Aug. 2007, pp. 2403–35. www.sciencedirect.com, doi:10.1016/j.cor.2005.09.012.
Powell, W. B., et al. “Approximate Dynac Programming for High Dimensional Resource Allocation Problems.” Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., vol. 5, IEEE, 2005, pp. 2989–94. DOI.org (Crossref), doi:10.1109/IJCNN.2005.1556401.
Proper, Scott, and Prasad Tadepalli. “Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery.” Machine Learning: ECML 2006, edited by Johannes Fürnkranz et al., Springer, 2006, pp. 735–42. Springer Link, doi:10.1007/11871842_74.
Puterman, Martin L. Markov Decision Processes Discrete Stochastic Dynamic Programming. 2014. Open WorldCat, https://nbn-resolving.org/urn:nbn:de:101:1-201411292603.
Puterman, Martin L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 2005.
Ritzinger, Ulrike, et al. “A Survey on Dynamic and Stochastic Vehicle Routing Problems.” International Journal of Production Research, vol. 54, no. 1, Jan. 2016, pp. 215–31. DOI.org (Crossref), doi:10.1080/00207543.2015.1043403.
Sutton, R., and A. Barto. Chapter 12 Time-Derivative Models of Pavlovian Reinforcement. 1990, https://www.semanticscholar.org/paper/Chapter-12-Time-Derivative-Models-of-Pavlovian-Sutton-Barto/78a0be0286ac20ac6ff928b58ca43693526574ea.
Sutton, Richard S., et al. “Policy Gradient Methods for Reinforcement Learning with Function Approximation.” Proceedings of the 12th International Conference on Neural Information Processing Systems, MIT Press, 1999, pp. 1057–63.
Sutton, Richard S., and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
Tesauro, G. “Temporal Difference Learning and TD-Gammon.” ICGA Journal, vol. 18, no. 2, June 1995, pp. 88–88. DOI.org (Crossref), doi:10.3233/ICG-1995-18207.
---. “Temporal Difference Learning and TD-Gammon.” J. Int. Comput. Games Assoc., 1995. Semantic Scholar, doi:10.3233/ICG-1995-18207.
The Interaction of Representations and Planning Objectives for Decision-Theoretic Planning Tasks. http://idm-lab.org/bib/abstracts/Koen02a.html. Accessed 4 July 2021.
Tsitsiklis, J. N., and B. Van Roy. “An Analysis of Temporal-Difference Learning with Function Approximation.” IEEE Transactions on Automatic Control, vol. 42, no. 5, May 1997, pp. 674–90. DOI.org (Crossref), doi:10.1109/9.580874.
van Otterlo, Martijn, and Marco Wiering. “Reinforcement Learning and Markov Decision Processes.” Reinforcement Learning: State-of-the-Art, edited by Marco Wiering and Martijn van Otterlo, Springer, 2012, pp. 3–42. Springer Link, doi:10.1007/978-3-642-27645-3_1.
“What Is the Relation between Q-Learning and Policy Gradients Methods?” Artificial Intelligence Stack Exchange, https://ai.stackexchange.com/questions/6196/what-is-the-relation-between-q-learning-and-policy-gradients-methods. Accessed 5 July 2021.
Wren, Anthony, and Alan Holliday. “Computer Scheduling of Vehicles from One or More Depots to a Number of Delivery Points.” Journal of the Operational Research Society, vol. 23, no. 3, Sept. 1972, pp. 333–44. DOI.org (Crossref), doi:10.1057/jors.1972.53.
Zhao, Yufan, et al. “Reinforcement Learning Design for Cancer Clinical Trials.” Statistics in Medicine, vol. 28, no. 26, Nov. 2009, pp. 3294–315. DOI.org (Crossref), doi:10.1002/sim.3720. |