A scalable approach for product delivery at a supermarket chain

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：43

、訪客IP：3.139.83.27

姓名

信念(Vinay Kumar Singh) 查詢紙本館藏

畢業系所

企業管理學系

論文名稱

(A scalable approach for product delivery at a supermarket chain)

相關論文

★ 以第四方物流經營模式分析博連資訊科技股份有限公司	★ 探討虛擬環境下隱性協調在新產品導入之作用--以電子組裝業為例
★ 動態能力機會擷取機制之研究－以A公司為例	★ 探討以價值驅動之商業模式創新－以D公司為例
★ 物聯網行動支付之探討-以Apple Pay與支付寶錢包為例	★ 企業資訊方案行銷歷程之探討-以MES為例
★ B2C網路黏著度之探討-以博客來為例	★ 組織機制與吸收能力關係之研究－以新產品開發專案為例
★ Revisit the Concept of Exploration and Exploitation	★ 臺灣遠距醫療照護系統之發展及營運模式探討
★ 資訊系統與人力資訊科技資源對供應鏈績效影響之研究-買方依賴性的干擾效果	★ 資訊科技對知識創造影響之研究-探討社會鑲嵌的中介效果
★ 資訊科技對公司吸收能力影響之研究－以新產品開發專案為例	★ 探討買賣雙方鑲嵌關係影響交易績效之機制 ─新產品開發專案為例
★ 資訊技術運用與協調能力和即興能力對新產品開發績效之影響	★ 團隊組成多元性影響任務衝突機制之研究─ 以新產品開發專案團隊為例

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 (2024-8-1以後開放)

摘要(中)

產品交付已經在企業中以不同的角度和方法進行了研究和工作：結合庫存控制和車輛路線。然而，解決這個問題大多數的方法都是使用經典的約束優化技術，當產品需要通過多個車輛在多個站點交付時，這些技術不是不可擴展，不然就是速度太慢。我們以 S. Proper和P. Tadepalli (2006) 的工作為基礎，通過 ASH學習控制一個簡單離散的五家商店和四輛卡車，在少量卡車和商店的情況下實驗結果是成功的；但是，它仍然在維度上的有三個阻礙。Gaur and Fisher (2004) and Avella et al (2014) 也以相似的目標但使用不同的方法解決了相同的問題。本研究將解決傳統方法所面臨的維度問題：狀態-動作空間 (Powell, WB, et al., 2005, S. Proper and P. Tadepalli′s, 2006) 。
隨著近年來計算能力的發展和基礎設施成本的下降，機器學習已成為數據科學的一個熱門領域。廣義上，機器學習技術分為兩類，即監督式學習和非監督式學習。但是，除了這兩種類型以外，還區分了另一種類型：強化學習。這幾十年來強化學習大量的被應用在不同行業領域上，例如：機器人技術和生產自動化 (Kober et al., 2013)、醫藥和醫療保健 (Frank et al., 2004; Yufan et al., 2017) 、媒體和廣告 (Cai et al., 2017; Agarwal et al., 2016; Abe et al., 2004) 、金融 (Bertoluzzo and Corazza, 2014) 、文本、語音和對話系統 (Dhingra et al.,2017) 。此外，它為我們提供了一個框架，可以用來建模各種隨機優化的問題。然而，將傳統的強化學習方法應用於重要的現實世界問題會受到維度的限制，會因著隨機性，狀態和動作空間的爆炸以及大量後續可能的動作狀態 (Powell, W. B., et al., 2005, S. Proper and P. Tadepalli′s, 2006) 。
本研究提出了一種使用深度強化學習來解決產品交付問題的新方法，該方法克服了狀態和動作空間的維度，限制了現有的經典優化解決方案。當直銷店面、零售商店和送貨車輛的數量巨大時，我們利用深度神經網絡的非線性和複雜性來實現可擴展性。最後，我們使用來自德國零售連鎖店的真實數據對我們的方法進行實驗評估，將我們的方法與在特定情況下使用的最先進的系統進行比較，我們的方法可節省高達14%的成本。
此外，我們的研究結果顯示，深度神經網絡比傳統的強化學習中所使用的表格線性函數更適合解決複雜的優化問題。我們還建議對該主題進行進一步的研究，我們預計深度神經網絡可能是解決一些維度困難上的關鍵，例如狀態-動作空間的爆炸。我們已經使用OpenAI健身房環境來解決我們的產品交付問題。

摘要(英)

Product delivery has been studied and worked in the industry with different aspects and approaches: combining inventory control and Vehicle routing. However, most approaches to solving this issue have been using classical constraint optimization techniques that are either not scalable when the product needs to be delivered at multiple sites via multiple vehicles or are too slow. We build on S. Proper and P. Tadepalli′s (2006) work of controlling a simplistic and discretized five shops and four trucks with ASH learning. The results are successful with the small number of trucks and shops; however, it still has the three curses of dimensionality. Gaur and Fisher (2004) and Avella et al. (2014) also solve the same problem with a similar goal but with different methods. This Ph.D. thesis aims to solve the dimensionality problem (state-action spaces) that the classical approaches suffer (Powell, W. B., et al.,2005, S. Proper and P. Tadepalli′s,2006).
With recent development in computation power and an exponential decrease in infrastructure cost, Machine Learning (ML) has become a trendy field in Data Science. Broadly ML techniques are studied under two categories, namely Supervised Learning and Unsupervised Learning. However, apart from these two types, we must distinguish another type: Reinforcement Learning (RL). It consists of techniques that have been used for decades in different fields of Artificial Intelligence for many applications. Like in areas such as robotics and automation in production (Kober et al.,2013), medicine, and health care (Frank et al.,2004; Yufan et al.,2017), media and advertising (Cai et al.,2017; Agarwal et al.,2016; Abe et al.,2004), finance (Bertoluzzo and Corazza,2014), text, speech, and dialog systems (Dhingra et al.,2017), and so forth. Moreover, it gives us a framework with which one can model a variety of stochastic optimization problems. However, applying the classical RL approach to significant real-world problems suffers from the limitation of dimensionality. Namely, an explosion in state and action space and a vast number of following possible states of action owing to stochasticity (Powell, W. B., et al.,2005, S. Proper and P. Tadepalli′s,2006).
Our study presents a novel method using Deep Reinforcement Learning to solve product delivery problems that overcome the dimensionality of state and action spaces, limiting existing classical Optimization solutions. We exploit the nonlinearity and complexity of a Deep Neural Network for scalability when the number of Outlets, shops, and the delivery vehicle is enormous. We perform an experimental evaluation of our approach with real-world data from a Retail chain based in Germany. We compare our approach with the state-of-the-
art system used in specific situations. Our approach generates a cost-saving potential of up to 14%.
Furthermore, Our Findings demonstrate that DNN is better for learning to solve complex optimization problems than the Tabular Linear Function used so far in classical Reinforcement Learning. We also recommend further research on the topic. We expect that Deep Neural Network could be the key to solving some of the curses of dimensionality, such as the explosion of state-action spaces. We have used the OpenAI gym environment for our product delivery problem.

關鍵字(中)

★ 強化學習
★ 深度神經網路
★ 產品交付

關鍵字(英)

★ Reinforcement Learning
★ Deep Neural Network
★ Product delivery

論文目次

Contents
1.Introduction 1
2. Problem Description 3
3.Related work 7
4. Background Concepts 11
5.Methodology 20
6.Simulations and results 36
6.1. Q-learning 37
6.2 Policy Gradient 42
7.Discussion 50
8. Conclusion 52
References 53

參考文獻

Abe, Naoki, et al. “Cross Channel Optimized Marketing by Reinforcement Learning.” Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’04, ACM Press, 2004, p. 767. DOI.org (Crossref), doi:10.1145/1014052.1016912.
Agra, Agostinho, et al. “The Robust Vehicle Routing Problem with Time Windows.” Computers & Operations Research, vol. 40, no. 3, Mar. 2013, pp. 856–66. DOI.org (Crossref), doi:10.1016/j.cor.2012.10.002.
Applegate, David L., editor. The Traveling Salesman Problem: A Computational Study. Princeton University Press, 2006.
Atienza, Rowel. Advanced Deep Learning with Keras. 2018. Open WorldCat, http://sbiproxy.uqac.ca/login?url=https://international.scholarvox.com/book/88865386.
Avella, Pasquale, et al. “Solving a Fuel Delivery Problem by Heuristic and Exact Approaches.” European Journal of Operational Research, vol. 152, no. 1, Jan. 2004, pp. 170–79. DOI.org (Crossref), doi:10.1016/S0377-2217(02)00676-8.
Baldacci, Roberto, et al. “Recent Exact Algorithms for Solving the Vehicle Routing Problem under Capacity and Time Window Constraints.” European Journal of Operational Research, vol. 218, no. 1, Apr. 2012, pp. 1–6. DOI.org (Crossref), doi:10.1016/j.ejor.2011.07.037.
Bellman, Richard. Dynamic Programming. Dover ed, Dover Publications, 2003.
Bello, Irwan, et al. “Neural Combinatorial Optimization with Reinforcement Learning.” ArXiv:1611.09940 [Cs, Stat], Jan. 2017. arXiv.org, http://arxiv.org/abs/1611.09940.
Boutilier, C., et al. “Decision-Theoretic Planning: Structural Assumptions and Computational Leverage.” Journal of Artificial Intelligence Research, vol. 11, July 1999, pp. 1–94. DOI.org (Crossref), doi:10.1613/jair.575.
Cai, Xiaoyan, et al. “HITS-Based Attentional Neural Model for Abstractive Summarization.” Knowledge-Based Systems, vol. 222, June 2021, p. 106996. DOI.org (Crossref), doi:10.1016/j.knosys.2021.106996.
Clarke, G., and J. W. Wright. “Scheduling of Vehicles from a Central Depot to a Number of Delivery Points.” Operations Research, vol. 12, no. 4, Aug. 1964, pp. 568–81. DOI.org (Crossref), doi:10.1287/opre.12.4.568.
Côté, Jean-François, et al. “The Vehicle Routing Problem with Stochastic Two-Dimensional Items.” Transportation Science, Jan. 2020, p. trsc.2019.0904. DOI.org (Crossref), doi:10.1287/trsc.2019.0904.
Dhingra, Bhuwan, et al. “Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access.” Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 2017, pp. 484–95. DOI.org (Crossref), doi:10.18653/v1/P17-1045.
Dullaert, Wout, and Olli Bräysy. “Routing Relatively Few Customers per Route.” Top, vol. 11, no. 2, Dec. 2003, pp. 325–36. Springer Link, doi:10.1007/BF02579048.
Frank, M. J. “By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism.” Science, vol. 306, no. 5703, Dec. 2004, pp. 1940–43. DOI.org (Crossref), doi:10.1126/science.1102941.
Gaur, Vishal, and Marshall L. Fisher. “A Periodic Inventory Routing Problem at a Supermarket Chain.” Operations Research, vol. 52, no. 6, Dec. 2004, pp. 813–22. DOI.org (Crossref), doi:10.1287/opre.1040.0150.
Goel,Asvin; Gruhn,Volker, “A General Vehicle Routing Problem.” European Journal of Operational Research, vol. 191, no. 3, Dec. 2008, pp. 650–60. www.sciencedirect.com, doi:10.1016/j.ejor.2006.12.065.
Golden, Bruce, et al., editors. The Vehicle Routing Problem: Latest Advances and New Challenges. Springer US, 2008. DOI.org (Crossref), doi:10.1007/978-0-387-77778-8.
Google/or-Tools. 2015. Google, 2021. GitHub, https://github.com/google/or-tools.
Haghani, Ali, and Soojung Jung. “A Dynamic Vehicle Routing Problem with Time-Dependent Travel Times.” Computers & Operations Research, vol. 32, no. 11, Nov. 2005, pp. 2959–86. DOI.org (Crossref), doi:10.1016/j.cor.2004.04.013.
Hinton, Geoffrey E., et al. “A Fast Learning Algorithm for Deep Belief Nets.” Neural Computation, vol. 18, no. 7, July 2006, pp. 1527–54. DOI.org (Crossref), doi:10.1162/neco.2006.18.7.1527.
Hwang, C. P., et al. “A Tour Construction Heuristic for the Travelling Salesman Problem.” Journal of the Operational Research Society, vol. 50, no. 8, Aug. 1999, pp. 797–809. DOI.org (Crossref), doi:10.1057/palgrave.jors.2600761.
Ioffe, Sergey, and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” ArXiv:1502.03167 [Cs], Mar. 2015. arXiv.org, http://arxiv.org/abs/1502.03167.
Jang, Beakcheol, et al. “Q-Learning Algorithms: A Comprehensive Classification and Applications.” IEEE Access, vol. 7, 2019, pp. 133653–67. DOI.org (Crossref), doi:10.1109/ACCESS.2019.2941229.
Kaelbling, L. P., et al. “Reinforcement Learning: A Survey.” ArXiv:Cs/9605103, Apr. 1996. arXiv.org, http://arxiv.org/abs/cs/9605103.
Kober, Jens, et al. “Reinforcement Learning in Robotics: A Survey.” The International Journal of Robotics Research, vol. 32, no. 11, Sept. 2013, pp. 1238–74. DOI.org (Crossref), doi:10.1177/0278364913495721.
La, Prashanth, and Shalabh Bhatnagar. “Reinforcement Learning With Function Approximation for Traffic Signal Control.” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 2, June 2011, pp. 412–21. DOI.org (Crossref), doi:10.1109/TITS.2010.2091408.
Laporte, Gilbert. “The Vehicle Routing Problem: An Overview of Exact and Approximate Algorithms.” European Journal of Operational Research, vol. 59, no. 3, June 1992, pp. 345–58. DOI.org (Crossref), doi:10.1016/0377-2217(92)90192-C.
Lin, S., and B. W. Kernighan. “An Effective Heuristic Algorithm for the Traveling-Salesman Problem.” Operations Research, vol. 21, no. 2, Apr. 1973, pp. 498–516. April 1973, doi:10.1287/opre.21.2.498.
Lingaitienė, Olga, et al. “The Model of Vehicle and Route Selection for Energy Saving.” Sustainability, vol. 13, no. 8, Jan. 2021, p. 4528. www.mdpi.com, doi:10.3390/su13084528.
Peters, Jan. “Policy Gradient Methods.” Scholarpedia, vol. 5, no. 11, Nov. 2010, p. 3698. www.scholarpedia.org, doi:10.4249/scholarpedia.3698.
Pisinger ,David; Ropke,Stefan, “A General Heuristic for Vehicle Routing Problems.” Computers & Operations Research, vol. 34, no. 8, Aug. 2007, pp. 2403–35. www.sciencedirect.com, doi:10.1016/j.cor.2005.09.012.
Powell, W. B., et al. “Approximate Dynac Programming for High Dimensional Resource Allocation Problems.” Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., vol. 5, IEEE, 2005, pp. 2989–94. DOI.org (Crossref), doi:10.1109/IJCNN.2005.1556401.
Proper, Scott, and Prasad Tadepalli. “Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery.” Machine Learning: ECML 2006, edited by Johannes Fürnkranz et al., Springer, 2006, pp. 735–42. Springer Link, doi:10.1007/11871842_74.
Puterman, Martin L. Markov Decision Processes Discrete Stochastic Dynamic Programming. 2014. Open WorldCat, https://nbn-resolving.org/urn:nbn:de:101:1-201411292603.
Puterman, Martin L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 2005.
Ritzinger, Ulrike, et al. “A Survey on Dynamic and Stochastic Vehicle Routing Problems.” International Journal of Production Research, vol. 54, no. 1, Jan. 2016, pp. 215–31. DOI.org (Crossref), doi:10.1080/00207543.2015.1043403.
Sutton, R., and A. Barto. Chapter 12 Time-Derivative Models of Pavlovian Reinforcement. 1990, https://www.semanticscholar.org/paper/Chapter-12-Time-Derivative-Models-of-Pavlovian-Sutton-Barto/78a0be0286ac20ac6ff928b58ca43693526574ea.
Sutton, Richard S., et al. “Policy Gradient Methods for Reinforcement Learning with Function Approximation.” Proceedings of the 12th International Conference on Neural Information Processing Systems, MIT Press, 1999, pp. 1057–63.
Sutton, Richard S., and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
Tesauro, G. “Temporal Difference Learning and TD-Gammon.” ICGA Journal, vol. 18, no. 2, June 1995, pp. 88–88. DOI.org (Crossref), doi:10.3233/ICG-1995-18207.
---. “Temporal Difference Learning and TD-Gammon.” J. Int. Comput. Games Assoc., 1995. Semantic Scholar, doi:10.3233/ICG-1995-18207.
The Interaction of Representations and Planning Objectives for Decision-Theoretic Planning Tasks. http://idm-lab.org/bib/abstracts/Koen02a.html. Accessed 4 July 2021.
Tsitsiklis, J. N., and B. Van Roy. “An Analysis of Temporal-Difference Learning with Function Approximation.” IEEE Transactions on Automatic Control, vol. 42, no. 5, May 1997, pp. 674–90. DOI.org (Crossref), doi:10.1109/9.580874.
van Otterlo, Martijn, and Marco Wiering. “Reinforcement Learning and Markov Decision Processes.” Reinforcement Learning: State-of-the-Art, edited by Marco Wiering and Martijn van Otterlo, Springer, 2012, pp. 3–42. Springer Link, doi:10.1007/978-3-642-27645-3_1.
“What Is the Relation between Q-Learning and Policy Gradients Methods?” Artificial Intelligence Stack Exchange, https://ai.stackexchange.com/questions/6196/what-is-the-relation-between-q-learning-and-policy-gradients-methods. Accessed 5 July 2021.
Wren, Anthony, and Alan Holliday. “Computer Scheduling of Vehicles from One or More Depots to a Number of Delivery Points.” Journal of the Operational Research Society, vol. 23, no. 3, Sept. 1972, pp. 333–44. DOI.org (Crossref), doi:10.1057/jors.1972.53.
Zhao, Yufan, et al. “Reinforcement Learning Design for Cancer Clinical Trials.” Statistics in Medicine, vol. 28, no. 26, Nov. 2009, pp. 3294–315. DOI.org (Crossref), doi:10.1002/sim.3720.

指導教授

陳炫碩(Shiuann-Shuoh Chen)

審核日期

2022-5-5

推文