摘要: | 產品交付已經在企業中以不同的角度和方法進行了研究和工作:結合庫存控制和車輛路線。然而,解決這個問題大多數的方法都是使用經典的約束優化技術,當產品需要通過多個車輛在多個站點交付時,這些技術不是不可擴展,不然就是速度太慢。我們以 S. Proper和P. Tadepalli (2006) 的工作為基礎,通過 ASH學習控制一個簡單離散的五家商店和四輛卡車,在少量卡車和商店的情況下實驗結果是成功的;但是,它仍然在維度上的有三個阻礙。Gaur and Fisher (2004) and Avella et al (2014) 也以相似的目標但使用不同的方法解決了相同的問題。本研究將解決傳統方法所面臨的維度問題:狀態-動作空間 (Powell, WB, et al., 2005, S. Proper and P. Tadepalli′s, 2006) 。 隨著近年來計算能力的發展和基礎設施成本的下降,機器學習已成為數據科學的一個熱門領域。廣義上,機器學習技術分為兩類,即監督式學習和非監督式學習。但是,除了這兩種類型以外,還區分了另一種類型:強化學習。這幾十年來強化學習大量的被應用在不同行業領域上,例如:機器人技術和生產自動化 (Kober et al., 2013)、醫藥和醫療保健 (Frank et al., 2004; Yufan et al., 2017) 、媒體和廣告 (Cai et al., 2017; Agarwal et al., 2016; Abe et al., 2004) 、金融 (Bertoluzzo and Corazza, 2014) 、文本、語音和對話系統 (Dhingra et al.,2017) 。此外,它為我們提供了一個框架,可以用來建模各種隨機優化的問題。然而,將傳統的強化學習方法應用於重要的現實世界問題會受到維度的限制,會因著隨機性,狀態和動作空間的爆炸以及大量後續可能的動作狀態 (Powell, W. B., et al., 2005, S. Proper and P. Tadepalli′s, 2006) 。 本研究提出了一種使用深度強化學習來解決產品交付問題的新方法,該方法克服了狀態和動作空間的維度,限制了現有的經典優化解決方案。當直銷店面、零售商店和送貨車輛的數量巨大時,我們利用深度神經網絡的非線性和複雜性來實現可擴展性。最後,我們使用來自德國零售連鎖店的真實數據對我們的方法進行實驗評估,將我們的方法與在特定情況下使用的最先進的系統進行比較,我們的方法可節省高達14%的成本。 此外,我們的研究結果顯示,深度神經網絡比傳統的強化學習中所使用的表格線性函數更適合解決複雜的優化問題。我們還建議對該主題進行進一步的研究,我們預計深度神經網絡可能是解決一些維度困難上的關鍵,例如狀態-動作空間的爆炸。我們已經使用OpenAI健身房環境來解決我們的產品交付問題。;Product delivery has been studied and worked in the industry with different aspects and approaches: combining inventory control and Vehicle routing. However, most approaches to solving this issue have been using classical constraint optimization techniques that are either not scalable when the product needs to be delivered at multiple sites via multiple vehicles or are too slow. We build on S. Proper and P. Tadepalli′s (2006) work of controlling a simplistic and discretized five shops and four trucks with ASH learning. The results are successful with the small number of trucks and shops; however, it still has the three curses of dimensionality. Gaur and Fisher (2004) and Avella et al. (2014) also solve the same problem with a similar goal but with different methods. This Ph.D. thesis aims to solve the dimensionality problem (state-action spaces) that the classical approaches suffer (Powell, W. B., et al.,2005, S. Proper and P. Tadepalli′s,2006). With recent development in computation power and an exponential decrease in infrastructure cost, Machine Learning (ML) has become a trendy field in Data Science. Broadly ML techniques are studied under two categories, namely Supervised Learning and Unsupervised Learning. However, apart from these two types, we must distinguish another type: Reinforcement Learning (RL). It consists of techniques that have been used for decades in different fields of Artificial Intelligence for many applications. Like in areas such as robotics and automation in production (Kober et al.,2013), medicine, and health care (Frank et al.,2004; Yufan et al.,2017), media and advertising (Cai et al.,2017; Agarwal et al.,2016; Abe et al.,2004), finance (Bertoluzzo and Corazza,2014), text, speech, and dialog systems (Dhingra et al.,2017), and so forth. Moreover, it gives us a framework with which one can model a variety of stochastic optimization problems. However, applying the classical RL approach to significant real-world problems suffers from the limitation of dimensionality. Namely, an explosion in state and action space and a vast number of following possible states of action owing to stochasticity (Powell, W. B., et al.,2005, S. Proper and P. Tadepalli′s,2006). Our study presents a novel method using Deep Reinforcement Learning to solve product delivery problems that overcome the dimensionality of state and action spaces, limiting existing classical Optimization solutions. We exploit the nonlinearity and complexity of a Deep Neural Network for scalability when the number of Outlets, shops, and the delivery vehicle is enormous. We perform an experimental evaluation of our approach with real-world data from a Retail chain based in Germany. We compare our approach with the state-of-the- art system used in specific situations. Our approach generates a cost-saving potential of up to 14%. Furthermore, Our Findings demonstrate that DNN is better for learning to solve complex optimization problems than the Tabular Linear Function used so far in classical Reinforcement Learning. We also recommend further research on the topic. We expect that Deep Neural Network could be the key to solving some of the curses of dimensionality, such as the explosion of state-action spaces. We have used the OpenAI gym environment for our product delivery problem. |