A Robust Deep Reinforcement Learning System for The Allocation of Epidemic Prevention Materials

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：23

、訪客IP：3.147.104.248

姓名

林孟宏(Meng-Hong Lin) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

(A Robust Deep Reinforcement Learning System for The Allocation of Epidemic Prevention Materials)

相關論文

★ Dynamic Overlay Construction for Mobile Target Detection in Wireless Sensor Networks	★ 車輛導航的簡易繞路策略
★ 使用傳送端電壓改善定位	★ 利用車輛分類建構車載網路上的虛擬骨幹
★ Why Topology-based Broadcast Algorithms Do Not Work Well in Heterogeneous Wireless Networks?	★ 針對移動性目標物的有效率無線感測網路
★ 適用於無線隨意網路中以關節點為基礎的分散式拓樸控制方法	★ A Review of Existing Web Frameworks
★ 將感測網路切割成貪婪區塊的分散式演算法	★ 無線網路上Range-free的距離測量
★ Inferring Floor Plan from Trajectories	★ An Indoor Collaborative Pedestrian Dead Reckoning System
★ Dynamic Content Adjustment In Mobile Ad Hoc Networks	★ 以影像為基礎的定位系統
★ 大範圍無線感測網路下分散式資料壓縮收集演算法	★ 車用WiFi網路中的碰撞分析

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

自 2019 年底以來，隨著 2019 新型冠狀病毒肺炎（COVID-19）在全球迅速蔓延，因此，對防疫物資（如，醫療級口罩）的需求急遽增加，若不適當控管口罩數量，將會導致存貨不足及哄抬價格現象產生。台灣早在疫情大流行前，醫療級口罩就由政府集中管理，並以固定價格出售給所有民眾。在這種情況下，優化供應鏈是一個重要問題，例如，如果政府在某個地區分配了太多的口罩，其他地區的民眾可能會遭受資源短缺的困擾。對於有效預防 COVID-19 而言，至關重要的是，將口罩分配到每個區域的量應接近每日消耗量。
在本研究中，我們提出一個醫療級口罩分配系統。提出的系統採用強化學習框架，該框架以口罩的日常供需為環境，以 DDPG 演算法進行代理人更新，以每日缺貨量為獎勵和懲罰。我們透過實驗將此系統與用於供應鏈需求預測的機器學習方法進行了比較，結果表明，本研究所提出的系統在環境中獲得了更多獎勵。另外，我們的強化學習框架在不同的口罩總數下具有一致的性能。

摘要(英)

Coronavirus Disease 2019 (COVID-19) has spread rapidly around the world since the end of 2019. As a result, the demand for epidemic prevention materials (e.g., medical-grade masks) has increased drastically. If the masks are not properly controlled, it will lead to understock and price gouging. In Taiwan, since the very early stage of pandemic, the medical-grade masks have been collected and managed by the government, and have been sold to all residents for a fixed price. In this case, the supply chain optimization becomes an important issue. For instance, if the government allocates too many masks to a region, the residents in other regions may suffer from resource shortage. It is crucial that the masks are distributed to each region in the amount close to the daily consumption for efficient COVID-19 prevention. In this study, we propose a robust system for the allocation of medical-grade masks. The proposed system adopts the reinforcement learning framework, which takes the daily supply and demand of masks as the environment, the DDPG algorithm for agent updates, and the daily shortage as rewards and punishments. The proposed system is compared with the traditional machine learning approach used for supply chain demand forecasting through experiments, and the results indicate that the proposed system achieves more rewards in the environment. Moreover, our reinforcement learning framework has a consistent performance under different total numbers of masks.

關鍵字(中)

★ 供應鏈管理
★ 強化學習
★ 醫療級口罩
★ 深度確定性策略梯度

關鍵字(英)

★ Supply Chain Management
★ Reinforcement Learning
★ Medical-grade Masks
★ Deep Deterministic Policy Gradient

論文目次

1 Introduction 1
2 RelatedWork 4
2.1 Machine Learning-based Customer Demand Forecasting 4
2.1.1 Support Vector Machine 4
2.1.2 Support Vector Regression 5
2.2 Reinforcement Learning 5
3 Preliminary 7
3.1 Machine learning techniques 7
3.1.1 Support Vector Machine 7
3.1.2 Reinforcement learning 9
3.2 Deep learning 10
3.2.1 Deep reinforcement learning 11
4 Design 13
4.1 Data Collection 14
4.2 Data Extraction 15
4.3 Reinforcement Learning Framework 17
4.3.1 Environment Design 17
4.3.2 Actor Network and Critic Network 19
4.3.3 Deep Deterministic Policy Gradient Algorithm 20
4.3.4 Feature Scaling 22
5 Performance 24
5.1 Data Description 24
5.2 Experimental Settings 25
5.3 Performance Evaluation 27
5.3.1 Evaluation Metrics 27
5.3.2 Experiment Results 28
6 Conclusions and Future Works 35
Reference 36

參考文獻

[1] National Health Insurance Administration. compare of mask system 1.0, 2.0, and 3.0.
[2] Bernhard E Boser, Isabelle M Guyon, and Vladimir N Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory, pages 144–152, 1992.
[3] Noe Casas. Deep deterministic policy gradient for urban traffic light control. arXiv preprint arXiv:1703.09035, 2017.
[4] Tony F Chan, Gene Howard Golub, and Randall J LeVeque. Updating formulae and a pairwise algorithm for computing sample variances. In COMPSTAT 1982 5th Symposium held at Toulouse 1982, pages 30–41. Springer, 1982.
[5] David J Closs, Cheri Speier, and Nathan Meacham. Sustainability to support endto-end value chains: the role of supply chain management. Journal of the Academy of Marketing Science, 39(1):101–116, 2011.
[6] Taiwan Centers for Disease Control. Taiwan centers for disease control has commandeered masks for use by children and allocated combat masks to local governments free of charge to meet emergency epidemic prevention needs.
[7] Organisation for Economic Co-operation and Development. The face mask globalvalue chain in the covid-19 outbreak: Evidence and policy lessons c oecd 2020the face mask global value chain in the covid-19 outbreak: Evidence and policy lessons, 2020.
[8] Michael J Fry, Roman Kapuscinski, and Tava Lennon Olsen. Coordinating production and delivery under a (z, z)-type vendor-managed inventory contract. Manufacturing & Service Operations Management, 3(2):151–173, 2001.
[9] Peter Gentsch. K¨unstliche Intelligenz f¨ur Sales, Marketing und Service: Mit AI und Bots zu einem Algorithmic Business–Konzepte, Technologien und Best Practices. Springer, 2017.
[10] Ilaria Giannoccaro and Pierpaolo Pontrandolfo. Inventory management in supply chains: a reinforcement learning approach. International Journal of Production Economics, 78(2):153–161, 2002.
[11] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
[12] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, 2009.
[13] Minghua He, Alex Rogers, Esther David, and Nicholas R Jennings. Designing and evaluating an adaptive trading agent for supply chain management. In AgentMediated Electronic Commerce. Designing Trading Agents and Mechanisms, pages 140–156. Springer, 2005.
[14] Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
[15] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[16] James Kennedy and Russell Eberhart. Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks, volume 4, pages 1942–1948. IEEE, 1995.
[17] CO Kim, J Jun, JK Baek, RL Smith, and Yeong-Dae Kim. Adaptive inventory control models for supply chain management. The International Journal of Advanced Manufacturing Technology, 26(9-10):1184–1192, 2005.
[18] Axel Kuhn and Bernd Hellingrath. Optimierte zusammenarbeit in der wertsch¨opfungskette, 2002.
[19] Choonjong Kwak, Jin Sung Choi, Chang Ouk Kim, and Ick-Hyun Kwon. Situation reactive approach to vendor managed inventory problem. Expert Systems with Applications, 36(5):9039–9045, 2009.
[20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
[21] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
[22] Tom M Mitchell et al. Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37):870–877, 1997.
[23] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[24] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
[25] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of machine learning. MIT press, 2018.
[26] Ahmad Mortazavi, Alireza Arshadi Khamseh, and Parham Azimi. Designing of an intelligent self-adaptive model for supply chain ordering management system. Engineering Applications of Artificial Intelligence, 37:207–220, 2015.
[27] Ministry of Health and Taiwan Welfare. A detailed list of the remaining number of masks at health care pharmacies, 2020.
[28] World Health Organization. Who coronavirus disease (covid-19) dashboard, 2020.
[29] Athanasios S Polydoros and Lazaros Nalpantidis. Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems, 86(2):153–173, 2017.
[30] Malek Sarhani and Abdellatif El Afia. Intelligent system based support vector regression for supply chain demand forecasting. In 2014 Second World Conference on Complex Systems (WCCS), pages 79–83. IEEE, 2014.
[31] Yuhui Shi et al. Particle swarm optimization: developments, applications and resources. In Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), volume 1, pages 81–86. IEEE, 2001.
[32] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. 2014.
[33] Ruoying Sun and Gang Zhao. Analyses about efficiency of reinforcement learning to supply chain ordering management. In IEEE 10th International Conference on Industrial Informatics, pages 124–127. IEEE, 2012.
[34] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. 2011.
[35] Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057–1063, 2000.
[36] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. In Thirtieth AAAI conference on artificial intelligence, 2016.
[37] Martijn Van Otterlo and Marco Wiering. Reinforcement learning and markov decision processes. In Reinforcement Learning, pages 3–42. Springer, 2012.
[38] Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 2013.
[39] Vladimir Vapnik, Steven E Golowich, and Alex J Smola. Support vector method for function approximation, regression estimation and signal processing. In Advances in neural information processing systems, pages 281–287, 1997.
[40] Chen Weigen, Teng Li, Liu Jun, et al. Transformer winding hot-spot temperature prediction model of support vector machine optimized by genetic algorithm. Transactions of China Electrotechnical Society, 29(1):44–51, 2014.
[41] Hannah Wenzel, Daniel Smit, and Saskia Sardesai. A literature review on machine learning in supply chain management. In Artificial Intelligence and Digital Transformation in Supply Chain Management: Innovative Approaches for Supply Chains. Proceedings of the Hamburg International Conference of Logistics (HICL), Vol. 27, pages 413–441. Berlin: epubli GmbH, 2019.
[42] YB Zhao, HN Gao, and SB Feng. Emergency materials demand prediction based on support vector machine regression. Computer Simulation, 8:408–412, 2013.

指導教授

孫敏德(Min-Te Sun)

審核日期

2021-1-28

推文