摘要(英) |
Novel coronavirus (COVID-19) disease is an infectious disease caused by the SARS-CoV-2 virus. COVID-19 originated at Wuhan city of China in early December 2019, and the epidemic quickly spread to the world. It is a Human-to-human transmission. SARS-CoV-2 spreads rapidly and is prone to severe symptoms after infection. It has led to a great impact on the world. In the absence of an adequate vaccine, significant medical resources and policies to limit human movement and contact such as restriction on gathering will be needed to mitigate the epidemic. Policies to reduce the spread of SARS-CoV-2 include border controls, mandatory or voluntary lock-down, quarantines, social distancing, mask-wearing, and vaccination. These measures are effective by restricting human movement and contact; however, the economy is seriously impacted as well. We focus on exploring the optimal balance between policy stringency and economy using Reinforcement Learning (RL): Asynchronous Advantage Actor-Critic + Proximal Policy Optimization. We use the compartmental SEIR model to train the agent and adjust the parameters of each state: suspected, exposed, infected, and removed. The parameters of these four states make the basic reproduction number in the SEIR correspond with the basic reproduction number of COVID-19 . In the experiment, we focus on the four prefectures in Japan – Hokkaido, Okinawa, Osaka, and Tokyo – and use the tested positive cases data from January 2020 to October 2021. There are five infection peaks in the data. For the compartmental SEIR model, it is difficult to make the whole picture simulations directly like the real situation. Hence, we create five environments to simulate these peaks then use an optimally trained agent to interact with these environments to reach the goal. We use CPU: i9-10980XE with 18 cores and 36 threads & GPU: RTX 3090 2ith 24GB to train the agent. With 18 workers for multi-threading on the A3C during training, the average reward rises with training and plateaus after 500 episodes. The results show that the optimal agent can effectively suppress the increase in the active cases. We also find the agent implement strict policies when the number of infected cases increase, continue increasing for a few days, or remain unchanged. These strict policies are implemented in high-risk areas on average. Finally, weighted population density can better represent the density of population in the area compared to traditional population density, hence it is more accurate to use population weighted density for pandemic infectivity studies. We change the SEIR model and add the Quarantined (Q) to form SEIQR model. Learned from the experiment that we can simulate various situations and various epidemic diseases by changing traditional SEIR model. However, whether our trained agent can be generally used in different epidemic diseases depends on the states that the environment gave. If we can generalize these states from the different epidemiological environment to find the necessary and crucial information which is sufficient for the agent to judge whether to implement strict policies, we can construct a general epidemiologically reward function with this information and train the agent to apply it to different epidemic diseases. |
參考文獻 |
參考文獻
[1] WHO: COVID-19 likely to shrink global GDP by almost one per cent in 2020, accessed from https://www.un.org/zh/desa/covid-19-likely-shrink-global-gdp-almost-one-cent-2020
[2] Ohi, A. Q., Mridha, M. F., Monowar, M. M., & Hamid, M. (2020). Exploring optimal control of epidemic spread using reinforcement learning. Scientific reports, 10(1), 1-19.
[3] Kinoshita, R., Jung, S. M., Kobayashi, T., Akhmetzhanov, A. R., & Nishiura, H. (2022). Epidemiology of coronavirus disease 2019 (COVID-19) in Japan during the first and second waves. Mathematical Biosciences and Engineering, 19(6), 6088-6101.
[4] Li, Y. (2017). Deep Reinforcement Learning: An Overview.. CoRR, abs/1701.07274.
[5] Bellman, R. (1957). A Markovian decision process. Journal of mathematics and mechanics, 679-684.
[6] Ivanov, S. & D′yakonov, A. (2019). Modern Deep Reinforcement Learning Algorithms.. CoRR, abs/1906.10025.
[7] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., ... & Kavukcuoglu, K. (2016, June). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928-1937). PMLR.
[8] Schulman, J., Moritz, P., Levine, S., Jordan, M. I. & Abbeel, P. (2016). High-Dimensional Continuous Control Using Generalized Advantage Estimation.. In Y. Bengio & Y. LeCun (eds.), ICLR (Poster), .
[9] Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. (2017). Proximal Policy Optimization Algorithms.. CoRR, abs/1707.06347.
[10] Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the royal society of london. Series A, Containing papers of a mathematical and physical character, 115(772), 700-721.
[11] Carcione, J. M., Santos, J. E., Bagaini, C., & Ba, J. (2020). A simulation of a COVID-19 epidemic based on a deterministic SEIR model. Frontiers in public health, 230.
[12] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[13] Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., & Xu, B. (2016, August). Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th annual meeting of the association for computational linguistics (volume 2: Short papers) (pp. 207-212).
[14] WHO: Coronavirus disease (COVID-19) symptoms, accessed from https://www.who.int/health-topics/coronavirus#tab=tab_3
[15] Billah, M. A., Miah, M. M., & Khan, M. N. (2020). Reproductive number of coronavirus: A systematic review and meta-analysis based on global level evidence. PloS one, 15(11), e0242128.
[16] Gallagher, J. (2021). Covid: Is there a limit to how much worse variants can get. How the R0 Numbers of COVID-19 Variants and Other Diseases Compare.
[17] Liu, Y., & Rocklöv, J. (2021). The reproductive number of the Delta variant of SARS-CoV-2 is far higher compared to the ancestral SARS-CoV-2 virus. Journal of travel medicine.
[18] Toyokeizai: Coronavirus Disease (COVID-19) Situation Report in Japan, accessed from https://toyokeizai.net/sp/visual/tko/covid19/en.html
[19] Achaiah, N. C., Subbarajasetty, S. B., & Shetty, R. M. (2020). R0 and re of COVID-19: can we predict when the pandemic outbreak will be contained?. Indian journal of critical care medicine: peer-reviewed, official publication of Indian Society of Critical Care Medicine, 24(11), 1125.
[20] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D. & Wierstra, D. (2016). Continuous control with deep reinforcement learning.. In Y. Bengio & Y. LeCun (eds.), ICLR, . |