dc.description.abstract | Background: Implementing containment measures slowed the spread of COVID-19 but led to a crisis in the world economy. We established a reinforcement learning (RL) algorithm to support disease management by balancing policies and activities. To shed light on lessons learned from the COVID-19 pandemic for future preparedness, we also conducted a spatiotemporal analysis to examine the clustering and risk factors of the 47 prefectures of Japan.
Methods: We designed an RL environment with 4 regions that 1) represented Tokyo, Osaka, Okinawa, and Hokkaido, Japan; 2) were connected by each region’s transport hub; and 3) had 4 separate Susceptible-Exposed-Infectious-Quarantined-Recovered (SEIQR) models. The RL agent was trained by obtaining observations from the environment, granting actions of movement, and receiving feedback from the reward function. The trained agent was introduced into environments mimicking the epidemic waves to observe the performance and action timing. In the spatiotemporal analysis, we applied hierarchical clustering on Pearson correlation coefficients of daily incidences to examine the overall similarity and variation among the 47 prefectures of Japan. We used linear regression to identify risk factors. We also demonstrated each prefecture′s incidence, mortality, and expected value of reproduction number for each epidemic wave and verified the risk factors using linear regression.
Results: The trained agent flattened the peaks of infectious cases and shortened the epidemics for the 5 epidemics covered in the RL study. The agent was often strict on screening but easy on movement, except for Okinawa, where both actions were generally easy. Action timing analyses indicated that restriction on movement was elevated when the number of exposed or infectious cases remained high or increased rapidly. Stringency on screening was eased when the number of exposed or infectious cases dropped quickly or to a regional low. For Okinawa, action on screening was tightened when the number of exposed or infectious cases increased rapidly. The spatiotemporal analysis demonstrated variations in epidemic patterns, with Okinawa/major metropolitans and Tohoku-Chubu prefectures having relatively higher and lower risks, respectively. Latitude and vaccination were strong discriminants. The comparison among waves also revealed significant deviations and showed signs of achieving herd immunity for early hotspot prefectures.
Conclusions: The RL experiments exhibited the potential to assist policy-making and demonstrated how the semi-connected SEIQR models created an interactive environment for imitating moving behaviors. Moreover, findings from the spatiotemporal analysis provide critical information regarding regional risk and can support authorities in future resource allocation. | en_US |