本文希望透過深度強化學習演算法來評估最佳的拔管時機,呼吸器是重症監護病房中相當常用的輔助手段,同時伴隨不同鎮靜及鎮痛藥物的使用以緩解病患的痛苦,過去這項醫療決策的判斷大多依靠醫生的經驗,並沒有明確的量化準則。根據醫生的專業能力以及病患的身體條件(年齡、體重)的不同,醫療決策會有不同程度的變化,由於人為的判斷時常受到主觀意識的影響,引進人工智慧的輔助能更加客觀。 在Medical Information Mart for Intensive Care (MIMIC)-III 資料庫演示版本 (v1.4)中,我們篩選出使用呼吸器的病患,將他們的用藥記錄以及不同時刻的生理數據進行整理,並將這些資料輸入我們所建立的深度Q學習神經網路 (deep Q network, DQN) 供機器學習,並調整模型中的各項超參數 (hyperparameter) 以得到最佳的訓練效果。經過不斷的學習,機器習得了可以判斷最佳拔管時機的策略後,接著再利用離線策略評估 (off-policy evaluation) 的方式來評估機器所採取的策略是否優於臨床醫生的判斷。 ;In this research, we want to use deep reinforcement learning algorithms to evaluate the best extubation timing. Ventilators are a common intervention in intensive care units. At the same time, different sedative and analgesic drugs are used to relieve the pain of patients. In the past, this medical treatment decision-making mostly relied on the doctors’ experience, and there was no clear quantitative criterion. According to the professional abilities of the doctors and the physical conditions of the patients, such as age and weight, the medical decision-making varies. To avoid subjective consciousness, the assistance of introducing artificial intelligence will be more objective. In the Medical Information Mart for Intensive Care (MIMIC)-III database demo (v1.4), we screen out patients using mechanical ventilators, organize their medication records and physiological data at different hours. We use this data as input to train DQN and tune the hyperparameters in the model to achieve a better training result. After the machine has learned the best extubation timing, we use off-policy evaluation to determine whether the machine can take better strategies that exceed clinicians.