突發性的腦血管疾病,也稱作中風,是造成全世界人類死亡的第二大原因,也是導致失能的第三大原因。心房顫動為缺血性中風的潛在因素,並且與缺血性中風有著極大的關聯,但心房顫動不易檢測,時常有陣發性發作卻被誤判成無症狀,以致無法被妥善治療的情形發生。當一個缺血性中風病人若偵測到有心房顫動,其中風次級預防之策略通常就會隨之改變,因為在這樣的狀況下,口服抗凝血劑的效果基本上會優於口服抗血小板藥物的治療,口服抗凝血劑可將中風病人復發的風險降低三分之二。本研究主要目的為使用非結構化的文字資料,藉由機器學習的演算法,於已發生缺血性中風之病人,建立中風後心房顫動的早期預測模型,並實際以電子病歷中的資料進行驗證。次要目的則為比較結構化資料與非結構化資料所建立之預測模型的預測效能有無不同,希望本研究所建立之模型可以輔助醫生的醫療決策,更能妥善運用醫療資源。 在預測心房顫動之實驗中,實驗1可發現邏輯迴歸技術在不同特徵之資料中皆有最好的指標效果,其中又以合併特徵搭配邏輯迴歸分類器最佳(AUC=0.8324);在實驗2中以兩家醫院之資料互相建立模型並交互驗證,從結果得知使用不同醫院之非結構化資料建立心房顫動預測模型,評估指標的效果並不如預期。因此本研究證明結構特徵加上文字特徵,比起只單純使用結構特徵,可助於提升模型之性能。 ;Cerebrovascular disease, which is also known as stroke, is the second largest reason of deaths of human worldwide and the third largest reason of disability. Atrial fibrillation is the potential factor to cause ischemic stroke, and it is strongly related to ischemic stroke as well. However, it′s difficult to detect atrial fibrillation, causing the situation that the patient can′t receive the treatment properly. When an acute ischemic stroke patient is detected atrial fibrillation, the strategy of secondary prevention will be modified accordingly. The main purpose of this study is to use electronic medical records and the machine learning algorithm to build the early prediction model based on the patients who have had ischemic stroke. The second purpose is to compare the performance of the prediction model based on the structured data with that based on the unstructured data. We hope that the model proposed by the study can assist the doctors′ medical decision making, and to utilize medical resources properly. In the experiment of predicting atrial fibrillation, we found that in the experiment 1, logistic regression classifier has the best performance on data with different features, especially on structural features combined with text features. In the experiment 2, we build and cross validate the model based on the data of two hospitals. The results indicated that using unstructured data of different hospitals to build prediction model of atrial fibrillation, the effect of performance is not as expected. Therefore, this study proved that compared to only using the structured features, the combination of structured and text features can enhance the performance of the model.