混合式心臟疾病危險因子與其病程辨識 於電子病歷之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：85

、訪客IP：3.135.246.8

姓名

簡舟陽(Chou-Yang Chien) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

混合式心臟疾病危險因子與其病程辨識於電子病歷之研究
(A hybrid approach to identifying heart disease risk factors and progression in electronic medical records)

相關論文

★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks	★ 基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
★ 基於 PowerDesigner 規範需求分析產出之快速導入方法	★ 社群論壇之問題檢索
★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例	★ 應用自然語言處理技術分析文學小說角色之關係：以互動視覺化呈現
★ 基於生醫文本擷取功能性層級之生物學表徵語言敘述：由主成分分析發想之K近鄰算法	★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結
★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data	★ 藉由加入多重語音辨識結果來改善對話狀態追蹤
★ 對話系統應用於中文線上客服助理:以電信領域為例	★ 應用遞歸神經網路於適當的時機回答問題
★ 使用多任務學習改善使用者意圖分類	★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法
★ 基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家	★ 使用YMCL模型改善使用者意圖分類成效

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

在電子病歷中提供許多病患的健康資訊，而疾病的危險因子是影響病人健康的重要威脅。因此，偵測危險因子成為是醫療文件探勘的一個重要目標，其中又以心臟疾病中的冠狀動脈疾病為2012~2013的全球第一大死因，於是從電子病歷中偵測心臟疾病的危險因子和追蹤危險因子的發展，將可以提供醫護人員參考與預防該疾病的發生。心臟疾病的危險因子在病歷中的表達方式主要包括命名實體、表格、句子的一部分及多句，因此很難只使用單一的方法來辨識它們是否存在。

在本文中，我們提出了一個混合的方法來完成這項任務。我們開發基於條件隨機域模型的危險因子偵測系統，來識別危險因子中的三個主要類別：提及疾病，提及用藥，以及吸煙狀態表達敘述。除此之外，為了要識別不能由我們的前述的偵測系統所發現的其他危險因子類別，我們額外設計了基於生理指標值和危險因子的關鍵字語法規則來補充，並且最後使用一些基於醫學知識的後處理，來過濾不適當的危險因子。並且為了要追蹤病人的危險因子發展，我們使用最大熵模型來標註前述所偵測到的危險因子與文件創立時間的相對時間。

我們利用2014年i2b2中心舉辦的自然語言競賽，第二項任務的測試資料集來評估系統的實驗結果中發現，使用基於條件隨機域的系統得到F-score 88.27%的成績，而在添加規則語法的組態達到了F-Score 89.74%，提高了F-score 1.47%的效能，最後加上後處理所做出來我們目前最佳的F-score 91.74%，改善2%的成績。

摘要(英)

The electronic medical records of patients provide detailed health information, and risk factors of disease effect patient on illness, thus they are an important target for medical text mining. The top one cause to death is coronary artery disease from 2012 to 2013, so detecting the risk factor of heart disease and tracking their progression over sets of longitudinal records is helpful to refer and prevent the heart disease. Risk factors are presented as named entity, part-of-sentence, tabular, and multi-sentence expressions in medical records; therefore, it is difficult to detect them using a single approach.

In this paper, we present a hybrid approach to this task by developing three systems based on the conditional random fields (CRF) model, each of which targets one of three major risk factor categories: disease, medication, and smoker. To recognize risk factors not found by our CRF-based systems, our team formulate syntactic rules based on physiological indicators and risk factor keywords. To track patient progression longitudinally, we also use maximum entropy to label the identified risk factor mentions with tags that describe their relation to the document creation time.

Our experimental results show that our CRF-based systems achieve an F-score of 88.27% on the i2b2 2014 Track 2 test dataset. Adding the various rules improves the F-score by 1.47% and achieves an F-score of 89.74%. Finally we combine previous system and post-processing, and the system achieves 91.74% and improve the F-score 2%.

關鍵字(中)

★ 生醫探勘
★ 自然語言處理
★ 機器學習

關鍵字(英)

★ Biomedcal imformation
★ Natural language processing
★ Machine learning

論文目次

中文摘要.........................................i
英文摘要........................................ii
謝誌............................................iv
目錄.............................................v
圖目錄........................................viii
表目錄..........................................ix
一、導論.........................................1
1.1 背景.........................................1
1.1.1 病歷.......................................1
1.1.2 電子病歷...................................2
1.1.3 心臟疾病...................................2
1.1.4 冠狀動脈心臟疾病...........................3
1.1.5 心臟疾病的危險因子追蹤.....................3
1.2 研究目的.....................................4
1.3 問題描述.....................................4
二、文獻探討.....................................8
2.1 醫療命名實體辨識.............................8
2.2 相對時間關係偵測.............................9
三、研究方法....................................11
3.1 問題設計....................................11
3.2 資料集標記處理..............................22
3.3 系統架構....................................24
3.4 前處理......................................24
3.5 基於條件隨機域的危險因子偵測................25
3.5.1 條件隨機域模型............................26
3.5.2 特徵值萃取................................28
3.6 基於規則的危險因子偵測......................32
3.6.1 生理測量值偵測規則........................32
3.6.2 危險因子關鍵字............................32
3.6.3 基於藥物詞典標記..........................34
3.7 後處理......................................34
3.7.1 提及疾病過濾..............................34
3.7.2 藥物正規化................................34
3.7.3 吸菸狀態表達敘述分類......................35
3.7.4 負向危險因子過濾..........................36
3.8 危險因子與文件創立時間的相對時間關係偵測....37
3.8.1 最大熵模型................................37
3.8.2 特徵值萃取................................38
四、實驗與評估..................................42
4.1 資料集......................................42
4.2 實驗評估....................................45
4.3 實驗結果....................................46
4.3.1 基於條件隨機域危險因子偵測................46
4.3.2 危險因子偵測..............................49
五、實驗討論與分析..............................51
5.1 i2b2 2014 NLP shared task 2.................51
5.2 實驗討論....................................52
5.3 錯誤分析....................................52
5.3.1 基於條件隨機域危險因子辨識錯誤............52
5.3.2 基於規則危險因子辨識錯誤..................55
六、結論........................................57
6.1 未來展望....................................57
參考文獻........................................61

參考文獻

[1]Finegold, J. a., Asaria, P., & Francis, D. P. (2013). Mortality from ischaemic heart disease by country, region, and age: Statistics from World Health Organisation and United Nations. International Journal of Cardiology, 168(2), 934–945.
[2]Uzuner, O., South, B. R., Shen, S., & Duvall, S. L. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association : JAMIA, 18(5), 552–556.
[3]De Bruijn, B., Cherry, C., Kiritchenko, S., Martin, J., & Zhu, X. (2011). Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. Journal of the American Medical Informatics Association : JAMIA, 18(5), 557–562.
[4]Jiang, M., Chen, Y., Liu, M., Rosenbloom, S. T., Mani, S., Denny, J. C., & Xu, H. (2011). A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Journal of the American Medical Informatics Association : JAMIA, 18(5), 601–606.
[5]Kang, N., Barendse, R. J., Afzal, Z., Singh, B., Schuemie, M. J., Mulligen, E. M. Van, & Kors, J. a. (2010). Erasmus MC Approaches to the i2b2 Challenge. Proceedings i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data.
[6]Sun, W., Rumshisky, A., & Uzuner, O. (2013). Evaluating temporal relations in clinical text: 2012 i2b2 Challenge. Journal of the American Medical Informatics Association : JAMIA, 20(5), 806–13.
[7]Xu, Y., Wang, Y., Liu, T., Tsujii, J., & Chang, E. I.-C. (2013). An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. Journal of the American Medical Informatics Association, 20(5), 849–858.
[8]Lin, Y. K., Chen, H., & Brown, R. a. (2013). MedTime: A temporal information extraction system for clinical narratives. Journal of Biomedical Informatics, 46(SUPPL.), S20–S28.
[9]Chang, Y. C., Dai, H. J., Wu, J. C. Y., Chen, J. M., Tsai, R. T. H., & Hsu, W. L. (2013). TEMPTING system: A hybrid method of rule and machine learning for temporal relation extraction in patient discharge summaries. Journal of Biomedical Informatics, 46(SUPPL.), S54–S62.
[10]Tsuruoka, Y., Tateishi, Y., Kim, J. D., Ohta, T., McNaught, J., Ananiadou, S., & Tsujii, J. (2005). Developing a robust part-of-speech tagger for biomedical text. Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3746 LNCS, 382–392.
[11]Sang, E. F. T. K., & Veenstra, J. (1999). Representing Text Chunks. Proceedings of EACL, 173–179.
[12]Dai, H.-J., Lai, P.-T., Chang, Y.-C., & Tsai, R. (2015). Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization. Journal of Cheminformatics, 7(Suppl 1), S14.
[13]Lafferty, J., & Mccallum, A. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings ICML, 2001(Icml), 282–289.
[14]Viterbi, a. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.
[15]Florian, R., Ittycheriah, A., Jing, H., & Zhang, T. (2003). Named entity recognition through classifier combination. Proceedings of the Seventh Conference on Natural Language Learning at HLTNAACL 2003, 4, 168–171.
[16]Zhou, G., & Su, J. (2004). Exploring Deep Knowledge Resources in Biomedical Name Recognition. Workshop on Natural Language Processing in Biomedicine and Its Applications at COLING, 96–99.
[17]Toutanova, K., Klein, D., & Manning, C. D. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 (NAACL ’03), 252–259.
[18]Tsuruoka, Y., & Tsujii, J. (2005). Bidirectional inference with the easiest-first strategy for tagging sequence data. Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language, HLT/EMNLP, 467–474.
[19]Jaccard, P. (1908). Nouvelles recherches sur la distribution florale. Bull Soc. Vaud. Sci. Nat, 223–270.
[20]Law, V., Knox, C., Djoumbou, Y., Jewison, T., Guo, A. C., Liu, Y., … Wishart, D. S. (2014). DrugBank 4.0: Shedding new light on drug metabolism. Nucleic Acids Research, 42(D1), 1–7.
[21]Ratnaparkhi, A. (1996). A maximum entropy model for part-of-speech tagging. Baseline, 1(49), 133–142. Retrieved from
[22]D, D. J. and R. (1972). Generalized Iterative Scaling for Log-Linear Models. The Annals of Mathematical Statistics.
[23]Nocedal, Jorge; Wright, S. (2006). Numerical Optimization - Solution Manual. Numerical Optimization. Springer Science & Business Media.
[24]Stubbs, A., & Uzuner, O. (2015). Annotating risk factors for heart disease in clinical narratives for diabetic patients. Journal of Biomedical Informatics, (May).

指導教授

蔡宗翰(Richard Tzong-Han Tsai)

審核日期

2015-8-25

推文