博碩士論文 103522078 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:22 、訪客IP:18.218.254.122
姓名 羅玉燕(Yu-Yan Lo)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 基於生醫文本擷取功能性層級之生物學表徵語言敘述:由主成分分析發想之K近鄰算法
(Extracting Function-level Statements in Biological Expression Language from Biomedical Literature:A K Nearest Neighbor approach inspired by Principal Component Analysis)
相關論文
★ A Real-time Embedding Increasing for Session-based Recommendation with Graph Neural Networks★ 基於主診斷的訓練目標修改用於出院病摘之十代國際疾病分類任務
★ 混合式心臟疾病危險因子與其病程辨識 於電子病歷之研究★ 基於 PowerDesigner 規範需求分析產出之快速導入方法
★ 社群論壇之問題檢索★ 非監督式歷史文本事件類型識別──以《明實錄》中之衛所事件為例
★ 應用自然語言處理技術分析文學小說角色 之關係:以互動視覺化呈現★ 基於分類系統建立文章表示向量應用於跨語言線上百科連結
★ Code-Mixing Language Model for Sentiment Analysis in Code-Mixing Data★ 藉由加入多重語音辨識結果來改善對話狀態追蹤
★ 對話系統應用於中文線上客服助理:以電信領域為例★ 應用遞歸神經網路於適當的時機回答問題
★ 使用多任務學習改善使用者意圖分類★ 使用轉移學習來改進針對命名實體音譯的樞軸語言方法
★ 基於歷史資訊向量與主題專精程度向量應用於尋找社群問答網站中專家★ 使用YMCL模型改善使用者意圖分類成效
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   [檢視]  [下載]
  1. 本電子論文使用權限為同意立即開放。
  2. 已達開放權限電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。
  3. 請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。

摘要(中) 一直以來,瞭解生物體中的蛋白質訊息傳導路徑是生醫領域研究的主要目的之一,因為蛋白質訊息傳導路徑牽涉到許多生物體內的調控作用,不同調控作用的組合會形成不同的蛋白質訊號路徑,而這些訊號路徑之間具有關聯性,彼此相連成精密的訊息傳遞網路。近年來,基於生醫實驗技術的精進以及資訊交流的便利,生醫領域中的文獻數量大幅成長,對於生醫文本探勘技術的需求也逐漸增加。生物學表徵語言(Biological Expression Language, BEL)是一種描述生醫訊息傳導網絡的表示法,此語言不僅可以描述兩生醫實體(基因、蛋白質、化合物等)之間的正負回饋關係,更可以近一步的表示生醫實體的功能性層級資訊,例如:是否為複合物、是否為伴侶性蛋白或是扮演催化物角色等等。在相關研究中,最新的擷取功能性層級(function-level)之生物學表徵語言成績為30.5\%,而此擷取成果會影響之後自動化擷取生物學表徵語言之完整性。為了提升生物學表徵語言敘述完整性,我們提出了主成分分析發想之K近鄰算法來自動化識別功能性層級之生醫實體,並在實驗中提出了基於不平衡資料集之功能性層級之生醫實體分類法,比較支持向量機(SVM)實驗與主成分分析發想之K近鄰算法之結果優缺。在實驗結果中,使用主成分分析發想之K近鄰算法對於不平衡資料集分類的效果為佳,其分類成績可達到59.70\%。因此,我們期望透過此自動化識別功能性層級之生醫實體之方法,提升未來建構生醫訊息傳導網路之完整性,進而加快生醫學者醫藥研究之進程。
摘要(英) Nowadays, understanding pathway is one of the main purpose of biomedical domains, because the biological pathway involves various regulation mechanisms. Many regulation mechanisms have being discovered and presented in biomedical literature, allowing life scientists to perceive the latest results. It also has being highly demanded within the scientific community in the text mining for biomedical researches. Biological Expression Language (BEL) is designed to capture relationships between the two biological entities, such as gene, protein and chemical in scientific literatures. This is can not only describe the positive/negative relationship between biomedical entities, but represent biomedical function-level information, such as complex abundance, chaperone protein, catalyst and so on. In related research, the latest performance of function-level classification is 30.5\%, and the performance will effect on the BEL full-statement performance. In order to enhance the integrity of the BEL full-statements, we proposed a K-nearest neighbor (KNN) approach inspired by Principal Component Analysis (PCA) to recognize the function-level terms automatically. In experimental results, combination of PCA and KNN has the higher performance than SVM-based method, and it can achieve F-score of 59.70\%. In conclusion, we hope that the higher performance of function-level classification can not only enhance the integrity of BEL full-statement, but help to construct complete biological networks and to accelerate the biomedical research processes for life scientists.
關鍵字(中) ★ 生醫文獻探勘
★ 生物學表徵語言
★ 機器學習
★ 主成分分析
★ K近鄰算法
關鍵字(英) ★ Biomedical text mining
★ Biological Expression Language
★ Machine learning
★ Principal component analysis
★ K-nearest neighbor
論文目次 中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
英文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2 Literature Reviews . . . . . . . . . . . . . . . . . . . 9
2.1 Biomedical Text Mining for Pathway Event Extraction 9
2.2 Extraction of Function-Level Information in Biological
Expression Language (BEL) . . . . . . . . . . . . . . 10
Chapter 3 Method . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Problem Statement of Function-Level Classification . . 14
3.2 Feature Extraction . . . . . . . . . . . . . . . . . . . 14
3.2.1 Basic Features . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 BEL Term-Level Features . . . . . . . . . . . . . . . 16
3.2.3 Advanced Features . . . . . . . . . . . . . . . . . . . 17
3.3 System Flow of Proposed Method I . . . . . . . . . . 18
3.3.1 Support Vector Machine (SVM) . . . . . . . . . . . . 18
3.3.2 Solution of Imbalanced Data . . . . . . . . . . . . . . 23
3.3.3 Voting Mechanism . . . . . . . . . . . . . . . . . . . 24
iii
3.3.4 Post-Processing . . . . . . . . . . . . . . . . . . . . . 26
3.4 System Flow of Proposed Method II . . . . . . . . . . 27
3.4.1 One-Hot Vector Converter . . . . . . . . . . . . . . . 27
3.4.2 Principal Component Analysis (PCA) . . . . . . . . . 28
3.4.3 K-Nearst Neighbor (KNN) Classifier . . . . . . . . . . 31
Chapter 4 Experiment and Evaluation . . . . . . . . . . . . . . . 32
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Experimental Setting . . . . . . . . . . . . . . . . . . 32
4.2.1 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 The Proposed Method I . . . . . . . . . . . . . . . . . 33
4.2.3 The Proposed Method II . . . . . . . . . . . . . . . . 34
4.3 Evaluation Metric . . . . . . . . . . . . . . . . . . . . 35
4.4 Experimental Results . . . . . . . . . . . . . . . . . . 35
Chapter 5 Discussion . . . . . . . . . . . . . . . . . . . . . . . 41
Chapter 6 Biology Interpretation . . . . . . . . . . . . . . . . . 46
Chapter 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . 48
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
參考文獻 [1] Shatkay, Hagit, and Ronen Feldman. Mining the biomedical literature in the
genomic era: an overview. Journal of computational biology 10.6 (2003):
821-855.
[2] Zhu, F., et al., Biomedical text mining and its applications in cancer research.
Journal of Biomedical Informatics, 2013. 46(2): p. 200-211.
[3] Hucka, M., et al., The systems biology markup language (SBML): a medium
for representation and exchange of biochemical network models. Bioinformatics,
2003. 19(4): p. 524-531.
[4] Demir, E., et al., The BioPAX community standard for pathway data sharing.
Nature biotechnology, 2010. 28(9): p. 935-942.
[5] Slater, T. and D. Song, Saved by the BEL: Ringing in a common language for
the life sciences. Drug Discovery World, 2012: p. 75-80.
[6] Fluck, J., et al., Track 4 Overview: Extraction of Causal Network Information
in Biological Expression Language (BEL).
[7] Elayavilli, R.K., M. Rastegar-Mojarad, and H. Liu. Adapting a rule-based
relation extraction system for BioCreative V BEL task. in Proceedings of the
fifth BioCreative challenge evaluation workshop. Sevilla, Spain. 2015.
[8] Ravikumar, K. E., Kavishwar B. Wagholikar, and HONGFANG LIU. Towards
pathway curation through literature mining–a case study using phar-
mGKB. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.
NIH Public Access, 2014.
[9] Hewett, M., et al., PharmGKB: the pharmacogenetics knowledge base. Nucleic
acids research, 2002. 30(1): p. 163-165.
[10] Lai, P.-T., et al., NCU-IISR System for BioCreative BEL Task. 2011
[11] Giménez, J. and L. Marquez, Fast and accurate part-of-speech tagging: The
SVM approach revisited. Recent Advances in Natural Language Processing
III, 2004: p. 153-162.
[12] McCallum, A. and W. Li. Early results for named entity recognition with
conditional random fields, feature induction and web-enhanced lexicons. in
Proceedings of the seventh conference on Natural language learning at HLTNAACL
2003-Volume 4. 2003. Association for Computational Linguistics.
[13] Pang, B., L. Lee, and S. Vaithyanathan. Thumbs up?: sentiment classification
using machine learning techniques. in Proceedings of the ACL-02 conference
on Empirical methods in natural language processing-Volume 10. 2002. Association
for Computational Linguistics.
[14] Ohta, T., et al., Overview of the pathway curation (PC) task of bioNLP shared
task 2013. 2013.
[15] Miyao, Y. and J.i. Tsujii, Feature forest models for probabilistic HPSG parsing.
Computational Linguistics, 2008. 34(1): p. 35-80.
[16] Sagae, K. and J.i. Tsujii. Dependency Parsing and Domain Adaptation with
LR Models and Parser Ensembles. in EMNLP-CoNLL. 2007.
[17] De Marneffe, M.-C., B. MacCartney, and C.D. Manning. Generating typed
dependency parses from phrase structure parses. in Proceedings of LREC.
2006.
[18] Choi, M., et al. Integrating Coreference Resolution for BEL Statement Generation.
in Proceedings of the fifth BioCreative challenge evaluatio workshop.
Sevilla, Spain. 2015.
[19] Elayavilli, R.K., M. Rastegar-Mojarad, and H. Liu. Adapting a rule-based
relation extraction system for BioCreative V BEL task. in Proceedings of the
fifth BioCreative challenge evaluation workshop. Sevilla, Spain. 2015.
[20] Wei, C.-H., H.-Y. Kao, and Z. Lu, PubTator: a web-based text mining tool
for assisting biocuration. Nucleic acids research, 2013: p. gkt441.
[21] Nunes, T., et al., BeCAS: biomedical concept recognition services and visualization
Bioinformatics, 2013: p. btt317.
[22] Ravikumar, K., et al. An ensemble approach for chemical entity mention detection
and indexing. in BioCreative Challenge Evaluation Workshop. 2013.
[23] Hewett, M., et al., PharmGKB: the pharmacogenetics knowledge base. Nucleic
acids research, 2002. 30(1): p. 163-165.
[24] Lai, P.-T., et al., NCU-IISR System for BioCreative BEL Task. 2011.
[25] Kim, J.-D., et al. Introduction to the bio-entity recognition task at JNLPBA. in
Proceedings of the international joint workshop on natural language processing
in biomedicine and its applications. 2004. Association for Computational
Linguistics.
[26] Ahmed, Shabbir, and Farzana Mithun. Word Stemming to Enhance Spam Filtering.
CEAS. 2004.
[27] Chen, Danqi, and Christopher D. Manning. A Fast and Accurate Dependency
Parser using Neural Networks. EMNLP. 2014.
[28] Chang, Chih-Chung, and Chih-Jen Lin. LIBSVM: a library for support vector
machines. ACM Transactions on Intelligent Systems and Technology (TIST)
2.3 (2011): 27.
[29] James Mercer, Functions of Positive and Negative Type, and their Connection
with the Theory of Integral Equations.Philosophical Transactions of the
Royal Society of London, vol. 209, pp.415-446, January 1909.
[30] Aizerman, M. A. and Braverman, E. A. and Rozonoer, L., Theoretical foundations
of the potential function method in pattern recognition learning. Automation
and Remote Control, vol. 25, pp.821-837,1964
[31] Kivinen, Jyrki, and Manfred K. Warmuth. The perceptron algorithm vs. winnow:
linear vs. logarithmic mistake bounds when few input variables are relevant.
Proceedings of the eighth annual conference on Computational learning
theory. ACM, 1995.
[32] Cochran, William G. Sampling techniques. (1953).
[33] Gundersen, H. J. G., and E. B. Jensen. The efficiency of systematic sampling
in stereology and its prediction. Journal of microscopy 147.3 (1987): 229-
263.
[34] He, Haibo, and Edwardo A. Garcia. Learning from imbalanced data. IEEE
Transactions on knowledge and data engineering 21.9 (2009): 1263-1284.
[35] Liu, Tian-Yu. Easyensemble and feature selection for imbalance data
sets. Bioinformatics, Systems Biology and Intelligent Computing, 2009.
IJCBS’09. International Joint Conference on. IEEE, 2009.
[36] Wold, Svante, Kim Esbensen, and Paul Geladi. Principal component analysis.
Chemometrics and intelligent laboratory systems 2.1-3 (1987): 37-52.
[37] Peterson, Leif E. K-nearest neighbor. Scholarpedia 4.2 (2009): 1883.
[38] Kitamura, T., et al., Insulin-induced phosphorylation and activation of cyclic
nucleotide phosphodiesterase 3B by the serine-threonine kinase Akt. Molecular
and cellular biology, 1999. 19(9): p. 6286-6296.
指導教授 蔡宗翰(Tzong-Han Tsai) 審核日期 2016-8-19
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明