博碩士論文 106522109 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:143 、訪客IP:18.219.28.145
姓名 張雅萍(Ya-Ping Chang)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 蛋白質賴氨酸丙二酰化修飾作用位點之預測系統
相關論文
★ 空氣汙染物與疾病關聯性之研究與利用深度學習預測疾病★ 利用質譜儀資料快速檢測金黃色葡萄球菌之抗藥性
★ 根據質譜儀資料辨識大腸桿菌抗藥性之特徵峰值★ 基於機器學習方法的抗微生物肽活性預測 及特徵分析
★ 用於預測抗菌肽多種功能類別的多標籤分類器★ 利用機器學習預測濁水溪沖積扇區域之地下水砷汙染
★ 基於質譜儀資料使用深度學習方法預測不同地區之耐甲氧西林金黃色葡萄球菌之抗藥性★ EnHemo:融合蛋白質語言模型的集成框架用於識別高活性抗菌肽的溶血毒性
★ 整合磷酸蛋白質組數據與深度學習的激酶活性圖譜預測與研究
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 賴氨酸丙二酰化作用(Malonylation)是一種新發現的蛋白質轉譯後修飾作用(Post-translational modification, PTM),發生該作用之蛋白質在許多生物功能中扮演著重要的角色,例如:動物蛋白質中葡萄糖及脂肪酸的代謝途徑、2型糖尿病的發病機制,和植物的碳代謝作用等。即便如此,目前對於丙二酰化作用之相關機制的研究成果仍相當有限。而其中,找出丙二酰化之作用位點為分析其相關機制的一個重要過程。傳統的驗證方式主要是透過在實驗室中的一系列生物實驗與分析,然而此方法之人力、時間與金錢成本都相當高。利用計算生物學的技術來辨識蛋白質轉譯後修飾作用位點已成為重要的研究議題。目前以計算生物學來判斷是否發生賴氨酸丙二酰化的研究大多專注於哺乳類蛋白質中,對於植物蛋白質之賴氨酸丙二酰化作用位點之預測系統卻還沒有一個專門的工具。因此,本研究提出以深度學習(Deep learning)方法來識別哺乳類與植物蛋白質之丙二酰化作用位點。我們從蛋白質胺基酸的物理化學性質、演化訊息,以及序列資訊等提取特徵,藉由混和式深度學習模型來識別發生丙二酰化作用之位置。在獨立集的測試中,分別在預測哺乳類動物蛋白質與植物蛋白質的丙二酰化作用位點模型中得到了AUC (Area under the receiver operating characteristic curve) 0.943與0.772。最後,建立了網站(Kmalo, http://fdblab.csie.ncu.edu.tw/Kmalo/)來提供這兩個預測模型。
摘要(英) Lysine malonylation is one of the newly recognized post-translational modification (PTMs), it is involved in many biological functions, such as cellular regulation, disease processes and carbon fixation. For better understanding the mechanisms of malonylation, identifying malonylation sites is an essential process. Traditionally, their identifications mainly rely on the mass spectrometry and biological experiments, which is time-consuming, labor-intensive and expensive. Recently, some studies have proposed computational approaches to predict malonylation sites in mammalian proteins. However, there has no predictor for malonylation sites in plant proteins. In this study, we developed two deep learning-based frameworks for identifying malonylation sites in mammalian and plant proteins separately. Physicochemical properties, evolutionary information and sequenced-based features were extracted for training the perdition models. We utilized hybrid deep learning models to predict the malonylation sites. The independent testing results for mammalian and plant proteins achieved an area under the receiver operating characteristic curve (AUC) value of 0.943 and 0.772 respectively. Furthermore, the prediction models are freely available as an online server —named Kmalo at http://fdblab.csie.ncu.edu.tw/Kmalo/.
關鍵字(中) ★ 賴氨酸丙二酰化作用
★ 深度學習
★ 卷積神經網路
★ 隨機森林
★ 支持向量機
關鍵字(英) ★ malonylation
★ deep learning
★ convolutional neural network
★ random forest
★ support vector machine
論文目次 中文摘要....................................................................................................................................ii
Abstract .....................................................................................................................................iii
致謝...........................................................................................................................................iv
Table of contents ........................................................................................................................ v
List of figures ............................................................................................................................vi
List of tables.............................................................................................................................vii
Chapter 1. Introduction ........................................................................................................ 1
1.1. Background.........................................................................................................1
1.2. Related works.....................................................................................................2
1.3. Motivation ..........................................................................................................3
1.4. Research goal......................................................................................................3
Chapter 2. Materials and methods........................................................................................ 4
2.1. Data resources.....................................................................................................5
2.2. Data preprocessing .............................................................................................5
2.3. Features extraction..............................................................................................7
2.3.1. Sequenced-based feature ............................................................................7
2.3.2. Physicochemical properties........................................................................8
2.3.3. Evolutionary information ...........................................................................9
2.4. Construction of predictive models......................................................................9
2.4.1. Random forest (RF)....................................................................................9
2.4.2. Support vector machine (SVM)..................................................................9
2.4.3. Convolutional neural network (CNN)........................................................9
2.4.4. Hybrid models ..........................................................................................10
2.5. Evaluation metrics............................................................................................13
Chapter 3. Results.............................................................................................................. 14
3.1. Sequence analysis.............................................................................................14
3.2. Feature analysis ................................................................................................17
3.3. Functional analysis...........................................................................................23
3.4. Determination of the window size and model of each feature .........................25
3.5. Performance of classification ...........................................................................27
3.6. Comparison with existing tools........................................................................28
3.7. Web tool............................................................................................................29
Chapter 4. Discussions and conclusions............................................................................ 31
References................................................................................................................................ 33
參考文獻 1. Nørregaard Jensen, O., Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Current Opinion in Chemical Biology, 2004. 8(1): p. 33-41.
2. Wang, Y.-C., et al., Protein post-translational modifications and regulation of pluripotency in human stem cells. Cell Research, 2013. 24: p. 143.
3. Ahearn, I.M., et al., Regulating the regulator: post-translational modification of RAS. Nature Reviews Molecular Cell Biology, 2011. 13: p. 39.
4. Gong, C.X., et al., Post-translational modifications of tau protein in Alzheimer’s disease. Journal of Neural Transmission, 2005. 112(6): p. 813-838.
5. Peng, C., et al., The first identification of lysine malonylation substrates and its regulatory enzyme. 2011. 10(12): p. M111. 012658.
6. Du, Y., et al., Lysine malonylation is elevated in type 2 diabetic mouse models and enriched in metabolic associated proteins. Mol Cell Proteomics, 2015. 14(1): p. 227-36.
7. Nishida, Y., et al., SIRT5 regulates both cytosolic and mitochondrial protein malonylation with glycolysis as a major target. Mol Cell, 2015. 59(2): p. 321-32.
8. Xie, Z., et al., Lysine succinylation and lysine malonylation in histones. Molecular & Cellular Proteomics, 2012. 11(5): p. 100-107.
9. Taguchi, G., et al., Malonylation is a key reaction in the metabolism of xenobiotic phenolic glucosides in Arabidopsis and tobacco. The Plant Journal, 2010. 63(6): p. 1031-1041.
10. Liu, J., et al., Systematic analysis of the lysine malonylome in common wheat. BMC Genomics, 2018. 19(1): p. 209.
11. Mujahid, H., et al., Malonylome analysis in developing rice (Oryza sativa) seeds suggesting that protein lysine malonylation is well-conserved and overlaps with acetylation and succinylation substantially. J Proteomics, 2018. 170: p. 88-98.
12. Xu, Y., et al., Mal-Lys: prediction of lysine malonylation sites in proteins integrated sequence-based features with mRMR feature selection. 2016. 6: p. 38318.
13. Xiang, Q., et al., Prediction of lysine malonylation sites based on pseudo amino acid. Comb Chem High Throughput Screen, 2017. 20(7): p. 622-628.
14. Wang, L.-N., et al., Computational prediction of species-specific malonylation sites via enhanced characteristic strategy. 2016. 33(10): p. 1457-1463.
15. Taherzadeh, G., et al., Predicting lysine‐malonylation sites of proteins using sequence and predicted structural features. 2018. 39(22): p. 1757-1763.
16. Zhang, Y., et al., Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Brief Bioinform, 2018.
17. Ahmed, A., et al., Prediction of lysine-malonylation sites via sequential and physicochemical features. 2018.
18. Huang, Y., et al., BERMP: a cross-species classifier for predicting m(6)A sites by integrating a deep learning algorithm and a random forest approach. International journal of biological sciences, 2018. 14(12): p. 1669-1677.
19. He, F., et al. A multimodal deep architecture for large-scale protein ubiquitylation site prediction. in 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2017.
20. Zhao, X., et al., General and species-specific lysine acetylation site prediction using a bi-modal deep architecture. Vol. PP. 2018. 1-1.
21. Xie, Y., et al., DeepNitro: prediction of protein nitration and nitrosylation sites by deep learning. Genomics Proteomics Bioinformatics, 2018. 16(4): p. 294-306.
22. Chen, Z., et al., Integration of a deep learning classifier with a random forest approach for predicting malonylation sites. Genomics Proteomics Bioinformatics, 2018. 16(6): p. 451-459.
23. Xu, H., et al., PLMD: An updated data resource of protein lysine modifications. Journal of Genetics and Genomics, 2017. 44(5): p. 243-250.
24. Huang, Y., et al., CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 2010. 26(5): p. 680-682.
25. Consortium, U., The universal protein resource (UniProt). Nucleic acids research, 2007. 36(suppl_1): p. D190-D195.
26. Lin, C.-T., et al., Protein metal binding residue prediction based on neural networks. International Journal of Neural Systems, 2005. 15(01n02): p. 71-84.
27. Shen, H.-B., et al., PseAAC: A flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical Biochemistry, 2008. 373(2): p. 386-388.
28. Chen, Z., et al., iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 2018. 34(14): p. 2499-2502.
29. Kawashima, S., et al., AAindex: amino acid index database, progress report 2008. Nucleic acids research, 2007. 36(suppl_1): p. D202-D205.
30. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research, 1997. 25(17): p. 3389-3402.
31. Breiman, L., Random forests. Machine Learning, 2001. 45(1): p. 5-32.
32. Pedregosa, F., et al., Scikit-learn: Machine learning in Python. Journal of machine learning research, 2011. 12(Oct): p. 2825-2830.
33. Noble, W.S., What is a support vector machine? Nature biotechnology, 2006. 24(12): p. 1565.
34. Crooks, G.E., et al., WebLogo: a sequence logo generator. Genome research, 2004. 14(6): p. 1188-1190.
35. Vacic, V., et al., Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics, 2006. 22(12): p. 1536-1537.
36. Mi, H., et al., Large-scale gene function analysis with the PANTHER classification system. Nature protocols, 2013. 8(8): p. 1551.
指導教授 洪炯宗 吳立青(Jorng-Tzong Horng Li-Ching Wu) 審核日期 2019-7-10
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明