基於人工智慧模型預測抗菌肽的最小抑菌濃度於特定菌株上

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：17

、訪客IP：3.144.250.153

姓名

簡崇宇(Chung-Yu Chien) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

基於人工智慧模型預測抗菌肽的最小抑菌濃度於特定菌株上
(Prediction of Antimicrobial Minimum Inhibitory Concentrations for Specific Strains Using Artificial Intelligence-based Model)

相關論文

★ 基於質譜儀資料使用機器學習辨識克雷伯氏肺炎桿菌之多重抗藥性	★ 結合多種訊號預處理方法於質譜儀資料以辨識細菌對抗生素之抗藥性
★ 利用機器學習預測濁水溪沖積扇區域之地下水位	★ 使用表徵學習和機器學習方法於晶圓線切割機台之異常偵測
★ 基於質譜儀資料利用人工智慧方法辨識革蘭氏陰性菌對環丙沙星抗藥性之特徵峰值	★ 應用數位分身於馬達軸承之異常偵測
★ 基於光誘導介電泳影像處理檢測流體抗藥性	★ 利用機器學習方法基於多類型地層監測資料預測濁水溪沖積扇地區之地層下陷
★ 使用語言模型嵌入和不平衡調整之深度學習方法識別多功能抗菌肽	★ 使用權重組合模型預測雲林縣地層下陷
★ 基於深度學習從核醣核酸定序表達譜推斷外周血單核細胞之細胞組成

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

由於抗生素的過度使用，微生物病原體已對其產生耐藥性，迫切需要開發替代療法以治療感染。抗微生物肽（Antimicrobial peptides, AMPs）是一種小型蛋白質，對細菌、真菌、寄生蟲和病毒等具有廣泛的抑制作用，因此近年來成為新型抗感染藥物之一。從微生物學角度來看，最小抑菌濃度（Minimum Inhibitory Concentration, MIC）是指能夠抑制細菌生長的最低濃度，並且是評估藥物活性的重要指標。本研究的主要目的是建立抗微生物肽最小抑菌濃度精確值的回歸模型。我們在研究中使用了八種不同的模型架構，並結合多種不同的序列特徵和基因特徵來驗證架構的穩健性。在這項研究中，我們最終採用了蛋白質語言模型生成的上下文嵌入層，並與基因特徵相結合，應用於深度學習模型架構中，取得了良好的評估結果。進一步，我們採用集成學習的方法，將三個最佳的監督式學習模型結果進行結合，並對集成模型進行評估。在金黃色葡萄球菌 (Staphylococcus aureus ATCC 25923)、大腸桿菌(Escherichia coli ATCC 25922)和綠膿桿菌(Pseudomonas aeruginosa ATCC 27853)的資料集上進行了測試，結果顯示皮爾森相關係數分別為0.756、0.781和0.802。此三隻細菌被世界衛生組織列為需要緊急研究的菌株。這些結果表明我們的集成模型在預測最小抑菌濃度方面具有一定的準確性，並且模型表現良好。

摘要(英)

Due to the excessive use of antibiotics, microbial pathogens have developed resistance to them, necessitating the urgent development of alternative therapies for infections. Antimicrobial peptides (AMPs) are small proteins that exhibit broad inhibitory effects against bacteria, fungi, parasites, and viruses et al. As a result, AMPs have emerged as a novel class of antimicrobial agents in recent years. In microbiology, the Minimum Inhibitory Concentration (MIC) refers to the lowest concentration that can inhibit bacterial growth and serves as an important indicator of drug activity. The primary objective of this study is to construct a regression model for predicting the MIC values of AMPs. Eight different model architectures were employed, along with various sequence features and genomic features, to assess the robustness of the frameworks. In this study, we ultimately utilized the contextual embeddings generated by a protein Language Model, combined with genomic features, in a deep learning architecture, achieving good evaluation results. Through an ensemble learning approach, the results of three top-performing supervised learning models were combined, and the ensemble model was evaluated. Pearson correlation coefficients of 0.756, 0.781, and 0.802 were obtained when testing the dataset against Staphylococcus aureus ATCC 25923, Escherichia coli ATCC 25922, and Pseudomonas aeruginosa ATCC 27853, respectively. These three strains are listed by the World Health Organization as requiring urgent research. These results demonstrate a certain level of accuracy in predicting the MIC using our ensemble model, which also exhibits good performance.

關鍵字(中)

★ 抗微生物肽
★ 最小抑菌濃度
★ 回歸模型
★ 蛋白質語言模型

關鍵字(英)

★ Antimicrobial peptides
★ Minimum Inhibitory Concentration
★ regression model
★ protein Language Model

論文目次

Table of Contents
中文摘要 i
Abstract ii
Table of Contents iii
List of Figures vi
List of Tables ix
Chapter 1 Introduction 1
1.1 Background 1
1.2 Related Works 2
1.3 Motivation and Goal 4
Chapter 2 Materials and Methods 6
2.1 Data Collection and Preprocessing 6
2.2 Feature Extraction 9
2.2.1 iFeature 9
2.2.2 Sequence Encoding 10
2.2.3 Pre-trained Embeddings 12
2.2.4 Genome Sequence Features 12
2.3 Models Building 13
2.3.1 Random Forest (RF) 14
2.3.2 Extreme Gradient Boosting (XGBoost) 14
2.3.3 Categorical Boosting (CatBoost) 15
2.3.4 Light Gradient Boosting Machine (LGBM) 15
2.3.5 Bi-Directional Long Short-Term Memory (BiLSTM) 16
2.3.6 Convolutional Neural Network (CNN) 17
2.3.7 Transformer 18
2.3.8 Multi-Branch Model 20
2.3.9 Ensemble Model 21
2.4 Evaluation Metrics 21
Chapter 3 Results 23
3.1 Observation of AMP 23
3.2 Only AMP Sequence-based Features to Models 31
3.3 Combine Genomic Features and AMP Sequence-based Features to Models 39
3.4 Comparison All Models with Ensemble Model 45
3.5 Comparison with Other Studies 48
Chapter 4 Conclusions 51
References 52

參考文獻

1. Ventola, C.L., The antibiotic resistance crisis: Part 1: Causes and threats. P T, 2015. 40(4): p. 277-83.
2. Kritsotakis, E.I., et al., Burden of multidrug and extensively drug-resistant eskapee pathogens in a secondary hospital care setting in greece. Epidemiol Infect, 2022. 150: p. e170.
3. Luong, H.X., T.T. Thanh, and T.H. Tran, Antimicrobial peptides - advances in development of therapeutic applications. Life Sci, 2020. 260: p. 118407.
4. Lei, J., et al., The antimicrobial peptides and their potential clinical applications. American Journal of Translational Research, 2019. 11(7): p. 3919-3931.
5. Huang, K.Y., et al., Identification of natural antimicrobial peptides from bacteria through metagenomic and metatranscriptomic analysis of high-throughput transcriptome data of taiwanese oolong teas. Bmc Systems Biology, 2017. 11.
6. Pasupuleti, M., A. Schmidtchen, and M. Malmsten, Antimicrobial peptides: Key components of the innate immune system. Crit Rev Biotechnol, 2012. 32(2): p. 143-71.
7. Jenssen, H., P. Hamill, and R.E. Hancock, Peptide antimicrobial agents. Clin Microbiol Rev, 2006. 19(3): p. 491-511.
8. Toke, O., Antimicrobial peptides: New candidates in the fight against bacterial infections. Biopolymers, 2005. 80(6): p. 717-735.
9. Mahlapuu, M., et al., Antimicrobial peptides : An emerging category of therapeutic agents. Frontiers in Cellular and Infection Microbiology, 2016. 6.
10. Andrews, J.M., Determination of minimum inhibitory concentrations (vol 48, s1, pg 5, 2001). Journal of Antimicrobial Chemotherapy, 2002. 49(6): p. 1049a-1049a.
11. Yasir, M., et al., Prediction of antimicrobial minimal inhibitory concentrations for neisseria gonorrhoeae using machine learning models. Saudi Journal of Biological Sciences, 2022. 29(5): p. 3687-3693.
12. Xiao, X. and Z.-B. You. Predicting minimum inhibitory concentration of antimicrobial peptides by the pseudo-amino acid composition and gaussian kernel regression. in 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI). 2015. IEEE.
13. Wang, X.X., S. Chen, and D.J. Brown, An approach for constructing parsimonious generalized gaussian kernel regression models. Neurocomputing, 2004. 62: p. 441-457.
14. Chou, K.C., Prediction of signal peptides using scaled window. Peptides, 2001. 22(12): p. 1973-9.
15. Wang, Z. and G. Wang, Apd: The antimicrobial peptide database. Nucleic Acids Res, 2004. 32(Database issue): p. D590-2.
16. Dean, S.N., et al., Pepvae: Variational autoencoder framework for antimicrobial peptide generation and activity prediction. Front Microbiol, 2021. 12: p. 725727.
17. O′Shea, K. and R. Nash, An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015.
18. Zou, H. and T. Hastie, Regularization and variable selection via the elastic net (vol b 67, pg 301, 2005). Journal of the Royal Statistical Society Series B-Statistical Methodology, 2005. 67: p. 768-768.
19. Natekin, A. and A. Knoll, Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 2013. 7.
20. Vovk, V., Kernel ridge regression. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, 2013: p. 105-116.
21. Ranstam, J. and J. Cook, Lasso regression. Journal of British Surgery, 2018. 105(10): p. 1348-1348.
22. Breiman, L., Random forests. Machine learning, 2001. 45: p. 5-32.
23. Fan, J., et al., Light gradient boosting machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agricultural water management, 2019. 225: p. 105758.
24. Chen, T., et al., Xgboost: Extreme gradient boosting. R package version 0.4-2, 2015. 1(4): p. 1-4.
25. Yan, J., et al., A deep learning method for predicting the minimum inhibitory concentration of antimicrobial peptides against escherichia coli using multi-branch-cnn and attention. 2023.
26. Pirtskhalava, M., et al., Dbaasp v3: Database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic acids research, 2021. 49(D1): p. D288-D297.
27. Suthaharan, S. and S. Suthaharan, Support vector machine. Machine learning models and algorithms for big data classification: thinking with examples for effective learning, 2016: p. 207-235.
28. Chen, J., H.H. Cheong, and S.W.I. Siu, Xdeep-acpep: Deep learning method for anticancer peptide activity prediction based on convolutional neural network and multitask learning. J Chem Inf Model, 2021. 61(8): p. 3789-3803.
29. Vishnepolsky, B., et al., Comparative analysis of machine learning algorithms on the microbial strain-specific amp prediction. Briefings in Bioinformatics, 2022. 23(4): p. bbac233.
30. Hastie, T., et al., Multi-class adaboost. Statistics and its Interface, 2009. 2(3): p. 349-360.
31. Gull, S. and F. Minhas, Amp(0): Species-specific prediction of anti-microbial peptides using zero and few shot learning. IEEE/ACM Trans Comput Biol Bioinform, 2022. 19(1): p. 275-283.
32. Sharma, R., et al., Artificial intelligence-based model for predicting the minimum inhibitory concentration of antibacterial peptides against eskapee pathogens. IEEE J Biomed Health Inform, 2023. PP.
33. O′Leary, N.A., et al., Reference sequence (refseq) database at ncbi: Current status, taxonomic expansion, and functional annotation. Nucleic acids research, 2016. 44(D1): p. D733-D745.
34. Jhong, J.H., et al., Dbamp 2.0: Updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data. Nucleic Acids Res, 2022. 50(D1): p. D460-D470.
35. Shi, G., et al., Dramp 3.0: An enhanced comprehensive data repository of antimicrobial peptides. Nucleic Acids Res, 2022. 50(D1): p. D488-D496.
36. Chen, Z., et al., Ifeatureomega: An integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets. Nucleic Acids Res, 2022. 50(W1): p. W434-W447.
37. Chen, Z., et al., Ifeature: A python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 2018. 34(14): p. 2499-2502.
38. Kawashima, S. and M. Kanehisa, Aaindex: Amino acid index database. Nucleic acids research, 2000. 28(1): p. 374-374.
39. Lee, T.Y., et al., Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites. PLoS One, 2011. 6(3): p. e17331.
40. Sandberg, M., et al., New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. Journal of medicinal chemistry, 1998. 41(14): p. 2481-2491.
41. Elnaggar, A., et al., Prottrans: Toward understanding the language of life through self-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 2021. 44(10): p. 7112-7127.
42. Vaswani, A., et al., Attention is all you need. Advances in neural information processing systems, 2017. 30.
43. Suzek, B.E., et al., Uniref: Comprehensive and non-redundant uniprot reference clusters. Bioinformatics, 2007. 23(10): p. 1282-8.
44. Dee, W., Lmpred: Predicting antimicrobial peptides using pre-trained language models and deep learning. Bioinform Adv, 2022. 2(1): p. vbac021.
45. Bonidia, R.P., et al., Mathfeature: Feature extraction package for DNA, rna and protein sequences based on mathematical descriptors. Brief Bioinform, 2022. 23(1).
46. Dorogush, A.V., V. Ershov, and A. Gulin, Catboost: Gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363, 2018.
47. Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation, 1997. 9(8): p. 1735-1780.
48. Abadi, M., et al. Tensorflow: A system for large-scale machine learning. in Osdi. 2016. Savannah, GA, USA.
49. Pedregosa, F., et al., Scikit-learn: Machine learning in python. the Journal of machine Learning research, 2011. 12: p. 2825-2830.
50. Zhao, W., et al., A multi-label learning framework for predicting antibiotic resistance genes via dual-view modeling. Briefings in Bioinformatics, 2022. 23(3).
51. Abdi, H. and L.J. Williams, Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2010. 2(4): p. 433-459.

指導教授

洪炯宗吳立青(Jorng-Tzong Horng)

審核日期

2023-7-27

推文