摘要: | 由於抗生素的過度使用,微生物病原體已對其產生耐藥性,迫切需要開發替代療法以治療感染。抗微生物肽(Antimicrobial peptides, AMPs)是一種小型蛋白質,對細菌、真菌、寄生蟲和病毒等具有廣泛的抑制作用,因此近年來成為新型抗感染藥物之一。從微生物學角度來看,最小抑菌濃度(Minimum Inhibitory Concentration, MIC)是指能夠抑制細菌生長的最低濃度,並且是評估藥物活性的重要指標。本研究的主要目的是建立抗微生物肽最小抑菌濃度精確值的回歸模型。我們在研究中使用了八種不同的模型架構,並結合多種不同的序列特徵和基因特徵來驗證架構的穩健性。在這項研究中,我們最終採用了蛋白質語言模型生成的上下文嵌入層,並與基因特徵相結合,應用於深度學習模型架構中,取得了良好的評估結果。進一步,我們採用集成學習的方法,將三個最佳的監督式學習模型結果進行結合,並對集成模型進行評估。在金黃色葡萄球菌 (Staphylococcus aureus ATCC 25923)、大腸桿菌(Escherichia coli ATCC 25922)和綠膿桿菌(Pseudomonas aeruginosa ATCC 27853)的資料集上進行了測試,結果顯示皮爾森相關係數分別為0.756、0.781和0.802。此三隻細菌被世界衛生組織列為需要緊急研究的菌株。這些結果表明我們的集成模型在預測最小抑菌濃度方面具有一定的準確性,並且模型表現良好。;Due to the excessive use of antibiotics, microbial pathogens have developed resistance to them, necessitating the urgent development of alternative therapies for infections. Antimicrobial peptides (AMPs) are small proteins that exhibit broad inhibitory effects against bacteria, fungi, parasites, and viruses et al. As a result, AMPs have emerged as a novel class of antimicrobial agents in recent years. In microbiology, the Minimum Inhibitory Concentration (MIC) refers to the lowest concentration that can inhibit bacterial growth and serves as an important indicator of drug activity. The primary objective of this study is to construct a regression model for predicting the MIC values of AMPs. Eight different model architectures were employed, along with various sequence features and genomic features, to assess the robustness of the frameworks. In this study, we ultimately utilized the contextual embeddings generated by a protein Language Model, combined with genomic features, in a deep learning architecture, achieving good evaluation results. Through an ensemble learning approach, the results of three top-performing supervised learning models were combined, and the ensemble model was evaluated. Pearson correlation coefficients of 0.756, 0.781, and 0.802 were obtained when testing the dataset against Staphylococcus aureus ATCC 25923, Escherichia coli ATCC 25922, and Pseudomonas aeruginosa ATCC 27853, respectively. These three strains are listed by the World Health Organization as requiring urgent research. These results demonstrate a certain level of accuracy in predicting the MIC using our ensemble model, which also exhibits good performance. |