EnHemo：融合蛋白質語言模型的集成框架用於識別高活性抗菌肽的溶血毒性

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：84

、訪客IP：3.147.126.146

姓名

吳晨瑄(Chen-Xuan Wu) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

EnHemo：融合蛋白質語言模型的集成框架用於識別高活性抗菌肽的溶血毒性

相關論文

★ 空氣汙染物與疾病關聯性之研究與利用深度學習預測疾病	★ 利用質譜儀資料快速檢測金黃色葡萄球菌之抗藥性
★ 根據質譜儀資料辨識大腸桿菌抗藥性之特徵峰值	★ 蛋白質賴氨酸丙二酰化修飾作用位點之預測系統
★ 基於機器學習方法的抗微生物肽活性預測及特徵分析	★ 用於預測抗菌肽多種功能類別的多標籤分類器
★ 利用機器學習預測濁水溪沖積扇區域之地下水砷汙染	★ 基於質譜儀資料使用深度學習方法預測不同地區之耐甲氧西林金黃色葡萄球菌之抗藥性
★ 整合磷酸蛋白質組數據與深度學習的激酶活性圖譜預測與研究

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

抗藥性是當前全球面臨的重大公共衛生挑戰之一。抗微生物肽（AMPs）被認為是應對日益嚴重的抗生素抗藥性威脅的有前景工具。然而，儘管AMPs具備許多優點，其在臨床應用中面臨一個關鍵挑戰，即對哺乳動物細胞的溶血毒性。為了克服這一挑戰，本研究引入名為EnHemo的整合模型，旨在辨識高活性抗菌肽的溶血毒性。EnHemo模型結合多種先進技術，包括極限梯度提升算法、殘差算法和遷移學習算法，並利用iFeature特徵和先進的蛋白質語言模型來提高解釋性和預測準確性。研究結果顯示，EnHemo模型在兩個數據集上分別達到了90.60%和96.43%的高準確率，顯著超越現有分類器在準確性和均衡分類任務上的表現。此外，EnHemo模型的多層次特徵整合和先進算法應用，顯示出其在實際應用中的潛力。總之，我們提出的EnHemo模型不僅能有效識別安全且具有高活性的抗菌肽，還為未來抗菌肽的設計和開發提供了重要的參考和指導。這一研究成果有望推動AMPs在臨床上的安全應用，為抗擊抗藥性威脅提供新的解決方案。

摘要(英)

Antimicrobial resistance is one of the major public health challenges currently facing the world. Antimicrobial peptides are considered promising tools to address the growing threat of antibiotic resistance. However, despite their many advantages, AMPs face a critical challenge in clinical applications due to their hemolytic toxicity to mammalian cells. To overcome this challenge, we introduce an integrated model named EnHemo, designed to identify the hemolytic toxicity of high active AMPs. EnHemo combines multiple advanced technologies, including Extreme Gradient Boosting, residual algorithms, and transfer learning, utilizing iFeature features and advanced protein language models to enhance interpretability and predictive accuracy. The results show that EnHemo achieved high accuracy rates of 90.60% and 96.43% on two datasets, significantly outperforming existing classifiers in terms of accuracy and balanced classification tasks. Moreover, the multi-level feature integration and advanced algorithms of EnHemo demonstrate its potential in practical applications. In summary, the EnHemo model effectively identifies safe and highly active AMPs and offers important guidance for the design and development of future AMPs. This research outcome is expected to promote the safe clinical application of AMPs, providing a new solution to combat the threat of antimicrobial resistance.

關鍵字(中)

★ 溶血毒性
★ 整合模型
★ 深度學習
★ 機器學習
★ 遷移式學習
★ 蛋白質語言模型

關鍵字(英)

★ Hemolytic Toxicity
★ Ensemble Model
★ Transfer Learning
★ Deep Learning
★ Machine Learning
★ Protein Language Models

論文目次

Table of Contents
中文摘要 i
Abstract ii
致謝 iii
Table of Contents iv
List of Figures v
List of Tables vii
Chapter 1 Introduction 9
1.1 Background 9
1.2 Related Works 10
1.3 Motivation and Goal 13
Chapter 2 Materials and Methods 15
2.1 Dataset 17
2.1.1 Dataset Collection and Preprocessing 17
2.1.2 Dataset Analysis and Visualization 20
2.2 Proposed Framework 27
2.2.1 Model 1-Machine Learning Model 29
2.2.2 Model 2-Deep Learning Model 44
2.2.3 Model 3-Transfer Learning Model 51
2.2.4 Ensemble Model 54
2.3 Evaluation Metrics 55
Chapter 3 Results 57
3.1 Performance of Machine Learning Model 58
3.1.1 Features Importance 61
3.2 Performance of Deep Learning Model 67
3.3 Performance of Transfer Learning Model 72
3.4 Performance of Ensemble Model 77
3.5 Comparison of EnHemo with Other Studies 79
Chapter 4 Discussions and Conclusions 81
4.1 Discussions 81
4.2 Conclusions 88
References 89

參考文獻

References
[1] C. L. Ventola, “The Antibiotic Resistance crisis,” P & T : a peer-reviewed Journal for Formulary Management, vol. 40, no. 4, pp. 277–83, Apr. 2015, Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4378521/
[2] H. Khabbaz, M. H. Karimi-Jafari, A. A. Saboury, and B. BabaAli, “Prediction of antimicrobial peptides toxicity based on their physico-chemical properties using machine learning techniques,” BMC Bioinformatics, vol. 22, no. 1, Nov. 2021, doi: https://doi.org/10.1186/s12859-021-04468-y.
[3] P. B. Timmons and C. M. Hewage, “HAPPENN is a novel tool for hemolytic activity prediction for therapeutic peptides which employs neural networks,” Scientific Reports, vol. 10, no. 1, p. 10869, Jul. 2020, doi: https://doi.org/10.1038/s41598-020-67701-3.
[4] M. Salem, A. Keshavarzi Arshadi, and J. S. Yuan, “AMPDeep: hemolytic activity prediction of antimicrobial peptides using transfer learning,” BMC Bioinformatics, vol. 23, no. 1, Sep. 2022, doi: https://doi.org/10.1186/s12859-022-04952-z.
[5] F. Plisson, O. Ramírez-Sánchez, and C. Martínez-Hernández, “Machine learning-guided discovery and design of non-hemolytic peptides,” Scientific Reports, vol. 10, no. 1, Oct. 2020, doi: https://doi.org/10.1038/s41598-020-73644-6.
[6] K. Chaudhary et al., “A Web Server and Mobile App for Computing Hemolytic Potency of Peptides,” Scientific Reports, vol. 6, no. 1, Mar. 2016, doi: https://doi.org/10.1038/srep22843.
[7] S. Yang and P. Xu, “HemoDL: Hemolytic peptides prediction by double ensemble engines from Rich sequence-derived and transformer-enhanced information,” Analytical Biochemistry, vol. 690, p. 115523, Jul. 2024, doi: https://doi.org/10.1016/j.ab.2024.115523.
[8] R. Sharma, S. Shrivastava, S. K. Singh, A. Kumar, A. K. Singh, and S. Saxena, “EnDL-HemoLyt: Ensemble Deep Learning-Based Tool for Identifying Therapeutic Peptides With Low Hemolytic Activity,” IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 4, pp. 1896–1905, Apr. 2024, doi: https://doi.org/10.1109/jbhi.2023.3264941.
[9] M. Pirtskhalava et al., “DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics,” Nucleic Acids Research, vol. 49, no. D1, pp. D288–D297, Nov. 2020, doi: https://doi.org/10.1093/nar/gkaa991.
[10] A. Gautam et al., “Hemolytik: a database of experimentally determined hemolytic and non-hemolytic peptides,” Nucleic Acids Research, vol. 42, no. Database issue, pp. D444–D449, Jan. 2014, doi: https://doi.org/10.1093/nar/gkt1008.
[11] U. Gawde et al., “CAMPR4: a database of natural and synthetic antimicrobial peptides,” Nucleic Acids Research, vol. 51, no. D1, pp. D377–D383, Nov. 2022, doi: https://doi.org/10.1093/nar/gkac933.
[12] V. V. Kleandrova, J. M. Ruso, A. Speck-Planche, and M. N. Dias Soeiro Cordeiro, “Enabling the Discovery and Virtual Screening of Potent and Safe Antimicrobial Peptides. Simultaneous Prediction of Antibacterial Activity and Cytotoxicity,” ACS Combinatorial Science, vol. 18, no. 8, pp. 490–498, Jul. 2016, doi: https://doi.org/10.1021/acscombsci.6b00063.
[13] V. Vacic, L. M. Iakoucheva, and P. Radivojac, “Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments,” Bioinformatics, vol. 22, no. 12, pp. 1536–1537, Apr. 2006, doi: https://doi.org/10.1093/bioinformatics/btl151.
[14] Z. Chen et al., “iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences,” Bioinformatics, vol. 34, no. 14, pp. 2499–2502, Mar. 2018, doi: https://doi.org/10.1093/bioinformatics/bty140.
[15] S. Kawashima, P. Pokarowski, M. Pokarowska, A. Kolinski, T. Katayama, and M. Kanehisa, “AAindex: amino acid index database, progress report 2008,” Nucleic Acids Research, vol. 36, no. Database, pp. D202–D205, Dec. 2007, doi: https://doi.org/10.1093/nar/gkm998.
[16] T.-Y. Lee, S.-A. Chen, H.-Y. Hung, and Y.-Y. Ou, “Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites,” PLoS ONE, vol. 6, no. 3, p. e17331, Mar. 2011, doi: https://doi.org/10.1371/journal.pone.0017331.
[17] Z. Chen, Y.-Z. Chen, X.-F. Wang, C. Wang, R.-X. Yan, and Z. Zhang, “Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs,” PLoS ONE, vol. 6, no. 7, p. e22930, Jul. 2011, doi: https://doi.org/10.1371/journal.pone.0022930.
[18] I. Dubchak, I. Muchnik, S. R. Holbrook, and S. H. Kim, “Prediction of protein folding class using global description of amino acid sequence.,” Proceedings of the National Academy of Sciences of the United States of America, vol. 92, no. 19, pp. 8700–8704, Sep. 1995, Accessed: Jul. 09, 2024. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC41034/
[19] K.-C. . Chou, “Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes,” Bioinformatics, vol. 21, no. 1, pp. 10–19, Aug. 2004, doi: https://doi.org/10.1093/bioinformatics/bth466.
[20] A. Kidera, Y. Konishi, M. Oka, T. Ooi, and H. A. Scheraga, “Statistical analysis of the physical properties of the 20 naturally occurring amino acids,” Journal of Protein Chemistry, vol. 4, no. 1, pp. 23–55, Feb. 1985, doi: https://doi.org/10.1007/bf01025492.
[21] A. Elnaggar et al., “ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 7112–7127, Oct. 2022, doi: https://doi.org/10.1109/tpami.2021.3095381.
[22] Z. Lin et al., “Evolutionary-scale prediction of atomic-level protein structure with a language model,” Science, vol. 379, no. 6637, pp. 1123–1130, Mar. 2023, doi: https://doi.org/10.1126/science.ade2574.
[23] C. D. Fjell, J. A. Hiss, R. E. W. Hancock, and G. Schneider, “Designing antimicrobial peptides: form follows function,” Nature Reviews Drug Discovery, vol. 11, no. 1, pp. 37–51, Dec. 2011, doi: https://doi.org/10.1038/nrd3591.
[24] A. T. Müller, G. Gabernet, J. A. Hiss, and G. Schneider, “modlAMP: Python for antimicrobial peptides,” Bioinformatics, vol. 33, no. 17, pp. 2753–2755, May 2017, doi: https://doi.org/10.1093/bioinformatics/btx285.
[25] A. Capecchi, X. Cai, H. Personne, T. Köhler, C. van Delden, and J.-L. Reymond, “Machine learning designs non-hemolytic antimicrobial peptides,” Chemical Science, vol. 12, no. 26, pp. 9221–9232, 2021, doi: https://doi.org/10.1039/d1sc01713f.
[26] M. M. Hasan, N. Schaduangrat, S. Basith, G. Lee, W. Shoombuatong, and B. Manavalan, “HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation,” Bioinformatics (Oxford, England), vol. 36, no. 11, pp. 3350–3356, Jun. 2020, doi: https://doi.org/10.1093/bioinformatics/btaa160.
[27] H. Moriwaki, Y.-S. Tian, N. Kawashita, and T. Takagi, “Mordred: a molecular descriptor calculator,” Journal of Cheminformatics, vol. 10, no. 1, Feb. 2018, doi: https://doi.org/10.1186/s13321-018-0258-y.
[28] The UniProt Consortium, “UniProt: a worldwide hub of protein knowledge,” Nucleic Acids Research, vol. 47, no. D1, pp. D506–D515, Nov. 2018, doi: https://doi.org/10.1093/nar/gky1049.

指導教授

洪炯宗吳立青(Jorng-Tzong Horng Li-Ching Wu)

審核日期

2024-7-29

推文