博碩士論文 110522146 詳細資訊




以作者查詢圖書館館藏 以作者查詢臺灣博碩士 以作者查詢全國書目 勘誤回報 、線上人數:11 、訪客IP:3.17.154.16
姓名 林駿燊(Jun-Shen Lin)  查詢紙本館藏   畢業系所 資訊工程學系
論文名稱 使用語言模型嵌入和不平衡調整之深度學習方 法識別多功能抗菌肽
(A Deep Learning Approach with Language Model Embeddings and Imbalance Adjustment for Identifying Multi-Functional Antimicrobial Peptides)
相關論文
★ 基於質譜儀資料使用機器學習辨識克雷伯氏肺炎桿菌之多重抗藥性★ 結合多種訊號預處理方法於質譜儀資料以辨識細菌對抗生素之抗藥性
★ 利用機器學習預測濁水溪沖積扇區域之地下水位★ 使用表徵學習和機器學習方法於晶圓線切割機台之異常偵測
★ 基於質譜儀資料利用人工智慧方法辨識革蘭氏陰性菌對環丙沙星抗藥性之特徵峰值★ 應用數位分身於馬達軸承之異常偵測
★ 基於光誘導介電泳影像處理檢測流體抗藥性★ 利用機器學習方法基於多類型地層監測資料預測濁水溪沖積扇地區之地層下陷
★ 基於人工智慧模型預測抗菌肽的最小抑菌濃度於特定菌株上★ 使用權重組合模型預測雲林縣地層下陷
★ 基於深度學習從核醣核酸定序表達譜推斷外周血單核細胞之細胞組成
檔案 [Endnote RIS 格式]    [Bibtex 格式]    [相關文章]   [文章引用]   [完整記錄]   [館藏目錄]   至系統瀏覽論文 ( 永不開放)
摘要(中) 抗生素抗藥性是當今世界所面臨的一個嚴重問題。為了應對這個問題,我們需要尋找替代的治療策略。持續的研究和開發抗微生物肽對未來的抗微生物治療有著巨大的潛力。近年來大多數針對抗微生物肽的多標籤深度學習研究旨在區分多個功能活性,沒有討論預處理方法以及損失函數在多標籤不平衡的差別,且較缺乏使用蛋白質語言模型之特徵的研究,而蛋白質語言模型之相關研究也缺乏模型架構的比較。在這項研究中,我們通過算法適應方法並比較不同種處理不平衡的方法:損失函數、前處理、語言模型嵌入層、模型架構以及特徵種類,找出整體性能較佳的深度學習模型分類器,用以預測五種不同活性功能肽。我們採用非對稱不平衡損失函數以及觀察前處理的影響,結果顯示我們提出的模型架構,對於細菌、哺乳動物細胞、真菌、病毒和癌細胞的分類在整體評估的絕對正確評估上可以達到 0.625,在絕對錯誤評估上達到 0.118。而在各別標籤結果評估上,當我們使用多標籤欠採樣 (Multi-label Undersampling) 時能使其宏觀平衡準確率(Balanced Accuracy) 從 0.780 提升為 0.801,並且觀察到前處理方法中,數據量對於深度學習的重要性。若採用針對實例重要性而篩選的前處理,能達到剃除少量資料的同時幫助資料上的平衡。在損失函數上,使用非對稱的損失函數亦能幫助少量標籤的預測能力,使其整體評估的絕對正確上由 0.601 提升至 0.625。在模型方面,蛋白質語言嵌入層的特徵,在使用卷積神經網路 (Convolutional neural network, CNN) 搭配雙向長短期記憶網路(Bi-directional Long Short-Term Memory, BiLSTM) 的架構下有著最佳的結果,而非簡易的卷積神經網路架構抑或是更複雜的多頭自注意力機制 (Multi-Head Self-Attention) 能達到最佳結果,其其整體評估的絕對正確上結果分別由 0.614 以及 0.592 提升至 0.625。
摘要(英) Antibiotic resistance is a serious problem faced by the world today, making the treatment of bacterial infections increasingly challenging. To address this issue, alternative therapeutic strategies need to be explored. Ongoing research and development of antimicrobial peptides (AMPs) hold tremendous potential for future antimicrobial therapies. However, most recent studies on multi-label deep learning for AMPs focus on differentiating multi-functional classes without discussing preprocessing methods and the differences in loss functions for imbalanced multi-label data. Moreover, there is a lack of research utilizing protein language model features, and existing studies also lack comparisons of model architectures in language model embeddings feature. To analyze these differences and identify better results, in this study, we employ algorithm adaptation methods and compare various approaches for handling data imbalance, including loss functions, preprocessing techniques, language model embeddings, model architectures, and feature types. The goal is to find a deep learning model classifier with superior overall performance for predicting five different active functional peptides. We utilized asymmetric loss functions and observe the impact of preprocessing. The results show that our proposed model architecture achieves an absolute true of 0.625 and an absolute false of 0.118 in the overall evaluation for the classification of bacteria, mammalian cells, fungi, viruses, and cancer cells.Regarding individual label result evaluations, when employing multi-label undersampling (Multi-label Undersampling), we can improve the macro balanced accuracy (BA) from 0.780 to 0.801. We also observed the influence of data quantity on deep learning through preprocessing methods. Preprocessing that selects instances based on their importance can help achieve data balance while removing a small amount of data. Additionally, using asymmetric loss functions (ASL) in the training process improves the predictive ability of
minority labels, resulting in an increase in the overall Absolute true score from 0.601 to 0.625. In terms of the model architecture, the protein language embeddings layer performs best when combined with a Convolutional Neural Network (CNN) and a Bidirectional Long Short-Term Memory (BiLSTM) network, rather than using a simple CNN architecture or a more complex Multi-Head Self-Attention mechanism. This architecture resulted in an improvement in overall evaluation accuracy from 0.614 and 0.592 to 0.625.
關鍵字(中) ★ 不平衡前處理
★ 語言模型嵌入層
★ 多標籤分類器
★ 抗菌肽
★ 不平衡損失函數
關鍵字(英) ★ Imbalanced preprocessing
★ Language model embeddings
★ Multi-label classifier
★ Antimicrobial peptides
★ Imbalanced loss function
論文目次 中文摘要 i
Abstract ii
致謝 iv
Table of Contents v
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
1.1 Background 1
1.2 Related Works 2
1.3 Motivation and Goal 3
Chapter 2 Materials and Methods 5
2.1 Data Sources 7
2.2 Preprocessing 12
2.2.1 CD-HIT 12
2.2.2 Multi-Label Undersampling (MLUL) 13
2.3 Feature Extraction 15
2.3.1 LM Embeddings 15
2.3.2 Physicochemical Properties 18
2.4 The Framework of Our Study 19
2.4.1 Input Layer 20
2.4.2 Convolutional Neural Network (CNN) 20
2.4.3 Bidirectional LSTM 21
2.4.4 Multi-Head Self-Attention Mechanism (MHSA) Layer 22
2.4.5 Classification Layer 24
2.4.6 Model Structures 25
2.5 Multi-Label Classification with Asymmetric Loss 26
2.6 Evaluation Metrics 28
Chapter 3 Results and Discussions 31
3.1 Observation of Basic Properties 31
3.2 Investigations of Different Loss Functions 34
3.3 Anaysis of CD-HIT and MLUL 44
3.4 Comparisions of Different Deep Learning Models 60
3.5 Investigations of Different Features 61
3.6 Comparison with Other Studies 66
Chapter 4 Conclusions 71
References 72
參考文獻 1. Lei, J., et al., The antimicrobial peptides and their potential clinical applications. American Journal of Translational Research, 2019. 11(7): p. 3919-3931.
2. Magana, M., et al., The value of antimicrobial peptides in the age of resistance. Lancet Infectious Diseases, 2020. 20(9): p. E216-E230.
3. Murray, C.J.L., et al., Global burden of bacterial antimicrobial resistance in 2019: A systematic analysis. Lancet, 2022. 399(10325): p. 629-655.
4. Zasloff, M., Antimicrobial peptides of multicellular organisms. Nature, 2002. 415(6870): p. 389-395.
5. Zasloff, M., Antimicrobial peptides of multicellular organisms: My perspective. Antimicrobial Peptides: Basics for Clinical Application, 2019. 1117: p. 3-6.
6. Wang, S., et al., Antimicrobial peptides as potential alternatives to antibiotics in food animal industry. International Journal of Molecular Sciences, 2016. 17(5).
7. Huan, Y.C., et al., Antimicrobial peptides: Classification, design, application and research progress in multiple fields. Frontiers in Microbiology, 2020. 11.
8. Nikaido, H., Molecular basis of bacterial outer membrane permeability revisited. Microbiology and Molecular Biology Reviews, 2003. 67(4): p. 593-+.
9. Hancock, R.E.W. and H.G. Sahl, Antimicrobial and host-defense peptides as new anti-infective therapeutic strategies. Nature Biotechnology, 2006. 24(12): p. 1551-1557.
10. Yan, W.H., et al., PrMFTP: Multi-functional therapeutic peptides prediction based on multi-head self-attention mechanism and class weight optimization. Plos Computational Biology, 2022. 18(9).
11. Niyonsaba, F., et al., Antimicrobial peptide derived from insulin-like growth factor-binding protein 5 activates mast cells via mas-related g protein-coupled receptor x2. Allergy, 2020. 75(1): p. 203-207.
12. Grønning, A.G.B., T. Kacprowski, and C. Schéele, MultiPep: A hierarchical deep learning approach for multi-label classification of peptide bioactivities. Biology Methods and Protocols, 2021. 6(1).
13. Pang, Y., et al., Integrating transformer and imbalanced multi-label learning to identify antimicrobial peptides and their functional activities. Bioinformatics, 2022. 38(24): p. 5368-5374.
14. Dee, W., LMPred: Predicting antimicrobial peptides using pre-trained language models and deep learning. Bioinformatics Advances, 2022. 2(1).
15. Elnaggar, A., et al., ProtTrans: Toward understanding the language of life through self-supervised learning. Ieee Transactions on Pattern Analysis and Machine Intelligence, 2022. 44(10): p. 7112-7127.
16. O′Shea, K. and R. Nash, An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015.
17. Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural Computation, 1997. 9(8): p. 1735-1780.
18. Ben-Baruch, E., et al., Asymmetric loss for multi-label classification. arXiv preprint arXiv:2009.14119, 2020.
19. Liu, B., K. Blekas, and G. Tsoumakas, Multi-label sampling based on local label imbalance. Pattern Recognition, 2022. 122.
20. Jhong, J.H., et al., dbAMP 2.0: Updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data. Nucleic Acids Research, 2022. 50(D1): p. D460-D470.
21. Shi, G.B., et al., DRAMP 3.0: An enhanced comprehensive data repository of antimicrobial peptides. Nucleic Acids Research, 2022. 50(D1): p. D488-D496.
22. Ye, G.Z., et al., LAMP2: A major update of the database linking antimicrobial peptides. Database-the Journal of Biological Databases and Curation, 2020.
23. Gawde, U., et al., CAMPR4: A database of natural and synthetic antimicrobial peptides. Nucleic Acids Res, 2023. 51(D1): p. D377-D383.
24. Pirtskhalava, M., et al., DBAASP v3: Database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Research, 2021. 49(D1): p. D288-D297.
25. Kim, H., et al., De novo generation of short antimicrobial peptides with enhanced stability and cell specificity. Journal of Antimicrobial Chemotherapy, 2014. 69(1): p. 121-132.
26. Fu, L.M., et al., CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012. 28(23): p. 3150-3152.
27. Tao, Y., D. Papadias, and X. Lian. Reverse knn search in arbitrary dimensionality. in Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30. 2004. Toronto, Canada: VLDB Endowment.
28. Chen, Z., et al., iFeature: A python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 2018. 34(14): p. 2499-2502.
29. Yang, Z., et al., Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 2019. 32.
30. Suzek, B.E., et al., UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 2015. 31(6): p. 926-932.
31. Bateman, A., et al., Uniprot: A worldwide hub of protein knowledge. Nucleic Acids Research, 2019. 47(D1): p. D506-D515.
32. Raffel, C., et al., Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020. 21.
33. Spanig, S. and D. Heider, Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. Biodata Mining, 2019. 12.
34. Chen, J.R., H.H. Cheong, and S.W.I. Siu, xDeep-AcPEP: Deep learning method for anticancer peptide activity prediction based on convolutional neural network and multitask learning. Journal of Chemical Information and Modeling, 2021. 61(8): p. 3789-3803.
35. Kawashima, S., et al., AAindex: Amino acid index database, progress report 2008. Nucleic Acids Research, 2008. 36: p. D202-D205.
36. Sandberg, M., et al., New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. Journal of Medicinal Chemistry, 1998. 41(14): p. 2481-2491.
37. Sechidis, K., G. Tsoumakas, and I. Vlahavas. On the stratification of multi-label data. in Machine Learning and Knowledge Discovery in Databases. 2011. Berlin, Heidelberg: Springer Berlin Heidelberg.
38. O′Shea, K. and R. Nash, An introduction to convolutional neural networks. 2015.
39. Vaswani, A., et al., Attention is all you need. Advances in neural information processing systems, 2017. 30.
40. Abadi, M., et al. Tensorflow: A system for large-scale machine learning. in Osdi. 2016. Savannah, GA, USA.
41. Li, Y., et al., MPMABP: A cnn and bi-lstm-based method for predicting multi-activities of bioactive peptides. Pharmaceuticals, 2022. 15(6).
42. Xiao, X., et al., iAMP-2L: A two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Analytical Biochemistry, 2013. 436(2): p. 168-177.
43. Gull, S., N. Shamim, and F. Minhas, AMAP: Hierarchical multi-label prediction of biologically active and antimicrobial peptides. Computers in Biology and Medicine, 2019. 107: p. 172-181.
44. Zhou, J.-P., L. Chen, and Z.-H. Guo, Iatc-nrakel: An efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs. Bioinformatics, 2019. 36(5): p. 1391-1396.
45. McInnes, L., J. Healy, and J. Melville, Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
指導教授 洪炯宗 吳立青(Jorng-Tzong Horng) 審核日期 2023-7-27
推文 facebook   plurk   twitter   funp   google   live   udn   HD   myshare   reddit   netvibes   friend   youpush   delicious   baidu   
網路書籤 Google bookmarks   del.icio.us   hemidemi   myshare   

若有論文相關問題,請聯絡國立中央大學圖書館推廣服務組 TEL:(03)422-7151轉57407,或E-mail聯絡  - 隱私權政策聲明