摘要: | 抗生素之濫用導致抗藥性細菌的快速出現,對於全球公共衛生造成嚴重威脅,使得尋找新的治療方法來輔助或改善這一問題變為更加急迫。由於抗菌肽對抗微生物病原體作用機制之特殊性,使其較不易引發抗藥性,幾乎所有生命體的先天免疫系統都可以產生抗菌肽。許多抗菌肽的作用對象是廣泛的,對於各種傳染性微生物如細菌、病毒、真菌和寄生蟲都具有活性。但抗菌肽的發展受到高昂的發開及製造成本限制,因此,如果能夠更準確的預測抗菌肽的功能類別,不僅可以有效降低開發和製造成本,還可以為新藥的發開提供更多的資訊。在這項研究中,我們通過二元相關性和算法適應方法建構了多標籤分類器用來預測抗菌肽對於細菌、哺乳動物細胞、真菌、病毒和癌細胞是否具有有效活性。此外我們還採用了向前特徵選取法來找出較為重要的特徵,從不同的面相探討這些特徵,並利用這些特徵重新訓練分類器,對於細菌、哺乳動物細胞、真菌、病毒和癌細胞的分類在獨立測試集上曲線下面積可以分別達到0.9066, 0.8568, 0.8492, 0.9126 以及0.8639,而子集準確率為0.4978。結果顯示我們的模型對於區分功能類別可以達到不錯的表現。;The rapid emergence of drug-resistant bacteria due to the abuse of antibiotics is a serious threat to global public health, which makes finding new therapeutics to assist or improve this problem more urgent. Owing to the specificity of the mechanism of action of antimicrobial peptides (AMPs) against microbial pathogens, AMPs are not readily leading to the occurrence of resistance, and AMPs can be produced by the innate immune system of almost all life forms. Many AMPs are broad-spectrum with effective activities against various infectious microorganisms, such as bacteria, viruses, fungi and parasites. But the development of AMPs is greatly limited by high development and manufacturing costs. Therefore, if the functional classes of AMPs can be more accurately predicted, it can not only effectively reduce the development and manufacturing costs, but also provide more information for the development of new drugs. In this study, we constructed multi-label classifiers by means of binary relevance and algorithm adaptation methods to predict whether AMPs have effective activities against bacteria, mammalian cells, fungi, virus and cancer cells. In addition, we also adopted forward feature selection to find informative features, explore these features from different aspects and use these features to retrain the classifiers, the retrained classifiers had the performance that under the curves (AUCs) of antibacterial, mammalian cells, antifungal, antiviral and anticancer are 0.9066, 0.8568, 0.8492, 0.9126 and 0.8639 on the independent testing data respectively and subset accuracy is 0.4978. The results show that our model can achieve good performance for distinguishing functional classes. |