摘要: | 激酶是一類生物分子,通常是蛋白質,其主要功能是在細胞內催化特定的化學反應。這些反應對於代謝、信號傳導和細胞增長等關鍵細胞過程至關重要。激酶活性指的是這些催化反應的速率,這種活性的變化可以直接影響細胞功能和狀態。因此,理解激酶活性對於闡明疾病機制和開發治療策略至關重要。激酶活性可以通過基於質譜的磷酸蛋白質組學數據進行識別。目前,已有多種工具可用於預測激酶活性,但這些工具在數據要求方面存在顯著差異,其中一些工具需要複雜且難以獲得的數據。本研究旨在設計一個模型,使用磷酸化蛋白數據來預測激酶表達水平,從而預測激酶活性圖譜。我們的方法採用深度學習方法作為框架建立預測模型,並使用不同架構的模型,包含深度神經網路(Deep Neural Networks,DNN)以及卷積神經網路(Convolutional Neural Networks,CNN),還有不同特徵資料,例如激酶底物資料和磷酸化蛋白資料,來作為模型輸入,通過比較來觀察哪種模型架構以及特徵種類的組合更適合激酶活性譜的預測。對於深度學習模型,我們對不同模型層數、神經元數量以及特徵種類進行測試。實驗結果表明,使用三層卷積層,神經元數量為(32,16,8)的CNN模型,搭配經過特徵選擇方法的磷酸化蛋白資料作為特徵,這個模型架構組合,於所有測試的模型架構中取得最好的結果。在與其他方法比較的部分,在斯皮爾曼排名相關係數的部分,取得0.4101,高於其他方法的0.0655和0.0080;在C-index的部分,在閥值x大於9之後取得優於其他方法的結果,整體來說也相對穩定。;Kinases are biomolecules, typically proteins, whose primary function is to catalyze specific chemical reactions within cells, essential for processes such as metabolism, signal transduction, and cell growth. Kinase activity, referring to the rate of these reactions, can directly affect cellular function and state, making it crucial for understanding disease mechanisms and developing therapeutic strategies. Kinase activity can be identified using mass spectrometry-based phosphoproteomics data. Various tools exist for predicting kinase activity, but they differ significantly in data requirements, with some necessitating complex and hard-to-obtain data. This study aims to design a model that predicts kinase expression levels using phosphorylation protein data to forecast kinase activity profiles. We employed deep learning methods, constructing predictive models using different architectures, including Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN), and various feature types, such as kinase-substrate data and phosphorylation protein data. By comparing these combinations, we sought to identify the most suitable model architecture and feature type for predicting kinase activity profiles. Our experimental results indicate that a CNN model with three convolutional layers and neuron counts of 32, 16, and 8, using phosphorylation protein data refined through feature selection methods, achieved the best results among all tested model architectures. In the comparative analysis with other methods, our model achieved a Spearman rank correlation coefficient of 0.4101, surpassing the other methods, which scored 0.0655 and 0.0080. Regarding the C-index, our model outperformed the other methods when the threshold x exceeded 9 and demonstrated overall relative stability. |