中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/98462
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 83776/83776 (100%)
Visitors : 60038124      Online Users : 867
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/98462


    Title: 結合異構圖注意力網路與蛋白質語言模型之歸納式激酶-底物磷酸化位點預測研究
    Authors: 謝文凱;Hsieh, Wen-Kai
    Contributors: 資訊工程學系
    Keywords: 蛋白質磷酸化;激酶-底物預測;圖注意力網路;磷酸化位點預測;蛋白質序列嵌入;Protein phosphorylation;Kinase-substrate prediction;graph attention network;phosphosite prediction;ESM2 protein embeddings
    Date: 2025-07-30
    Issue Date: 2025-10-17 12:48:08 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 蛋白質磷酸化是一種重要的轉譯後修飾作用,調控細胞內幾乎所有的訊息傳遞途徑,對於細胞增殖、代謝、分化及凋亡等多種生物過程均具關鍵作用。儘管高通量磷酸化蛋白質體學已有廣泛研究,但目前實驗驗證的磷酸化位點中,僅不到5%的位點能明確指出其特定的上游激酶,導致磷酸化介導的訊息路徑與疾病機制上仍理解有限。傳統計算方法大多依賴序列模體(Motif)或局部序列特徵,具有泛化能力不足且忽略更廣泛生物情境及網路層級關係的缺點。本研究發展出一套歸納式計算框架,系統性地整合異構圖注意力網路(Graph Attention Network, GAT)與預訓練蛋白質語言模型(Evolutionary Scale Modeling version 2, ESM2),用以預測激酶與底物間之磷酸化關係。提出之模型建構一個異構圖,將激酶與磷酸化位點視為不同類型的節點,透過實驗驗證的激酶-底物關係以及基於生物資訊嵌入的相似性邊進行連接。ESM2模型可提供豐富、高維度的蛋白質嵌入表徵,有效捕捉蛋白質及磷酸化胜肽之演化特徵、生化性質及結構資訊;GAT模型則進一步動態聚合這些嵌入資訊,在局部及全域圖結構情境中學習複雜的激酶-底物交互模式,以實現對新型激酶-底物配對的歸納式推理能力。透過嚴謹的基準資料集以及負樣本驗證,本研究建構之模型在獨立測試集上達到0.9635的受試者操作曲線下面積(Area Under the Receiver Operating Characteristic Curve, AUC),優於Phosformer及KinasePhos 3.0。此外,針對包含CDK及MAPK之不同激酶家族,分析亦顯示本模型之優異泛化能力。透過深入的生物學案例探討,包括MAP3K10介導的SMAD5磷酸化與CDC7介導的AAAS磷酸化,本研究結合訊息傳導路徑分析、跨世代轉錄體相關性、臨床預後評估及蛋白質-胜肽結構對接模擬等多層次驗證,進一步顯示本研究模型所預測之激酶-底物交互作用具有生物合理性與。本研究提出之歸納式計算框架,透過深度學習方法、序列資訊嵌入及基於圖譜的歸納式推理可增進激酶-底物磷酸化位點的預測能力,將有助於磷酸化蛋白質體學發現與生物學意涵之解釋,推展新型訊息傳導機制與治療標靶的鑑定及發掘。;Protein phosphorylation, a fundamental post-translational modification, regulates all aspects of cellular signaling and plays critical roles in diverse biological processes such as cell proliferation, metabolism, differentiation, and apoptosis. Despite extensive high-throughput phosphoproteomics research, fewer than 5% of experimentally validated phosphorylation sites are associated with their specific kinases, creating a substantial knowledge gap that limits our understanding of phosphorylation-mediated signaling pathways and related disease mechanisms. Traditional computational approaches, depending primarily on sequence motifs or local sequence features, suffer from limited generalizability and ignore broader biological contexts and network-level relationships. In this study, we develop an inductive computational framework incorporating heterogeneous graph attention networks (GAT) with pretrained protein language models (evolutionary scale modeling version 2, ESM2) to predict kinase-substrate phosphorylation relationships systematically. Our proposed model constructs a heterogeneous graph wherein kinases and phosphosites are represented as distinct nodes connected by experimentally validated kinase-substrate interactions and similarity-based edges derived from biologically informed embeddings. The ESM2 model provides rich, high-dimensional embeddings capturing evolutionary, biochemical, and structural properties of proteins and phosphopeptides. Subsequently, the GAT dynamically aggregates these embeddings, learning to capture complex kinase-substrate interactions within local and global graph contexts, enabling robust inductive inference for novel kinase-substrate pairs. Rigorous evaluation using curated benchmark datasets and advanced negative sampling strategies demonstrated superior predictive performance, with our model achieving an area under the receiver operating characteristic curve (AUC) of 0.9635, exceeding state-of-the-art tools such as Phosformer and KinasePhos 3.0. Further analyses validated the model’s robust generalizability across diverse kinase families, including the CDK and MAPK groups. Through biological case studies, including MAP3K10-mediated SMAD5 phosphorylation and CDC7-mediated AAAS phosphorylation, we provided multi-layered validation—comprising pathway analyses, cross-cohort transcriptomic correlations, clinical outcome assessments, and peptide-protein structural docking—that strongly support these computationally predicted kinase-substrate interactions as biologically reasonable, experimentally testable hypotheses. In conclusion, an inductive computational framework integrating deep learning methods, sequence-informed protein embeddings, and graph-based inductive reasoning enhances kinase-substrate phosphorylation site prediction. It provides advancements in bridging the gap between phosphoproteomic discoveries and biological interpretation, facilitating the identification of novel signaling mechanisms and therapeutic targets.
    Appears in Collections:[Graduate Institute of Computer Science and Information Engineering] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML10View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明