| 摘要: | 空間轉錄體技術(Spatial Transcriptomics, ST)能同時量測組織的空間位置與基因表現,對研究組織異質性、疾病機制及細胞微環境提供嶄新的視角。然而, 空間轉錄體數據常因RNA捕捉效率低與技術噪聲影響,導致大量基因表現缺失,進一步限制後續的生物分析能力。目前已有的方法多仰賴外部單細胞資料或僅針對已測量位置進行插補,無法有效重建未測量區域,亦可能忽略基因表現相似性與空間位置之間的內在聯繫,限制了生物意義的解析能力。鑑於此,本研究透過建構異構圖(Heterogeneous Graph)整合空間鄰近性與基因共表現相似性,並利用圖注意力網路(Graph Attention Network)產生生物意義豐富的嵌入表示,再導入對比學習(Contrastive Learning)以強化基因表現與空間訊息間之一致性,提升插補結果之解析度與生物訊號準確性。在人類大腦皮層、乳腺癌及黑色素瘤等多個公開數據集的評估結果顯示,本方法插補效果優於現有技術,例如在大腦皮層資料中達到最低的均方根誤差(Root Mean Squared Error, RMSE=0.067)與最高的餘弦相似度(Cosine Similarity=0.982),且於乳腺癌資料中成功恢復ERBB2及BRCA1等癌症相關基因表現訊號,改善空間區域之生物訊息解析。進一步的基因功能分析亦顯示,本方法可提升基因功能註釋之生物解讀性,如恢復與神經膠質細胞分化及神經傳導等重要訊號,且透過細胞通訊分析工具驗證,本方法插補之數據呈現更豐富且一致之細胞間訊息傳遞,顯示插補結果具高度生物意義與解讀價值。基於異構圖建構與圖注意力網路結合對比學習之方法兼具計算效率與高解析度重建能力,能有效推動空間轉錄體技術於疾病研究與精準醫學之應用發展。;Spatial transcriptomics (ST) simultaneously measures spatial tissue coordinates and gene expression, offering novel insights into tissue heterogeneity, disease mechanisms, and cellular microenvironments. However, ST data commonly suffer from extensive missing gene-expression values due to low RNA capture efficiency and technical noise, significantly restricting subsequent biological analyses. Existing imputation methods typically depend on additional single-cell RNA sequencing data or perform imputations only on measured spots, limiting their capability to reconstruct unmeasured regions effectively. Moreover, these methods often neglect intrinsic relationships between spatial location and gene-expression similarity, reducing biological interpretability. To address these challenges, this study integrates spatial proximity and gene co-expression similarities by constructing a heterogeneous graph and employs a graph attention network to produce biologically meaningful embeddings. Furthermore, contrastive learning is utilized to reinforce consistency between spatial and transcriptional information, enhancing the resolution and biological accuracy of imputed results. Evaluations conducted on multiple publicly available datasets—including the human cerebral cortex, breast cancer, and melanoma—demonstrate that the proposed method achieves superior performance compared to existing approaches, such as attaining the lowest root mean squared error (RMSE=0.067) and highest cosine similarity (0.982) on the cerebral cortex dataset. Additionally, the method successfully recovers critical cancer-related gene-expression signals (e.g., ERBB2 and BRCA1) in breast cancer samples, improving the biological interpretation of spatial domains. Further functional analyses highlight enhanced biological interpretability, including restoring significant biological signals associated with glial cell differentiation and neurotransmission. Cell-cell communication analyses further validate that the imputed data reveals richer and more consistent intercellular signaling networks, underscoring the biological relevance and interpretative value of the results. Overall, the proposed method, integrating heterogeneous graph construction, graph attention networks, and contrastive learning, provides computational efficiency and high-resolution reconstruction capability, thus significantly advancing the application of spatial transcriptomics in disease research and precision medicine. |