中大機構典藏-NCU Institutional Repository-提供博碩士論文、考古題、期刊論文、研究計畫等下載:Item 987654321/97685
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 83776/83776 (100%)
Visitors : 61156953      Online Users : 639
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://ir.lib.ncu.edu.tw/handle/987654321/97685


    Title: AutoGNN:以遺傳演算法驅動的圖神經網路,用於大規模人口之發病預測;AutoGNN: Genetic-Algorithm-Optimized Graph Neural Networks for Population-Scale Disease Onset Prediction
    Authors: 蔡昀辰;Tsai, Yun-Chen
    Contributors: 系統生物與生物資訊研究所
    Keywords: 第二型糖尿病;風險預測;遺傳演算法;圖神經網路;超參數優化;GNNExplainer;Type 2 diabetes;risk prediction;genetic algorithm;graph neural networks;hyperparameter optimization;GNNExplainer
    Date: 2025-08-29
    Issue Date: 2025-10-17 11:47:04 (UTC+8)
    Publisher: 國立中央大學
    Abstract: 人口層級的第二型糖尿病(T2DM)「新發」風險預測,常受限於類別不平衡、特徵異質性,以及忽略關係結構的 i.i.d. 式流程。本文提出 AutoGNN——一個以中介資料(metadata)驅動的框架,將受試者隊列表徵為「人口圖」,並在固定運算預算下,對 GCN/GAT/GIN 進行「雙目標」遺傳式搜尋,同步選擇超參數與特徵區塊。研究族群來自台灣生物銀行(TWB;2012–2024)。新發風險隊列僅納入基線為非糖尿病者(N=35,016;陽性=1,187);評估採嚴格分割與「病例錨定」的年齡—性別配對,並在 0→1(轉陽)與 0→0(持續陰性)上進行魯棒測試。

    首次訓練的 AUROC 接近 SOTA:GCN 0.890、GIN 0.887、GAT 0.881,與多數納入近診斷等級檢驗之生物銀行/EHR 模型相當。相同組態重訓結果相近(典型 |∆|≈0.02–0.05,標準差適中);隨機打散標籤的控制組則趨近機率水準(AUROC 約 0.45–0.60),顯示訊號為真,非種子運氣或洩漏。在相同預算下,MLP 偶有跑次在 AUROC 略勝,但 GCN 的整體準確度更佳,凸顯「關係歸納偏置」的價值。對 AUROC>0.78 的模型進行 persona 分析(性別×年齡;教育分群)顯示:中年女性表現最佳;年長男性表現可接受且可監測;同時以 macro-/worst-F1 指標防止子群失效。GNNExplainer 強調 HBA1C 與空腹血糖在各 persona 中皆具關鍵性;而體態(腰臀比、BMI)與血脂(TG)在年輕男性與年長女性的權重更高——對臨床閾值設定與校準具有參考價值。

    AutoGNN 在接近 SOTA 的區辨力之上,納入可重現性「衛生學」:固定分割/預算、重複實驗之平均與標準差、負向控制;並提供透明的次族群報告與「結構感知」的可稽核解釋,適合實務上線與審核。該框架亦可自然延伸至存活分析目標、聯邦式訓練,以及知識引導的拓樸學習。

    關鍵詞: 第二型糖尿病;風險預測;生物銀行;圖神經網路;遺傳演算法;超參數優化;穩健性;公平性;校準;GNNExplainer;台灣生物銀行;Population-scale prediction of incident type 2 diabetes mellitus (T2DM) is challenged by
    class imbalance, feature heterogeneity, and i.i.d. pipelines that ignore relational structure.
    We present AutoGNN, a metadata-driven framework that casts the cohort as a population
    graph and runs fixed-budget, dual-objective genetic search over GCN/GAT/GIN, jointly
    selecting hyperparameters and feature blocks. The study population derives from Taiwan
    Biobank (TWB; 2012–2024). The incident-risk cohort includes baseline non-diabetics
    (N = 35,016; positives = 1,187); evaluation uses strict splits with case-anchored age–sex
    matching and a robust test on 0→1 (positive) vs. 0→0 (negative).
    First-run AUROC approaches state-of-the-art (SOTA): 0.890 (GCN), 0.887 (GIN),
    0.881 (GAT), comparable to widely cited biobank/EHR models that often include near-
    diagnostic labs [49, 71]. Same-config retrains stay close (typical |∆| ≈ 0.02–0.05; modest
    SD), while shuffled-label controls collapse toward chance (AUROC ∼ 0.45–0.60), indi-
    cating genuine signal rather than lucky seeds or leakage. Under equal budgets, MLP
    may edge AUROC in some runs, but GCN yields better accuracy, underscoring the value
    of relational inductive bias. Persona analyses (sex × age; education clusters) for models
    with AUROC > 0.78 show strongest performance in middle-aged females and acceptable,
    monitorable performance in older males; macro-/worst-F1 guard against subgroup failure.
    GNNExplainer highlights HBA1C and fasting glucose across personas, with anthropometry
    (WHR, BMI) and lipids (TG) weighing more in younger males and older females—useful for
    thresholding and calibration.
    AutoGNN pairs near-SOTA discrimination with reproducibility hygiene (fixed splits/budgets,
    repeat means/SDs, negative controls), transparent subgroup reporting, and structure-
    aware explanations suitable for audit and deployment; it readily extends to survival ob-
    jectives, federated training, and knowledge-guided topology learning.
    Keywords: type 2 diabetes; risk prediction; biobank; graph neural networks; genetic al-
    gorithm; hyperparameter optimization; robustness; fairness; calibration; GNNExplainer;
    Taiwan Biobank.
    Appears in Collections:[Institute of Systems Biology and Bioinformatics] Electronic Thesis & Dissertation

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML51View/Open


    All items in NCUIR are protected by copyright, with all rights reserved.

    社群 sharing

    ::: Copyright National Central University. | 國立中央大學圖書館版權所有 | 收藏本站 | 設為首頁 | 最佳瀏覽畫面: 1024*768 | 建站日期:8-24-2009 :::
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 隱私權政策聲明