摘要: | 心血管疾病(Cardiovascular Disease, CVD)長期位居全球與台灣人口死亡原因前列,對公共衛生與醫療資源構成嚴峻挑戰。根據世界衛生組織與我國衛生福利部最新統計,CVD 每年奪走數百萬條人命,顯示其預防與早期識別具有高度重要性。低密度脂蛋白膽固醇(Low-Density Lipoprotein Cholesterol, LDL-C)為 CVD 的主要危險因子之一,其血中濃度變異受到遺傳與生活型態多重因素調控。 儘管過去已有多項關於 LDL-C 水平遺傳影響的研究,但大多集中於歐美族群,對亞洲族群,特別是台灣漢人族群之研究仍相對不足。本研究利用臺灣人體生物資料庫所提供之大規模族群資料,進行全基因組關聯分析(Genome-Wide Association Study, GWAS),篩選出與 LDL-C 水平顯著相關的單核苷酸多態性(Single Nucleotide Polymorphisms, SNPs),並進一步結合機器學習演算法建構預測模型。 研究結果發現多個顯著 SNP 與 LDL-C 水平高度相關,部分位於 LDLR、APOE 等已知脂質代謝基因座。經特徵篩選與模型訓練分析,各模型在高維特徵下表現最佳;同時,在特徵數有限情況下加入 BMI、年齡與性別之臨床變數,可顯著提升分類效能,顯示「遺傳資訊與臨床資料整合」具實務應用潛力。 雖然本研究已利用 Taiwan Biobank 進行模型建構與內部測試,但其族群樣本是否具備足夠的外部代表性,以及模型在其他獨立族群中的泛化能力與應用範圍,仍有待進一步的驗證。展望未來,本研究不僅深化了對台灣漢人族群在 LDL-C 遺傳風險上的理解,更為精準醫療與個人化風險評估工具的開發奠定了堅實的基礎,預期能為心血管疾病的早期預防與管理提供新的策略。;Cardiovascular disease (CVD) has long been a leading cause of death worldwide and in Taiwan, posing a serious challenge to public health and healthcare resources. According to the latest statistics from the World Health Organization and Taiwan’s Ministry of Health and Welfare, CVD claims millions of lives each year, highlighting the critical importance of prevention and early detection. Low-density lipoprotein cholesterol (LDL-C) is one of the major risk factors for CVD, and its serum levels are influenced by both genetic and lifestyle factors. Although numerous studies have investigated the genetic influences on LDL-C levels, most have focused on European and American populations, with relatively few addressing Asian populations, especially the Han Chinese population in Taiwan. This study utilized large-scale population data from the Taiwan Biobank to conduct a genome-wide association study (GWAS), identifying single nucleotide polymorphisms (SNPs) significantly associated with LDL-C levels. These SNPs were further incorporated into machine learning algorithms to construct disease risk prediction models. The results identified several SNPs strongly associated with LDL-C levels, including variants located within known lipid metabolism genes such as LDLR and APOE. Feature selection and model training analyses showed that all classifier performed best in high-dimensional feature settings. Additionally, in models with fewer genetic features, incorporating clinical variables such as body mass index (BMI) .age and sex significantly improved classification performance, demonstrating the practical potential of integrating genetic and clinical data. Although this study successfully constructed and internally tested predictive models using Taiwan Biobank data, further validation is required to determine the external representativeness of the population sample and to assess the model′s generalizability and applicability across independent populations. Looking forward, this study not only enhances understanding of the genetic risk of LDL-C in the Taiwanese Han population but also provides a solid foundation for the development of precision medicine and personalized risk assessment tools. It is expected to offer new strategies for the early prevention and management of cardiovascular disease. |