COVID-19的DNA病毒序列在潛空間下趨勢擬合和生成突變新DNA病毒序列

、線上人數：98

、訪客IP：18.191.44.23

姓名	陳祈銘(CHEN,CHI-MING) 查詢紙本館藏	畢業系所	數學系
論文名稱	COVID-19的DNA病毒序列在潛空間下趨勢擬合和生成突變新DNA病毒序列 (Generating DNA Sequences of COVID-19 and Trending Fitting in Latent Space)
檔案	[Endnote RIS 格式] [Bibtex 格式] [相關文章] [文章引用] [完整記錄] [館藏目錄] 至系統瀏覽論文 (2025-7-31以後開放)
摘要(中)	從2019年12月起新冠肺炎爆發，防疫措施愈來愈重要，隨著新型冠狀病毒的高傳播力和突變力，新的病毒株陸續出現，而這將導致疫苗防護力下降，甚至可能被突破，本研究將重點放在預測病毒株下一階段突變預測，把收集到的資料依照Nextstrain clade分類，將2021年1月以前非delta病毒出現前的DNA病毒序列提取出來，透過Variational AutoEncoder將DNA病毒序列提取其生物資訊後，使用高斯過程進行潛空間下比較不同kernel進行突變趨勢擬合，並生成之後1到6個月後的新DNA病毒序列，最後成功生成出新的病毒序列，並且突變出Dalta病毒株，說明DNA序列在Variational AutoEncoder上，成功提取其生物資訊，並能依時間演化建模，而這有助於提前開發新疫苗和預測新症狀等。
摘要(英)	Since the outbreak of new coronary pneumonia in December 2019, epidemic prevention measures have become more and more important. With the high transmissibility and mutation power of the new coronavirus, new virus strains have emerged one after another, which will lead to a decline in vaccine protection and may even be broken through. This study will focus on predicting the next-stage mutation prediction of virus strains, and classify the collected data according to the Nextstrain clade. Using the Gaussian process to compare different kernels in the latent space to fit the mutation trend, and generate a new DNA virus sequence 1 to 6 months later, and finally successfully generate a new virus sequence, and mutate the Dalta virus strain. It shows that the biological information of the DNA sequence can be successfully extracted on the Variational AutoEncoder, and can be modeled according to the evolution of time, which is helpful for the development of new vaccines and the prediction of new symptoms in advance.
關鍵字(中)	★ 新冠病毒 ★ 變分自編碼器 ★ 高斯過程迴歸 ★ DNA 病毒序列突變	關鍵字(英)	★ COVID-19 ★ Variational AutoEncoder ★ Gaussian Process Regression ★ DNA virus sequence mutation
論文目次	摘要 iii Abstract v 誌謝 vii 目錄 ix 一、緒論 1 二、 DNA 序列資料來源 5 2.1 資料下載 5 2.2 資料描述 6 三、資料前處理 (多序列比對) 9 3.1 序列間編輯距離 (Levenshtein distance) 9 3.2 Clustal Omega 中使用的比對方式 10 3.2.1 得分函數 11 3.2.2 空位罰分 11 3.2.3 聯合空位罰分 11 四、模型 13 4.1 Variational AutoEncoder(VAE) 13 4.1.1 簡述 13 4.1.2 數學角度 14 4.1.3 模型建構 16 4.2 高斯過程 (Gaussian process) 18 4.2.1 kernel method 18 4.2.2 權重角度 19 4.2.3 函數角度 20 4.2.4 常見kernel 22 五、研究結果 25 5.1 DNA 序列對齊差異 25 5.2 Variational AutoEncoder 訓練 26 5.3 高斯過程 kernel 比較結果 27 5.3.1 L 的選取和 kernel 間的比較 27 六、總結 33 參考文獻 35
參考文獻	[1] L. van Dorp, M. Acman, D. Richard, et al., “Emergence of genomic diversity and recurrent mutations in sars-cov-2,” Infection, Genetics and Evolution, vol. 83, p. 104 351, 2020. [2] S. Duffy, L. A. Shackelton, and E. C. Holmes, “Rates of evolutionary change in viruses: Patterns and determinants,” Nature Reviews Genetics, vol. 9, no. 4, pp. 267–276, 2008. [3] J. Shaman and M. Galanti, “Will sars-cov-2 become endemic?” Science, vol. 370, no. 6516, pp. 527–529, 2020. [4] A. Kumar, “Model evolution in sars-cov-2 spike protein sequences using a generative neural network,” bioRxiv, 2022. [5] Y. Li, C. Huang, L. Ding, Z. Li, Y. Pan, and X. Gao, “Deep learning in bioinformatics: Introduction, application, and perspective in the big data era,” Methods, vol. 166, pp. 4–21, 2019. [6] S. Min, B. Lee, and S. Yoon, “Deep learning in bioinformatics,” Briefings in bioinformatics, vol. 18, no. 5, pp. 851–869, 2017. [7] R. F. Mansour, J. Escorcia-Gutierrez, M. Gamarra, D. Gupta, O. Castillo, and S. Kumar, “Unsupervised deep learning based variational autoencoder model for covid-19 diagnosis and classification,” Pattern Recognition Letters, vol. 151, pp. 267– 274, 2021. [8] S. Sinai, E. Kelsic, G. M. Church, and M. A. Nowak, “Variational auto-encoding of protein sequences,” arXiv preprint arXiv:1712.03346, 2017. [9] C. Doersch, “Tutorial on variational autoencoders,” arXiv preprint arXiv:1606.05908, 2016. [10] S. N. Dean and S. A. Walper, “Variational autoencoder for generation of antimicrobial peptides,” ACS omega, vol. 5, no. 33, pp. 20 746–20 754, 2020. [11] R. R. Eguchi, C. A. Choe, and P.-S. Huang, “Ig-vae: Generative modeling of protein structure by direct 3d coordinate generation,” PLoS computational biology, vol. 18, no. 6, e1010271, 2022. [12] K. Gregor, I. Danihelka, A. Graves, D. Rezende, and D. Wierstra, “Draw: A recurrent neural network for image generation,” in International conference on machine learning, PMLR, 2015, pp. 1462–1471. [13] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013. [14] T. Salimans, D. Kingma, and M. Welling, “Markov chain monte carlo and variational inference: Bridging the gap,” in International conference on machine learning, PMLR, 2015, pp. 1218–1226. [15] J. Walker, C. Doersch, A. Gupta, and M. Hebert, “An uncertain future: Forecasting from static images using variational autoencoders,” in European Conference on Computer Vision, Springer, 2016, pp. 835–851. [16] G. P. Way and C. S. Greene, “Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders,” in PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018: Proceedings of the Pacific Symposium, World Scientific, 2018, pp. 80–91. [17] X. Ding, Z. Zou, and C. L. Brooks III, “Deciphering protein evolution and fitness landscapes with latent space models,” Nature communications, vol. 10, no. 1, pp. 1– 13, 2019. [18] A. Hawkins-Hooker, F. Depardieu, S. Baur, G. Couairon, A. Chen, and D. Bikard, “Generating functional protein variants with variational autoencoders,” PLoS computational biology, vol. 17, no. 2, e1008736, 2021. [19] E. Schulz, M. Speekenbrink, and A. Krause, “A tutorial on gaussian process regression: Modelling, exploring, and exploiting functions,” Journal of Mathematical Psychology, vol. 85, pp. 1–16, 2018. [20] M. Seeger, “Gaussian processes for machine learning,” International journal of neural systems, vol. 14, no. 02, pp. 69–106, 2004. [21] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine learning. MIT press Cambridge, MA, 2006. [22] L. Cheng, S. Ramchandran, T. Vatanen, et al., “An additive gaussian process regression model for interpretable non-parametric analysis of longitudinal data,” Nature communications, vol. 10, no. 1, pp. 1–11, 2019. [23] S. Roberts, M. Osborne, M. Ebden, S. Reece, N. Gibson, and S. Aigrain, “Gaussian processes for time-series modelling,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 371, no. 1984, p. 20 110 550, 2013. [24] P. A. Romero, A. Krause, and F. H. Arnold, “Navigating the protein fitness landscape with gaussian processes,” Proceedings of the National Academy of Sciences, vol. 110, no. 3, E193–E201, 2013. [25] S. King, X. E. Chen, S. W. Ng, et al., “Modeling the trajectory of sars-cov-2 spike protein evolution in continuous latent space using a neural network and gaussian process,” bioRxiv, 2021. [26] Á. O’Toole, E. Scher, A. Underwood, et al., “Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool,” Virus evolution, vol. 7, no. 2, veab064, 2021. [27] J. Salvatier, T. V. Wiecki, and C. Fonnesbeck, “Probabilistic programming in python using pymc3,” PeerJ Computer Science, vol. 2, e55, 2016. [28] A. B. Abdessalem, N. Dervilis, D. J. Wagg, and K. Worden, “Automatic kernel selection for gaussian processes regression with approximate bayesian computation and sequential monte carlo,” Frontiers in Built Environment, vol. 3, p. 52, 2017. [29] K. P. Murphy, Probabilistic machine learning: an introduction. MIT press, 2022.
指導教授	洪盟凱周世偉	審核日期	2023-6-29
推文	facebook plurk twitter funp google live udn HD myshare reddit netvibes friend youpush delicious baidu
網路書籤	Google bookmarks del.icio.us hemidemi myshare

博碩士論文 109221018 詳細資訊