整合深度學習方法預測年齡以及衰老基因之研究

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：78

、訪客IP：3.21.248.119

姓名

宋政洋(Jheng-yang Sung) 查詢紙本館藏

畢業系所

資訊工程學系在職專班

論文名稱

整合深度學習方法預測年齡以及衰老基因之研究
(Deep learning approach for predicting aging-associated genes)

相關論文

★ 運用深度學習方法預測阿茲海默症惡化與腦中風手術存活

★ 運用深度學習方法預測癌症種類及存活死亡與治癒復發

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

深度學習是許多現代AI人工智慧（Artificial Intelligence）應用的基礎。自從在語音識別和圖像識別領域中展現出突破性的成果後，深度學習在其他領域的應用便以極快的速度成長。而在生物醫學領域也陸續可以看到其應用，像是癌症檢測，生物信息分析等等，在生物學中的衰老研究也有相當大的貢獻，在此論文中，使用了來自The Genotype-Tissue Expression(GTEx)的基因型組織表達，針對DNA去做次世代定序而得出的RNA-seq資料，因具有檢測速度快，通量(throughput)高且檢測的範圍廣的特點，所以在偵測基因表現量上有較佳的正確性。
在本篇論文裡，主要方向為三：
1. 分別對肺組織、肝組織、小腦組織、心臟組織以及血液所屬供體的年齡層進行分類預測
2. 實驗深度神經網路下在不同激發函數以及損失函數中比較結果 3. 透過統計分析的方法提取各組織關聯基因集以探討老化因子
在本篇論文裡，會使用像是嶺回歸(Ridge Regression)，決策樹(Decision Tree)，隨機森林(Random Forest)以及支援向量機(Support Vector Machine)等部分機器學習之方法來實驗，同時為了比較各個方法的辨識率，也加入深度神經網路(Deep Neural Network)、自編碼器(Auto-encoder)等深度學習的方法，最後利用統計分析的方法探討各組織間存在的衰老相關潛在因子。

摘要(英)

Deep learning is the foundation of AI Artificial Intelligence applications.
Since the achievement in the field of speech recognition and image recognition, DNN has grown with extremely fast rate in other fields. For biomedicine, the application of deep learning methods, such as cancer detection, bioinformatics analysis, etc., has also been widely used,
and aging research has also made significant contributions . In this paper,
The genotype tissue from The Genotype-Tissue Expression (GTEx) expresses RNA-seq data for DNA sequencing, with high detection speed, high throughput and wide range of detection. Characteristics, so there is better correctness in detecting gene expression.
In this paper, we have tree main directions:
1.Classification and prediction of age groups from normal tissues
2.Compare the results between activation function and loss functions
3.Extracting related gene sets of various tissues by statistical analysis
In this paper, we will use machine learning for experiment such like Ridge Regression , Decision Tree , Random Forest, and Support Vector Machine . In order to compare the recognition rates of each method,we also added deep neural network , auto-encoder , and other methods of deep learning.

關鍵字(中)

★ 深度學習
★ 機器學習
★ 加權基因共表達分析
★ 醫學預測

關鍵字(英)

★ Deep learning
★ Machine learning
★ Weighted gene co-expression network analysis
★ Medical prediction

論文目次

中文摘要 ................................................................................................................ i
Abstract ................................................................................................................ ii
圖目錄 .................................................................................................................. iii
表目錄 ................................................................................................................... v
章節目次 .............................................................................................................. vi
第一章緒論 ....................................................................................................... 1
1.1 研究背景、動機及目的 ................................................................................................ 1
1.2 研究方法與章節概要 .................................................................................................... 3
第二章相關研究及文獻探討 ........................................................................... 4
2.1 深度學習架構 ................................................................................................................ 4
2.1.1 感知機原理 ............................................................................................................... 4
2.1.2 倒傳遞類神經網路 ................................................................................................... 7
2.1.3 多層感知機架構 ....................................................................................................... 8
2.2 分類方法(Classification) ............................................................................................ 11
2.2.1 支持向量機(Support Vector Machine , SVM) .................................................... 11
2.2.2 嶺回歸分析(Ridge regression) ............................................................................... 14
2.2.3 決策樹(Decision tree) ............................................................................................ 15
2.2.4 隨機森林(Random forest) ..................................................................................... 17
vii
第三章資料特徵處理 ....................................................................................... 20
3.1 主成分分析(Principal component analysis , PCA) ................................................... 20
3.2 自編碼器(Auto Encoder) ............................................................................................. 21
第四章整體實驗架構與方法 ........................................................................... 26
4.1 GTEx 資料集 ................................................................................................................. 27
4.2 資料正規化方法 ........................................................................................................... 28
4.3 激發函數 ....................................................................................................................... 30
第五章實驗過程與結果 ................................................................................... 32
5.1 實驗設置與環境 ........................................................................................................... 32
5.2 深度學習分析 ............................................................................................................... 35
5.2.1 實驗流程 ................................................................................................................. 35
5.2.2 六類年齡層預測 ...................................................................................................... 36
5.2.3 兩類年齡層預測 ...................................................................................................... 38
5.2.4 激發函數的比較 ..................................................................................................... 44
5.2.5 損失函數的比較 ...................................................................................................... 44
5.2.6 深度學習結果討論 ................................................................................................. 46
5.3 加權基因共表達網路分析 ........................................................................................... 47
5.3.1 實驗流程 ................................................................................................................. 49
5.3.2 資料預處理 ............................................................................................................. 50
viii
5.3.3 基因層級聚類分析 ................................................................................................. 50
5.3.4 基因模塊性狀分析 ................................................................................................. 52
5.3.5 基因模塊間關聯分析 ............................................................................................. 57
5.3.6 基因模塊下游分析 ................................................................................................. 58
5.3.7 基因共表達網路分析 .............................................................................................. 58
第六章結論及未來研究方向 ......................................................................... 61
參考文獻 ............................................................................................................. 62

參考文獻

[1] LeCun, Y., Y. Bengio, and G. Hinton, Deep learning. Nature, 2015. 521: p. 436.
[2] ZHAVORONKOV, A., Insilico to present at the WuXi Healthcare Forum 2019>. 2019
[3] Hoerl, A.E. and R.W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 1970. 12(1): p. 55-67.
[4] Safavian, S.R. and D. Landgrebe, A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 1991. 21(3): p. 660-674.
[5] Breiman, L., et al., RANDOM FORESTS. January 2001.
[6] Chen, T., et al., Targeted Local Support Vector Machine for Age-Dependent Classification. Journal of the American Statistical Association, 2014. 109(507): p. 1174-1187.
[7] Haykin, S., Neural networks. Vol. 2. 1994: Prentice hall New York.
[8] AP, S.C., et al. An autoencoder approach to learning bilingual word representations. in Advances in Neural Information Processing Systems. 2014.
[9] Ames, B.N., M.K. Shigenaga, and T.M.J.P.o.t.N.A.o.S. Hagen, Oxidants, antioxidants, and the degenerative diseases of aging. 1993. 90(17): p. 7915-7922.
[10] Finkel, T., M. Serrano, and M.A.J.N. Blasco, The common biology of cancer and ageing. 2007. 448(7155): p. 767.
[11] Beal, M.F.J.A.o.n., Aging, energy, and oxidative stress in neurodegenerative diseases. 1995. 38(3): p. 357-366.
[12] Lindsay, J., et al., Risk factors for Alzheimer’s disease: a prospective analysis from the Canadian Study of Health and Aging. 2002. 156(5): p. 445-453.
[13] Graves, A., A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. 2013. IEEE.
[14] Zhavoronkov, A., et al., Artificial intelligence for aging and longevity research: Recent advances and perspectives. Ageing Res Rev, 2019. 49: p. 49-66.
[15] Jia, K., et al., An analysis of aging-related genes derived from the Genotype-Tissue Expression project (GTEx). Cell Death Discovery, 2018. 4(1): p. 91.
[16] Tan, Y., J. Wang, and J.M. Zurada, Nonlinear blind source separation using a radial basis function network. IEEE transactions on neural networks, 2001. 12(1): p. 124-134.
[17] Simonyan, K. and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[18] Warsito, W. and L. Fan, Neural network based multi-criterion optimization image reconstruction technique for imaging two-and three-phase flow systems using electrical capacitance tomography. Measurement Science and Technology, 2001. 12(12): p. 2198.
[19] Krizhevsky, A., I. Sutskever, and G.E. Hinton. Imagenet classification with deep convolutional neural networks. in Advances in neural information processing systems. 2012.
[20] Kingma, D.P. and M.J.a.p.a. Welling, Auto-encoding variational bayes. 2013.
[21] Mnih, A. and K. Gregor, Neural variational inference and learning in belief networks. arXiv preprint arXiv:1402.0030, 2014.
[22] Goodfellow, I., et al. Generative adversarial nets. in Advances in neural information processing systems. 2014.
[23] Mayr, A., et al., Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. 2018. 9(24): p. 5441-5451.
[24] Caruana, R.J.M.L., Multitask Learning. 1997. 28(1): p. 41-75.
[25] Bengio, Y., A. Courville, and P. Vincent, Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013. 35(8): p. 1798-1828.
[26] Rajpurkar, P., et al., CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv preprint arXiv:1711.05225, 2017.
[27] Iandola, F., et al., Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869, 2014.
[28] Liu, Y., J. Zhou, and K.P. White, RNA-seq differential expression studies: more sequence or more replication? Bioinformatics, 2013. 30(3): p. 301-304.
[29] Grabherr, M.G., et al., Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology, 2011. 29(7): p. 644.
[30] Schena, M., et al., Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 1995. 270(5235): p. 467-470.
[31] Consortium, I.H.G.S., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860.
[32] Meng, F., et al., Involvement of human micro-RNA in growth and response to chemotherapy in human cholangiocarcinoma cell lines. Gastroenterology, 2006. 130(7): p. 2113-2129.
[33] Wu, Y. and K. He, Group normalization. arXiv preprint arXiv:1803.08494, 2018.
[34] Ioffe, S. and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[35] Wang, Z. and A.C. Bovik, Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures. IEEE Signal Processing Magazine, 2009. 26(1): p. 98-117.
[36] Willmott, C.J. and K.J.C.r. Matsuura, Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. 2005. 30(1): p. 79-82.
[37] Klambauer, G., et al. Self-normalizing neural networks. in Advances in Neural Information Processing Systems. 2017.
[38] Nair, V. and G.E. Hinton. Rectified linear units improve restricted boltzmann machines. in Proceedings of the 27th international conference on machine learning (ICML-10). 2010.
[39] McCulloch, W.S. and W. Pitts, A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 1943. 5(4): p. 115-133.
[40] Rosenblatt, F., The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 1958. 65(6): p. 386.
[41] Livingstone, M.S. and D.H.J.N. Hubel, Effects of sleep and arousal on the processing of visual information in the cat. 1981. 291(5816): p. 554.
[42] Rumelhart, D.E., G.E. Hinton, and R.J. Williams, Learning representations by back-propagating errors. nature, 1986. 323(6088): p. 533.
[43] De Boer, P.-T., et al., A tutorial on the cross-entropy method. Annals of operations research, 2005. 134(1): p. 19-67.
[44] Ephraim, Y. and D. Malah, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on acoustics, speech, and signal processing, 1984. 32(6): p. 1109-1121.
[45] Chang, C.-C., C.-J.J.A.t.o.i.s. Lin, and technology, LIBSVM: A library for support vector machines. 2011. 2(3): p. 27.
[46] Safavian, S.R., D.J.I.t.o.s. Landgrebe, man,, and cybernetics, A survey of decision tree classifier methodology. 1991. 21(3): p. 660-674.
[47] Jolliffe, I., Principal component analysis. 2011: Springer.
[48] Hinton, G.E. and R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. science, 2006. 313(5786): p. 504-507.
[49] Ng, A. and S. Autoencoder, CS294A Lecture notes. Dosegljivo: https://web. stanford. edu/class/cs294a/sparseAutoencoder_2011new. pdf.[Dostopano 20. 7. 2016], 2011.
[50] Lonsdale, J., et al., The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 2013. 45: p. 580.
[51] Seifuddin, F., et al., lncRNAKB: A comprehensive knowledgebase of long non-coding RNAs. 2019: p. 669994.
[52] Valero, O., On Banach fixed point theorems for partial metric spaces. Applied General Topology, 2005. 6(2): p. 229-240.
[53] Mortazavi, A., et al., Mapping and quantifying mammalian transcriptomes by RNA-Seq. 2008. 5(7): p. 621.
[54] Yang, J., et al., Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases. Scientific Reports, 2015. 5: p. 15145.
[55] Langfelder, P. and S. Horvath, WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 2008. 9(1): p. 559.

指導教授

王家慶許藝瓊(Jia-Ching Wang Yi-Chiung Hsu)

審核日期

2019-8-12

推文