開發CNN模型預測學生是否退學— 練習如何建立AI模型以從NGS短序列片段數據中偵測SNP

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：13

、訪客IP：13.58.151.231

姓名

蔡婷安(Ting-An Tsai) 查詢紙本館藏

畢業系所

系統生物與生物資訊研究所

論文名稱

開發CNN模型預測學生是否退學— 練習如何建立AI模型以從NGS短序列片段數據中偵測SNP

相關論文

★ 發展酵素非限制性全基因體調控因子解析方法	★ 利用健保資料庫探討常見複雜疾病之中草藥處方研究
★ 主觀影響療癒的案例與主觀在醫療重要性的探討	★ 精神分裂症病患與正常人之DNA甲基化網絡的差異
★ 躁鬱症病患的精子之DNA 甲基化的網路分析	★ Cloud-R:以R軟體與雲端技術為基礎的生物統計應用網站
★ 中草藥藥性與中草藥遺傳演化樹之關係	★ 利用階層式叢集及不同分類方法分析人類正常組織特異性基因
★ 由ENCODE計畫分析脫氧核醣核酸酶I與組蛋白修飾	★ 皮膚痣圖片毛髮辨識去除
★ 中醫癌症處方多由癰瘍、和解之劑與寒方組成，並隨氣溫下降而更改組成	★ 主成分分析與叢集分析於DNA微陣列數據前處理的應用與實作
★ 確認與中醫處方有關的環境和社會經濟變數	★ 與中醫處方有關的社會經濟變量關係網絡的確認與分析
★ 深度 Q 網絡學習用於加護病房敗血症治療	★ 比較線性模型、多層感知器和卷積神經網絡在回歸分析應用中的性能

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

[檢視]

[下載]

本電子論文使用權限為同意立即開放。
已達開放權限電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。
請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。

摘要(中)

近年來人工智慧的發展迅速，TensorFlow為重要的深度學習框架之一，使用者可以輕鬆運用軟體庫裡的深度學習運算法，簡單開始架設模型，使得深度學習變得容易上手。如今運用深度學習分析大數據已成趨勢，而在影像辨識方面Convolution Neural Network(CNN)的表現也非常的優秀。我們認為，這項技術應用在學校系統上也一定能夠有實質的幫助。本文主要介紹如何將CNN技術應用於分析學生成績分析資料上，建立一個透過學期修課成績就可以預測下個學期是否會被退學的模型。
取得「學生成績分析資料」後，我們觀察到「QUIT REASON」這個欄位裡「累積兩次1/2學分不及格」與「累積兩次2/3學分不及格」這兩項與成績有直接關係的退學原因。本文使用R語言，針對所有大學部學生的成績資料，將這些大學生每一學期的成績資料一一列出轉換成圖片，以他們下一學期是否有被退學作為圖片的Label。利用Keras 在R語言內建立CNN模型，調整模型內的參數，一一嘗試後找出最適合處理「學生成績分析資料」的模型。
嘗試過各種不同參數的模型，我們發現參數的設定並沒有一定的趨勢，像是filter或是epoch，參數值增加但模型的表現不一定比較優秀。CNN模型需要透過經驗去做各種嘗試，從中挑選出最好的模型。

摘要(英)

Artificial intelligence has developed rapidly in recent years. TensorFlow is one of the important deep learning frameworks, users can easily use the deep learning algorithms in TensorFlow and start up to build model easily. It make deep learning easy to get started. Nowadays, use deep learning to analyze big data has become a trend. Moreover, Convolution Neural Network’s performance in image recognition is excellent. Nowadays, deep learning has been widely used. The application of this technology in the school system will certainly be very helpful. This article mainly introduces how to apply CNN technology to the analysis of student score analysis data and to build a model that can predict whether student will be dropped out in the next semester.
After we obtain student score analysis data, we observed two reasons for the dropouts in the "QUIT_REASON" column, "two cumulative 1/2 credit failures" and "two cumulative 2/3 credit failures", which are directly related to grades. This article use R language to analysis all required data. Focused on all college students’ data, list the results of each semester of these college students and convert them into pictures. Every picture has a label to show whether they will be dropped out in next semester. First of all, using Keras to build CNN model in R language. Second, adjust the parameters in the model. Third, trying one by one to find the most suitable model for processing "student score analysis data.
After trying various models with different parameters, we found that there is no certain trend in parameter setting. Such as filters or epochs, the parameter value increases but the performance of the model is not necessarily better. However, CNN model needs to make various attempts through experience to find the best model.

關鍵字(中)

★ 卷積神經網路
★ 退學預測
★ 深度學習

關鍵字(英)

★ Dropout prediction
★ Deep Learning
★ Convolutional Neural Network
★ CNN

論文目次

摘要 i
英文摘要 ii
誌謝 iv
目錄 v
圖目錄 vii
表目錄 ix
一、緒論 1
1-1 人工智慧(Artificial Intelligence) 1
1-2 深度學習 3
1-2-1 深度神經網路 3
1-2-2 激勵函數 4
1-2-3 損失函數(Loss Function) 7
1-2-4優化器(Optimizer) 9
1-3卷積神經網路(Convolutional Neural Network) 10
1-3-1深度卷積神經網路- AlexNet 13
1-4 DeepVariant 15
1-4-1 單核苷酸多型性(Single Nucleotide Polymorphism，簡稱SNP) 20
1-5學生成績分析資料 21
1-6 研究動機 22
二、研究內容與方法 25
2-1 資料前處理 25
2-2 Keras 27
2-2-1 安裝Keras 28
2-3 CNN 模型實作 29
2-3-1 建立模型 30
2-3-2編譯(compile) 30
2-3-3訓練模型(training) 30
2-3-4評估模型(evaluate) 31
2-3-5預測(predict) 31
三、結果 32
3-1 尋找最佳Model 32
3-1-1 Epochs 32
3-1-2權重(Weight) 33
3-1-3 Filter Size 36
3-1-4 Filters 36
3-2 模型表現 39
四、結論 41
參考文獻 43
附錄一以R語言做資料前處理之程式碼 45
附錄二以R語言建立CNN模型之程式碼 47
附錄三本文的CNN模型構成 49

參考文獻

1. Turing, A.M., Computing machinery and intelligence, in Parsing the turing test. 2009, Springer. p. 23-65.
2. Intel. How to Get Started as a Developer in AI. 2016; Available from: https://software.intel.com/en-us/articles/how-to-get-started-as-a-developer-in-ai.
3. Gill, J.K. Automatic Log Analysis using Deep Learning and AI. 2018 [cited 2020 July 1]; Available from: https://www.xenonstack.com/blog/log-analytics-deep-machine-learning/.
4. LeCun, Y., Y. Bengio, and G.J.n. Hinton, Deep learning. 2015. 521(7553): p. 436-444.
5. Nielsen, M.A., Neural networks and deep learning. Vol. 2018. 2015: Determination press San Francisco, CA.
6. Huang, W., 比較 Cross Entropy 與 Mean Squared Error. 2018.
7. Ruder, S., An overview of gradient descent optimization algorithms. 2016.
8. Miao, S., Z.J. Wang, and R.J.I.t.o.m.i. Liao, A CNN regression approach for real-time 2D/3D registration. 2016. 35(5): p. 1352-1363.
9. 斎藤康毅, Deep Learning｜用 Python 進行深度學習的基礎理論實作. 2017: 碁峰資訊.
10. Team, S., The Ultimate Guide to Convolutional Neural Networks(CNN), in SuperDataScience. 2018.
11. Krizhevsky, A., I. Sutskever, and G.E. Hinton. Imagenet classification with deep convolutional neural networks. in Advances in neural information processing systems. 2012.
12. Hinton, G.E., et al., Improving neural networks by preventing co-adaptation of feature detectors. 2012.
13. Poplin, R., et al., A universal SNP and small-indel variant caller using deep neural networks. 2018. 36(10): p. 983-987.
14. RGB color model, in Wikipedia, the free encyclopedia.
15. Poplin, M.D.a.R., DeepVariant: Highly Accurate Genomes With Deep Neural Networks, in Google Open Source. 2017.
16. Andrew Carroll, N.T., Evaluating DeepVariant: A New Deep Learning Variant Caller from the Google Brain Team, in dnanexus. 2017.
17. Administration, U.S.F.a.D. PrecisionFDA Truth Challenge. 2017 [cited 2020 July 10]; Available from: https://precision.fda.gov/challenges/truth/results.
18. Wiki, I.S.o.G.G. Single-nucleotide polymorphism. [cited 2020 July 12]; Available from: https://isogg.org/wiki/Single-nucleotide_polymorphism/en.
19. Keras. Why choose Keras? [cited 2020 July 8]; Available from: https://keras.io/why_keras/.
20. Anaconda. [cited 2020 June 25]; Available from: https://www.anaconda.com/.
21. LinkedIn, Emerging Jobs Report U.S. 2019.
22. 陳廷斌, 以資料探勘技術建立休退學之預測模式－以A科大為例. 2013, 崑山科技大學資訊管理研究所.
23. 鄭光盛, 應用多重支持之廣義關聯分類法建構大學休退學預測系統. 2018, 國立高雄大學資訊工程學系碩士班.

指導教授

王孫崇(Sun-Chong Wang)

審核日期

2020-7-30

推文