透過Ensemble method提升學生學習成效預測模型的準確度

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士

、以作者查詢全國書目

、勘誤回報

、線上人數：98

、訪客IP：3.141.7.165

姓名

鄭舜澤(Shun-Ze Jheng) 查詢紙本館藏

畢業系所

資訊工程學系

論文名稱

透過Ensemble method提升學生學習成效預測模型的準確度
(Applying Ensemble Method to Improve the Performance on Predicting Students′ Academic Performance)

相關論文

★ 應用智慧分類法提升文章發佈效率於一企業之知識分享平台	★ 家庭智能管控之研究與實作
★ 開放式監控影像管理系統之搜尋機制設計及驗證	★ 資料探勘應用於呆滯料預警機制之建立
★ 探討問題解決模式下的學習行為分析	★ 資訊系統與電子簽核流程之總管理資訊系統
★ 製造執行系統應用於半導體機台停機通知分析處理	★ Apple Pay支付於iOS平台上之研究與實作
★ 應用集群分析探究學習模式對學習成效之影響	★ 應用序列探勘分析影片瀏覽模式對學習成效的影響
★ 一個以服務品質為基礎的網際服務選擇最佳化方法	★ 維基百科知識推薦系統對於使用e-Portfolio的學習者滿意度調查
★ 學生的學習動機、網路自我效能與系統滿意度之探討-以e-Portfolio為例	★ 藉由在第二人生內使用自動對話代理人來改善英文學習成效
★ 合作式資訊搜尋對於學生個人網路搜尋能力與策略之影響	★ 數位註記對學習者在線上學習環境中反思等級之影響

檔案

[Endnote RIS 格式]

[Bibtex 格式]

[相關文章]

[文章引用]

[完整記錄]

[館藏目錄]

至系統瀏覽論文 ( 永不開放)

摘要(中)

線上教育平台的蓬勃發展，使得傳統教育型態進行轉變，隨著許多教育機構開始採用線上教育平台提供學生教育資源，如何確保學生學習品質的議題也日益重要。為了幫助授課教師掌握學生學習狀況與及時提供介入輔導，已有許多研究學者引入教育數據挖掘(Educational data mining ,EDM)於教育環境，透過機器學習、統計學，對學生之學習行為進行探索並對學習成效進行預測。因此，本研究透過分析學習行為，建立有效準確的學習成效預測模型，期望能確保線上教育平台的教育品質。
本研究收集了三個不同微積分班級的學習歷程，包含Open edX平台、MapleTA平台的線上行為與線下的實體作業與測驗。相關研究指出，一般將預測學習成效歸類成兩種問題，分別是迴歸與分類，因此本研究分別對這兩種作法進行探討，並提出ensemble method與常見的演算法進行比較，證明ensemble method能更進一步提升預測準確度。在迴歸問題方面，本研究先從常見的六種迴歸演算法中找出一種較穩定、準確的演算法，並以此演算法為基礎加入資料點分類與指標函數(indicator variables)；另外，在分類問題方面，本研究透過在建立分類模型的過程中加入重採樣(resampling)與投票機制(voting)來解決原始資料集中資料不平衡與單一演算法預測效能不足的問題，最後將此兩種ensemble method實作於三個微積分班級上，證明了ensemble method確實有達到改善的效果。

摘要(英)

The rapid development of the online education platform has changed the traditional education. With the adoption of Massive Open Online Course(MOOCs) in many education institutions, the issue of ensuring the quality of student learning becomes more and more important. In order to help instructors keep track of the progress of students and provide interventions to at-risk students, many researchers have introduced Educational Data Mining (EDM) into educational environment and apply machine learning and statistics not only to explore students’ learning behaviors but also to predict student academic performance. Therefore, this study analyzes learning behavior and build an effective and accurate predictive model of predicting student academic performance. Real data was collected from three MOOCs and MapleTA enabled calculus course, which comprise of video viewing behavior, online assessment behavior, homework score and exam score.
Many researchers generally predicting student academic performance by applying regression algorithms and classification algorithms. Therefore, this study explores these two approaches separately and proposes ensemble methods that are better than common algorithms. In terms of regression, this study first finds a relatively stable and accurate algorithm from the common six regression algorithms, and improve this algorithm through applying classifier to assign indicator variables to data points. In terms of classification, we add resampling technology and voting classifier to solve unbalanced data problem and bad performance by using single-algorithm. Finally, the two ensemble methods are implemented on three calculus classes, demonstrating that the ensemble method does achieve an improvement.

關鍵字(中)

★ 學習成效預測
★ 多元迴歸
★ 主成分迴歸
★ 指標函數
★ 學習風險識別
★ 多元分類
★ 重採樣
★ 投票機制

關鍵字(英)

★ Students academic performance prediction
★ Multiple Regression
★ Principle Component Regression
★ Indicator variables
★ At-risk Students Identification
★ Multiclass Classification
★ Resampling
★ VotingClassifier

論文目次

摘要 V
ABSTRACT VI
圖目錄 X
表目錄 XI
一、緒論 1
二、文獻探討 3
2.1 多元迴歸演算法(Multiple Regression Algorithm) 3
2.2 多元分類演算法(Multi-class Classification Algorithm) 4
2.3 總結 4
三、研究方法 4
3.1 系統流程圖 5
3.2 資料收集 5
3.4.1 特徵描述(Feature description) 6
3.4.2 資料前處理(Data pre-processing) 10
3.4.2.1 遺漏值處理(Imputing Missing Value) 10
3.4.2.2 資料整合(Data Integration) 11
3.4.2.3 資料標準化(Data Standardization) 11
3.3 資料儲存 11
3.4 資訊萃取與分析 11
3.4.1 特徵選取(Feature Selection) 12
3.4.1.1 皮爾森相關係數(Pearson correlation coefficient) 12
3.4.1.2 單變量分類器(Single Variable Classifier) 13
3.4.2 多元迴歸(Multiple Regression) 13
3.4.2.1 Multiple Linear Regression(MLR) 13
3.4.2.2 Classification And Regression Tree(CART) 14
3.4.2.3 Quantile Regression 14
3.4.2.4 Robust Regression 14
3.4.2.5 Support Vector Regression(SVR) 15
3.4.2.6 Principle Component Regression(PCR) 15
3.4.3 多元分類(Multiclass Classification) 16
3.4.3.1 Gaussian Naive Bayes(GaNB) 16
3.4.3.2 Support Vector Machine- Linear-SVC 16
3.4.3.3 Support Vector Machine- SVC 16
3.4.3.4 Logistic Regression 16
3.4.3.5 Decision Tree 17
3.4.3.6 Random Forest 17
3.4.3.7 Neural Network 17
3.4.4 交叉驗證(Cross Validation) 17
3.4.5 模型評估(Model Evaluation) 18
3.4.5.1 Predictive Mean Squared Error (pMSE) 18
3.4.5.2 Adjusted r-square(Adj. R2) 18
3.4.5.3 Confusion matrix 19
3.4.5.4 Accuracy 19
3.4.5.5 Precision 19
3.4.5.6 Recall 20
3.4.5.7 F1-measure 20
3.5 資訊應用 20
四、 Ensemble method設計與實驗結果 20
4.1 多元迴歸預測模型 21
4.2 多元迴歸-Ensemble method實驗設計 22
4.2.1 現有問題 22
4.2.2 剔除離群值(Remove outliers) 23
4.2.2.1 自定義離群值 24
4.2.2.2 影響點(Influence points) 24
4.2.3 加入指標函數(Indicator variables) 26
4.2.3.1 分群演算法賦予資料點指標函數 27
4.2.3.2 手動分群賦予資料點指標函數 29
4.2.3.3 分類演算法賦予資料點指標函數 32
4.3 多元分類預測模型 34
4.4 多元分類-Ensemble method實驗設計 36
4.4.1 現有問題 36
4.4.2 投票機制(Voting Classifier) 37
4.4.3 重採樣技術(Resampling technology) 38
五、結論與未來研究 40
六、參考文獻 43

參考文獻

林俊慶 , 黃俊堂 , 黃正旭 , 黃鈺晴 , 呂欣澤 , & 楊鎮華 . (2017, March). Prediction mechanism of At-risk students in MOOCs. Paper presented at the Taiwan E-Learning Forum (TWELF).
Agudo-Peregrina, Á. F., Iglesias-Pradas, S., Conde-González, M. Á., & Hernández-García, Á. (2014). Can we predict success from log data in VLEs? Classification of interactions for learning analytics and their relation with performance in VLE-supported F2F and online learning. Computers in human behavior, 31, 542-550.
Asif, R., Merceron, A., & Pathan, M. K. (2014). Predicting student academic performance at degree level: a case study. International Journal of Intelligent Systems and Applications, 7(1), 49.
Betts, J. R., & Grogger, J. . (2003). The impact of grading standards on student achievement, educational attainment, and entry-level earnings. . Economics of Education Review, 22(24), 343-352.
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). ClassiŪcation and Regression Trees (CART). Belmont (CA): Wadsworth.
Çevik, Y. D. (2015). Predicting college students’ online information searching strategies based on epistemological, motivational, decision-related, and demographic variables. Computers & Education, 90, 54-63.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting student performance from LMS data: a comparison of 17 blended courses using Moodle LMS. IEEE Transactions on Learning Technologies, 10(1), 17-29.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological), 215-242.
Eide, E., & Showalter, M. H. (1998). The effect of school quality on student performance: A quantile regression approach. Economics letters, 58(3), 345-350.
Galton, F. (1886). Regression towards mediocrity in hereditary stature. The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246-263.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection.
44
Journal of machine learning research, 3(Mar), 1157-1182.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (2011). Robust statistics: the approach based on influence functions (Vol. 196): John Wiley & Sons.
He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Paper presented at the Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on.
Huang, S., & Fang, N. (2013). Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models. Computers & Education, 61, 133-145.
Ibrahim, Z., & Rusli, D. (2007). Predicting students’ academic performance: comparing artificial neural network, decision tree and linear regression. Paper presented at the 21st Annual SAS Malaysia Forum, 5th September.
John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. Paper presented at the Proceedings of the Eleventh conference on Uncertainty in artificial intelligence.
Koenker, R., & Bassett Jr, G. (1978). Regression quantiles. Econometrica: journal of the Econometric Society, 33-50.
Loterman, G., Brown, I., Martens, D., Mues, C., & Baesens, B. (2012). Benchmarking regression algorithms for loss given default modeling. International Journal of Forecasting, 28(1), 161-170.
Lu, O. H., Huang, J. C., Huang, A. Y., & Yang, S. J. (2017). Applying learning analytics for improving students engagement and learning outcomes in an MOOCs enabled collaborative programming course. Interactive Learning Environments, 25(2), 220-234.
Lu, O. H., Huang, A. Y., Huang, J. C., LIN, A. J., HIROAKI OGATA, Yang, S. J. . (2017). Applying Learning Analytics for the Early Prediction of Students’ Academic Performance in Blended Learning. Educational Technology & Society.
Mani, I., & Zhang, I. (2003). kNN approach to unbalanced data distributions: a case study involving information extraction. Paper presented at the Proceedings of workshop on learning from imbalanced datasets.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.
Meier, Y., Xu, J., Atan, O., & van der Schaar, M. (2016). Predicting grades. IEEE Transactions on Signal Processing, 64(4), 959-972.
Oladokun, V., Adebanjo, A., & Charles-Owaba, O. (2008). Predicting students’
45
academic performance using artificial neural network: A case study of an engineering course. The Pacific Journal of Science and Technology, 9(1), 72-79.
Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240-242.
Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559-572.
Perna, L., Ruby, A., Boruch, R., Wang, N., Scull, J., Evans, C., & Ahmad, S. (2013). The life cycle of a million MOOC users. Paper presented at the Presentation at the MOOC Research Initiative Conference.
Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games Machine Learning, Volume I (pp. 463-482): Elsevier.
Romero, C., López, M.-I., Luna, J.-M., & Ventura, S. (2013). Predicting students′ final performance from participation in on-line discussion forums. Computers & Education, 68, 458-472.
Stigler, S. M. (1989). Francis Galton′s account of the invention of correlation. Statistical Science, 73-79.
Tomek, I. (1976). An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics(6), 448-452.
Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics(3), 408-421.
Yang, S. J., Huang, J. C., & Huang, A. Y. (2017). MOOCs in Taiwan: The Movement and Experiences Open Education: from OERs to MOOCs (pp. 101-116): Springer.
Yoo, J., & Kim, J. (2014). Can online discussion participation predict group project performance? investigating the roles of linguistic features and participation patterns. International Journal of Artificial Intelligence in Education, 24(1), 8-32.

指導教授

楊鎮華

審核日期

2018-7-12

推文