摘要: | 學習分析的理念上是透過學習者在課堂中產生的數位足跡,促使其在課堂中獲得成功,實作上許多學者們倡導早期識別風險學生,適時給予學習介入為主要手段,因此,在研究的領域中興起了使用機器學習訓練風險學生識別模型的風潮。然而,在系統性探討文獻後,我們發現許多研究中忽略了模型訓練的細節,包含了:學習風險早期預測可行性、降低資料維度以增加模型準確度與找出關鍵因子、風險類別不平衡的影響等。因此,本研究先著手探討學習風險預測的現況定義,再收集了12個來自於線上學習平台的課程資料,並且以監督式/非監督式學習、回歸、分類以及分群等方法,透過資料的切片、特徵工程導入、資料重新取樣等進行探索。最後,透過交叉驗證機制與現況比對,證實了透過學生在平台上的關鍵學習行為,能夠在學期前三分之一針對風險程度進行提前識別,並且歸納出教師常用的鑑別型、嚴格型與寬鬆型等三種給分策略,以及每一種策略對於機器學習的成效影響以及改善方式。在機器學習方法套用於實際的課程之前,我們提出了兩點限制,第一、風險識別模型若需在同類型的課程通用,需限制課程長度、學習教材、作業、小考、學習活動以及給分策略的一致性;第二、對於低數位足跡學生族群無法透過機器進行風險識,教師仍需投入投入合適的干預手段在該族群,若是採取小考成績輔助,需要特別注意考題的鑑別度以提高識別的準確度。;The concept of learning analytics is to motivate student achieving success in the classroom by the support of digital footprint that generated from the learning environment. In practice, many researchers advocate early identification of risk students and timely access to learning intervention as the main approach. Therefore, a trend of adopting machine learning to train risk student identification models has emerged in the field of learning analytics. However, after systematically exploring the recent literature, we identified several details of model training were overlooked by many studies, which including an innovated early warning system for classroom, reducing data dimensions for improving model accuracy and identifying key factors that affected students′ learning performance, and impact of the number of failure students that caused by teachers′ grading policy. Therefore, this thesis collected 12-courses data and adopted supervised/unsupervised learning under the method of classification, regression, and clustering to fill the gap from previous studies. Through the process of feature engineering and resampling, it is confirmed that students′ risk level can be identified by one-third of the semester and three grading policies have been summarized, which is discrimination, stringency, and leniency. Moreover, a resampling process is necessary to avoid issues caused by teachers′ grading policy. Furthermore, we propose two limitations when adopting machine learning into the classroom: the first one is the risk identification model could be applied to different courses only if the course duration, learning materials, homework, exam, learning activities, and grading policy were consistent. Second, machine cannot identify the risk population with a low digital footprint, exam discrimination is necessary if the instructor would consider the exam results as well. |