dc.description.abstract | With the progress of information technology in the 21st century, the software, hardware, and facilities in schools as well as teaching resources have been gradually enriched, and e-learning platform came into being credited to the technology, which promoted the integration of virtual and practical teaching. In recent years, due to the COVID-19 pandemic, the teaching mode between teachers and students has changed from physical teaching to online teaching, gradually making online learning become the norm, which not only satisfied the students’ rights to receive education when they were subject to home quarantine, but also made the teaching interaction between teachers and students richer and diversified. The online learning platform could record the activities related to learning behaviors, such as checking teaching materials or taking online quiz in the log while students were learning, which is known as learning process record of students on the learning platform. These records contain key information that can affect learning results. Therefore, if a set of effective methods can be worked out to analyze the correlation between students’ learning behavior and academic performance during the use of online learning system, and to predict the learning results, teachers can provide early remedial teaching to students identified as poor in academic performance or intervene with their learning behaviors and properly adjust teaching contents, thereby improving their learning results. Nowadays, most scholars identify students’ learning results through data mining. However, if the traditional data mining classifier is used to classify minority class samples with poor learning results, all samples may be classified as good. The main reason is that when imbalanced data is used to construct classifiers, the learning rules of classifiers constructed by categories with majority class samples will be unfavorable to categories with minority class samples, making it possible for a small number of students with poor learning results to be judged as having good learning results. Some studies suggest the use of under-sampling to reduce the number of samples in majority class for the problem of imbalanced data. However, removing excessive data on a cluster basis may remove important information in majority class. Therefore, some scholars suggest the use of over-sampling to increase the number of samples in minority class. Synthetic Minority Oversampling Technique (SMOTE) proposed by Chawla et al. has been most widely used. The main purpose of this method is to randomly extract linear data points near minority class from the original data set as a new minority class data in order to improve the sample number of minority class. Additionally, Haibo He, Yang Bai, Edwardo A. Garcia, Shutao Li, et al came up with Adaptive Synthetic Sampling Approach (ADASYN) in 2008, which weighs every minority class samples. If a minority class samples has more neighboring majority class samples, the weight value of the sample will be higher. In this way, the number of samples to be synthesized for each minority class can be determined. SMOTE can oversampling all minority class samples, but not all minority class samples are indiscriminative. Among them, the minority class samples mixed with the majority class sample are the ones relatively indiscriminative. After oversampling, the minority class samples close to the boundary are mixed with the majority class samples, which is easy to produce noise. If the boundary samples are trained and learned, the majority class samples may be misjudged as minority class samples. Therefore, two data cleaning methods, Tomek Links and ENN, have been used to remove overlapping data after sampling, so as to improve the discrimination of minority class samples.
In this study, the learning records of the Python programming course in the first semester of the academic year 2021 of National Central University were used to predict the learning effectiveness of the final exam. In the serialized data of the learning process, the students who have passed and failed showed great disparity in proportion. It’s likely to make the algorithm impossible to work in effective training and learning if the imbalanced data was used for algorithm training and prediction, thereby leading to the paradox of accuracy. In this study, imbalanced data are sampled by over-sampling to balance the proportion between minority class samples and majority class samples to solve the imbalanced data of learning process record and effectively predict students’ learning results. In addition, the overlapping samples between different classes are eliminated by under-sampling, so that the samples of the nearest neighbors belong to the same class, thereby improving the classification efficiency of classification algorithms. Then, six kinds of algorithms of SVM, Logistic Regression, Random Forest, KNN, Naive Bayes, and Decision Tree were used to predict the learning results, with the prediction results of each algorithm compared. From the study, it is observed that the data sets processed by different sampling methods have significantly improved in Accuracy, Recall, F1-score, AUC, G-mean, MCC, and other indicators compared with the raw data sets after classification and prediction by each algorithm. After the imbalanced data is processed by the sampling method, the classification problem of minority class samples in the algorithm can be effectively solved, the disastrous consequences in the training can be avoided, and the accuracy of the algorithm prediction can be significantly improved. | en_US |