Reproducability of Enhancement of E-Learning Student’s Performance Based on Ensemble Techniques by Abdulkream A. Alsulami, Abdullah S. AL-Malaise AL-Ghamdi, & Mahmoud Ragab (2023, Electronics)

Author

Amy Tan (amyjxtan@gmail.com)

Published

October 29, 2024

Introduction

The original paper, “Enhancement of E-Learning Student’s Performance Based on Ensemble Techniques”, aimed to improve educational data mining, or EDM, specifically in regards to electronic learning (e-learning) since the COVID-19 pandemic saw a surge in E-learning programs. EDM involves developing methods to deal with the different types of data in educational systems to improve students’ learning outcomes. In particular, the researchers sought to predict student performance using decision trees, naive Bayes, and random forests, enhancing the accuracy further through bagging and boosting. Researchers ultimately concluded that most accurate model methods used decision trees, coupled with boosting, resulting in an accuracy of 0.77.

In my project, I sought to reproduce their findings and their visualizations in Python. I anticipated the largest challenges to be implementing the various EDM techniques as I wasn’t extremely well-versed in either of them. However, I was confident that the challenge would not prove too difficult, and would be an extremely valuable learning experience. I chose this particular paper to reproduce since it fit into my niche of interests. The work I hope to do in the future involves an intersection of education, cognitive science, and computational tools - this paper provided the opportunity to further my knowledge and skills in how computational tools can and are used to better our education system. My work is available in my project repository.

Methods

Materials

The Data/Sample

The dataset used in this project (provided in the data folder in xAPI-Edu-Data.csv, and also available on Kaggle) is the exact one used in the original paper. It was “obtained from the Kalboard 360 E-Learning system via the Experience API (XAPI). The data set in this study consists of 480 records with 17 attributes.” All attributes are either integer or categorical and are generally categorized into three major attribute types: demographic, academic, and behavioral.

Procedure

“First, we collect the data set and prepare it to perform the study. Then, three traditional data mining methods will apply (decision tree (DT), naïve Bayes (NB), and random forest (RF)) to produce a performance model. In addition to the classifiers mentioned earlier, two ensemble methods are used to improve their performance. Boosting, as well as bagging, is applied to enhance the student prediction model’s success. Two and three methods were added to each ensemble technique using the voting process for a more accurate prediction. The model’s last phase will involve evaluating and discussing the results. The data were divided into training and test sets. Each prediction model’s performance was evaluated using K-fold cross-validation. When testing a model, this technique is used to solve the variance problem. In brief, k-fold cross-validation divides the training set into 10 folds. During training, 9 folds are applied before the final fold is tested. As an average of the different accuracies is taken, this better represents the model performance. The method was repeated ten times. All models were run with the WEKA software’s default parameters.” My analysis differs in that the models will be built and run in Python.

Analysis Plan

Data Cleaning

“As part of preprocessing, data cleaning is essential for removing irrelevant objects and missing values in the data collection. There are zero missing values in the data set.” Though the authors do not specifically mention this, I cleaned the data set with specific regard to the NationalITy, PlaceofBirth, and StageID columns to ensure that column contents were consistent in their usage of capitalization and abbreviations (or lack thereof). I did not exclude any of the data from the provided data set.

Features Selection

“Feature selection refers to selecting the relevant features of a dataset based on specific criteria from an original feature set. There are two types of data reduction methods: wrapper methods and filter methods. The filter method ranks the features using variable ranking methods, with the highly ranked features being selected and implemented into the learning algorithm. In this study, the information gain ranking filter and a correlation- ranking filter were used. At each decision tree node, and in order to select the test attribute, the information gain measure is taken into account. The information gain (IG) metric determines features with a large number of values. It is calculated with Equation (1).

\[IG(T, a) = H(T) − H(T|a) (1)\]

where \(T\) is a random variable and \(H(T|a)\) is the entropy of \(T\) given the value of attribute a.

Correlation coefficients are applied to measure correlations among attributes and classes and inter-correlations between features. It is calculated with Equation (2).

\[\rho(X,Y) = \frac{\text{cov}(X,Y)}{\sigma_X\sigma_Y} = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2}\sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}}\]

where:

  • \(X\) and \(Y\) are the two variables being correlated
  • \(n\) is the number of data points
  • \(x_i\) and \(y_i\) are the values of \(X\) and \(Y\) for the data point
  • \(\bar{x}\) and \(\bar{y}\) are the means of \(X\) and \(Y\)
  • \(\text{cov}(X,Y)\) is the covariance between \(X\) and \(Y\)

Data Mining Tool and Model Creation

After the most relevant features were selected, I used these to create a decision tree (DT), naïve Bayes (NB), and random forest (RF) classifiers. However, as previously mentioned, my project will not use WEKA, but Python. I anticipated using skit-learn to accomplish building and testing these models. Following that, I applied boosting and bagging to all the models to test potential improvements.”

Measurement Measures

“Different DM techniques were compared to determine which had higher prediction accuracy than others, and a decision was made based on that. The following common metrics can evaluate a study’s performance: accuracy, precision, recall, and F-Measure.” Thus, my project also examined all four measurements of accuracy to see if I could reproduce a similar finding that decision trees with boosting had the highest accuracy of 0.77. The following includes how the authors calculated each of the following, for which I followed suit.

Accuracy

“This represents the classifier’s accuracy and relates to the classifier’s capacity. The accuracy of a predictor relates to the way it accurately predicts the impact of a predicted fea- ture for new information. The percentage of correct predictions divided by the total number of predictions yields the accuracy. It is calculated with the following Equation (3):

\[Accuracy = \frac{TP + TN}{TP + TN + FP + FN}\]

where: + True positives (\(TP\)): cases that are predicted as yes. + True negatives (\(TN\)): cases that are predicted as no. + False positives (\(FP\)): cases that are predicted yes and are actually yes. + False negatives (\(FN\)): cases that are predicted as no but are actually yes.

Precision

Precision is calculated as the ratio of correctly classified positive predictions to total positive predictions, whether correctly or incorrectly classified. It is calculated with Equation (4).

\[Precision = \frac{TP}{TP + FP}\]

Recall

The recall is determined by calculating the proportion of correctly classified positive predictions to all positive predictions. It is calculated with Equation (5).

\[Recall = \frac{TP}{TP + FN}\]

F-Measure

F-measure conveys both recall and precision in a single measure. It is calculated with Equation (6).”

\[ F1 − measure = (2 ∗ Recall ∗ Precision)/(Recall + Precision)\]

Differences from Original Study

Again, a key difference between this reproducibility project and the original paper are the computational tools used. The original paper used WEKA to visualize and perform machine learning/ensemble methods, whereas my project will use Python. Current anticipated packages include seaborn and skit-learn. Additionally, I plan to clean the data using pandas to ensure that entries are consistent, but I do not anticipate any changes I made to make a difference in the findings.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

Confirmatory analysis

The analyses as specified in the analysis plan.

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.