Introduction

What ultimately determines a student’s ability in the classroom? General advice includes increasing the student’s study time, seeing a tutor, and limiting the amount of substances the student consumes. Each of these factors, among others, are within the control of the student. The student can choose to study more, seek out help from a tutor, and drink less alcohol. With other factors though, the student’s options become more limited. For example, the student’s parents may be able to send their child to a private school, which, in some cases, provides a higher quality education. Furthermore, according to Kelly Musick and Ann Meier, two professors of sociology, students without two parental figures have “lower levels of educational and occupational attainment” than those with both of their parents. While there appears to be two types of variables, one which the student can generally affect and the other further out of the student’s control, both are clearly important for the student’s success

While it is expected that these factors influence a student’s ability to perform appropriately in the classroom, certain factors may be more important in determining math and science aptitude versus language proficiency or vice versa. It is possible, for instance, having a tutor could drastically improve a student’s math performance whereas a tutor would only have a marginal impact on the student’s language ability. On the other hand, having parents who are journalists or editors may be crucial to a student’s literacy development but completely inconsequential for the student’s development in mathematics.

Our analysis attempts to determine which factors are important for two categories of education, math and language to be specific. Through this analysis we will ascertain whether the factors important for math learning are in fact the same, or different from, those important to language learning. We acquired our data from Kaggle user Soumyadip Sarkar in a dataset entitled “Student Performance Data Set” on April 30, 2021. This dataset features two different CSV files, each containing factors relevant to student performance at local secondary schools in Portugal. The first dataset contains information about each student’s performance in his or her math class, whereas the second contains the same information with regard to the student’s Portuguese class. Both datasets have information across the same 33 different variables. These variables include, but are not limited to, the school the student attends as well as their sex, age, family size, travel time to and from school, and study time. The math dataset contains the final grade the student received in the course, and the Portuguese dataset has the final grade each student earned in his or her Portuguese class. Using the final grade as the response variable, we employ various models to determine the factors relevant to success in math as well as those for Portuguese. The final grade in each class was initially a numeric value between 0 and 20. In order to classify each student, we altered the final grade into three separate bins. The first bin, “poor,” included students whose final score was in the 25th percentile. The second bin, “good,” contained scores within the middle 50. The third bin, “excellent,” contained scores in the 75th percentile. These thresholds were slightly different for each of the datasets. For the math dataset, scores between 0 and 8 are poor, 9 to 13 are good, and 14 to 20 are excellent. The Portuguese dataset had cutoffs between 0 and 10 for poor, 11 and 13 for good, as well as 14 and 20 for excellent.

EDA

When performing initial exploratory data analysis of both files, we noticed some discrepancies between the two datasets as well as within each of the datasets. For example, the math dataset contains fewer records than the Portuguese dataset. There were 395 students listed in math class but 649 were in Portuguese class. To remedy this issue, we decided to undersample the Portuguese dataset so that its size matched that of the math dataset. This may have created a new issue regarding whether or not we had an adequate number of data points to properly generate a classification model. Additionally, there were a few factors that, during early analysis, appeared to be significant in predicting success. The discrepancies in the data and a few of the potentially important factors are listed below.

Data Issues

School

As seen in the graphs above, there is not an equal number of students from both schools in each of the datasets. Our initial exploration shows the math dataset had 349 students from the Gabriel Pereira school as opposed to the 46 from Mouzinho da Silveira. This raises concerns over the representativeness of the sample from the da Silveira school. Similarly, the Portuguese dataset had 423 Gabriel Pereira students compared to 226 from Mouzinho da Silveira. Regardless, there appears to be a difference between performance in Portuguese between the two schools. Those who attend Pereira are more likely to achieve an excellent score than those who attend Silveira. This distinction may appear in the final model.

Sex

Another issue within both datasets is that there are fewer male students than female students. This lack of parity between the two factor levels may result in poor model results when predicting success in either class if sex proves to be an important factor. On the other hand, ther appears to be a distinction between females’ success in math and those of their male counterparts. Males seem to achieve excellent scores at a slightly higher rate than females. Conversely, females may attain excellent scores at a higher rate than males.

Important Factors

Failures

Unsurprisingly, the number of failures a student has within any given class negatively affects their final course grade. Those with a higher number of failures on assignments or exams are more likely to receive a lower grade. This is true across both datasets, albeit with relatively small samples for those with failures, and is a variable we expect to appear as influential in our final model.

Age

Generally, the older the student is the more likely they are to succeed. This is especially true for their Portuguese class. The exception to this rule is this generalization is students who are over 18 years of age. Students 19 years old or older likely have extenuating circumstances that are keeping them in the secondary school. Those students may have been held back in the past due to performance or behavioral issues. Either of these situations is likely to have a negative effect on the student’s performance in the classroom.

Parental Job - Math

For both datasets the “other” job title has many more entries than the other levels of the parental job type factor. Regardless, there is still important information to glean from the data. For students with mothers who work in the health field, their scores appear to be higher than their peers. Curiously, this does not appear to translate to students whose fathers work in the health industry. Instead, students whose fathers are teachers generally have higher math scores than other students. One potential reason for this discrepancy is that students’ fathers generally have jobs classified as “other” than students’ mothers. The overwhelming number of students with fathers in the “other” industry means that the “teacher” classification may have a misrepresentative sample. With more data for fathers’ jobs, we may see the teacher factor become less important and see the health factor become more important, similar to the distributions for mothers’ jobs.

Parental Job - Portuguese

For the Portuguese data, there are a few outcomes that contrast the outcomes of the math dataset. For starters, when a parent stays home with their child, the student appears to do worse in their Portuguese class than other students. This goes against conventional wisdom that when a parent stays home with a child they are more likely to succeed in the future. Furthermore, having a mother as a teacher positively impacts the student’s final grade in Portuguese. This is consistent with the father’s job for the math data, and it is also consistent with the father’s job for the Portuguese data. As seen in the math data, having a mother who works in the health industry may increase the student’s ability in the classroom.

Family Size

Initially, we anticipated students with a greater family size, more than three members, would have better success in both math and Portuguese. This, however, may not be the case. Even though there are students who have three family members or fewer, indicating a missing parent, these students appear to perform as well as students with both of their parents and have siblings. While we anticipate most families who fall within the greater than three designation have both parents in their household, this may not always be true. It is possible a household consists of a single parent who happens to be raising three or more siblings. In this instance, the student may be negatively affected by not having both parents in his or her life, thus evening out the distribution.

Methods

To answer our initial questions, “are the factors that contribute to success in a math class the same factors as in Portuguese class”, and “if there are any primary factors involved in G3 scores for both subjects, what are they and what are their relative significance levels”, we will build classification models. Classification models use machine learning methods to assess relative varible importance, and predict a response. In our situation, we want to know which variables are of similar and varying importance to Math and Portuguese grades. To do so, we will need to build at least two models, one for the math data, and one for the Portuguese data. The following section will take you through building several different models, as well as the ones we deemed to be the best.

Pre-Model Building

The first step in the model building process was preparation. Upon exploratory data anlaysis, we discovered that the math and Portuguese data were slightly imbalanced. The math data contained 395 observations, while the Portuguese data contined 649 observations. In attempt to combat this imbalance, we chose to undersample to the Portuguese data down to the size of the math dta. To do so, we took a random sample of 395 rows of the Portuguese data, with replacement.

Random Forest

With the two datasets being more balanced, we can now move into the model building. The first step in this process is to create testing and training data. Our first model will be a random forest, which requires slightly more training data than other models, so the split will be 90% for training, and 10% for testing. These splits are created for both the math and the Portuguese data.

With our data partition created, we can next move into actually creating the random forest models. Two random forests here were created, one for math, and one for Portuguese scores. The two models were created using the default random forest settings, except “mtry” was set to 5 for both models, as this was calculated to be the square root of the predictor values, and number of trees was set to 1000 for both models. All of the variables in the prepared dataset were used for both models. Below is the results of each random forest.

## 
## Call:
##  randomForest(formula = G3 ~ ., data = math.training, mtry = 5,      ntree = 1000, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 1000
## No. of variables tried at each split: 5
## 
##         OOB estimate of  error rate: 42.98%
## Confusion matrix:
##           poor good excellent class.error
## poor        31   56         5   0.6630435
## good        19  137        18   0.2126437
## excellent    3   52        35   0.6111111

## 
## Call:
##  randomForest(formula = G3 ~ ., data = por.training, mtry = 5,      ntree = 1000, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 1000
## No. of variables tried at each split: 5
## 
##         OOB estimate of  error rate: 26.05%
## Confusion matrix:
##           poor good excellent class.error
## poor        91   15         7   0.1946903
## good        12  110        18   0.2142857
## excellent    7   34        63   0.3942308

Here, we see that the estimates of the error for math and Portuguese data are 42.98% and 26.05%, respectively. The estimated error rate of 42.98% of the math forest is very subpar, but as will see later on, is a recurring theme for the math models. The estimate error rate of 26.05% for the Portuguese forest is decent.

Decision Trees

The next series of models we will test will be decision trees. We will be using the same testing and training data as the random forest models, and will be using the CART method of decision trees. The decision trees were created using default settings, and complexities parameters of 0.025 fpr the math and 0.03 for the Portuguese model. After creating the models, we can determine the model accuracies to get a glimpse into model peformance.

## Confusion Matrix and Statistics
## 
##            Actual
## Prediction  poor good excellent
##   poor         2    0         0
##   good         6   15         6
##   excellent    2    4         4
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5385          
##                  95% CI : (0.3718, 0.6991)
##     No Information Rate : 0.4872          
##     P-Value [Acc > NIR] : 0.31533         
##                                           
##                   Kappa : 0.2095          
##                                           
##  Mcnemar's Test P-Value : 0.03843         
## 
## Statistics by Class:
## 
##                      Class: poor Class: good Class: excellent
## Sensitivity              0.20000      0.7895           0.4000
## Specificity              1.00000      0.4000           0.7931
## Pos Pred Value           1.00000      0.5556           0.4000
## Neg Pred Value           0.78378      0.6667           0.7931
## Prevalence               0.25641      0.4872           0.2564
## Detection Rate           0.05128      0.3846           0.1026
## Detection Prevalence     0.05128      0.6923           0.2564
## Balanced Accuracy        0.60000      0.5947           0.5966

## Confusion Matrix and Statistics
## 
##            Actual
## Prediction  poor good excellent
##   poor         8    0         0
##   good         2    5         2
##   excellent    2   10         9
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5789          
##                  95% CI : (0.4082, 0.7369)
##     No Information Rate : 0.3947          
##     P-Value [Acc > NIR] : 0.01647         
##                                           
##                   Kappa : 0.3809          
##                                           
##  Mcnemar's Test P-Value : 0.02517         
## 
## Statistics by Class:
## 
##                      Class: poor Class: good Class: excellent
## Sensitivity               0.6667      0.3333           0.8182
## Specificity               1.0000      0.8261           0.5556
## Pos Pred Value            1.0000      0.5556           0.4286
## Neg Pred Value            0.8667      0.6552           0.8824
## Prevalence                0.3158      0.3947           0.2895
## Detection Rate            0.2105      0.1316           0.2368
## Detection Prevalence      0.2105      0.2368           0.5526
## Balanced Accuracy         0.8333      0.5797           0.6869

As we can see from the confusion matrices of the two decision trees, these models are performing very subpar, more so than the random forest models. The math decision tree has an accuracy of 0.5385, while the Portuguese tree has an accuracy of 57.89%. Both of these measures are pretty bad, so as of now, decision trees will be below our random forests.

KNN

The last set of models we will be testing out are kNN models. For kNN, we had to isolate out all of the quantitative variables in the dataset, and then proceeded to scale them. Then, we had to re-partition the data, with a 80% training and 20% testing split. We then created the kNN models for both math and Portuguese using the default settings of the train() function with method ‘knn’. Below, we can see the confusion matrices for both models.

## Confusion Matrix and Statistics
## 
##            Actual
## Prediction  poor good excellent
##   poor         5    7         2
##   good        10   25        11
##   excellent    5    6         7
## 
## Overall Statistics
##                                           
##                Accuracy : 0.4744          
##                  95% CI : (0.3601, 0.5907)
##     No Information Rate : 0.4872          
##     P-Value [Acc > NIR] : 0.6325          
##                                           
##                   Kappa : 0.1347          
##                                           
##  Mcnemar's Test P-Value : 0.3496          
## 
## Statistics by Class:
## 
##                      Class: poor Class: good Class: excellent
## Sensitivity               0.2500      0.6579          0.35000
## Specificity               0.8448      0.4750          0.81034
## Pos Pred Value            0.3571      0.5435          0.38889
## Neg Pred Value            0.7656      0.5938          0.78333
## Precision                 0.3571      0.5435          0.38889
## Recall                    0.2500      0.6579          0.35000
## F1                        0.2941      0.5952          0.36842
## Prevalence                0.2564      0.4872          0.25641
## Detection Rate            0.0641      0.3205          0.08974
## Detection Prevalence      0.1795      0.5897          0.23077
## Balanced Accuracy         0.5474      0.5664          0.58017

## Confusion Matrix and Statistics
## 
##            Actual
## Prediction  poor good excellent
##   poor         5    5         1
##   good        16   17         8
##   excellent    4    9        14
## 
## Overall Statistics
##                                           
##                Accuracy : 0.4557          
##                  95% CI : (0.3431, 0.5717)
##     No Information Rate : 0.3924          
##     P-Value [Acc > NIR] : 0.15000         
##                                           
##                   Kappa : 0.1662          
##                                           
##  Mcnemar's Test P-Value : 0.05454         
## 
## Statistics by Class:
## 
##                      Class: poor Class: good Class: excellent
## Sensitivity              0.20000      0.5484           0.6087
## Specificity              0.88889      0.5000           0.7679
## Pos Pred Value           0.45455      0.4146           0.5185
## Neg Pred Value           0.70588      0.6316           0.8269
## Precision                0.45455      0.4146           0.5185
## Recall                   0.20000      0.5484           0.6087
## F1                       0.27778      0.4722           0.5600
## Prevalence               0.31646      0.3924           0.2911
## Detection Rate           0.06329      0.2152           0.1772
## Detection Prevalence     0.13924      0.5190           0.3418
## Balanced Accuracy        0.54444      0.5242           0.6883

As we can see from the confusion matrices of the two kNN mdels, these models are performing very subpar as well. The math decision tree has an accuracy of 47.44%, while the Portuguese tree has an accuracy of 45.57%. Both of these measures are very bad. The accuracy rate of the Portuguese kNN model is actually the lowest of all of the models, with poor other metrics for both models as well. With this in mind, we will not be using the kNN models.

Final Model

Therefore, with all of this information, we will be using the random forests as our final models for both math and Portuguese scores. These models had the best accuracy of the bunch of models we tested, with error rates of around 26% and 43%, which are decent at best in general, but are by far the best in comparison with the models we have looked at using these data.

So, to answer our questions, we will use variable importance measures from our random forest models. The first visual we will look at is a data frame consisiting of variables for both models, as well as their MeanDecreaseGini value. The higher this value, the more significant the variable is to the model. Here, we will display the top five varibles for both math and Portuguese scores.

##    Variable MeanDecreaseGini
## 8      Fedu         11.11512
## 11   reason         11.15370
## 7      Medu         13.21046
## 9      Mjob         13.57345
## 30 absences         18.11024

##    Variable MeanDecreaseGini
## 7      Medu         12.25974
## 3       age         12.31648
## 30 absences         13.48421
## 8      Fedu         13.48586
## 15 failures         17.34001

With the data frames above, we can spot similarities and differneces between the two models. A similarity we see is the presence of “Medu”, “Fedu”, and “absences” in the top five of both models. “Medu” and “Fedu” are mother’s education and father’s education, respectively. Where the two models differ in variable importnace is that the math model contains reason for attending the school and mother job (mjob), while the Portuguese model contains age and number of past class failures.

To get a further look at variable importance, we can create graphical visuals of variables that contribute to the accuracy and Gini of the models. Below are graphs for both models, with variables towards the top being more significant. The first pair of graphs on the left is for the math model, and the second pair, on the right, is for the Portuguese model.

Given the visuals above, we can point out some significant results. A big similarity we see between the two models is that the most important variable to accuracy is number of past class failures. Other variables that share common significance between the two models are absences, mother’s education (medu), father’s education (fedu), mother’s job (mjob), desire for higher education (higher), weekly alcohol consumption (Walc), a health status (health). One difference we can see between the two models is that for the math model, absences is the most important variable to the Gini value, while number of past class failures is the most important for the Portuguese models. Another difference to note is that father’s education is the second most important variable for accuracy in the Portuguese model, while it is not even in the top ten of the math model.

Evaluation of the Model

To evaluate our models’ performances, we will look at several different metrics. The first step in this process is to add the models’ predictions on the testing data. The initial error rates were calculated on the training data, but we now utilize the testing data to evaluate the accurcy and other metrics of the models. Seen below are the confusion matrices of both the math and Portuguese models.

## Confusion Matrix and Statistics
## 
##            Actual
## Prediction  poor good excellent
##   poor         3    1         0
##   good         6   17         5
##   excellent    1    1         5
## 
## Overall Statistics
##                                          
##                Accuracy : 0.641          
##                  95% CI : (0.4718, 0.788)
##     No Information Rate : 0.4872         
##     P-Value [Acc > NIR] : 0.03861        
##                                          
##                   Kappa : 0.3788         
##                                          
##  Mcnemar's Test P-Value : 0.06468        
## 
## Statistics by Class:
## 
##                      Class: poor Class: good Class: excellent
## Sensitivity              0.30000      0.8947           0.5000
## Specificity              0.96552      0.4500           0.9310
## Pos Pred Value           0.75000      0.6071           0.7143
## Neg Pred Value           0.80000      0.8182           0.8438
## Precision                0.75000      0.6071           0.7143
## Recall                   0.30000      0.8947           0.5000
## F1                       0.42857      0.7234           0.5882
## Prevalence               0.25641      0.4872           0.2564
## Detection Rate           0.07692      0.4359           0.1282
## Detection Prevalence     0.10256      0.7179           0.1795
## Balanced Accuracy        0.63276      0.6724           0.7155

## Confusion Matrix and Statistics
## 
##            Actual
## Prediction  poor good excellent
##   poor         9    0         0
##   good         2   11         2
##   excellent    1    4         9
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7632          
##                  95% CI : (0.5976, 0.8856)
##     No Information Rate : 0.3947          
##     P-Value [Acc > NIR] : 4.304e-06       
##                                           
##                   Kappa : 0.6426          
##                                           
##  Mcnemar's Test P-Value : 0.2998          
## 
## Statistics by Class:
## 
##                      Class: poor Class: good Class: excellent
## Sensitivity               0.7500      0.7333           0.8182
## Specificity               1.0000      0.8261           0.8148
## Pos Pred Value            1.0000      0.7333           0.6429
## Neg Pred Value            0.8966      0.8261           0.9167
## Precision                 1.0000      0.7333           0.6429
## Recall                    0.7500      0.7333           0.8182
## F1                        0.8571      0.7333           0.7200
## Prevalence                0.3158      0.3947           0.2895
## Detection Rate            0.2368      0.2895           0.2368
## Detection Prevalence      0.2368      0.3947           0.3684
## Balanced Accuracy         0.8750      0.7797           0.8165

With the predictions on the testing data, we actually see increases in accuracy in both models. The model accuracy for the math data is now 64.1%, compared to around 57% from earlier. The accuracy for the Portuguese model is also a step up, at 76.32%, compared to around 75% from earlier. Both of these metrics are decent, but surely have plenty room for improvement. However, the accuracies for both models are well above the accuracy from random guessing, which would be 33.33% in this case. Another metric we can look at for the models is sensitivity. For the math model, we see that the sensitivity for poor and excellent performance are low, at 0.3 and 0.5, respectively, while it is 0.89 for good performance, which is very good. For the Portuguese model, we have much more consistent, good sensitivities. The sensitivies are 0.75 for poor performance, 0.73 for good performance, and 0.82 for excellent performance, all of which are good measures. Another measure we can point out is the specificity and false positive rate (FPR). The FPR is one minus the specificity. Therefore, we see that the FPRs for the math model are very good for poor and excellent performance, at around 4.5% and 7%, respectively, while it is much more poor for good performance, at 55%. The FPRs for the Portuguese model are very good as well. For poor performance, our FPR is 0%, while it is around 17.4% for good performance and 18.2% for excellent performance. The last measure to look at here is the Kappa value. For the math model, the Kapp is 0.38, which represents a moderate agreement on how to classify outcomes, while for Portuguese, it is 0.6426, which represents a more substaintial agreement.

Fairness Assessment

To examine the fairness of our final models, we can use different fairness assessments on the portected classes within the data. The one protected class in both of the datasets is the sex variable, which is male versus female. To get an introduction into the distribution of sex within the two data sets, we can look at the base rates of sex. The below tables show male and females of the data, and the base rates are the proportion that are male.

## 
##   F   M 
## 208 187

## [1] "Math Male Base Rate: 47.3%"

## 
##   F   M 
## 226 169

## [1] "Portuguese Male Base Rate: 42.8%"

With the tables and base rates above, we see that both the math and Portuguese data are both have a bigger presence of females. This is seen more in the Portuguese data, where only 42.8% of the observations are male, whereas in the math data, it is 47.3% male. Both of these data seem to be roughly balanced, but there is surely some imbalance in favor of femaales.

After looking at the base rates, we can move more into metrics and visuals of fairness. First, we will look at the proportion parity metrics for the math data. Seen below is the metric, and the probability plot for the sex class of the math dataset.

##                              M          F
## Proportion           0.4705882  0.5909091
## Proportional Parity  1.0000000  1.2556818
## Group size          17.0000000 22.0000000

After running the proportional parity function on the testing and prediction data, we see that females are actully slighly favored. The proportional parity metric for males is found to be 1 in the first table, while it is 1.26 for females. This metric is supposed to be closer to 1, so the higher value for females indicates a slight favoring. This finding is furthered by the probability plot, which shows a higher density for females around the middle of the plot, which also indicates favoring of females.

Next, we will complete the same process, except on the Portuguese data. Below is the proportionl parity metrics, and the probability plot for the sex class of the Portuguese dataset.

##                              M     F
## Proportion           0.2777778  0.30
## Proportional Parity  1.0000000  1.08
## Group size          18.0000000 20.00

Similar to the math data, we see a slight favoring of females in the proportional parity metric. This value is 1.0 for males, and 1.08 for females. Though this difference is very slight, it has the potential to be an indicator of favoring. Despite this metric, we see in the probability plot, a higher density for males arouond the peak of the curve over females. This goes to indicate a favoring of males in the Portuguese data.

With all of these metrics and visuals in mind, we can say that there is a very slight unfairness in favor of females in the overall data. The higher proportion of females in both datasets, the higher proportional parity metrics, and the probability plots point towards females in this case. The exception to this is the probability plot for the Portuguese data, but as a whole, we can infer the general trend is in favor of females.

Conclusion

In conclusion, it was determined that the number of failures along with the number of absences were recognized as very important across all models in predicting what bin a student ought to fit into. This was not necessarily surprising given the distributions of G3 scores in our exploratory data analysis, but the fact that both of these variables were identified across all models confirmed the gravity of any expected relationships found during the EDA phase. While these models all generally pointed to the importance of these two variables, the conclusions and reliability of their predictions were very different. For example, some unseen differences in the variable importance hierarchy were elicited by the random forest model, especially when the Mother’s education variable was listed as third most important in terms of accuracy for both random forest models. This was not emphasized in prior models, and by implementing k-cross fold validation properly, random forest was able to deliver a more full and reliable description of the importance of relevant factors within a limited dataset. In learning The decision tree model was less than 60% accurate, which was better than the KNN model, with a sensitivity value below 20%. From testing multiple models, we gained an indication of which types of prediction are effective for this dataset and its size. By learning about their relative efficacies, we should now be able to comfortably derive our findings in the future quickly and accurately, allowing us to focus on interpretation rather than implementation.

Future Work

Moving forward, it would be helpful if future iterations of this project revolved around larger datasets. Overfitting and model accuracy were challenges throughout our exercises, and while a mediocre model may not have prevented us from recognizing the importance of large impact indicators such as the number of absences or failures as being important, it crushed our model sensitivity and Kappa values up until our random forest model. These values may even take precedence over specificity given the nature of our central question, which is once again, trying to establish what makes a good student, and identifying those attributes definitively. Our team even discussed finding a way to merge the datasets in such a way so that we could further examine the variables across both datasets and and their roles on performance across two academic subjects. However, we also recognized the potential need for additional variables to consider and test. This is because many of the most important, non-neutral variables that held relevance towards G3 scores were non-behavioral, and this is a serious issue given the nature of our question. Often, the reason our team or any other group inquires about student performance is because it may yield actionable solutions that can improve education, and solve real problems with new tangible insights. The demographic, parental and school related features dominated almost every model, and so next time, it might be wise to balance the dataset across all groups of students representatively, but then go on to create models that exclude circumstantial variables, like a father or mother’s education. Further, our final model was a strong learner, but perhaps boosting our model in further iterations may be a helpful way to confirm our results, as our balanced accuracy in our final model was only around ~75%. Considering the principle that data science is a tool for problem solving, we were also somewhat limited by the narrow sort of data that was used in these exercises in terms of real-world representation. Technically, our insights and subsequent conclusions can only really be rationally applied to the students listed in our two datasets given how heavily non-behavioral variables impacted student performance. Because non-behavioral circumstances are clearly different across varying education environments, one should only really consider the general trends identified by such a model, as applying any specific conclusions would be unsound.

References

Musick, Kelly, and Ann Meier. “Are both parents always better than one? Parental conflict and young adult well-being.” Social science research vol. 39,5 (2010): 814-30. doi:10.1016/j.ssresearch.2010.03.002

Comparing Students’ Grades

Jay Ralyea, William Cull, and John Hope

5/14/2021