Is Students Self-Prediction of Success an Effective Tool in Early-Intervention?

A comparison of Findings from Student Self-Report Surveys and Course Outcomes from the Data Science in Education Using R Online Science Class Student self-report and collected data. 

About

The complexity of getting students the help they need to succeed with the resources available in the classroom can be a challenging one, especially when it comes to identifying students in need before it is too late to help them pass or flourish in a course. Although countless methods to identify struggling students early on have been attempted, such as predictive modeling, many students still “fall through the cracks” and do not get the help they needed. The more metrics available to identify students early and cause less to be missed, the better.

So what if students were better predictors of their potential success and were just too nervous to seek out the help that they really need? The intention is to take pre-course self-reported data on the students personal perceived competence with their chosen course and see if this method is successful at identifying students who ultimately pass or fail. 

The data for the analysis is borrowed from Estrellado, Bovee, Mostipak, Rosenberg, and Velásquez and their survey and data collection of online science class student self-report and collected data of 943 students as analyzed in their text Data Science in Education Using R. The data represents student final outcomes, data points collected from the students time on the online portion of the course, and self-reported survey data. 

NOTE that the grading scale is based on that of the North Carolina State University in Raleigh, North Carolina and that NA results to queries, such as gender or missing survey data, are excluded in relevant sections. Additionally, the data is used both ethically and legally as all of the information is anonymous and no personally identifiable information (PII) was shared. 

Research Question

Is there a link between students’ pre-course perceived competence and students who eventually needed help in the course or failed?

The goal is to understand if students are honest and largely able to successfully appraise their own chances of success in their chosen course of study and if there is a correlation between their self-appraisal of perceived competence and metrics including time spent on course, gender, and final grade. If successful, the hope is that perceived competence could be another metric in a predictive model to help identify struggling students as early as possible to set them up for success and get the help they need. 

Introduction

The dilemma of how to identify struggling students early enough to intervene and help is a complex one that has many dimensions but often leaves some students still falling behind through the cracks. Fifty-eight (58%) percent of responding students included in the sample group enrolled in the online sections of the science courses with the reason “Course Unavailable at Local School”. As a result, it is likely that these were not required courses to graduate for these groups. 

The data itself presents the possibility for many possible analytical questions to be analyzed, such as whether there is a model that could be rendered, whether student interest is a significant factor, and other correlative studies on time spent and other metrics in relation to final grade. 

The following will focus exclusively on the question of whether or not the student-reported perception of perceived competence before the start of the course can be a useful metric in identifying struggling students? The goal is to discover If there is a link between student success (time spent and final grade) and self-reported perceived competence, then the metric could be helpful to explore in a predictive model and if similar surveys could be issued in classrooms in the future. 

Findings

The results depicted several interesting trends between students’ self-reported perceived competence and their activities, such as time spent on the online portion of the course, final grade, gender, etc. 

Many of the metrics, such as student progress in the course and how much time students are spending on the course, are all clear accepted means of seeing the student’s potential for success, but the goal was to see how accurately perceived competence can be when trying to identify struggling students.  

The following chart shows the number of students who received a final grade of an A, B, C, D, or F in their online science course and is further color-coded based on the course taken. NOTE, students who did not complete the course are not included.  Most students received either an A or a B, generally accepted good scores. There is a consistent decrease until reaching the F or failing scores which there are a larger number of than D scores. Based on these results, the assumption would be a majority number of students would be feeling confident in their ability to flourish with a smaller number of students needing additional assistance. 

The results of the student pre-course survey question on their perceived competence in their selected class. The survey question was “I consider this topic to be one of my best subjects” and was presented as a five-point Likert scale with 1 being strongly disagreed and 5 strongly agree. As shown in the following chart, most students were neutral in feeling, opting for a 3, with more students leaning towards answers of 4 or 5 as compared to answers of 1 or 2. These results further make sense as we consider student reasoning and the likelihood that this is not a mandatory course for many students. Importantly, we can see most students felt as though they were not playing to their strengths.  

The scatter plot below shows the relationship between the time students invested in the online space for the course and their final grade in the course. The result approximates final grades by showing the relationship between the amount of time spent on their classes and the percentage of points earned in their course. As shown, there is a quick spike among the students and their time invested, with more time spent correlating strongly to a passing course grade.

 

When comparing perceived competence and the time students are spending on the course, as shown in the chart below, we can see that there is not so much variation. Students with low competence ratings of 1 or 2 averaged the least amount of time on the course with students rating themselves a 3, 4, or 5 seeing very similar, comparitively higher amounts of time spent. 

Despite the sentiments and time spent, many students who statistically put in more time, ranking a 5 equating to high confidence in the course received marginally lower grades. Students who rated themselves a 3 or 4 averaged higher scores, indicating that their appraisal of their own abilities did have some accuracy. Students with 1-2 as their personal rating generally scored lower, indicating some validity in student outlooks. Also important of note, the shape of the perceived competence graphs for time spent and final grade have a resemblance.

Additionally, we can see that despite appraisals, the majority of students passed the course with some failing grades being seen among students of all five possible rankings for perceived competence. This particular finding raises some concerns as depending on the self-report data on its own would cause several students to not get the help they need while focusing energy on students who appear to have not needed it as much. 


Also of note, of the 35 students who withdrew or dropped the course, none rated themselves a 1 for their perceived confidence in the course. Interestingly as well, of this group of students, none of the male students gave themself a perceived confidence score of a 4 or 5, which is likely just coincidental. The majority of these students also rated themself a 2 or 3. Also important to NOTE is that we do not have reasons for the students not completing the course and the reason could be completely unrelated to their success in the course.

Additionally, it was found that both men and women tended towards a neutral rating of their perceived competence with 3 and 4 being their top replies. Both genders had hardly any 1 replies and 2 and 5 were median answers.

Similar findings were apparent when looking at final grade results in terms of gender. Women consistently saw decreases in the number of lower scoring individuals while men appeared more consistent. 

Conclusions

Students’ pre-course perceived competence may be a valuable metric to include, with others, as a part of a predictive model. Although imperfect since many students succeeded despite their initial hesitation while others came in with high expectations they would struggle to meet, there was still a seemingly present link between perceived competence and eventual success. 

It is also important to note that the survey data, like all survey data, can often fall victim to bias where respondents may not be totally honest or may not take the time to answer thoughtfully. If students are genuinely thoughtful and honest, I feel the metric would be at its most useful vs. when students speed through it or do not take it seriously. 

Ultimately, I believe examining other student data sets to see the validity of perceived competence is of value to ensure the results are not simply an anomaly. In short, I see this as an initial observation seeing the potential for developing a new metric for early prediction of students needing additional assistance. 

Creation of Analysis

Background

I had worked with the dataset multiple times in the past and found myself curious, after analysis, about the survey data and what it could illuminate about the students. Through a review of the survey questions, I felt perceived competence would be an interesting metric to work with. The reasoning was that through past work, time spent was often correlated to final grade, as shown in one of the above graphs. As a result, the question of student perceived confidence in relation to final grade had me intrigued. 

As a student, I often was fairly certain about classes I was going to struggle with and classes I did not need to be concerned about. Very rarely, personally, was I wrong. My assumption was that if I have this gut feeling going into a course, do the majority of fellow students feel the same? That was the initial inspiration.

Data Cleaning and Wrangling

Most of the effort in cleaning the data involved manipulation of variables since the initial researchers already did a significant amount of cleaning and wrangling. 

Cleaning and Wrangling involved 

  • Rounding percentage grades to the closest whole number

  • Remove NA values from 

    • Perceived competence survey question
    • Gender variable
  • Filter enrollment status to only show “Approved/Enrolled” as their status

  • Mutate to:

    • Create percent grade by multiplying proportion earned by 100
    • Create pass/fail variable by indicating anything below 60% as a failing grade
    • Recode course title abbreviations to the subject name 
    • Recode numeric percent grades to a corresponding letter grade 
  • Create specific data frames with relevant and mutated variables including:

    • Ones from the original data frame - subject, proportion_earned, grade_percent, q3, time_spent_hours, gender, enrollment_status

    • Manufactured ones - letter_grade, grade_percent, Pass.Fail


Visualizations

The goal with the visualizations was to select and create relationships that would help illuminate the effectiveness of self-appraised perceived competence. The result was a mix of bar charts, a scatterplot, and smooth plots.  

Bar Graphs

The first set of bar graphs was meant to aid in the comparison between final grade distribution and perceived competence. Both were color-coded by subject. The relationship between the two charts helped to demonstrate that many students who were not totally confident in their abilities with the course still managed to achieve an A or B in the course, showing that there was a lack of correlation between students appraisal and final score. This relationship was assumed, however. 

The second set of bar graphs was meant to show the relationships between perceived competence and gender as well as final grades and gender and pass/fail rates. There was also an examination of the perceived competence of the students who either withdrew or dropped their respective courses. The analysis of these students was interesting since they were excluded from the other graphs since they did not have a full semester of data to analyze. 

Scatterplot

The scatterplot also had a smooth underlay. The goal was to visualize the clear relationship between students final grade and the amount of time they spent on the course. As shown in the graph, there is a high correlation between the amount of time students spent on the course and their final grade in the course. 

It is also important to not that the time spent on the course as shown in the graph is only the time spent on the online course module and does not account for any time the student spent on the course outside of the modules. 

Smooth Plots

The smooth plots were meant to show the relationship between time spent on the course and perceived competence as well as final grade in the course and perceived competence. The smooth charts make it easier and cleaner to see the trend without all of the clutter of the plots or bars that would have given less information as a whole.