2025-03-30

Introduction

  • When searching for a possible dataset that I could base my Midterm Project off of, I wanted to make sure the data I worked with was both interesting and personally relevant

  • I stumbled upon this data set about ‘Student Performance & Learning Style’ on kaggle.com and knew this was the perfect dataset to work with and analyze as it not only is obviously relevant to myself as a student but also, relevant to myself as someone who struggles with balancing my education, job, and social life while wanting to maintain a high grade in all my courses

  • This presentation will dive deep into trying to find the effect of various factors and their influence on student performance, primarily in the form of final exam grades

About This Data Set

  • This dataset in particular, is a “synthetic representation” based on true trends in education and career outcomes as stated in the description of the dataset on kaggle.com and helps capture the effect that different learning styles, study habits, and other factors can have on student performance

  • The dataset contains responses from 10,000 students. To make the analysis more digestible, I selected a random sample of 100 students for closer examination

  • Throughout the presentation, I will explore many of the factors that can affect student performance and final exam grades and display these effects in the form of graphical and statistical analysis

  • Since this data set is a “synthetic representation”, I will make two assumptions. First, I will assume this data represents students in higher education as the minimum age in the dataset is 18. Secondly, I will assume that these samples are taken over the period of a semester and that there are a total of 30 lectures over the semester (15 weeks + 2 classes per week)

Cleaning My Data First & Creating a New Variable

  • Before getting started with the analysis, I made sure to get rid of any rows in the data set that had any missing (NA) values that could possibly skew our results

  • I also created a new column, ‘Classes_Attended’, that keeps track of how many classes each student attended throughout the semester as displayed below:

# remove all rows with NA values in the data set
student_data <- drop_na(student_data)

student_data <- student_data %>%
  rename( # rename some funky column names
    Assignment_Completion_Rate = Assignment_Completion_Rate....,
    Exam_Score = Exam_Score....,
    Attendance_Rate = Attendance_Rate....,
    Time_Spent_on_Social_Media_per_Week = Time_Spent_on_Social_Media..hours.week.
  )
student_data <- student_data %>%  # new variable
  mutate(Classes_Attended = round(30 * Attendance_Rate / 100))

Relationship Between Study Hours (x), Classes Attended (y), Final Exam Score (z)

  • First, let’s start off with what I often associate my academic success with, which is a high attendance rate and the amount of hours I dedicate to studying

  • I decided to see if these were both keys for a high final exam grade through the following 3D scatter plot visualizing study hours, classes attended, and final exam scores:

Multiple Linear Regression Statistical Analysis

I want to further explore the correlation between study hours, classes attended, and final exam scores in the form of statistical analysis. The goal is to build a multiple linear regression model and determine whether ‘Study_Hours_per_Week’ and ‘Classes_Attended’, can significantly predict ‘Exam_Score’. The following code snippet demonstrates the creation of the model in R:

# Multiple Linear Regression
MultipleLinearRegression <- lm(Exam_Score ~ Study_Hours_per_Week 
                               + Classes_Attended, data = student_sample)

We obtain the following p-values for study hours and classes attended from the model:

  • ‘Study_Hours_per_Week’ p-value: 0.2026602

  • ‘Classes_Attended’ p-value: 0.9893973

Since both p-values are high, there is no strong statistical evidence that study hours or class attendance have a linear relationship with final exam scores in this dataset. This suggests that other outside factors may have a more significant role in exam performance. Overall, this model does not effectively predict exam scores based on study hours and class attendance alone which can also be depicted in the 3d scatter plot in the slide prior.

So Where to Next? How About Learning Styles!

The way in which we prefer to learn is extremely different from person to person which makes it difficult to cater to people across the spectrum of visual, kinesthetic, reading/writing, and auditory learning. This variety of learning preferences is illustrated below in the following pie chart where we see there is a pretty even spread among learning styles for students.

Let’s continue to dive deeper into the true effects that learning styles can have on student performance and see if there is one that stands above all.

Relationship Between Learning Styles (x) and Average Exam Scores (y)

  • This bar plot displays the average exam score for each learning style. We can see through the bar plot, that there is not one learning style that separates themselves from another it terms of average exam scores.

  • Every learning style is unique in it’s own way and the process of learning whether it be through a visual, kinesthetic, reading/writing, or auditory style all have the same goal of obtaining knowledge and academic success

Current Takeaways

  • So far, we have evaluated many different factors that could play a role in final exam scores, whether that be the amount of time a student spends studying, the amount of classes the student attended, or even the preferred learning style of said student

  • Our graphical and statistical analysis suggests there is no strong correlation between the analyzed factors and student performance.

  • I think it is fair to conclude that although exam scores can be somewhat useful in displaying the knowledge of a person, there are many other factors beyond the surface level that play a key role as to why a student might not perform well on a exam including mental health, sleep patterns, or external stress

  • Let’s explore a couple of these outside factors like the amount of time a student spends on social media, hours slept, and reported stress level

Exploring Stress Levels (x) and Exam Scores (y)

  • Stress can both be a hindrance to our ability to score well on an exam and possibly even a motivating factor. This graph illustrates the distribution of exam scores based on reported stress levels.

  • While students with lower stress levels tend to perform slightly better in terms of median scores, the overall difference is pretty minimal.

Graphing Hours Spent on Social Media (x) and Hours Spent Sleeping (y)

For our last graph, we will be comparing the amount of time the students spend on social media versus how much sleep they get and facet the graphs by the final grade to see if there is any correlation between the two factors and their final grade.

As we can see, people who ended up with a final grade of ‘B’ or higher, generally spend less time on social media in comparison to students who earned a grade of ‘C’ or lower. We can also see a subtle correlation between hours spent on social media and hours spent sleeping; students who spend more time on social media tend to sleep fewer hours per night.

Concluding Thoughts

  • Through both graphical and statistical analysis, we have explored the various factors that may influence student performance and final exam scores

  • The findings suggest that study habits, class attendance, and learning styles do not strongly correlate with final exam scores

  • External factors such as stress, sleep patterns, and social media usage, appear to have a greater influence on academic performance versus study habits, class attendance, and learning styles

  • Education is an extremely valuable resource that we often overlook and is extremely prevalent throughout our entire lives. There is not one right way to approach education and being measured by a singular grade to measure our knowledge of a subject is pretty absurd when putting into perspective just how many factors there are that can either benefit or hurt our grade

  • Student performance is shaped by a mix of factors that highlight the need to approach education as one of our greatest resources, not a grade or score