2025-11-01

The Dataset

  • This project looks at Student exam score data set analysis from kaggle.
  • The data set compares student habits to exam scores. It compares six columns, which are, student ID, hours studied, sleep hours, attendence percent, previous score, and exam score.

plotly Plot Comparing Hours Studied, Sleep Hours and Exam Score

This graph shows the correlation between hours studied sleep hours and then exam scores. Through the graph u can see the clear correlation between hours studied and their exam score. The more that students studied the higher their exam scores were. There isn’t a strong correlation between sleep hours and exam scores. However there are some data points that had the least hours of sleep and the lowest scores and points with highest scores and a lot of sleep which could imply a slight corraltion. Overall, we can see that the more sleep and hours studied tend to lead to higher exam scores.

plotly Plot that Shows the Distrubition of Exam Scores

This histogram shows the how many students received each grade. You can also therefore see which grade was the most and least common. I found that the most common score was between 30 and 40. While the least common fell between 50 and 55.

ggplot that Compares Attendence and Exam Score

This graph shows the corraltion between attendence and exam scores. The more often students went to class the higher their exam scores were.

ggplot Comparing Hours Studied and Exam Scores

This box plot shows the min, max, q1, q3 and average for 3 groups. The groups are people who studied between 0-5 hours, 6-10 hours, 11+ hours. The overal values in each group increase the more they studied.

Code for Grouping

To find a relationship between attendance and exam scores, i decided to use a bar graph. But to do this i had to group the attendance to I could demonstrate the correlation. Here is my code on how i grouped the attendance.

exam_scores$attendance_group <- cut(exam_scores$attendance_percent,
                                   breaks = c(0, 60, 80, 100),
                                   labels = c("Low (0-60%)", 
                                              "Medium (61-80%)", 
                                              "High (81-100%)"))

Statistical Analysis

Looking at the exam scores, through calculation we are able to find the following:

  • Mean: 34.0
  • Median: 34.1
  • Standard Deviation: 6.8
  • Minimum: 17.1
  • Maximum: 51.3
  • Range: 17.1 - 51.3

Conclusion

Overall through this analysis we could see the range of exam scores and how the distribution of it looks. Then we were able to look into which factor effects exams scores. We found that overall there was a mild correlation between sleep and previous score. That overall the less sleep you got the more common in was to do poorly on the exam. However there were stronger coraltions between hours studied and attendance. The more hours studied and the more classes a student went to often led to a higher exam score.