Student Performance Analysis

Dalen Sanchez Carreon

2025-03-26

What is my dataset?

    • Data set: Student Performance Prediction
    • 708 students observed
    • Variables: Gender, Study Hours, Attendance, Exam Scores, Parental Education, Internet Access, Pass/Fail, Extracurricular activities.
    • Won’t be using Student_ID
    • I wanted to see if we could predict student performance based on attendance, extracurricular activities, internet availability at home and the education level of their parents.

Initial Thoughts

    • 10 variables, 6 are categorical
    • Average attendance: 82.3%
    • Avg. study hours per week: 26.13
    • Exam scores seem consistent with study time (median is 27 hours)
    • The average final exam score was 99.41% which is almost the 100% of the past exam score max.
    • I wanted to see if we could predict student performance based on attendance, extracurricular activities, and the education level of their

Pass vs Fail Proportion

This pie chart shows that 58% of students passed and 42% failed the final exam. While more than half of the students passed, a large portion didn’t, suggesting a possible challenge in the final assessment or curriculum or a gap in teacher instruction or care.

Gender Distribution

Gender distribution is close to even, with a slightly higher proportion of female students (53%). This balance helps ensure that the data isn’t biased heavily toward one gender when analyzing student performance.

Parental Education Level

Extracurricular Activities

The ratio from kids who engage in extracurricular activities vs kids who don’t is 49.01%/50.99%. The near 50/50 split helps in balanced comparisons.

Attendance Distribution

This histogram shows attendance is mostly between 70–95%, with a clear concentration around 75–80%. The distribution looks slightly bimodal, possibly reflecting differences between students who participate in extracurriculars and those who don’t.

After some initial exploratory data…

  • The data set was confusing because some of the results don’t make sense in the real world.
  • In a sample of 708 students, it seems unlikely that 183 have parents with only a high school education and 165 have parents with PhDs — these are surprisingly close.
  • The ratio of students who engage in extracurricular activity is almost exactly 50/50 (50.99% yes, 49.01% no).
  • My primary variables are attendance, parental education level, and final exam scores.
  • Past and final exam scores are very similar, yet the fail rate is just under 50%.
  • Near 50/50 gender and activity splits may reflect synthetic or unusually balanced data.

Does internet access affect final exam scores?

  • The correlation coefficient is .036 which is very very low though it is positive.
  • Students without internet access at home are predicted to score 74.15 on the final exam. Students with internet access at home score 0.69 points higher on average but a very small difference.
  • The p- value is not statistically significant (.34) (since p > 0.05). This means the result could be due to random chance.
  • Less than 0.13% of the variation in exam scores is explained by internet access, extremely weak relationship.
  • As you can see, the scatter in each column looks very random and the regression line is almost completely flat.

Effect of Extracurricular on Attendance

    • Students involved in extracurricular activities have higher attendance rates.
    • Strong upward trend confirms a significant positive relationship between the two variables. Students who don’t participate in extracurricular activities tend to have lower attendance, clustered around 75–80%, those who do are clustered around 90%. -There are some outliers.

Effect of Attendance on Final Exam Score

  • The regression line fits the data well, with relatively low scatter around it, especially in the middle range (75–90% attendance).
  • For every 1% increase in attendance, the model predicts only a 0.038 increase in final exam score, a very small effect.

Parental Education Level and Exam Score

  • Students with PhD-educated parents scored 9.6 points lower than the Bachelor’s group.
  • About 57% of the variation in final exam scores is explained by parental education, strong relationship.
  • This chart suggests that students with more highly educated parents tend to perform better on the final exam.
  • Variation exists within each group, those whose parents have Bachelor’s or Master’s degrees show higher median scores than those with only a high school education.

Conclusion

    • Internet access help students who have it, though the relationship is not statistically significant and is very weak.
    • Students involved in extracurriculars consistently had higher attendance rates, with a correlation of 0.84 and a large difference in group means.
    • Attendance has a positive, though less significant, effect on exam scores. Perhaps if the data set had more variation we would have seen a stronger effect.
    • Parental education is a strong predictor of academic success. Students with more highly educated parents, especially those with Master’s degrees, performed better on the final exam.
    • Overall, this analysis supports the idea that external support systems play a meaningful role in student performance.
    • Unrealistic data patterns: gender, parental education, pass/fail, extracurricular activities.
    • Future research could benefit from tracking changes over time or including qualitative factors like student motivation or teacher feedback.