Dataset Overview

Student Performance Factors Dataset: This analyzes the data from 6,600+ students and the various factors that influence their performance in exams.

Data Source: Kaggle

Key Variables:

  • Exam_Score: Final exam score
  • Hours_Studied: Number of hours spent studying per week
  • Attendance: Percentage of classes attended
  • Sleep_Hours: Average number of hours sleep per night
  • Gender: Male or Female
  • Family_Income: Low, Medium, or High

Data Preparation (R Code)

This is how the data was prepared:

# load libraries
library(ggplot2)
library(plotly)
library(dplyr)
library(RColorBrewer)

# load data
data <- read.csv("StudentPerformanceFactors.csv")

# convert variables
data$Gender <- factor(data$Gender)
data$Family_Income <- factor(data$Family_Income, levels = c("Low", "Medium", "High"))

ggplot Boxplot: Exam Score by Family Income

ggplot Bar Chart: Average Exam Score by Income and Gender

ggplot Scatter: Hours Studied vs Exam Score

plotly 3D Scatter: Hours Studied, Attendance, Exam Score

3D Plot Interpretation

  • Both hours studied and attendance individually show a positive relationship with exam score
  • Students with high values in both tend to show the highest scores
  • Differences across income levels are less prominent compared to studying habits

plotly Scatter: Sleep Hours vs Exam Score

Statistical analysis: Linear Regression

# linear regression model
model <- lm(Exam_Score ~ Hours_Studied + Attendance + Sleep_Hours + Gender, data = data)

coef(summary(model))
##                  Estimate  Std. Error    t value  Pr(>|t|)
## (Intercept)   45.86583323 0.299882872 152.945825 0.0000000
## Hours_Studied  0.29313418 0.005413033  54.153409 0.0000000
## Attendance     0.19722067 0.002808421  70.224744 0.0000000
## Sleep_Hours   -0.03364123 0.022089741  -1.522935 0.1278229
## GenderMale    -0.03874418 0.065634960  -0.590298 0.5550111

Linear Regression: Interpretation

Key findings:

  • Hours Studied: There is a strong positive effect on exam performance
  • Attendance: Second highest positive factor for exam performance
  • Sleep Hours: Not a statistically significant predictor for scores
  • Gender: There is no significant difference in exam scores after controlling study habits
  • Overall: Studying and attendance matter more than lifestyle factors in predicting score

Conclusion

Study Habits: Hours studied and attendance are the strongest predictors of exam performance

Income: shows very small impact on student’s scores

Sleep Hours: don’t significantly influence exam scores in the dataset

Gender: there are no meaningful differences between male and female

Overall, improving study consistency is the best way to increase scores for students