- Goal: Predict student grades based on factors such as study hours and attendance.
- Objective: Build a predictive model using a regression analysis to forecast student grades.
- Dataset:
StudentPerformanceFactors.csv
StudentPerformanceFactors.csvExam_Score)Hours_Studied)Attendance)The dataset used for this analysis is called StudentPerformanceFactors.csv. It includes information on student exam scores and several factors that potentially influence academic performance. Below are the key variables:
The dataset contains a range of important variables that describe the students’ academic performance and other factors. Below are some key statistics for these variables:
Exam_Score: The students’ final exam scores.
Hours_Studied: Number of hours spent studying for the exam.
Attendance: Attendance rate as a percentage.
The histogram indicates that most grades in the class range from the 65-70% range with few outliers comparatively. We will soon discuss how certain factors are driving this behavior.
This plot shows that in most cases, more hours studied is indicative of higher test scores.
Similarly to study hours, attendance also plays a large factor as it relates to exam success, as those with higher attendance tend to have higher exam grades.
In this analysis, we constructed a linear regression model to predict student exam scores using three key variables:
This model helps quantify the relationship between these factors and exam performance. Below is the summary of the model, including the coefficients for each predictor and the overall fit of the model:
The results of the predictive model analysis in relation to Mean Squared Error and R-squared are:
## [1] "Mean Squared Error: 6.3"
## [1] "R-squared: 0.54"
The MSE of 6.3 means that, on average, the squared difference between the actual exam scores and the predicted scores is 6.3. On a scale of 1-100, this variance is relatively small.
The R-squared value of 0.54 indicates that your model explains 54% of the variance in the exam scores. This means that study hours, attendance, and parental education level account for 54% of the variation in exam scores, while the remaining 46% is explained by other factors not included in the model.
This visual provides the correlation from the MSE and R-squared data to show variation and trend based on the Predictive Model: