This exercise shows how to apply Multiple Linear Regression in R programming. The objective is to predict students’ final exam scores using several independent variables such as: Hours Studied, Attendance Performed, and Assignment Scored.
We are using ‘Student Performance Dataset’ to predict: ‘Final Exam Score’ Using multiple independent variables: (Hours Studied, Attendance Rate, and Assignment Score). Multiple Linear Regression is used to determine how these variables affect the final exam results.
# Creating the student dataset
# ----------------------------
student_ds <- data.frame(
Hr_Studied = c(5,8,2,7,6,9,4,10),
Attendance = c(80,90,60,85,75,95,70,98),
Assignment = c(70,85,55,80,78,92,60,96),
Final_Exam = c(65,88,50,82,76,94,62,98)
)
# Display the student dataset
# ---------------------------
print(student_ds)
## Hr_Studied Attendance Assignment Final_Exam
## 1 5 80 70 65
## 2 8 90 85 88
## 3 2 60 55 50
## 4 7 85 80 82
## 5 6 75 78 76
## 6 9 95 92 94
## 7 4 70 60 62
## 8 10 98 96 98
# Structure of the dataset
# ------------------------
str(student_ds)
## 'data.frame': 8 obs. of 4 variables:
## $ Hr_Studied: num 5 8 2 7 6 9 4 10
## $ Attendance: num 80 90 60 85 75 95 70 98
## $ Assignment: num 70 85 55 80 78 92 60 96
## $ Final_Exam: num 65 88 50 82 76 94 62 98
# Summary statistics
# ------------------
summary(student_ds)
## Hr_Studied Attendance Assignment Final_Exam
## Min. : 2.000 Min. :60.00 Min. :55.00 Min. :50.00
## 1st Qu.: 4.750 1st Qu.:73.75 1st Qu.:67.50 1st Qu.:64.25
## Median : 6.500 Median :82.50 Median :79.00 Median :79.00
## Mean : 6.375 Mean :81.62 Mean :77.00 Mean :76.88
## 3rd Qu.: 8.250 3rd Qu.:91.25 3rd Qu.:86.75 3rd Qu.:89.50
## Max. :10.000 Max. :98.00 Max. :96.00 Max. :98.00
# Building the regression model
#------------------------------
model <- lm(Final_Exam ~ Hr_Studied + Attendance + Assignment, data = student_ds)
# Display model results
# ---------------------
summary(model)
##
## Call:
## lm(formula = Final_Exam ~ Hr_Studied + Attendance + Assignment,
## data = student_ds)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## -1.2306 1.1975 0.2227 1.3683 -0.4524 1.0717 -0.2318 -1.9454
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.04408 19.98987 2.954 0.0418 *
## Hr_Studied 8.34302 2.22418 3.751 0.0199 *
## Attendance -0.41186 0.23404 -1.760 0.1533
## Assignment -0.02256 0.29796 -0.076 0.9433
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.586 on 4 degrees of freedom
## Multiple R-squared: 0.9949, Adjusted R-squared: 0.9911
## F-statistic: 260.4 on 3 and 4 DF, p-value: 4.859e-05
The model follows this equation: \[ Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_3 \] Where: - \(Y\) = Final Exam - \(X_1\) = Hours Studied - \(X_2\) = Attendance Score - \(X_3\) = Assignment Score
# Predict final exam scores
# -------------------------
predicted_scores <- predict(model)
# Create comparison dataset
# -------------------------
results <- data.frame(
Actual = student_ds$Final_Exam,
Predicted = predicted_scores
)
print(results)
## Actual Predicted
## 1 65 66.23058
## 2 88 86.80253
## 3 50 49.77727
## 4 82 80.63165
## 5 76 76.45240
## 6 94 92.92828
## 7 62 62.23184
## 8 98 99.94545
# Plot actual vs predicted scores
# -------------------------------
plot(results$Actual, results$Predicted, main = "Actual vs Predicted Final Exam Scores",
xlab = "Actual Scores", ylab = "Predicted Scores", pch = 19)
# Adding regression line
# ----------------------
abline(0,1,col="yellow",lwd=2)
## Visualization Explanation
## -------------------------
## - Points close to the yellow line indicate good predictions.
## - Large distances from the line indicate prediction errors.
# Calculate R-squared
# -------------------
R2 <- cor(results$Actual, results$Predicted)^2
print(R2)
## [1] 0.994905
# Interpretation
# ----------------
# R-squared value shows how well the independent variables explain the variation in final exam scores.
# For example: R² = 0.95 means 95% of exam performance is explained by the model.
This exercise demonstrated the application of Multiple Linear Regression using R programming.
The analysis showed that:
- Students who study more tend to perform better.
- Higher attendance contributes positively to final exam
performance.
- Assignment scores are also important predictors of academic
success.