We will learn how to predict something using previous data points, plotting them, then finding the best fit line through them, and using that line to make future predictions with confidence.
2025-11-16
We will learn how to predict something using previous data points, plotting them, then finding the best fit line through them, and using that line to make future predictions with confidence.
The scenario which we are choosing for this analays is described below:
study_hours = c(1,2,3,4,5,6,7,8) scores = c(50,55,60,65,67,72,78,85)
data = data.frame(study_hours, scores)
ggplot(data, aes(x = study_hours, y = scores)) + geom_point(size = 3) + labs( title = "Study Hours vs Exam Score", x = "Study Hours", y = "Exam Score" )
ggplot(data, aes(x = study_hours, y = scores)) + geom_point(size = 3, color = "blue") + geom_smooth(method = "lm", se = FALSE, color = "red") + labs( title = "Study Hours vs Exam Score ( with Regression Line )", x = "Study Hours", y = "Exam Score")
## `geom_smooth()` using formula = 'y ~ x'
\[ \hat{y} = a + bx \]
Where:
\(b\) = slope ( how much the score changes per hour )
\[b = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\]
\(a\) = intercept ( when hours studied = 0 )
\[a = \bar{y} - b \bar{x}\]