Understanding relationships between two variables using statistics.
2026-02-09
Understanding relationships between two variables using statistics.
Simple linear regression models the relationship between: - One predictor variable \(x\) - One response variable \(y\)
It assumes a linear relationship between the two.
The simple linear regression model is:
\[ y = \beta_0 + \beta_1 x + \varepsilon \]
Where: - \(\beta_0\) is the intercept
- \(\beta_1\) is the slope
- \(\varepsilon\) is the random error term
We generate simulated data where: - \(x\) represents hours studied - \(y\) represents exam score
A linear trend with random noise is added.
set.seed(123) data <- data.frame( x = seq(1, 10, length.out = 50) ) data$y <- 50 + 5 * data$x + rnorm(50, 0, 5) model <- lm(y ~ x, data = data)
The slope estimator is:
\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sum (x_i - \bar{x})^2} \]
The intercept estimator is:
\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]
## `geom_smooth()` using formula = 'y ~ x'