- Goal: Model how one variable changes with another
- Example: Predict height from age, sales from advertising
\[ y = \beta_0 + \beta_1 x + \varepsilon \]
We draw a line to get predicted values:
\[ \hat y_i = \hat\beta_0 + \hat\beta_1 x_i \]
The residual for point \(i\) is:
\[ \text{residual}_i = y_i - \hat y_i \]
Least squares chooses \(\hat\beta_0, \hat\beta_1\) to make
\[ \sum_{i=1}^n \text{residual}_i^2 \]
as small as possible.
ggplot(data, aes(x = x, y = y)) +
geom_point(color = "blue", size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(x = "X",
y = "Y") +
theme_minimal()
Explanation:
geom_smooth( # This add a fitted curve to the plot method = "lm", # "lm" = Linear Model, on x and y variables se = FALSE, # "se" = enables or disables confidence band color = "red" # This makes the line red )
Linear regressions can work with multiple dimensions…
Look forwards to next lesson where you learn to work with 3D data!