- Definition: A method to model the relationship between a dependent variable and one independent variable.
- Purpose: To predict the value of the dependent variable based on the independent variable.
The simple linear regression model is given by:
\[ Y = \beta_0 + \beta_1 X + \epsilon \]
Where: - \(Y\) is the dependent variable. - \(X\) is the independent variable. - \(\beta_0\) is the intercept. - \(\beta_1\) is the slope. - \(\epsilon\) is the error term.
We’ll use the built-in mtcars dataset to explore the relationship between Horsepower (hp) and Miles Per Gallon (mpg).
We fit a simple linear regression model to predict MPG based on HP.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 30.0988605 | 1.6339210 | 18.421246 | 0e+00 |
| hp | -0.0682283 | 0.0101193 | -6.742388 | 2e-07 |
The estimated regression equation from our model is:
\[ \hat{Y} = 30.09886 - 0.06823 \times \text{HP} \]
Below is the R code used to create the regression line plot with ggplot2.
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = 'darkgreen') +
geom_smooth(method = 'lm', se = TRUE, color = 'red') +
labs(title = "Linear Regression of MPG on HP",
x = "Horsepower (hp)",
y = "Miles Per Gallon (mpg)") +
theme_minimal()