Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.
Common uses:
A simple linear regression model is:
\[ y = \beta_0 + \beta_1 x + \epsilon \]
Where:
The coefficients are chosen to minimize the sum of squared errors:
\[ SSE = \sum_{i=1}^{n}(y_i - \hat{y_i})^2 \]
Predicted value:
\[ \hat{y} = \beta_0 + \beta_1 x \]
This process is done with extensive matrix calculations that are typically performed by computers.
Dataset used in this presentation: mtcars
Variables used:
mpg – miles per gallonwt – weight of the carGoal: predict fuel efficiency (mpg) from weight.
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Fuel Efficiency vs Car Weight",
x = "Weight",
y = "Miles per Gallon")## `geom_smooth()` using formula = 'y ~ x'
As the graph shows, a higher weight correlates heavily with a decrease in fuel. We can see that, for example, if there were to be a data point with a weight of 4.5, it would likely have a mpg value of around 12.
Residual Plots show the difference in each data point from the line of
best fit.
This version of the plot uses plotly, which allows it to be interacted with in ways such as zooming in or hovering over data points.