Simple Linear Regression

What is Linear Regression?

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables.

Common uses:

Linear Regression Model

A simple linear regression model is:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Where:

Least Squares Estimation

The coefficients are chosen to minimize the sum of squared errors:

\[ SSE = \sum_{i=1}^{n}(y_i - \hat{y_i})^2 \]

Predicted value:

\[ \hat{y} = \beta_0 + \beta_1 x \]

This process is done with extensive matrix calculations that are typically performed by computers.

Example Dataset

Dataset used in this presentation: mtcars

Variables used:

Goal: predict fuel efficiency (mpg) from weight.

Scatter Plot with Regression Line

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Fuel Efficiency vs Car Weight",
       x = "Weight",
       y = "Miles per Gallon")
## `geom_smooth()` using formula = 'y ~ x'

As the graph shows, a higher weight correlates heavily with a decrease in fuel. We can see that, for example, if there were to be a data point with a weight of 4.5, it would likely have a mpg value of around 12.

Residual Plot

Residual Plots show the difference in each data point from the line of best fit.

Interactive Regression Plot

This version of the plot uses plotly, which allows it to be interacted with in ways such as zooming in or hovering over data points.