2024-11-17

The Linear Regression Equation

Linear regression models the relationship between two variables with a straight line. This equation predicts the value of \(y\) for a given value of \(x\), assuming a linear relationship between the variables. The equation for simple linear regression is: \[ y = \beta_0 + \beta_1 x + \epsilon \] Where:

  • \(y\) is the dependent variable.
  • \(x\) is the independent variable.
  • \(\beta_0\) is the intercept.
  • \(\beta_1\) is the slope coefficient.
  • \(\epsilon\) is the error term.

Linear Regression Data Example

We will use the mtcars dataset from R to demonstrate simple linear regression between miles per gallon (mpg) and horsepower (hp) of various vehicles. We’ll use the cars dataset for polynomial regression afterwards.

# Load data
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

R code for Scatter Plot with Line of Best Fit

library(ggplot2)
# Scatter plot with line of best fit 
ggplot(mtcars, aes(x=hp, y=mpg)) + 
  geom_point() + 
  geom_smooth(method="lm", se=FALSE, color="blue") + 
  labs(title="Scatter Plot of mpg vs hp with Line of Best Fit", 
       x="Horsepower (hp)", 
       y="Miles per Gallon (mpg)")

Scatter Plot with Line of Best Fit

The Polynomial Regression Equation

Polynomial regression allows for a more flexible model than simple linear regression, potentially fitting certain types of data better. There can be a trade-off between adding degrees and reducing the usefulness of the model, though. A 2nd degree polynomial regression can be represented as: \[ y = \beta_0 + \beta_1 x + \beta_2 x^2 + \epsilon \] Where:

  • \(y\) is the dependent variable.
  • \(x\) is the independent variable.
  • \(\beta_0\) is the intercept.
  • \(\beta_1\) and \(\beta_2\) are the coefficients.
  • \(\epsilon\) is the error term.

Polynomial Regression: Speed vs Stopping Distance

Limitations of Regression

As we’ve seen, regression is a powerful tool to create a model. However, it isn’t suited to everything you may need to do in the plotting / graphing world. If, for example, you needed to map a surface as is needed for a variety of important modern-day applications, creating a 3D plot and approximating the location of the surface through that is a much better tool.

3D Surface Plot of Volcano Data