2024-06-04

What is Simple Linear Regression?

Simple linear regression is used to estimate the relationship between two quantitative variables.

It uses the line of best fit through the data by searching for the value of the regression coefficient/s that minimizes the total error of the model.

What can it be used for?

It can be used to describe the nature of relationship between two variables.

It can be used to estimate the dependent value of the independent variable.

Simple Linear Regression Formula

\[ y = \beta_0 + \beta_1x + \epsilon \]

y is the dependent variable for any given value of x

\(\beta_0\) is the intercept, predicted value of y when x is 0

\(\beta_1\) is the regression coefficient – rate of change of y as x increases

x is the independent variable

\(\epsilon\) is the error of the estimate

Simple Linear Regression Model

Let’s use the cars database that gives us the speed of cars and their stopping distances.

model <- lm(cars$dist~cars$speed)
summary(model)
## 
## Call:
## lm(formula = cars$dist ~ cars$speed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Using the summary()

Derived from the (Intercept) row:

\(\beta_0\) = -17.5791, which is the y-intercept

Derived from the car$speed row:

\(\beta_1\) = 3.9324, which is the change in y

Taken from the Std. Error column of car$speed row:

\(\epsilon\) = 0.4155, which is the variation in our estimate

Using that our Linear Regression Equation will be \[ y = -17.5791 + 3.9324x + 0.4155 \]

Linear Regression in plotly

Scatterplot of cars dataset in ggplot

ggplot(data=cars, aes(x=speed, y=dist)) + geom_point() + xlab("Car Speed") + 
  ylab("Stopping Distance")

Scatterplot with linear regression

ggplot(data=cars, aes(x=speed, y=dist)) + geom_point() + xlab("Car Speed") + 
  ylab("Stopping Distance") + geom_smooth(method=lm)

Conclusion

Linear Regression is a useful tool in understanding the relationship of variables as well as using that to predict dependent values for independent variables.

While these predictions can only be applied to the range of values we have measured, and can vary, it is still very important.