Summary

Linear Regression is a statistical procedure used to predict the value of a variable based on one or more other variables. The graphed output of linear regression is usually a scatter plot with a straight line with the shortest possible distance to each point on the plot. From there, the line can be used to predict the value of the dependent variable for a given independent variable.

How to calculate

  • Linear regressions are represented in the format (y=mx+b). X is the independent variable, Y is dependent, m is the slope of the line, and b is the y-intercept.

  • The formula for the slope is: \(m = \frac{n(\sum_{}^{}XY)-(\sum_{}^{}X)(\sum_{}^{}Y)}{n(\sum_{}^{}X^2)-(\sum_{}^{}X^2)}\)

  • The formula for the y-intercept is: \(b = \frac{\sum_{}^{}Y-m(\sum_{}^{}X)}{n}\)

Sample data airquality

head(air)
  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    28      NA 14.9   66     5   6
6    23     299  8.6   65     5   7

Scatter plot Temp vs. Ozone

Simple plot before Linear Regression

Regression line equation

The linear regression of the previous plot can be calculated as follows:

Temp = 0.20(Ozone)+69.41

Simple Linear Regression

3D Regression Temp vs Ozone and Wind

Model Evaluation

To evaluate the effectiveness of the model, we can calculate the \(R^2\) value. The \(R^2\) value demonstrates the percentage of variation in the dependent variable that can be explained by the model. A \(R^2\) value of 1 means the model is 100% accurate, and a value of 0 means it isn’t accurate at all. The model previously graphed has a \(R^2\) value of ~.95

The formula for calculating \(R^2\) is:

\(R^2 = 1 - \frac{\sum_{}^{}(Y_{actual}-Y_{predicted})^2}{\sum_{}^{}(Y_{actual}-\bar{Y})^2}\)