2026-2-05

What is Linear Regression?

Linear regression models a relationship between one or more forecasting variables to an outcome variable. Understanding this model allows us to predict an outcome based on its forecasting varibles.

For example, we could take a students hours per week spent studying and use it to predict how they will score on their final.

The Linear Equation

The equation is: \(y=mx+b\)

  • \(x\) is the independent variable (Forecasting variables)

  • \(y\) is the dependent variable (Outcome variable)

  • \(m\) is the slope of the line

  • \(b\) is the y-intercept

Calculating b and m

The formula to find the slope (m) is: \(r\frac{Sy}{Sx}\)

  • \(r\) is the correlation coefficient

  • \(Sy\) is the standard deviation of the outcome variable (Dependent variable)

  • \(Sx\) is the standard deviation of the forecasting variable (independent variable)

The formula to find the y-intercept (b) is: \(\bar{y}-m * \bar{x}\)

  • \(\bar{y}\) is the mean of the outcome variable (Dependent variable)

  • \(\bar{x}\) is the mean of the forecasting variable (independent variable)

  • \(m\)s is the slope

GGplot2 Linear Regression Example Code

Lets do a simple linear regression using the iris dataset that’s prepackaged with r. We will predict Sepal Width using Sepal Length as the forecasting variable. The below code will create a scatter plot with an ordinary least squares trend line representing the linear regression equation that minimizes error.

ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + 
  geom_point() + 
  geom_smooth(method = lm) +
  ggtitle("Effect of Sepal Length on Sepal Width") +
  xlab("Sepal Width") +
  ylab("Sepal Width")

Graph Created by Example Code

GGPlot example 2

Plotly example

Real World applications of Linear Regression

  • Regularly used in finance and business to predict the price of securities or other capital holdings, as well as key metrics like sales and shrinkage.
  • Cities use linear regression to predict things like fuel usage in the winter, or how growth will effect water usage and housing prices.
  • Medication dosages are determined by using things like patient weight, height, and blood pressure levels as forecasting variables.