What is Simple Linear Regression?

Simple linear regression models the relationship between dependent and independent variables by using a line of best fit.

To make sure the model is reliable, there are four assumptions that need to be met:

  1. There is a linear relationship between the dependent and independent variable.
  2. The residuals are independent.
  3. The residuals are normally distributed.
  4. The residuals of the independent variables have constant variance.

Line of Best Fit

The equation for the line of best fit is:

\(\widehat{y} = \beta_0 + \beta_1 x\)

where

  • \(\widehat{y}\) is the estimated dependent variable
  • \(\beta_0\) is the \(y\)-intercept
  • \(\beta_1\) is the slope
  • \(x\) is the independent variable

How to Find \(\beta_0\) and \(\beta_1\)

The equations for the \(y\)-intercept and the slope are:

\(\beta_0 = \bar{y} - \beta_1 \bar{x}\)

\(\beta_1 = \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\)

where

  • \(\bar{y}\) is the mean of the dependent variables
  • \(\bar{x}\) is the mean of the independent variables
  • \(x_i\) is the values of the independent variable
  • \(y_i\) is the values of the dependent variable

Q-Q Plot

Using r dataset trees, a Q-Q Plot can be used to check if the residuals of variable Height are normally distributed.

Simple Linear Regression Plot

A simple linear regression plot using dataset trees.

Code for Simple Linear Regression Plot

best_fit = lm(Volume ~ Height, data = trees)

plot_ly(data = trees, x = ~Height, y = ~Volume, type = "scatter", 
        mode = "markers", marker = list(color = "black"))%>%
  add_lines(x = ~Height, y = ~fitted(best_fit), 
            line = list(color = "blue"), inherit = FALSE)%>%
  layout(title = "Volume by Height",
         xaxis = list(title = "Height in ft"),
         yaxis = list(title = "Volume in cubic ft"),
         showlegend = FALSE)

Plot With 95% Confidence Interval Bands

A simple linear regression plot with 95% confidence interval bands using dataset trees.

Code for Plot With 95% Confidence Interval Bands

ggplot(data = trees, aes(x = Height, y = Volume))+
      geom_point()+
      geom_smooth(formula = y ~ x ,method = "lm", se = TRUE)+
      labs(title = "Volume by Height With 95% Bands", 
           x = "Height in ft", y = "Volume in cubic ft")

Reference

  • [Equations on Slide 2 and 3] Libretexts.“5.4.1: Model and Equation for Simple Linear Regression Analysis.” Business LibreTexts, August 25, 2025.