2025-03-22

Simple Linear Regression Introduction

  • Single explanatory variable and response variable.
  • Models the linear relationship between the explanatory and response variable.
  • Goal is to be able to use the model to predict the response variable from the explanatory variable.
  • Can be applied in various fields like business, psychology, biology, etc.
  • Examples: predicting exam scores based on time spent studying, predicting sales from amount spent on advertising

Equation

This is the basic linear regression equation: \(\hat{y} = \beta_0 + \beta_1x + \varepsilon\)

Explanation of Components:

  • \(\hat{y}\) is the predicted value of the response/dependent variable.
  • \(\beta_0\) is the y-intercept, so the value of y if x equals 0.
  • \(\beta_1\) is the slope of the linear regression line.
  • \(x\) is the independent/explanatory variable.
  • \(\varepsilon\) is the error term.

Calculations

The linear regression line is also commonly referred to as the line of best fit, as it aims to minimize the sum of the squared residuals. The sum of all the residuals should also equal 0.

Residual = Actual Value - Predicted Value

The slope is calculated by: \(\beta_1 = \frac{\Sigma(x - \bar{x})(y - \bar{y})}{\Sigma(x - \bar{x})^2}\)

The y-intercept is calculated by: \(\beta_0 = \bar{y} - \beta_1x\)

Example

Doing an example of simple linear regression using the scenario about the hours studied predicting a student’s exam score. First setting up the x and y variables.

library(plotly)
set.seed(42)
x = rnorm(20, mean = 8, sd = 2)
y = pmin(50 + 5 * x + rnorm(20, mean = 0, sd = 2), 100)
df = data.frame(x, y)
head(df)
##           x         y
## 1 10.741917 100.00000
## 2  6.870604  80.79040
## 3  8.726257  93.28745
## 4  9.265725  98.75798
## 5  8.808537  97.83307
## 6  7.787751  88.07782

Preparing Data

Setting the x and y axis for the scatterplot.

xax <- list(
  title = "Hours Studied",
  range = c(0, 14)
)
yax <- list(
  title = "Exam Score",
  range = c(40,110)
)

Scatterplot of Data

Linear Regression

Fitting the linear regression model and then plotting it with the data.

The Residuals

This is a plot of the residuals, which shows the difference between the actual and predicted value for each observation.