2026-02-08

What is Simple Linear Regression?

  • Simple Linear Regression is used to predict values based on an independent value (the predictor).
  • Simple Linear Regression does not imply that the two variables are dependent, it just shows a correlation between variables.
  • To use the Simple Linear Regression Equation, we first need to make sure there is a linear relationship between the two values we are trying to correlate.

Simple Linear Regression Example

  • For example, look at the following graph made from the data set airquality:

  • We can see that there is a positive correlation between the variables Temp at LaGuardia Airport and the Ozone mean in parts per billion for 1300 to 1500 hours at Roosevelt Island.

  • While recognizing this correlation is useful, we cannot predict the Ozone given a Temp just using this graph. We need the simple linear regression equation to estimate the Ozone.

Simple Linear Regression Equation

The equation for Simple Linear Regression is

\[\text{y} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{x}\]

  • \(\text{y}\) is the dependent value we are trying to predict

  • \(\hat{\beta}_0\) is the estimated y-intercept.

  • \(\hat{\beta}_1\) is the estimated slope of our model.

  • \(\text{x}\) is the independent value we are using for our predictions

Slide with Plotly

  • Using the previous data set of airquality we can get the linear regression equation between Temp and Ozone.

\[\text{Ozone} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{Temp}\]

The graph in the previous slide was made using this code:

mod = lm(Ozone ~ Temp, data = air_quality)
x = air_quality$Temp; y = air_quality$Ozone

xax <- list(title = "Temp (F)",
            titlefont = list(family = "Modern Computer Roman")
            )
yax <- list(title = "Ozone",
            titlefont = list(family = "Modern Computer Roman")
            )

fig <- plot_ly(x=x, y=y, type="scatter", mode="markers",name="data",
               width=800, height=430) %>%
  add_lines( x=x, y=fitted(mod), name="fitted") %>%
  layout(xaxis = xax, yaxis = yax) %>%
  layout(margin=list(
    l=150,
    r=50,
    b=20,
    t=20
    )
  )

config(fig)

ggplot2

  • To get a better understanding of the data, we can also check the months the different data points are from
`geom_smooth()` using formula = 'y ~ x'

LR model with 0.99 Confidence

  • This is the linear regression model shown when there is a 99% confidence interval added through ggplot.

  • In this graph, the grey area around the linear equation line is supposed to show the range of Ozone value with 99% confidence.

  • Using this, we can more accurately understand what Ozone values are likely at a certain Temp.

`geom_smooth()` using formula = 'y ~ x'

Summary

  • If you have a relatively linear relationship between two values in a data set, you can reasonably predict one of the values given the other using simple linear regression.
  • In this presentation, we have the two variables Temp and Ozone, and we used Simple Linear Regression to use the Temp to predict the Ozone using the equation:

\[\text{Ozone} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{Temp}\]