What is Simple Linear Regression?

  • Simple linear regression is a statistical method used to estimate the relationship between two quantitative variables.

  • By using simple linear regression, we can create an equation that predicts the value of the independent variable based on the dependent variable we input.

\(y=\alpha+ \beta x\), where
\(y\) = y-coordinate
\(\alpha\) = y-intercept
\(\beta\) = slope
\(x\) = x-coordinate

Data Set

  • In this lesson, we will be using a built in R data set, “faithful”, to easily demonstrate simple linear regression.

  • Here, we can see that the data set provides eruption durations and waiting times for Old Faithful.

  • How can we use simple linear regression to predict the waiting times based on the eruption duration?

##   eruptions waiting
## 1     3.600      79
## 2     1.800      54
## 3     3.333      74
## 4     2.283      62
## 5     4.533      85
## 6     2.883      55

Graph Eruptions vs. Waiting

There is a positive linear relationship, which seems relatively constant. This would make a good instance to use simple linear regression.

Adding the Regression Line

faithfulGraph = ggplot(faithful, aes(x=eruptions, y = waiting)) +
           geom_smooth(method="lm", se = FALSE) +
           geom_point() +
           labs(title = "Eruption Duration and Waiting Times", 
           x = "Eruption  Duration", y = "Waiting Times")
  • The code stays the same, but “geom_smooth(method=”lm”, se = FALSE)” is added.
  • se = FALSE simply turns off standard error.

Graph with Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Regression Line Equation

How can get the actual equation?

equation = lm(waiting ~ eruptions, faithful)
equation
## 
## Call:
## lm(formula = waiting ~ eruptions, data = faithful)
## 
## Coefficients:
## (Intercept)    eruptions  
##       33.47        10.73

As previously mentioned, \(y=\alpha+ \beta x\)
Therefore, for the old faithful data,
\(y=33.47+ 10.73 x\)

Using this equation

\(y=33.47+ 10.73 x\)

  • What if we want to know what the waitime for an eruption with a duration of 4?
  • Simply plug in 4 for \(x\)
  • \(y=33.47+ 10.73 (4)\)
  • \(y = 76.39\)

Interactive Regression Line

  • Now, using this interactive plot, we can look at any point in the graph and predict what the waiting time will be, without manual calculations.