What is a simple linear regression?

In simple terms, a simple linear regression is a model which estimates the value of the dependent variable (y axis), against an independent variable (x axis).

An important thing to note is that in a linear regression model, the model assumes that the relationship between the dependent and the independent variable is linear, and not quadratic etc.

General Formula

The general formula of a simple linear regression model is

\(y = \beta_0 + \beta_1 x + \varepsilon; \ \ \ \ \ \ \ \varepsilon \sim \mathcal{N}(0, \sigma^{2})\)

Where:

  • \(\beta_0\) is the y intercept
  • \(\beta_1\) is the gradient of the line
  • \(\varepsilon\) is the error term

Example #1: Orange Trees

Let us go over an example of Orange trees. We will draw the linear regression model of The circumference of orange trees against their age.

Sample of data:

  Tree  age circumference
1    1  118            30
2    1  484            58
3    1  664            87
4    1 1004           115
5    1 1231           120
6    1 1372           142

Therefore formula now becomes: \(circumference = \beta_0 + \beta_1 Age + \varepsilon\)

Formula Expanded

Using

mod <- lm(circumference ~ age, data=Orange)
coef(mod)
(Intercept)         age 
 17.3996502   0.1067703 

We can deduce that

  • \(\beta_0=\) 17.3996502

  • \(\beta_1=\) 0.1067703

Therefore formula now becomes:

\(circumference=\) 17.3996502 \(+\) 0.1067703\(Age\)

Resulting Graph

Linear Regression of Circumference vs Age of Orange Trees

Example #2: Linear Regression of Weight vs Height of Women in the United States

Example code:

ggplot(women, aes(x = height, y = weight)) + 
  geom_point() + 
  geom_smooth(method = "lm") +
  labs(x = "Height (inches)", y = "Weight (pounds)")

Resulting Graph

Linear Regression of Weight vs Height of Women in the United States

Example #3: Penguins!

You can also create interactive graphs showcasing the original data alongside the linear regression model. In the following slide you can find example code that uses plotly to draw the linear regression of flipper length vs body mass of penguins near Palmer Station, Antarctica.

Example Code

  # Remove rows with NA
  penguins_na_removed <- 
    na.omit(penguins[, c("body_mass", "flipper_len")])

  mod = lm(flipper_len ~ body_mass, data=penguins_na_removed)
  
  x = penguins_na_removed$body_mass
  y = penguins_na_removed$flipper_len
  
  xax <- list (title = "Body Mass")
  yax <- list(title = "Flipper Length")
  
  fig <- plot_ly(x=x, y=y,type="scatter", 
                 mode="markers", name="data") %>%
         add_lines(x=x, y=fitted(mod), name="fitted") %>%
         layout(xaxis = xax, yaxis = yax, margin = list(t=40))
  
  fig

Resulting Graph

Linear regression of flipper length vs body mass of penguins near Palmer Station, Antarctica