2026-04-12

Simple Linear Regression

Simple linear regression is a model which measures the relationship between to variables by fitting a straight line to the data. This line is often called the “line of best fit” which tries to minimize the errors between the different values of data.

Formula for Simple Linear Regression

formula: \(y = \beta_0 + \beta_1x + \epsilon\)

  • \(\beta_0\): intercept

  • \(\beta_1\): slope

  • \(\epsilon\): error

Dataset trees

To show simple linear regression we will use the built-in ‘trees’ dataset.

  • Girth is the tree diameter in inches
  • Height (ft)
  • Volume (cu ft)
data(trees)
head(trees)
  Girth Height Volume
1   8.3     70   10.3
2   8.6     65   10.3
3   8.8     63   10.2
4  10.5     72   16.4
5  10.7     81   18.8
6  10.8     83   19.7

Scatter Plot With Linear Regression Line

Analyzing the Results

\(y=5.07x-36.94\)

This shows that for roughly every inch increase in girth there should be a increase of 5 cubic feet in Volume.

Residual Plot

This plot looks quite random so a straight line is sufficient for modeling.

Plotly Plot

mod = lm(Volume ~ Girth, data=trees)
x = trees$Girth; y = trees$Volume

xax <- list(title = "Girth")

yax <- list(title = "Volume", range = c(0,80))

plot_ly(x=x, y=y, type="scatter", mode="markers", name="data",
        width=800, height=430) %>%
  add_lines(x=x, y= fitted(mod), name="fitted") %>%
  layout(xaxis = xax, yaxis= yax)

3D Plot

plot_ly(data = trees, 
        x = ~Girth,
        y = ~Height,
        z = ~Volume,
        color = ~Volume,
        type = "scatter3d",
        mode = "markers") 
#I have tried everything to get this plot to work and nothing is working. It works when I manually type it into the console but when I try to knit the slide is always left blank.

Code used for the Original Plot

ggplot(trees, aes(x = Girth, y = Volume)) +
  geom_point(size = 3,) +
  geom_smooth(method ="lm", se = FALSE, color = "purple") +
  labs(title = "Tree Girth vs Volume",
       x = "Girth (in)",
       y = "Volume (cu ft)"
      ) +
      theme_minimal(base_size = 18)

Takeaway

Simple linear regression is a powerful tool to asses the relationship between variables in a data set. Simple linear regression is also a good way to develop a predictable value for how one variable will effect another.