2026-03-07

Overview

  • Simple linear regression shows us the relationship between one explanatory variable and one response variable.
  • In this example:
    • \(x =\) Girth
    • \(y =\) Volume
  • Goal: Use tree trunk girth to help predict tree volume.

What is Simple Linear Regression

  • Simple linear regression: Linear regression analysis is used to predict the value of one variable based on another variable.

Source: IBM, “What is linear regression?”

  • The horizontal axis is the predictor \(x\), and the vertical axis is the response \(y\).
  • Several possible lines, such as L1, L2, and L3, can be drawn through the data.
  • For each line, the vertical distance from a point to the line is known as the error or residual.
  • Linear regression chooses the line that makes these prediction errors as small as possible overall.

Scatterplot of the Data

Regression Equation

\[ \hat{y} = a + bx \]

\[ \widehat{\text{Volume}} = a + b(\text{Girth}) \]

  • \(a\) is the y-intercept.
  • \(b\) is the slope.
  • A positive slope means larger girth tends to predict larger volume.

How the Best Line Is Chosen

\[ e = y - \hat{y} \]

\[ e^2 = ( y - \hat{y} )^2 \]

\[ \sum_{ i=1}^{n}(y_i - \hat{y}_i )^2 \]

  • Many possible lines can be drawn through the data, such as L1, L2, and L3.
  • The residual is the actual value minus predicted value.
  • Regression squares each residual so all errors are positive and larger errors count more.
  • The least squares regression line is the line with the smallest sum of squared residuals.

ggplot with Fitted Regression Line

Interactive Plotly Version

Important Math and Interpretation

\[ b = \text{change in predicted } y \text{ for a 1-unit increase in } x \]

\[ R^2 = \text{proportion of variation in } y \text{ explained by the regression model} \]

  • The slope describes how predicted volume changes as girth increases.
  • \(R^2\) tells us how much of the variation in volume is explained by girth.
  • Extrapolation is the use of known data to make predictions outside the observed range.
  • These predictions should be treated carefully as future patterns and results may not continue to trend in this direction.

Source: Eurostat Glossary: Extrapolation

Residual Plot

R Code Used to Build a Plot

mod <- lm(Volume ~ Girth, data = trees)

ggplot(trees, aes(x = Girth, y = Volume)) +
  geom_point(size = 2) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()

Summary / Takeaways

  • The relationship between girth and volume appears to be linear and positive.
  • A least squares regression line predicts volume from girth.
  • Residuals measure actual value minus predicted value.
  • Simple linear regression finds the line that minimizes \[ \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \]
  • Scatterplots, fitted lines, residual plots, and \(R^2\) help us understand the linear regression model.