2026-06-07

What is Simple Linear Regression?

Simple Linear Regression models the linear relationship between:

  • A response variable \(Y\) - what we want to predict
  • A predictor variable \(X\) - what we use to predict it

Using the trees dataset built into R, which measures trees:

  • Girth - diameter of the tree (inches)
  • Height - height of the tree (feet)
  • Volume - volume of timber (cubic feet)

The Model

\[Y = \beta_0 + \beta_1 x + \varepsilon\]

where:

  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\varepsilon \sim \mathcal{N}(0, \sigma^2)\) is the error term

Least Squares Estimates

The goal is to find \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that minimize the sum of squared errors:

\[SSE = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\]

The formulas are:

\[\hat{\beta}_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}\]

\[\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}\]

The Data

head(trees)
##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7

Scatter Plot: Girth vs Volume

Regression Line

R Code for the Model

mod <- lm(Volume ~ Girth, data = trees)
summary(mod)
## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.065 -3.107  0.152  3.495  9.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
## Girth         5.0659     0.2474   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.252 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

3D Plot: Volume ~ Girth + Height

Interpretation

from the regression analysis:

  • there is a strong positive relationship between Girth and Volume
  • the regression line shows us a way to predict Volume from Girth
  • as Girth increases, Volume increases
  • the 3D plot shows that both Girth and Height affect Volume