2026-04-12

What is Simple Linear Regression?

Simple linear regression is a statistical technique that models the relationship between one explanatory variable and one response variable.

It describes the relationship using a straight line:

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

where \(\beta_0\) is the intercept, \(\beta_1\) is the slope, and \(\varepsilon\) is the error term.

Slope and Intercept

Within this regression:

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

  • \(\beta_0\) represents the intercept. Also, it predicts value of \(Y\) when \(X = 0\).
  • \(\beta_1\) represents the slope, or make a change in predicted \(Y\) when \(X\) increases.

Scatterplot of Tree Data

We use the trees dataset, where tree girth is used to help explain the tree volume.

Fitted Regression Line

The linear regression line summarizes the overall relationship between tree girth and tree volume.

Least Squares Idea

The best-fitting regression line is one that minimizes the sum of squared residuals:

\[ SSE = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \]

A residual is the discrepancy between an observed and expected value.

\[ e_i = y_i - \hat{y}_i \]

R Code for Fitting the Model

mod <- lm(Volume ~ Girth, data = trees)
summary(mod)
## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.065 -3.107  0.152  3.495  9.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
## Girth         5.0659     0.2474   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.252 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

Plotly plot: Miles per Gallon vs. Weight

Click here to open the interactive plot

This graphic depicts the link between automobile weight and miles per gallon in the ‘mtcars’ data set.
The color indicates the number of cylinders, while the marker size signifies horsepower.

Interpretation of the Model

The estimated regression line indicates a positive linear association between tree girth and volume.

  • Trees with bigger girths have larger volumes.
  • The slope is positive, therefore the expected volume grows as the girth increases.
  • The regression line does not pass through every point exactly since real data includes random error.