2026-03-05

What is a Linear Relationship

A linear relationship means that as one variable increases, the other variable changes at a constant rate.

Examples: Height vs Weight, Miles Driven vs Gas Used, Tree Girth vs Tree Volume

Scatterplot in R with Trees Data

We can use the built-in trees dataset, with measurements of 31 trees.

Using a linear relationship, we can visualize relationships like: Volume vs Girth and Volume vs Height

Volume vs Girth

plot(trees$Girth, trees$Volume, xlab = "Girth", ylab = "Volume", 
     main = "Trees: Volume vs Girth")

Volume vs Height

plot(trees$Height, trees$Volume, xlab = "Height", ylab = "Volume", 
     main = "Trees: Volume vs Height")

What is Linear Regression

Linear regression finds the best straight line that predicts Y from X. We model:

Regression Model

The simple linear regression equation is

\[Y = \beta_0 + \beta_1 X + \epsilon\]

where \(Y\) = dependent variable

\(X\) = independent variable

\(\beta_0\) = intercept

\(\beta_1\) = slope

\(\epsilon\) = random error

Interpreting the Equation

The slope coefficient tells us how \(Y\) changes when \(X\) increases.

If \[ \beta_1 > 0 \]

then \(Y\) increases as \(X\) increases.

If \[ \beta_1 < 0 \]

then \(Y\) decreases as \(X\) increases.

ggplot Regression Line

Least Squares Method

The regression line is found by minimizing the sum of squared residuals.

\[ \min_{\beta_0,\beta_1} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

Residual for each point:

\[ e_i = y_i - \hat{y}_i \]

Application

Linear regression is widely used in biology. Example: relationship between Petal Length and Sepal Length.