2026-04-15

Introduction

This presentation uses simple linear regression to analyze an example in biology.

The goal is to understand whether leaf length can help predict plant height.

This type of analysis is commonly used in biology to study relationships between measurable traits.

Sample Data

leaf_length <- c(4.1, 4.5, 5.0, 5.2, 5.8, 6.0, 6.3, 6.8, 7.1, 7.5)
plant_height <- c(10.2, 11.0, 12.5, 12.9, 13.8, 14.4, 15.1, 16.0, 16.8, 17.5)
plants <- data.frame(leaf_length, plant_height)
plants
##    leaf_length plant_height
## 1          4.1         10.2
## 2          4.5         11.0
## 3          5.0         12.5
## 4          5.2         12.9
## 5          5.8         13.8
## 6          6.0         14.4
## 7          6.3         15.1
## 8          6.8         16.0
## 9          7.1         16.8
## 10         7.5         17.5

Scatterplot of Leaf Length vs Plant Height

ggplot(plants, aes(x = leaf_length, y = plant_height)) +
  geom_point() + 
  labs(
    title = "Leaf Length vs. Plant Height",
    x = "Leaf Length (cm)",
    y = "Plant Height (cm)"
)

Regression Line

ggplot(plants, aes(x = leaf_length, y = plant_height)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Regression Line for Plant Height",
    x = "Leaf Length (cm)",
    y = "Plant Height (cm)"
  )
## `geom_smooth()` using formula = 'y ~ x'

Regression Equation

\[ y = \beta_0 + \beta_1 x + \epsilon \]

In this example, \(y\) is plant height and \(x\) is leaf length.

Slope Formula

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

This formula calculates the slope of the regression line.

Interactive Plot

plot_ly(
  plants,
  x = ~leaf_length,
  y = ~plant_height,
  type = "scatter",
  mode = "markers"
)

Example R Code

ggplot(plants, aes(x = leaf_length, y = plant_height)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

Conclusion

Simple linear regression shows the relationship between two quantitative variables.

In this biology example, longer leaf length appears to be associated with greater plant height. This indicates that there is a positive relationship between the two variables.