Introduction

This presentation walks through a simple linear regression example using the trees dataset in R.

The main goal is to examine how tree Volume changes as tree Girth increases.

I will be using:

  • ggplot for static visualizations
  • plotly for an interactive plot
  • LaTeX math for the regression formulas

The Data

  Girth Height Volume       fit     resid
1   8.3     70   10.3  5.103149 5.1968508
2   8.6     65   10.3  6.622906 3.6770939
3   8.8     63   10.2  7.636077 2.5639226
4  10.5     72   16.4 16.248033 0.1519667
5  10.7     81   18.8 17.261205 1.5387954
6  10.8     83   19.7 17.767790 1.9322098

Regression Model

I am using the standard simple linear regression model: \[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i,\quad i = 1,2,\dots,n. \]

Where: - \(Y_i\) is tree Volume
- \(X_i\) is tree Girth
- \(\beta_0, \beta_1\) are the regression parameters
- \(\varepsilon_i\) is the random error term

A common assumption is: \[ \varepsilon_i \sim \text{Normal}(0,\sigma^2) \text{ and independent}. \]

Estimation & Interpretation

The fitted regression line is: \[ \hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_i. \]

The least squares method chooses \(\hat{\beta}_0\) and \(\hat{\beta}_1\) to minimize: \[ \text{SSE} = \sum_{i=1}^n (Y_i - \hat{Y}_i)^2. \]

The slope \(\hat{\beta}_1\) tells us how much Volume changes, on average, when Girth increases by 1 inch.

The coefficient of determination is: \[ R^2 = 1 - \frac{\text{SSE}}{\text{SST}}. \]

Scatterplot

The scatterplot shows a pretty clear upward trend which is bigger girth usually means bigger volume.

Regression Line

The red line shows the fitted trend, and the shaded region gives the confidence band.

Residual Plot (R Code)

Below is the code used to generate the residual plot:

ggplot(trees, aes(x = fit, y = resid)) +
  geom_point(size = 2, color = "darkgreen") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals vs Fitted",
    x = "Fitted Volume",
    y = "Residual"
  ) +
  theme_minimal()

Plot on next slide.

Residual Plot

Residuals don’t show any huge curvature or weird pattern, which is good for linear regression.

Interactive Plot

You can hover over points, zoom, and pan.

Summary

  • Modeled Volume as a function of Girth
  • Regression line captured the upward trend well
  • Residuals looked reasonable
  • Used ggplot for clean static plots
  • Used plotly for interactive visualization
  • Used LaTeX to show the regression formulas

This presentation meets all requirements for the assignment.