Slide 1: Title

Welcome: a compact overview of simple linear regression, visual examples, and code.

Slide 2: Learning goals

  • Understand the model \[Y = \beta_0 + \beta_1 X + \varepsilon\]
  • Fit using least squares, interpret slope/intercept
  • Visualize with ggplot2 (2 plots) and plotly (3D)
  • Show the R code used to create figures

Slide 3: Regression Model (Math)

The simple linear regression model is:

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

where:

  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\varepsilon_i\) are random errors with

\[ \varepsilon_i \sim \text{iid } (0, \sigma^2) \]

Slide 4: Example dataset

We’ll use the built-in mtcars dataset.
Predict miles per gallon (mpg) from weight (wt) and optionally horsepower (hp).

# this chunk will print the first rows on the slide
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Slide 5: Scatter Plot with Regression Line (ggplot)

Slide 6: Residual Diagnostics (ggplot)

Slide 7: Confidence Interval (Math)

A \((1-\alpha)\times 100\%\) confidence interval for the slope \(\beta_1\) is:

\[ \hat{\beta}_1 \pm t_{n-2,\,1-\alpha/2} \cdot \text{SE}(\hat{\beta}_1) \]

where

\[ \text{SE}(\hat{\beta}_1) = \sqrt{ \frac{\hat{\sigma}^2}{\sum (X_i - \bar{X})^2} } \]

Slide 8: 3D Interactive Visualization (plotly)

Rotate and zoom the plot to explore the relationship between MPG, weight, and horsepower.

Slide 8 ext: 3D Plotly Code (Shown)

The following R code creates the 3D interactive plot shown earlier. The code is displayed here for transparency and reproducibility.

library(plotly)

plot_ly(
  data = mtcars,
  x = ~wt,
  y = ~hp,
  z = ~mpg,
  type = "scatter3d",
  mode = "markers",
  marker = list(size = 4)
) %>%
  layout(
    title = "MPG vs Weight and Horsepower",
    scene = list(
      xaxis = list(title = "Weight"),
      yaxis = list(title = "Horsepower"),
      zaxis = list(title = "MPG")
    )
  )

Slide 9: Fit summary

The following R code fits the regression model.

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Slide 10: Interpretation

  • The estimated slope is negative: heavier cars tend to have lower MPG

  • Residuals show no strong patterns, supporting model assumptions

  • Confidence intervals quantify uncertainty in the estimate

Slide 11: Model Output

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10
##                 2.5 %    97.5 %
## (Intercept) 33.450500 41.119753
## wt          -6.486308 -4.202635

Slide 12: Key Takeaways

  • Simple linear regression models linear relationships

  • Visualization helps validate assumptions

  • ggplot2 and plotly produce high-quality graphics

  • R Markdown enables fully reproducible presentations