BS2004 Demo, multiple explanatory variables (built-in + simulated data)

Author

Eamonn Mallon

Published

January 21, 2026

Code
# ============================================================
# Setup
# ============================================================
# install.packages(c("tidyverse","ggfortify","broom","kableExtra","car","patchwork"))

library(tidyverse)
library(ggfortify)
library(broom)
library(kableExtra)
library(car)
library(patchwork)

theme_set(theme_minimal(base_size = 12))
options(max.print = 75)
set.seed(1)

1 Aim of this demo

This demo uses the same techniques as the assessed practical, but applies them to different data so you can practice safely.

2 Part A: Built-in dataset (mtcars)

2.1 Question

Response variable: - mpg (fuel efficiency)

Explanatory variables: - wt (weight) - hp (horsepower)

2.2 Data inspection

Code
data("mtcars")
d <- mtcars |> as_tibble() |> select(mpg, wt, hp)

glimpse(d)
Rows: 32
Columns: 3
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, …
$ wt  <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.4…
$ hp  <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180,…
Code
summary(d)
      mpg              wt              hp       
 Min.   :10.40   Min.   :1.513   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:2.581   1st Qu.: 96.5  
 Median :19.20   Median :3.325   Median :123.0  
 Mean   :20.09   Mean   :3.217   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:3.610   3rd Qu.:180.0  
 Max.   :33.90   Max.   :5.424   Max.   :335.0  

2.3 Scatterplots

Code
p_wt <- ggplot(d, aes(x = wt, y = mpg)) +
  geom_point() +
  labs(x = "Weight (1000 lbs)", y = "Fuel efficiency (mpg)")

p_hp <- ggplot(d, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(x = "Horsepower", y = "Fuel efficiency (mpg)")

p_wt + p_hp

2.4 Single-predictor models

Code
m_wt <- lm(mpg ~ wt, data = d)
m_hp <- lm(mpg ~ hp, data = d)

summary(m_wt)

Call:
lm(formula = mpg ~ wt, data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5432 -2.3647 -0.1252  1.4096  6.8727 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7446 
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10
Code
summary(m_hp)

Call:
lm(formula = mpg ~ hp, data = d)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,    Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

2.5 Multiple regression

Code
m_both <- lm(mpg ~ wt + hp, data = d)
summary(m_both)

Call:
lm(formula = mpg ~ wt + hp, data = d)

Residuals:
   Min     1Q Median     3Q    Max 
-3.941 -1.600 -0.182  1.050  5.854 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
wt          -3.87783    0.63273  -6.129 1.12e-06 ***
hp          -0.03177    0.00903  -3.519  0.00145 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared:  0.8268,    Adjusted R-squared:  0.8148 
F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

2.5.1 Type II sums of squares (adjusted)

Code
Anova(m_both)
Anova Table (Type II tests)

Response: mpg
           Sum Sq Df F value   Pr(>F)    
wt        252.627  1  37.561 1.12e-06 ***
hp         83.274  1  12.381 0.001451 ** 
Residuals 195.048 29                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.5.2 Type I sums of squares (sequential)

Code
anova(m_both)
Analysis of Variance Table

Response: mpg
          Df Sum Sq Mean Sq F value    Pr(>F)    
wt         1 847.73  847.73 126.041 4.488e-12 ***
hp         1  83.27   83.27  12.381  0.001451 ** 
Residuals 29 195.05    6.73                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
m_both_swapped <- lm(mpg ~ hp + wt, data = d)
anova(m_both_swapped)
Analysis of Variance Table

Response: mpg
          Df Sum Sq Mean Sq F value    Pr(>F)    
hp         1 678.37  678.37 100.862 5.987e-11 ***
wt         1 252.63  252.63  37.561 1.120e-06 ***
Residuals 29 195.05    6.73                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.6 Diagnostics

Code
autoplot(m_both)

2.7 Coefficients

Code
tidy(m_both) |>
  mutate(across(where(is.numeric), ~ round(.x, 4))) |>
  kable(caption = "Coefficients for mpg ~ wt + hp") |>
  kable_styling(full_width = FALSE)
Coefficients for mpg ~ wt + hp
term estimate std.error statistic p.value
(Intercept) 37.2273 1.5988 23.2847 0.0000
wt -3.8778 0.6327 -6.1287 0.0000
hp -0.0318 0.0090 -3.5187 0.0015

3 Part B: Simulated correlated predictors

Code
n <- 120
X1 <- rnorm(n)
X2 <- 0.75 * X1 + rnorm(n, 0, 0.7)
Y  <- 3 + 2 * X1 - 1.2 * X2 + rnorm(n)

sim <- tibble(Y = Y, X1 = X1, X2 = X2)

cor(sim$X1, sim$X2)
[1] 0.6801491

3.1 Plots

Code
ggplot(sim, aes(x = X1, y = Y)) +
  geom_point()

Code
ggplot(sim, aes(x = X2, y = Y)) +
  geom_point()

Code
ggplot(sim, aes(x = X1, y = X2)) +
  geom_point()

3.2 Models

Code
m12 <- lm(Y ~ X1 + X2, data = sim)
summary(m12)

Call:
lm(formula = Y ~ X1 + X2, data = sim)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.50970 -0.57540  0.02118  0.67015  2.47001 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.07160    0.08955   34.30   <2e-16 ***
X1           2.26970    0.13779   16.47   <2e-16 ***
X2          -1.37668    0.12304  -11.19   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9713 on 117 degrees of freedom
Multiple R-squared:  0.6987,    Adjusted R-squared:  0.6936 
F-statistic: 135.7 on 2 and 117 DF,  p-value: < 2.2e-16
Code
Anova(m12)
Anova Table (Type II tests)

Response: Y
          Sum Sq  Df F value    Pr(>F)    
X1        255.99   1  271.35 < 2.2e-16 ***
X2        118.11   1  125.19 < 2.2e-16 ***
Residuals 110.38 117                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
anova(m12)
Analysis of Variance Table

Response: Y
           Df Sum Sq Mean Sq F value    Pr(>F)    
X1          1 137.89 137.887  146.16 < 2.2e-16 ***
X2          1 118.11 118.106  125.19 < 2.2e-16 ***
Residuals 117 110.38   0.943                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
m21 <- lm(Y ~ X2 + X1, data = sim)
anova(m21)
Analysis of Variance Table

Response: Y
           Df Sum Sq Mean Sq  F value Pr(>F)    
X2          1   0.00   0.000   0.0004 0.9837    
X1          1 255.99 255.993 271.3450 <2e-16 ***
Residuals 117 110.38   0.943                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.3 Diagnostics

Code
autoplot(m12)

4 Reproducibility

Code
sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-apple-darwin20
Running under: macOS Sequoia 15.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] patchwork_1.3.1  car_3.1-3        carData_3.0-5    kableExtra_1.4.0
 [5] broom_1.0.8      ggfortify_0.4.18 lubridate_1.9.4  forcats_1.0.0   
 [9] stringr_1.5.2    dplyr_1.1.4      purrr_1.1.0      readr_2.1.5     
[13] tidyr_1.3.1      tibble_3.3.0     ggplot2_4.0.0    tidyverse_2.0.0 

loaded via a namespace (and not attached):
 [1] generics_0.1.4     xml2_1.4.0         stringi_1.8.7      hms_1.1.3         
 [5] digest_0.6.37      magrittr_2.0.4     evaluate_1.0.4     grid_4.5.0        
 [9] timechange_0.3.0   RColorBrewer_1.1-3 fastmap_1.2.0      jsonlite_2.0.0    
[13] backports_1.5.0    Formula_1.2-5      gridExtra_2.3      viridisLite_0.4.2 
[17] scales_1.4.0       textshaping_1.0.1  abind_1.4-8        cli_3.6.5         
[21] rlang_1.1.6        withr_3.0.2        yaml_2.3.10        tools_4.5.0       
[25] tzdb_0.5.0         vctrs_0.6.5        R6_2.6.1           lifecycle_1.0.4   
[29] htmlwidgets_1.6.4  pkgconfig_2.0.3    pillar_1.11.1      gtable_0.3.6      
[33] glue_1.8.0         systemfonts_1.3.1  xfun_0.53          tidyselect_1.2.1  
[37] rstudioapi_0.17.1  knitr_1.50         farver_2.1.2       htmltools_0.5.8.1 
[41] labeling_0.4.3     rmarkdown_2.29     svglite_2.2.2      compiler_4.5.0    
[45] S7_0.2.0