This document is a response to a challenge issued by John Toczek to see ‘how high’ I could get the \(R^2\) of the mtcars dataset.

Challenge accepted.

library(dplyr); library(magrittr); library(ggplot2); library(knitr); library(broom) 
data(mtcars)

head(mtcars) %>% kable()
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Here’s a naive linear model:

lm1 = lm(mpg~., data = mtcars)
lm1 %>% glance() %>% kable()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.8690158 0.8066423 2.650197 13.93246 4e-07 11 -69.85491 163.7098 181.2986 147.4944 21

But we can drive up the R-square by adding interaction terms. Here we add interactions:

lm(mpg~cyl + disp + hp + drat + (wt + qsec + vs + am + gear + carb)^2, data = mtcars) %>% glance() %>% kable()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.9754612 0.8732162 2.145998 9.540427 0.0048738 26 -43.05777 140.1155 179.6904 27.63185 6

To be clear, this is bad statistics, and is only done to show how interaction terms can drive up the fit.