This document is a response to a challenge issued by John Toczek to see ‘how high’ I could get the \(R^2\) of the mtcars dataset.

Challenge accepted.

library(dplyr); library(magrittr); library(ggplot2); library(knitr); library(broom) 
data(mtcars)

head(mtcars) %>% kable()

Here’s a naive linear model:

lm1 = lm(mpg~., data = mtcars)
lm1 %>% glance() %>% kable()

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual
0.8690158	0.8066423	2.650197	13.93246	4e-07	11	-69.85491	163.7098	181.2986	147.4944	21

But we can drive up the R-square by adding interaction terms. Here we add interactions:

lm(mpg~cyl + disp + hp + drat + (wt + qsec + vs + am + gear + carb)^2, data = mtcars) %>% glance() %>% kable()

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual
0.9754612	0.8732162	2.145998	9.540427	0.0048738	26	-43.05777	140.1155	179.6904	27.63185	6

To be clear, this is bad statistics, and is only done to show how interaction terms can drive up the fit.

MTCARS pathological example for John Toczek