library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(broom)
Augment is a function in broom which allows you to combine a linear model object with a dataframe.
I’ll use mtcars to demonstrate how it’s used.
lm1 = lm(mpg~disp, data = mtcars)
alm1 = augment(lm1,data=mtcars)
## Warning: Deprecated: please use `purrr::possibly()` instead
## Warning: Deprecated: please use `purrr::possibly()` instead
## Warning: Deprecated: please use `purrr::possibly()` instead
## Warning: Deprecated: please use `purrr::possibly()` instead
## Warning: Deprecated: please use `purrr::possibly()` instead
str(alm1)
## 'data.frame': 32 obs. of 19 variables:
## $ .rownames : chr "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp : num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec : num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear : num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb : num 4 4 1 1 2 1 4 2 2 4 ...
## $ .fitted : num 23 23 25.1 19 14.8 ...
## $ .se.fit : num 0.664 0.664 0.815 0.589 0.838 ...
## $ .resid : num -2.01 -2.01 -2.35 2.43 3.94 ...
## $ .hat : num 0.0418 0.0418 0.0629 0.0328 0.0663 ...
## $ .sigma : num 3.29 3.29 3.28 3.27 3.22 ...
## $ .cooksd : num 0.00865 0.00865 0.01868 0.00983 0.05581 ...
## $ .std.resid: num -0.63 -0.63 -0.746 0.761 1.253 ...
Note that it is a dataframe and that it contains both the raw data and the model. This makes it easy to compare fitted and actual values.
Here’s an example.
alm1 %>% select(.rownames,mpg,.fitted,.resid) %>% arrange(.resid)
## .rownames mpg .fitted .resid
## 1 Merc 280C 17.8 22.69220 -4.8922007
## 2 Ferrari Dino 19.7 23.62366 -3.9236624
## 3 Merc 280 19.2 22.69220 -3.4922007
## 4 Volvo 142E 21.4 24.61283 -3.2128252
## 5 Toyota Corona 21.5 24.64992 -3.1499188
## 6 Merc 450SLC 15.2 18.23272 -3.0327247
## 7 Datsun 710 22.8 25.14862 -2.3486218
## 8 Valiant 18.1 20.32645 -2.2264528
## 9 Maserati Bora 15.0 17.19410 -2.1941036
## 10 Mazda RX4 21.0 23.00544 -2.0054356
## 11 Mazda RX4 Wag 21.0 23.00544 -2.0054356
## 12 Camaro Z28 13.3 15.17456 -1.8745628
## 13 AMC Javelin 15.2 17.07046 -1.8704583
## 14 Merc 450SE 16.4 18.23272 -1.8327247
## 15 Merc 230 22.8 23.79677 -0.9967659
## 16 Dodge Challenger 15.5 16.49345 -0.9934466
## 17 Merc 450SL 17.3 18.23272 -0.9327247
## 18 Duster 360 14.3 14.76241 -0.4624116
## 19 Lincoln Continental 10.4 10.64090 -0.2408996
## 20 Cadillac Fleetwood 10.4 10.14632 0.2536819
## 21 Ford Pantera L 15.8 15.13335 0.6666524
## 22 Merc 240D 24.4 23.55360 0.8464033
## 23 Fiat X1-9 27.3 26.34386 0.9561397
## 24 Porsche 914-2 26.0 24.64168 1.3583242
## 25 Hornet 4 Drive 21.4 18.96635 2.4336462
## 26 Chrysler Imperial 14.7 11.46520 3.2347980
## 27 Honda Civic 30.4 26.47987 3.9201298
## 28 Hornet Sportabout 18.7 14.76241 3.9375884
## 29 Lotus Europa 30.4 25.68030 4.7197032
## 30 Fiat 128 32.4 26.35622 6.0437752
## 31 Pontiac Firebird 19.2 13.11381 6.0861932
## 32 Toyota Corolla 33.9 26.66946 7.2305403
It is also useful for considering the potential for other potential variables. Look for variables correlated with the residuals from the first model.
cor(alm1$.resid,alm1$wt)
## [1] -0.2167851
cor(alm1$.resid,alm1$hp)
## [1] -0.1993521
cor(alm1$.resid,alm1$drat)
## [1] 0.149288
It looks like wt has the most potential for improving the model.