C. Donovan
We've seen a number of ways of increasing the complexity of the model of signal and how we might select between different models
Here we return to model diagnostics and then apply all we've covered to an analysis problem
We model using the following - where the design matrix X can have cunning form to allow a wide range of curvature, interactions, factor variables etc
\[ y = \mathbf{X}\boldsymbol{\beta} + \mathbf{e} \]
So, we've fitted a model. Do we believe it?
We check the model for signal (can our design matrix X capture it?)
We check the model for noise (does the noise look iid Normal?)
The primary means to check both are the residuals
As for simpler models, want to check:
We'll use a little dataset from Wood's (2006) GAM book. Data is on a number of different makes/models of car, mainly for modelling fuel efficiency:
library(tidyverse)
library(car)
carData <- read_csv("data/mpg.csv")
carData <- carData %>% select(-city.mpg, -loss)
head(carData)
# A tibble: 6 x 24
symbol make fuel aspir doors style drive eng.loc wb length width
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 3 alfa~ gas std two conv~ rwd front 88.6 169. 64.1
2 3 alfa~ gas std two conv~ rwd front 88.6 169. 64.1
3 1 alfa~ gas std two hatc~ rwd front 94.5 171. 65.5
4 2 audi gas std four sedan fwd front 99.8 177. 66.2
5 2 audi gas std four sedan 4wd front 99.4 177. 66.4
6 2 audi gas std two sedan fwd front 99.8 177. 66.3
# ... with 13 more variables: height <dbl>, weight <dbl>, eng.type <chr>,
# cylinders <chr>, eng.cc <dbl>, fuel.sys <chr>, bore <dbl>,
# stroke <dbl>, comp.ratio <dbl>, hp <dbl>, rpm <dbl>, hw.mpg <dbl>,
# price <dbl>
carData <- na.omit(carData)
carModel <- lm(hw.mpg ~ ., data = carData)
summary(carModel)
Call:
lm(formula = hw.mpg ~ ., data = carData)
Residuals:
Min 1Q Median 3Q Max
-7.4665 -1.0138 0.0000 0.8741 10.0551
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.6843785 24.2290090 1.803 0.073590 .
symbol 0.1353545 0.3650639 0.371 0.711382
makeaudi 1.8115468 3.2796769 0.552 0.581605
makebmw 4.1339222 3.1813149 1.299 0.195976
makechevrolet 9.7704927 2.9539487 3.308 0.001202 **
makedodge 5.4576878 2.5950907 2.103 0.037287 *
makehonda 1.0326973 3.0152128 0.342 0.732503
makeisuzu 0.8437792 3.3642575 0.251 0.802339
makejaguar 7.3937479 3.7612187 1.966 0.051346 .
makemazda 3.1229455 2.3196991 1.346 0.180438
makemercedes-benz 2.9225176 3.4091267 0.857 0.392797
makemercury 4.5979902 3.9959076 1.151 0.251870
makemitsubishi 5.2088118 2.6149771 1.992 0.048368 *
makenissan 4.9401008 2.2794148 2.167 0.031943 *
makepeugot -4.4386591 6.1719266 -0.719 0.473262
makeplymouth 5.8813652 2.5417841 2.314 0.022163 *
makeporsche -0.4360866 4.0568461 -0.107 0.914554
makesaab 1.9324417 2.9519237 0.655 0.513798
makesubaru 1.3023219 2.6272432 0.496 0.620901
maketoyota 4.3685821 2.1244222 2.056 0.041646 *
makevolkswagen 3.6078869 2.5307096 1.426 0.156246
makevolvo 5.9318178 2.9132783 2.036 0.043664 *
fuelgas 16.6191311 9.1883955 1.809 0.072689 .
aspirturbo -2.7709378 1.1168920 -2.481 0.014315 *
doorstwo -0.2250915 0.6880396 -0.327 0.744055
stylehardtop 1.7103202 1.6374963 1.044 0.298106
stylehatchback 2.9727311 1.5375638 1.933 0.055249 .
stylesedan 3.4966761 1.6543631 2.114 0.036361 *
stylewagon 3.6930945 1.7811426 2.073 0.040004 *
drivefwd 2.0511821 1.2586652 1.630 0.105473
driverwd 1.4852133 1.6848122 0.882 0.379575
eng.locrear 1.6195561 3.8399158 0.422 0.673855
wb -0.2885159 0.1292921 -2.232 0.027272 *
length -0.0468275 0.0715338 -0.655 0.513810
width 0.3968397 0.3179012 1.248 0.214046
height 0.0119531 0.2025530 0.059 0.953028
weight -0.0070225 0.0023340 -3.009 0.003122 **
eng.typel 8.8880828 5.7047931 1.558 0.121540
eng.typeohc -1.0931503 1.6389832 -0.667 0.505914
eng.typeohcf NA NA NA NA
eng.typeohcv -0.3687564 1.7155451 -0.215 0.830126
cylindersfive 2.4649711 3.9672687 0.621 0.535417
cylindersfour 5.0305364 4.8358107 1.040 0.300048
cylinderssix 0.6094793 3.6838421 0.165 0.868836
cylindersthree NA NA NA NA
cylinderstwelve -9.1650619 7.1394520 -1.284 0.201407
eng.cc -0.0134897 0.0364468 -0.370 0.711864
fuel.sys2bbl -4.5292855 2.0180164 -2.244 0.026409 *
fuel.sysidi NA NA NA NA
fuel.sysmfi -6.9314111 3.6450686 -1.902 0.059325 .
fuel.sysmpfi -5.0116028 2.1110270 -2.374 0.018983 *
fuel.sysspdi -5.5823819 2.4816961 -2.249 0.026081 *
fuel.sysspfi -2.9327732 4.2016856 -0.698 0.486361
bore -2.1766848 2.5285287 -0.861 0.390825
stroke 0.1328865 1.3648168 0.097 0.922578
comp.ratio 1.4214909 0.6746679 2.107 0.036945 *
hp 0.0011179 0.0340103 0.033 0.973826
rpm -0.0032222 0.0008884 -3.627 0.000404 ***
price 0.0002182 0.0001158 1.885 0.061567 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.384 on 137 degrees of freedom
Multiple R-squared: 0.9128, Adjusted R-squared: 0.8777
F-statistic: 26.06 on 55 and 137 DF, p-value: < 2.2e-16
Call:
lm(formula = hw.mpg ~ ., data = carData)
Residuals:
Min 1Q Median 3Q Max
-7.4665 -1.0138 0.0000 0.8741 10.0551
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 43.6843785 24.2290090 1.803 0.073590 .
symbol 0.1353545 0.3650639 0.371 0.711382
makeaudi 1.8115468 3.2796769 0.552 0.581605
makebmw 4.1339222 3.1813149 1.299 0.195976
makechevrolet 9.7704927 2.9539487 3.308 0.001202 **
makedodge 5.4576878 2.5950907 2.103 0.037287 *
makehonda 1.0326973 3.0152128 0.342 0.732503
makeisuzu 0.8437792 3.3642575 0.251 0.802339
makejaguar 7.3937479 3.7612187 1.966 0.051346 .
makemazda 3.1229455 2.3196991 1.346 0.180438
makemercedes-benz 2.9225176 3.4091267 0.857 0.392797
makemercury 4.5979902 3.9959076 1.151 0.251870
makemitsubishi 5.2088118 2.6149771 1.992 0.048368 *
makenissan 4.9401008 2.2794148 2.167 0.031943 *
makepeugot -4.4386591 6.1719266 -0.719 0.473262
makeplymouth 5.8813652 2.5417841 2.314 0.022163 *
makeporsche -0.4360866 4.0568461 -0.107 0.914554
makesaab 1.9324417 2.9519237 0.655 0.513798
makesubaru 1.3023219 2.6272432 0.496 0.620901
maketoyota 4.3685821 2.1244222 2.056 0.041646 *
makevolkswagen 3.6078869 2.5307096 1.426 0.156246
makevolvo 5.9318178 2.9132783 2.036 0.043664 *
fuelgas 16.6191311 9.1883955 1.809 0.072689 .
aspirturbo -2.7709378 1.1168920 -2.481 0.014315 *
doorstwo -0.2250915 0.6880396 -0.327 0.744055
stylehardtop 1.7103202 1.6374963 1.044 0.298106
stylehatchback 2.9727311 1.5375638 1.933 0.055249 .
stylesedan 3.4966761 1.6543631 2.114 0.036361 *
stylewagon 3.6930945 1.7811426 2.073 0.040004 *
drivefwd 2.0511821 1.2586652 1.630 0.105473
driverwd 1.4852133 1.6848122 0.882 0.379575
eng.locrear 1.6195561 3.8399158 0.422 0.673855
wb -0.2885159 0.1292921 -2.232 0.027272 *
length -0.0468275 0.0715338 -0.655 0.513810
width 0.3968397 0.3179012 1.248 0.214046
height 0.0119531 0.2025530 0.059 0.953028
weight -0.0070225 0.0023340 -3.009 0.003122 **
eng.typel 8.8880828 5.7047931 1.558 0.121540
eng.typeohc -1.0931503 1.6389832 -0.667 0.505914
eng.typeohcf NA NA NA NA
eng.typeohcv -0.3687564 1.7155451 -0.215 0.830126
cylindersfive 2.4649711 3.9672687 0.621 0.535417
cylindersfour 5.0305364 4.8358107 1.040 0.300048
cylinderssix 0.6094793 3.6838421 0.165 0.868836
cylindersthree NA NA NA NA
cylinderstwelve -9.1650619 7.1394520 -1.284 0.201407
eng.cc -0.0134897 0.0364468 -0.370 0.711864
fuel.sys2bbl -4.5292855 2.0180164 -2.244 0.026409 *
fuel.sysidi NA NA NA NA
fuel.sysmfi -6.9314111 3.6450686 -1.902 0.059325 .
fuel.sysmpfi -5.0116028 2.1110270 -2.374 0.018983 *
fuel.sysspdi -5.5823819 2.4816961 -2.249 0.026081 *
fuel.sysspfi -2.9327732 4.2016856 -0.698 0.486361
bore -2.1766848 2.5285287 -0.861 0.390825
stroke 0.1328865 1.3648168 0.097 0.922578
comp.ratio 1.4214909 0.6746679 2.107 0.036945 *
hp 0.0011179 0.0340103 0.033 0.973826
rpm -0.0032222 0.0008884 -3.627 0.000404 ***
price 0.0002182 0.0001158 1.885 0.061567 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.384 on 137 degrees of freedom
Multiple R-squared: 0.9128, Adjusted R-squared: 0.8777
F-statistic: 26.06 on 55 and 137 DF, p-value: < 2.2e-16
Anova(carModel)
Anova Table (Type II tests)
Response: hw.mpg
Sum Sq Df F value Pr(>F)
symbol 0.78 1 0.1375 0.7113821
make 164.68 18 1.6104 0.0653575 .
fuel 0
aspir 34.97 1 6.1550 0.0143145 *
doors 0.61 1 0.1070 0.7440545
style 28.41 4 1.2504 0.2926659
drive 16.63 2 1.4636 0.2349912
eng.loc 0
wb 28.29 1 4.9796 0.0272724 *
length 2.43 1 0.4285 0.5138097
width 8.85 1 1.5583 0.2140462
height 0.02 1 0.0035 0.9530282
weight 51.43 1 9.0529 0.0031223 **
eng.type 2.56 2 0.2251 0.7987346
cylinders 37.28 4 1.6404 0.1676106
eng.cc 0.78 1 0.1370 0.7118644
fuel.sys 37.21 5 1.3100 0.2633417
bore 4.21 1 0.7411 0.3908246
stroke 0.05 1 0.0095 0.9225782
comp.ratio 25.22 1 4.4392 0.0369448 *
hp 0.01 1 0.0011 0.9738258
rpm 74.73 1 13.1548 0.0004036 ***
price 20.18 1 3.5527 0.0615671 .
Residuals 778.31 137
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
carModel <- step(carModel)
Start: AIC=381.13
hw.mpg ~ symbol + make + fuel + aspir + doors + style + drive +
eng.loc + wb + length + width + height + weight + eng.type +
cylinders + eng.cc + fuel.sys + bore + stroke + comp.ratio +
hp + rpm + price
Step: AIC=381.13
hw.mpg ~ symbol + make + fuel + aspir + doors + style + drive +
wb + length + width + height + weight + eng.type + cylinders +
eng.cc + fuel.sys + bore + stroke + comp.ratio + hp + rpm +
price
Step: AIC=381.13
hw.mpg ~ symbol + make + aspir + doors + style + drive + wb +
length + width + height + weight + eng.type + cylinders +
eng.cc + fuel.sys + bore + stroke + comp.ratio + hp + rpm +
price
Df Sum of Sq RSS AIC
- eng.type 3 3.762 782.07 376.06
- hp 1 0.006 778.32 379.13
- height 1 0.020 778.33 379.13
- stroke 1 0.054 778.37 379.14
- doors 1 0.608 778.92 379.28
- eng.cc 1 0.778 779.09 379.32
- symbol 1 0.781 779.09 379.32
- length 1 2.435 780.75 379.73
- style 4 28.415 806.73 380.05
- bore 1 4.210 782.52 380.17
- make 19 167.246 945.56 380.69
<none> 778.31 381.13
- drive 2 16.630 794.94 381.21
- width 1 8.853 787.17 381.31
- fuel.sys 6 53.772 832.08 382.02
- cylinders 4 37.276 815.59 382.16
- price 1 20.183 798.50 384.07
- comp.ratio 1 25.220 803.53 385.28
- wb 1 28.290 806.60 386.02
- aspir 1 34.968 813.28 387.61
- weight 1 51.431 829.74 391.48
- rpm 1 74.734 853.05 396.82
Step: AIC=376.06
hw.mpg ~ symbol + make + aspir + doors + style + drive + wb +
length + width + height + weight + cylinders + eng.cc + fuel.sys +
bore + stroke + comp.ratio + hp + rpm + price
Df Sum of Sq RSS AIC
- stroke 1 0.092 782.17 374.08
- height 1 0.285 782.36 374.13
- doors 1 0.531 782.61 374.19
- hp 1 0.867 782.94 374.27
- symbol 1 0.924 783.00 374.29
- length 1 1.294 783.37 374.38
- eng.cc 1 2.057 784.13 374.56
- bore 1 2.162 784.24 374.59
- style 4 28.467 810.54 374.96
- width 1 7.159 789.23 375.82
- drive 2 15.582 797.66 375.86
<none> 782.07 376.06
- make 20 180.637 962.71 376.16
- cylinders 5 42.096 824.17 376.18
- fuel.sys 6 59.316 841.39 378.17
- price 1 27.135 809.21 380.64
- wb 1 31.948 814.02 381.78
- comp.ratio 1 32.891 814.97 382.01
- aspir 1 36.099 818.17 382.77
- weight 1 74.887 856.96 391.71
- rpm 1 78.550 860.63 392.53
Step: AIC=374.08
hw.mpg ~ symbol + make + aspir + doors + style + drive + wb +
length + width + height + weight + cylinders + eng.cc + fuel.sys +
bore + comp.ratio + hp + rpm + price
Df Sum of Sq RSS AIC
- height 1 0.257 782.42 372.14
- doors 1 0.550 782.72 372.22
- hp 1 0.819 782.99 372.28
- symbol 1 0.969 783.14 372.32
- length 1 1.334 783.50 372.41
- eng.cc 1 2.228 784.39 372.63
- bore 1 2.362 784.53 372.66
- style 4 28.425 810.59 372.97
- width 1 7.074 789.24 373.82
- drive 2 15.640 797.81 373.90
<none> 782.17 374.08
- cylinders 5 43.631 825.80 374.56
- make 20 185.143 967.31 375.08
- fuel.sys 6 59.958 842.12 376.33
- price 1 27.641 809.81 378.78
- wb 1 31.863 814.03 379.79
- comp.ratio 1 32.875 815.04 380.03
- aspir 1 36.419 818.59 380.86
- weight 1 75.434 857.60 389.85
- rpm 1 79.671 861.84 390.80
Step: AIC=372.14
hw.mpg ~ symbol + make + aspir + doors + style + drive + wb +
length + width + weight + cylinders + eng.cc + fuel.sys +
bore + comp.ratio + hp + rpm + price
Df Sum of Sq RSS AIC
- doors 1 0.495 782.92 370.27
- symbol 1 0.823 783.25 370.35
- hp 1 0.868 783.29 370.36
- length 1 1.709 784.13 370.56
- eng.cc 1 2.112 784.54 370.66
- bore 1 2.651 785.07 370.80
- style 4 28.637 811.06 371.08
- width 1 6.833 789.26 371.82
- drive 2 15.415 797.84 371.91
<none> 782.42 372.14
- cylinders 5 46.765 829.19 373.35
- make 20 186.325 968.75 373.37
- fuel.sys 6 61.368 843.79 374.72
- price 1 27.600 810.02 376.83
- comp.ratio 1 32.619 815.04 378.03
- aspir 1 36.648 819.07 378.98
- wb 1 37.874 820.30 379.27
- weight 1 77.838 860.26 388.45
- rpm 1 79.474 861.90 388.81
Step: AIC=370.27
hw.mpg ~ symbol + make + aspir + style + drive + wb + length +
width + weight + cylinders + eng.cc + fuel.sys + bore + comp.ratio +
hp + rpm + price
Df Sum of Sq RSS AIC
- symbol 1 0.529 783.45 368.40
- hp 1 0.901 783.82 368.49
- length 1 1.642 784.56 368.67
- eng.cc 1 2.329 785.25 368.84
- bore 1 2.439 785.36 368.87
- width 1 7.098 790.02 370.01
- drive 2 15.958 798.88 370.16
- style 4 33.083 816.00 370.25
<none> 782.92 370.27
- make 20 187.168 970.09 371.64
- cylinders 5 48.406 831.33 371.84
- fuel.sys 6 64.333 847.25 373.51
- price 1 28.027 810.95 375.05
- comp.ratio 1 33.361 816.28 376.32
- aspir 1 36.871 819.79 377.15
- wb 1 38.619 821.54 377.56
- weight 1 77.343 860.26 386.45
- rpm 1 81.209 864.13 387.31
Step: AIC=368.4
hw.mpg ~ make + aspir + style + drive + wb + length + width +
weight + cylinders + eng.cc + fuel.sys + bore + comp.ratio +
hp + rpm + price
Df Sum of Sq RSS AIC
- hp 1 0.751 784.20 366.58
- length 1 1.797 785.25 366.84
- eng.cc 1 1.928 785.38 366.87
- bore 1 2.537 785.98 367.02
- drive 2 15.446 798.89 368.16
- width 1 7.451 790.90 368.22
<none> 783.45 368.40
- style 4 34.356 817.80 368.68
- make 20 189.343 972.79 370.17
- cylinders 5 54.255 837.70 371.32
- fuel.sys 6 63.904 847.35 371.53
- price 1 27.515 810.96 373.06
- comp.ratio 1 34.731 818.18 374.77
- aspir 1 36.681 820.13 375.23
- wb 1 45.884 829.33 377.38
- weight 1 77.828 861.28 384.67
- rpm 1 80.776 864.22 385.33
Step: AIC=366.58
hw.mpg ~ make + aspir + style + drive + wb + length + width +
weight + cylinders + eng.cc + fuel.sys + bore + comp.ratio +
rpm + price
Df Sum of Sq RSS AIC
- eng.cc 1 1.272 785.47 364.89
- length 1 1.614 785.81 364.98
- bore 1 2.430 786.63 365.18
- width 1 7.510 791.71 366.42
- drive 2 16.325 800.52 366.56
<none> 784.20 366.58
- style 4 35.201 819.40 367.06
- make 20 195.636 979.83 369.57
- fuel.sys 6 66.261 850.46 370.24
- cylinders 5 58.622 842.82 370.49
- price 1 28.657 812.86 371.51
- comp.ratio 1 35.630 819.83 373.16
- aspir 1 47.958 832.16 376.04
- wb 1 48.878 833.08 376.25
- weight 1 77.245 861.44 382.71
- rpm 1 99.983 884.18 387.74
Step: AIC=364.89
hw.mpg ~ make + aspir + style + drive + wb + length + width +
weight + cylinders + fuel.sys + bore + comp.ratio + rpm +
price
Df Sum of Sq RSS AIC
- length 1 2.026 787.50 363.39
- drive 2 15.669 801.14 364.71
- width 1 7.805 793.28 364.80
<none> 785.47 364.89
- bore 1 9.426 794.90 365.20
- style 4 37.173 822.64 365.82
- make 20 200.231 985.70 368.72
- fuel.sys 6 70.758 856.23 369.54
- price 1 27.871 813.34 369.62
- comp.ratio 1 40.411 825.88 372.58
- aspir 1 47.930 833.40 374.33
- wb 1 48.959 834.43 374.56
- weight 1 84.098 869.57 382.52
- cylinders 5 130.658 916.13 384.59
- rpm 1 103.077 888.55 386.69
Step: AIC=363.39
hw.mpg ~ make + aspir + style + drive + wb + width + weight +
cylinders + fuel.sys + bore + comp.ratio + rpm + price
Df Sum of Sq RSS AIC
- drive 2 14.149 801.65 362.83
- width 1 6.556 794.05 362.99
<none> 787.50 363.39
- bore 1 9.976 797.47 363.82
- style 4 35.191 822.69 363.83
- price 1 28.758 816.26 368.31
- fuel.sys 6 72.928 860.43 368.48
- comp.ratio 1 40.028 827.53 370.96
- aspir 1 46.992 834.49 372.58
- make 20 249.001 1036.50 376.42
- wb 1 82.661 870.16 380.66
- cylinders 5 129.180 916.68 382.71
- rpm 1 108.872 896.37 386.38
- weight 1 120.268 907.77 388.82
Step: AIC=362.83
hw.mpg ~ make + aspir + style + wb + width + weight + cylinders +
fuel.sys + bore + comp.ratio + rpm + price
Df Sum of Sq RSS AIC
- bore 1 7.251 808.90 362.57
<none> 801.65 362.83
- style 4 34.034 835.68 362.85
- width 1 13.043 814.69 363.94
- fuel.sys 6 68.476 870.12 366.65
- price 1 28.881 830.53 367.66
- comp.ratio 1 44.355 846.00 371.22
- aspir 1 48.226 849.87 372.10
- wb 1 74.163 875.81 377.90
- make 20 273.428 1075.07 379.47
- cylinders 5 122.658 924.30 380.31
- rpm 1 116.552 918.20 387.03
- weight 1 192.270 993.92 402.32
Step: AIC=362.57
hw.mpg ~ make + aspir + style + wb + width + weight + cylinders +
fuel.sys + comp.ratio + rpm + price
Df Sum of Sq RSS AIC
<none> 808.90 362.57
- width 1 12.59 821.49 363.55
- style 4 45.98 854.87 365.23
- fuel.sys 6 72.29 881.19 367.09
- price 1 28.25 837.14 367.19
- aspir 1 43.89 852.79 370.76
- comp.ratio 1 49.01 857.90 371.92
- wb 1 76.67 885.57 378.04
- cylinders 5 115.73 924.63 378.37
- rpm 1 112.51 921.40 385.70
- make 20 333.52 1142.42 389.20
- weight 1 291.55 1100.45 419.97
Anova(carModel)
Anova Table (Type II tests)
Response: hw.mpg
Sum Sq Df F value Pr(>F)
make 333.52 20 3.0924 4.448e-05 ***
aspir 43.89 1 8.1395 0.0049432 **
style 45.98 4 2.1314 0.0796671 .
wb 76.67 1 14.2180 0.0002337 ***
width 12.59 1 2.3345 0.1286416
weight 291.55 1 54.0649 1.170e-11 ***
cylinders 115.73 5 4.2922 0.0011189 **
fuel.sys 72.29 6 2.2343 0.0428853 *
comp.ratio 49.01 1 9.0878 0.0030224 **
rpm 112.51 1 20.8627 1.022e-05 ***
price 28.25 1 5.2378 0.0234978 *
Residuals 808.90 150
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Once all the signal is removed, only iid Normal stuff should remain.
Shape
qqnorm(resid(carModel))
qqline(resid(carModel))
shapiro.test(resid(carModel))
Shapiro-Wilk normality test
data: resid(carModel)
W = 0.92131, p-value = 1.166e-08
We can try to track down the extreme residuals
hist(resid(carModel))
We can try to track down the extreme residuals
bigResid <- which(abs(resid(carModel))>5)
carData[bigResid,]
# A tibble: 7 x 24
symbol make fuel aspir doors style drive eng.loc wb length width
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 2 honda gas std two hatc~ fwd front 86.6 145. 63.9
2 1 mazda gas std two hatc~ fwd front 93.1 159. 64.2
3 1 niss~ dies~ std two sedan fwd front 94.5 165. 63.8
4 0 suba~ gas std four sedan 4wd front 97 172 65.4
5 0 toyo~ dies~ std four sedan fwd front 95.7 166. 64.4
6 0 toyo~ gas std four sedan fwd front 95.7 166. 64.4
7 -1 toyo~ dies~ turbo four sedan fwd front 102. 176. 66.5
# ... with 13 more variables: height <dbl>, weight <dbl>, eng.type <chr>,
# cylinders <chr>, eng.cc <dbl>, fuel.sys <chr>, bore <dbl>,
# stroke <dbl>, comp.ratio <dbl>, hp <dbl>, rpm <dbl>, hw.mpg <dbl>,
# price <dbl>
Once all the signal is removed, only iid Normal stuff should remain.
Spread
carResid <- resid(carModel)
plot(fitted(carModel), carResid, ylab = 'residuals', xlab = 'Fitted values')
plot(carData$hp, carResid, ylab = 'residuals', xlab = 'Horse power')
ncvTest
- non-constant variance test). ncvTest(carModel)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 43.44292, Df = 1, p = 4.3651e-11
We can be explicit about which covariate to test against:
ncvTest(carModel, ~hp)
Non-constant Variance Score Test
Variance formula: ~ hp
Chisquare = 35.76066, Df = 1, p = 2.2311e-09
Once all the signal is removed, only iid Normal stuff should remain.
Independence
plot(carData$width, carResid, ylab = 'residuals', xlab = 'car width')
durbinWatsonTest(carModel)
lag Autocorrelation D-W Statistic p-value
1 -0.07326368 2.146045 0.454
Alternative hypothesis: rho != 0
plot(carModel, which = 1:2)
While it looks like a lot of tests for different things, it is simply one assumption:
the errors are iid Normal
If we have modelled all the signal, only iid Normal stuff should remain. So, it follows that problems in the residuals may be linked to a poor model for the signal
Signal
This is a little different - it is not strictly signal or noise.
We can examine this using Variance Inflation Factors (VIFs).
library(GGally)
numericOnly <- carData %>% select_if(is.numeric)
ggpairs(numericOnly)
vif(carModel)
GVIF Df GVIF^(1/(2*Df))
make 3.095147e+05 20 1.371725
aspir 3.301658e+00 1 1.817046
style 6.468664e+00 4 1.262850
wb 9.620137e+00 1 3.101635
width 1.140642e+01 1 3.377339
weight 1.870658e+01 1 4.325111
cylinders 2.600441e+02 5 1.743832
fuel.sys 2.906645e+04 6 2.354775
comp.ratio 1.696585e+02 1 13.025303
rpm 3.038367e+00 1 1.743091
price 2.055915e+01 1 4.534220
alteredModel <- update(carModel, .~.-comp.ratio)
vif(alteredModel)
GVIF Df GVIF^(1/(2*Df))
make 1.873647e+05 20 1.354619
aspir 2.767284e+00 1 1.663516
style 6.267025e+00 4 1.257861
wb 9.606971e+00 1 3.099511
width 1.107595e+01 1 3.328055
weight 1.803180e+01 1 4.246387
cylinders 1.688878e+02 5 1.670167
fuel.sys 3.578376e+02 6 1.632327
rpm 2.936689e+00 1 1.713677
price 1.951306e+01 1 4.417359
qqnorm(resid(alteredModel))
shapiro.test(resid(alteredModel))
Shapiro-Wilk normality test
data: resid(alteredModel)
W = 0.93355, p-value = 1.011e-07
ncvTest(alteredModel)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 39.0538, Df = 1, p = 4.1229e-10
summary(alteredModel)
Call:
lm(formula = hw.mpg ~ make + aspir + style + wb + width + weight +
cylinders + fuel.sys + rpm + price, data = carData)
Residuals:
Min 1Q Median 3Q Max
-7.883 -1.177 -0.063 1.184 9.715
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.839e+01 1.565e+01 3.730 0.000270 ***
makeaudi 2.376e+00 2.522e+00 0.942 0.347648
makebmw 2.131e+00 2.050e+00 1.039 0.300434
makechevrolet 9.216e+00 2.452e+00 3.759 0.000244 ***
makedodge 4.617e+00 1.983e+00 2.328 0.021213 *
makehonda 5.195e-01 2.435e+00 0.213 0.831323
makeisuzu -8.328e-01 3.033e+00 -0.275 0.783987
makejaguar 8.003e+00 2.772e+00 2.887 0.004454 **
makemazda 1.603e+00 1.860e+00 0.862 0.390234
makemercedes-benz 4.533e+00 3.020e+00 1.501 0.135519
makemercury 3.178e+00 3.062e+00 1.038 0.301020
makemitsubishi 4.185e+00 1.990e+00 2.103 0.037094 *
makenissan 3.646e+00 1.770e+00 2.060 0.041094 *
makepeugot 4.577e+00 2.159e+00 2.120 0.035638 *
makeplymouth 5.075e+00 2.019e+00 2.513 0.013020 *
makeporsche -1.999e-01 2.367e+00 -0.084 0.932797
makesaab 3.041e+00 1.977e+00 1.539 0.125989
makesubaru -1.801e-02 1.827e+00 -0.010 0.992149
maketoyota 3.686e+00 1.704e+00 2.163 0.032111 *
makevolkswagen 3.461e+00 1.817e+00 1.905 0.058657 .
makevolvo 5.219e+00 2.003e+00 2.605 0.010097 *
aspirturbo -3.205e+00 7.408e-01 -4.327 2.74e-05 ***
stylehardtop 8.437e-01 1.484e+00 0.569 0.570486
stylehatchback 2.137e+00 1.397e+00 1.529 0.128264
stylesedan 2.666e+00 1.421e+00 1.876 0.062632 .
stylewagon 3.188e+00 1.536e+00 2.076 0.039631 *
wb -3.092e-01 8.666e-02 -3.567 0.000484 ***
width 5.404e-01 2.678e-01 2.018 0.045371 *
weight -1.091e-02 1.387e-03 -7.864 6.56e-13 ***
cylindersfive 1.601e+00 2.155e+00 0.743 0.458758
cylindersfour 5.246e+00 2.912e+00 1.802 0.073603 .
cylinderssix 3.360e+00 2.864e+00 1.173 0.242464
cylindersthree 1.045e+01 4.218e+00 2.478 0.014329 *
cylinderstwelve -3.537e+00 4.179e+00 -0.846 0.398681
fuel.sys2bbl -4.231e+00 1.906e+00 -2.219 0.027949 *
fuel.sysidi 3.566e+00 2.129e+00 1.675 0.096010 .
fuel.sysmfi -8.147e+00 3.342e+00 -2.437 0.015952 *
fuel.sysmpfi -4.979e+00 1.959e+00 -2.542 0.012026 *
fuel.sysspdi -6.582e+00 2.302e+00 -2.860 0.004844 **
fuel.sysspfi -8.275e-01 3.960e+00 -0.209 0.834738
rpm -2.503e-03 6.290e-04 -3.980 0.000107 ***
price 1.511e-04 9.394e-05 1.609 0.109781
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.384 on 151 degrees of freedom
Multiple R-squared: 0.9038, Adjusted R-squared: 0.8777
F-statistic: 34.62 on 41 and 151 DF, p-value: < 2.2e-16
Anova(alteredModel)
Anova Table (Type II tests)
Response: hw.mpg
Sum Sq Df F value Pr(>F)
make 354.26 20 3.1177 3.856e-05 ***
aspir 106.36 1 18.7200 2.741e-05 ***
style 35.54 4 1.5641 0.1867782
wb 72.30 1 12.7261 0.0004836 ***
width 23.13 1 4.0720 0.0453711 *
weight 351.40 1 61.8503 6.555e-13 ***
cylinders 77.47 5 2.7272 0.0217222 *
fuel.sys 622.63 6 18.2648 6.971e-16 ***
rpm 89.99 1 15.8391 0.0001068 ***
price 14.70 1 2.5878 0.1097809
Residuals 857.90 151
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(alteredModel)
2.5 % 97.5 %
(Intercept) 2.746083e+01 89.3169847817
makeaudi -2.606924e+00 7.3587282908
makebmw -1.920726e+00 6.1818500668
makechevrolet 4.371420e+00 14.0609259892
makedodge 6.992340e-01 8.5338880841
makehonda -4.291007e+00 5.3300193726
makeisuzu -6.824992e+00 5.1592988875
makejaguar 2.526847e+00 13.4790899822
makemazda -2.072309e+00 5.2777851596
makemercedes-benz -1.434954e+00 10.5003940472
makemercury -2.871971e+00 9.2271074253
makemitsubishi 2.537551e-01 8.1161413725
makenissan 1.494042e-01 7.1430080285
makepeugot 3.113616e-01 8.8418359891
makeplymouth 1.084842e+00 9.0644387695
makeporsche -4.876275e+00 4.4764402764
makesaab -8.641071e-01 6.9465280115
makesubaru -3.627642e+00 3.5916310912
maketoyota 3.190514e-01 7.0520317932
makevolkswagen -1.282779e-01 7.0505709494
makevolvo 1.261057e+00 9.1768666447
aspirturbo -4.668568e+00 -1.7414069576
stylehardtop -2.088202e+00 3.7756562640
stylehatchback -6.238389e-01 4.8981490905
stylesedan -1.423027e-01 5.4734466081
stylewagon 1.532166e-01 6.2227268622
wb -4.803811e-01 -0.1379280095
width 1.127866e-02 1.0695021269
weight -1.364723e-02 -0.0081668691
cylindersfive -2.656893e+00 5.8580823389
cylindersfour -5.072034e-01 10.9987135560
cylinderssix -2.297719e+00 9.0186063776
cylindersthree 2.116376e+00 18.7826300635
cylinderstwelve -1.179524e+01 4.7203963415
fuel.sys2bbl -7.997001e+00 -0.4644003955
fuel.sysidi -6.404693e-01 7.7724981337
fuel.sysmfi -1.475087e+01 -1.5430860747
fuel.sysmpfi -8.849510e+00 -1.1092831268
fuel.sysspdi -1.112945e+01 -2.0340271154
fuel.sysspfi -8.650982e+00 6.9959366661
rpm -3.745829e-03 -0.0012604514
price -3.448908e-05 0.0003367182
library(effects)
carData <- carData %>% mutate(aspir = factor(aspir), style = factor(style),
make = factor(make), fuel.sys = factor(fuel.sys), cylinders = factor(cylinders))
alteredModel <- update(alteredModel, .~., data = carData)
plot(effect(term = "style", mod = alteredModel))
plot(effect(term = "rpm", mod = alteredModel))
plot(effect(term = "weight", mod = alteredModel))
We've covered:
Upcoming: