MT5762 Lecture 18

C. Donovan

More on linear models

We've seen a number of ways of increasing the complexity of the model of signal and how we might select between different models

Here we return to model diagnostics and then apply all we've covered to an analysis problem

The linear model form - recap

We model using the following - where the design matrix X can have cunning form to allow a wide range of curvature, interactions, factor variables etc

\[ y = \mathbf{X}\boldsymbol{\beta} + \mathbf{e} \]

  • Additionally we assume the errors (e) are iid Normal.
  • On this basis we're optimising the \( \hat{\boldsymbol{\beta}} \) - minimising the Residual Sums of Squares (RSS, i.e. OLS estimates) or maximising the (log-)likelihood. These are equivalent for Normal errors.

Model diagnostics

So, we've fitted a model. Do we believe it?

We check the model for signal (can our design matrix X capture it?)

We check the model for noise (does the noise look iid Normal?)

The primary means to check both are the residuals

Model diagnostics

As for simpler models, want to check:

  • Because we're assuming iid Normal errors:
    • Shape of error distribution
    • Independence of errors
    • Constant spread everywhere
  • Because X should capture all the signal
    • No pattern in the errors
  • (New) we assume the covariates offer independent information about \( y \):
    • Collinearity (relationships between the individal x) is acceptable

Data

We'll use a little dataset from Wood's (2006) GAM book. Data is on a number of different makes/models of car, mainly for modelling fuel efficiency:

  • symbol Insurers measure of relative riskiness of car from -3 (safe) to 3 (risky)
  • loss average insurance loss payment per insured vehicle per year.
  • hw.mpg Fuel consumption on highway US MPG.
  • city.mpg Fuel consumption in town as US MPG .
  • make Name of car maker.
  • fuel 2 level factor. gas or diesel.
  • aspir 2 level factor. std or turbo.
  • doors 2 level factor. two or four.
  • style Factor indicating style of car.
  • drive 3 level factor indicating front, rear or all wheel drive: fwd, rwd or 4wd.
  • eng.loc Engine location
  • wb wheel base in inches
  • length in inches
  • width in inches
  • height in inches
  • weight in pounds
  • eng.type Factor giving engine type
  • cylinders Factor for number of cylinders
  • eng.cc cubic capicity of engine in cubic inches.
  • fuel.sys fuel system
  • bore in inches
  • stroke in inches
  • comp.ratio compression ratio
  • hp horse power
  • rpm maximum RPM
  • price in US dollars

An initial model

  library(tidyverse)
  library(car)
  carData <- read_csv("data/mpg.csv")

  carData <- carData %>% select(-city.mpg, -loss)
  head(carData)
# A tibble: 6 x 24
  symbol make  fuel  aspir doors style drive eng.loc    wb length width
   <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>   <dbl>  <dbl> <dbl>
1      3 alfa~ gas   std   two   conv~ rwd   front    88.6   169.  64.1
2      3 alfa~ gas   std   two   conv~ rwd   front    88.6   169.  64.1
3      1 alfa~ gas   std   two   hatc~ rwd   front    94.5   171.  65.5
4      2 audi  gas   std   four  sedan fwd   front    99.8   177.  66.2
5      2 audi  gas   std   four  sedan 4wd   front    99.4   177.  66.4
6      2 audi  gas   std   two   sedan fwd   front    99.8   177.  66.3
# ... with 13 more variables: height <dbl>, weight <dbl>, eng.type <chr>,
#   cylinders <chr>, eng.cc <dbl>, fuel.sys <chr>, bore <dbl>,
#   stroke <dbl>, comp.ratio <dbl>, hp <dbl>, rpm <dbl>, hw.mpg <dbl>,
#   price <dbl>

An initial model

  carData <- na.omit(carData)

  carModel <- lm(hw.mpg ~ ., data = carData) 

  summary(carModel)

Call:
lm(formula = hw.mpg ~ ., data = carData)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.4665 -1.0138  0.0000  0.8741 10.0551 

Coefficients: (3 not defined because of singularities)
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)       43.6843785 24.2290090   1.803 0.073590 .  
symbol             0.1353545  0.3650639   0.371 0.711382    
makeaudi           1.8115468  3.2796769   0.552 0.581605    
makebmw            4.1339222  3.1813149   1.299 0.195976    
makechevrolet      9.7704927  2.9539487   3.308 0.001202 ** 
makedodge          5.4576878  2.5950907   2.103 0.037287 *  
makehonda          1.0326973  3.0152128   0.342 0.732503    
makeisuzu          0.8437792  3.3642575   0.251 0.802339    
makejaguar         7.3937479  3.7612187   1.966 0.051346 .  
makemazda          3.1229455  2.3196991   1.346 0.180438    
makemercedes-benz  2.9225176  3.4091267   0.857 0.392797    
makemercury        4.5979902  3.9959076   1.151 0.251870    
makemitsubishi     5.2088118  2.6149771   1.992 0.048368 *  
makenissan         4.9401008  2.2794148   2.167 0.031943 *  
makepeugot        -4.4386591  6.1719266  -0.719 0.473262    
makeplymouth       5.8813652  2.5417841   2.314 0.022163 *  
makeporsche       -0.4360866  4.0568461  -0.107 0.914554    
makesaab           1.9324417  2.9519237   0.655 0.513798    
makesubaru         1.3023219  2.6272432   0.496 0.620901    
maketoyota         4.3685821  2.1244222   2.056 0.041646 *  
makevolkswagen     3.6078869  2.5307096   1.426 0.156246    
makevolvo          5.9318178  2.9132783   2.036 0.043664 *  
fuelgas           16.6191311  9.1883955   1.809 0.072689 .  
aspirturbo        -2.7709378  1.1168920  -2.481 0.014315 *  
doorstwo          -0.2250915  0.6880396  -0.327 0.744055    
stylehardtop       1.7103202  1.6374963   1.044 0.298106    
stylehatchback     2.9727311  1.5375638   1.933 0.055249 .  
stylesedan         3.4966761  1.6543631   2.114 0.036361 *  
stylewagon         3.6930945  1.7811426   2.073 0.040004 *  
drivefwd           2.0511821  1.2586652   1.630 0.105473    
driverwd           1.4852133  1.6848122   0.882 0.379575    
eng.locrear        1.6195561  3.8399158   0.422 0.673855    
wb                -0.2885159  0.1292921  -2.232 0.027272 *  
length            -0.0468275  0.0715338  -0.655 0.513810    
width              0.3968397  0.3179012   1.248 0.214046    
height             0.0119531  0.2025530   0.059 0.953028    
weight            -0.0070225  0.0023340  -3.009 0.003122 ** 
eng.typel          8.8880828  5.7047931   1.558 0.121540    
eng.typeohc       -1.0931503  1.6389832  -0.667 0.505914    
eng.typeohcf              NA         NA      NA       NA    
eng.typeohcv      -0.3687564  1.7155451  -0.215 0.830126    
cylindersfive      2.4649711  3.9672687   0.621 0.535417    
cylindersfour      5.0305364  4.8358107   1.040 0.300048    
cylinderssix       0.6094793  3.6838421   0.165 0.868836    
cylindersthree            NA         NA      NA       NA    
cylinderstwelve   -9.1650619  7.1394520  -1.284 0.201407    
eng.cc            -0.0134897  0.0364468  -0.370 0.711864    
fuel.sys2bbl      -4.5292855  2.0180164  -2.244 0.026409 *  
fuel.sysidi               NA         NA      NA       NA    
fuel.sysmfi       -6.9314111  3.6450686  -1.902 0.059325 .  
fuel.sysmpfi      -5.0116028  2.1110270  -2.374 0.018983 *  
fuel.sysspdi      -5.5823819  2.4816961  -2.249 0.026081 *  
fuel.sysspfi      -2.9327732  4.2016856  -0.698 0.486361    
bore              -2.1766848  2.5285287  -0.861 0.390825    
stroke             0.1328865  1.3648168   0.097 0.922578    
comp.ratio         1.4214909  0.6746679   2.107 0.036945 *  
hp                 0.0011179  0.0340103   0.033 0.973826    
rpm               -0.0032222  0.0008884  -3.627 0.000404 ***
price              0.0002182  0.0001158   1.885 0.061567 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.384 on 137 degrees of freedom
Multiple R-squared:  0.9128,    Adjusted R-squared:  0.8777 
F-statistic: 26.06 on 55 and 137 DF,  p-value: < 2.2e-16

An initial model


Call:
lm(formula = hw.mpg ~ ., data = carData)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.4665 -1.0138  0.0000  0.8741 10.0551 

Coefficients: (3 not defined because of singularities)
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)       43.6843785 24.2290090   1.803 0.073590 .  
symbol             0.1353545  0.3650639   0.371 0.711382    
makeaudi           1.8115468  3.2796769   0.552 0.581605    
makebmw            4.1339222  3.1813149   1.299 0.195976    
makechevrolet      9.7704927  2.9539487   3.308 0.001202 ** 
makedodge          5.4576878  2.5950907   2.103 0.037287 *  
makehonda          1.0326973  3.0152128   0.342 0.732503    
makeisuzu          0.8437792  3.3642575   0.251 0.802339    
makejaguar         7.3937479  3.7612187   1.966 0.051346 .  
makemazda          3.1229455  2.3196991   1.346 0.180438    
makemercedes-benz  2.9225176  3.4091267   0.857 0.392797    
makemercury        4.5979902  3.9959076   1.151 0.251870    
makemitsubishi     5.2088118  2.6149771   1.992 0.048368 *  
makenissan         4.9401008  2.2794148   2.167 0.031943 *  
makepeugot        -4.4386591  6.1719266  -0.719 0.473262    
makeplymouth       5.8813652  2.5417841   2.314 0.022163 *  
makeporsche       -0.4360866  4.0568461  -0.107 0.914554    
makesaab           1.9324417  2.9519237   0.655 0.513798    
makesubaru         1.3023219  2.6272432   0.496 0.620901    
maketoyota         4.3685821  2.1244222   2.056 0.041646 *  
makevolkswagen     3.6078869  2.5307096   1.426 0.156246    
makevolvo          5.9318178  2.9132783   2.036 0.043664 *  
fuelgas           16.6191311  9.1883955   1.809 0.072689 .  
aspirturbo        -2.7709378  1.1168920  -2.481 0.014315 *  
doorstwo          -0.2250915  0.6880396  -0.327 0.744055    
stylehardtop       1.7103202  1.6374963   1.044 0.298106    
stylehatchback     2.9727311  1.5375638   1.933 0.055249 .  
stylesedan         3.4966761  1.6543631   2.114 0.036361 *  
stylewagon         3.6930945  1.7811426   2.073 0.040004 *  
drivefwd           2.0511821  1.2586652   1.630 0.105473    
driverwd           1.4852133  1.6848122   0.882 0.379575    
eng.locrear        1.6195561  3.8399158   0.422 0.673855    
wb                -0.2885159  0.1292921  -2.232 0.027272 *  
length            -0.0468275  0.0715338  -0.655 0.513810    
width              0.3968397  0.3179012   1.248 0.214046    
height             0.0119531  0.2025530   0.059 0.953028    
weight            -0.0070225  0.0023340  -3.009 0.003122 ** 
eng.typel          8.8880828  5.7047931   1.558 0.121540    
eng.typeohc       -1.0931503  1.6389832  -0.667 0.505914    
eng.typeohcf              NA         NA      NA       NA    
eng.typeohcv      -0.3687564  1.7155451  -0.215 0.830126    
cylindersfive      2.4649711  3.9672687   0.621 0.535417    
cylindersfour      5.0305364  4.8358107   1.040 0.300048    
cylinderssix       0.6094793  3.6838421   0.165 0.868836    
cylindersthree            NA         NA      NA       NA    
cylinderstwelve   -9.1650619  7.1394520  -1.284 0.201407    
eng.cc            -0.0134897  0.0364468  -0.370 0.711864    
fuel.sys2bbl      -4.5292855  2.0180164  -2.244 0.026409 *  
fuel.sysidi               NA         NA      NA       NA    
fuel.sysmfi       -6.9314111  3.6450686  -1.902 0.059325 .  
fuel.sysmpfi      -5.0116028  2.1110270  -2.374 0.018983 *  
fuel.sysspdi      -5.5823819  2.4816961  -2.249 0.026081 *  
fuel.sysspfi      -2.9327732  4.2016856  -0.698 0.486361    
bore              -2.1766848  2.5285287  -0.861 0.390825    
stroke             0.1328865  1.3648168   0.097 0.922578    
comp.ratio         1.4214909  0.6746679   2.107 0.036945 *  
hp                 0.0011179  0.0340103   0.033 0.973826    
rpm               -0.0032222  0.0008884  -3.627 0.000404 ***
price              0.0002182  0.0001158   1.885 0.061567 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.384 on 137 degrees of freedom
Multiple R-squared:  0.9128,    Adjusted R-squared:  0.8777 
F-statistic: 26.06 on 55 and 137 DF,  p-value: < 2.2e-16

An initial model

  Anova(carModel)
Anova Table (Type II tests)

Response: hw.mpg
           Sum Sq  Df F value    Pr(>F)    
symbol       0.78   1  0.1375 0.7113821    
make       164.68  18  1.6104 0.0653575 .  
fuel                0                      
aspir       34.97   1  6.1550 0.0143145 *  
doors        0.61   1  0.1070 0.7440545    
style       28.41   4  1.2504 0.2926659    
drive       16.63   2  1.4636 0.2349912    
eng.loc             0                      
wb          28.29   1  4.9796 0.0272724 *  
length       2.43   1  0.4285 0.5138097    
width        8.85   1  1.5583 0.2140462    
height       0.02   1  0.0035 0.9530282    
weight      51.43   1  9.0529 0.0031223 ** 
eng.type     2.56   2  0.2251 0.7987346    
cylinders   37.28   4  1.6404 0.1676106    
eng.cc       0.78   1  0.1370 0.7118644    
fuel.sys    37.21   5  1.3100 0.2633417    
bore         4.21   1  0.7411 0.3908246    
stroke       0.05   1  0.0095 0.9225782    
comp.ratio  25.22   1  4.4392 0.0369448 *  
hp           0.01   1  0.0011 0.9738258    
rpm         74.73   1 13.1548 0.0004036 ***
price       20.18   1  3.5527 0.0615671 .  
Residuals  778.31 137                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

An initial model

  carModel <- step(carModel)
Start:  AIC=381.13
hw.mpg ~ symbol + make + fuel + aspir + doors + style + drive + 
    eng.loc + wb + length + width + height + weight + eng.type + 
    cylinders + eng.cc + fuel.sys + bore + stroke + comp.ratio + 
    hp + rpm + price


Step:  AIC=381.13
hw.mpg ~ symbol + make + fuel + aspir + doors + style + drive + 
    wb + length + width + height + weight + eng.type + cylinders + 
    eng.cc + fuel.sys + bore + stroke + comp.ratio + hp + rpm + 
    price


Step:  AIC=381.13
hw.mpg ~ symbol + make + aspir + doors + style + drive + wb + 
    length + width + height + weight + eng.type + cylinders + 
    eng.cc + fuel.sys + bore + stroke + comp.ratio + hp + rpm + 
    price

             Df Sum of Sq    RSS    AIC
- eng.type    3     3.762 782.07 376.06
- hp          1     0.006 778.32 379.13
- height      1     0.020 778.33 379.13
- stroke      1     0.054 778.37 379.14
- doors       1     0.608 778.92 379.28
- eng.cc      1     0.778 779.09 379.32
- symbol      1     0.781 779.09 379.32
- length      1     2.435 780.75 379.73
- style       4    28.415 806.73 380.05
- bore        1     4.210 782.52 380.17
- make       19   167.246 945.56 380.69
<none>                    778.31 381.13
- drive       2    16.630 794.94 381.21
- width       1     8.853 787.17 381.31
- fuel.sys    6    53.772 832.08 382.02
- cylinders   4    37.276 815.59 382.16
- price       1    20.183 798.50 384.07
- comp.ratio  1    25.220 803.53 385.28
- wb          1    28.290 806.60 386.02
- aspir       1    34.968 813.28 387.61
- weight      1    51.431 829.74 391.48
- rpm         1    74.734 853.05 396.82

Step:  AIC=376.06
hw.mpg ~ symbol + make + aspir + doors + style + drive + wb + 
    length + width + height + weight + cylinders + eng.cc + fuel.sys + 
    bore + stroke + comp.ratio + hp + rpm + price

             Df Sum of Sq    RSS    AIC
- stroke      1     0.092 782.17 374.08
- height      1     0.285 782.36 374.13
- doors       1     0.531 782.61 374.19
- hp          1     0.867 782.94 374.27
- symbol      1     0.924 783.00 374.29
- length      1     1.294 783.37 374.38
- eng.cc      1     2.057 784.13 374.56
- bore        1     2.162 784.24 374.59
- style       4    28.467 810.54 374.96
- width       1     7.159 789.23 375.82
- drive       2    15.582 797.66 375.86
<none>                    782.07 376.06
- make       20   180.637 962.71 376.16
- cylinders   5    42.096 824.17 376.18
- fuel.sys    6    59.316 841.39 378.17
- price       1    27.135 809.21 380.64
- wb          1    31.948 814.02 381.78
- comp.ratio  1    32.891 814.97 382.01
- aspir       1    36.099 818.17 382.77
- weight      1    74.887 856.96 391.71
- rpm         1    78.550 860.63 392.53

Step:  AIC=374.08
hw.mpg ~ symbol + make + aspir + doors + style + drive + wb + 
    length + width + height + weight + cylinders + eng.cc + fuel.sys + 
    bore + comp.ratio + hp + rpm + price

             Df Sum of Sq    RSS    AIC
- height      1     0.257 782.42 372.14
- doors       1     0.550 782.72 372.22
- hp          1     0.819 782.99 372.28
- symbol      1     0.969 783.14 372.32
- length      1     1.334 783.50 372.41
- eng.cc      1     2.228 784.39 372.63
- bore        1     2.362 784.53 372.66
- style       4    28.425 810.59 372.97
- width       1     7.074 789.24 373.82
- drive       2    15.640 797.81 373.90
<none>                    782.17 374.08
- cylinders   5    43.631 825.80 374.56
- make       20   185.143 967.31 375.08
- fuel.sys    6    59.958 842.12 376.33
- price       1    27.641 809.81 378.78
- wb          1    31.863 814.03 379.79
- comp.ratio  1    32.875 815.04 380.03
- aspir       1    36.419 818.59 380.86
- weight      1    75.434 857.60 389.85
- rpm         1    79.671 861.84 390.80

Step:  AIC=372.14
hw.mpg ~ symbol + make + aspir + doors + style + drive + wb + 
    length + width + weight + cylinders + eng.cc + fuel.sys + 
    bore + comp.ratio + hp + rpm + price

             Df Sum of Sq    RSS    AIC
- doors       1     0.495 782.92 370.27
- symbol      1     0.823 783.25 370.35
- hp          1     0.868 783.29 370.36
- length      1     1.709 784.13 370.56
- eng.cc      1     2.112 784.54 370.66
- bore        1     2.651 785.07 370.80
- style       4    28.637 811.06 371.08
- width       1     6.833 789.26 371.82
- drive       2    15.415 797.84 371.91
<none>                    782.42 372.14
- cylinders   5    46.765 829.19 373.35
- make       20   186.325 968.75 373.37
- fuel.sys    6    61.368 843.79 374.72
- price       1    27.600 810.02 376.83
- comp.ratio  1    32.619 815.04 378.03
- aspir       1    36.648 819.07 378.98
- wb          1    37.874 820.30 379.27
- weight      1    77.838 860.26 388.45
- rpm         1    79.474 861.90 388.81

Step:  AIC=370.27
hw.mpg ~ symbol + make + aspir + style + drive + wb + length + 
    width + weight + cylinders + eng.cc + fuel.sys + bore + comp.ratio + 
    hp + rpm + price

             Df Sum of Sq    RSS    AIC
- symbol      1     0.529 783.45 368.40
- hp          1     0.901 783.82 368.49
- length      1     1.642 784.56 368.67
- eng.cc      1     2.329 785.25 368.84
- bore        1     2.439 785.36 368.87
- width       1     7.098 790.02 370.01
- drive       2    15.958 798.88 370.16
- style       4    33.083 816.00 370.25
<none>                    782.92 370.27
- make       20   187.168 970.09 371.64
- cylinders   5    48.406 831.33 371.84
- fuel.sys    6    64.333 847.25 373.51
- price       1    28.027 810.95 375.05
- comp.ratio  1    33.361 816.28 376.32
- aspir       1    36.871 819.79 377.15
- wb          1    38.619 821.54 377.56
- weight      1    77.343 860.26 386.45
- rpm         1    81.209 864.13 387.31

Step:  AIC=368.4
hw.mpg ~ make + aspir + style + drive + wb + length + width + 
    weight + cylinders + eng.cc + fuel.sys + bore + comp.ratio + 
    hp + rpm + price

             Df Sum of Sq    RSS    AIC
- hp          1     0.751 784.20 366.58
- length      1     1.797 785.25 366.84
- eng.cc      1     1.928 785.38 366.87
- bore        1     2.537 785.98 367.02
- drive       2    15.446 798.89 368.16
- width       1     7.451 790.90 368.22
<none>                    783.45 368.40
- style       4    34.356 817.80 368.68
- make       20   189.343 972.79 370.17
- cylinders   5    54.255 837.70 371.32
- fuel.sys    6    63.904 847.35 371.53
- price       1    27.515 810.96 373.06
- comp.ratio  1    34.731 818.18 374.77
- aspir       1    36.681 820.13 375.23
- wb          1    45.884 829.33 377.38
- weight      1    77.828 861.28 384.67
- rpm         1    80.776 864.22 385.33

Step:  AIC=366.58
hw.mpg ~ make + aspir + style + drive + wb + length + width + 
    weight + cylinders + eng.cc + fuel.sys + bore + comp.ratio + 
    rpm + price

             Df Sum of Sq    RSS    AIC
- eng.cc      1     1.272 785.47 364.89
- length      1     1.614 785.81 364.98
- bore        1     2.430 786.63 365.18
- width       1     7.510 791.71 366.42
- drive       2    16.325 800.52 366.56
<none>                    784.20 366.58
- style       4    35.201 819.40 367.06
- make       20   195.636 979.83 369.57
- fuel.sys    6    66.261 850.46 370.24
- cylinders   5    58.622 842.82 370.49
- price       1    28.657 812.86 371.51
- comp.ratio  1    35.630 819.83 373.16
- aspir       1    47.958 832.16 376.04
- wb          1    48.878 833.08 376.25
- weight      1    77.245 861.44 382.71
- rpm         1    99.983 884.18 387.74

Step:  AIC=364.89
hw.mpg ~ make + aspir + style + drive + wb + length + width + 
    weight + cylinders + fuel.sys + bore + comp.ratio + rpm + 
    price

             Df Sum of Sq    RSS    AIC
- length      1     2.026 787.50 363.39
- drive       2    15.669 801.14 364.71
- width       1     7.805 793.28 364.80
<none>                    785.47 364.89
- bore        1     9.426 794.90 365.20
- style       4    37.173 822.64 365.82
- make       20   200.231 985.70 368.72
- fuel.sys    6    70.758 856.23 369.54
- price       1    27.871 813.34 369.62
- comp.ratio  1    40.411 825.88 372.58
- aspir       1    47.930 833.40 374.33
- wb          1    48.959 834.43 374.56
- weight      1    84.098 869.57 382.52
- cylinders   5   130.658 916.13 384.59
- rpm         1   103.077 888.55 386.69

Step:  AIC=363.39
hw.mpg ~ make + aspir + style + drive + wb + width + weight + 
    cylinders + fuel.sys + bore + comp.ratio + rpm + price

             Df Sum of Sq     RSS    AIC
- drive       2    14.149  801.65 362.83
- width       1     6.556  794.05 362.99
<none>                     787.50 363.39
- bore        1     9.976  797.47 363.82
- style       4    35.191  822.69 363.83
- price       1    28.758  816.26 368.31
- fuel.sys    6    72.928  860.43 368.48
- comp.ratio  1    40.028  827.53 370.96
- aspir       1    46.992  834.49 372.58
- make       20   249.001 1036.50 376.42
- wb          1    82.661  870.16 380.66
- cylinders   5   129.180  916.68 382.71
- rpm         1   108.872  896.37 386.38
- weight      1   120.268  907.77 388.82

Step:  AIC=362.83
hw.mpg ~ make + aspir + style + wb + width + weight + cylinders + 
    fuel.sys + bore + comp.ratio + rpm + price

             Df Sum of Sq     RSS    AIC
- bore        1     7.251  808.90 362.57
<none>                     801.65 362.83
- style       4    34.034  835.68 362.85
- width       1    13.043  814.69 363.94
- fuel.sys    6    68.476  870.12 366.65
- price       1    28.881  830.53 367.66
- comp.ratio  1    44.355  846.00 371.22
- aspir       1    48.226  849.87 372.10
- wb          1    74.163  875.81 377.90
- make       20   273.428 1075.07 379.47
- cylinders   5   122.658  924.30 380.31
- rpm         1   116.552  918.20 387.03
- weight      1   192.270  993.92 402.32

Step:  AIC=362.57
hw.mpg ~ make + aspir + style + wb + width + weight + cylinders + 
    fuel.sys + comp.ratio + rpm + price

             Df Sum of Sq     RSS    AIC
<none>                     808.90 362.57
- width       1     12.59  821.49 363.55
- style       4     45.98  854.87 365.23
- fuel.sys    6     72.29  881.19 367.09
- price       1     28.25  837.14 367.19
- aspir       1     43.89  852.79 370.76
- comp.ratio  1     49.01  857.90 371.92
- wb          1     76.67  885.57 378.04
- cylinders   5    115.73  924.63 378.37
- rpm         1    112.51  921.40 385.70
- make       20    333.52 1142.42 389.20
- weight      1    291.55 1100.45 419.97

An initial model

  Anova(carModel)
Anova Table (Type II tests)

Response: hw.mpg
           Sum Sq  Df F value    Pr(>F)    
make       333.52  20  3.0924 4.448e-05 ***
aspir       43.89   1  8.1395 0.0049432 ** 
style       45.98   4  2.1314 0.0796671 .  
wb          76.67   1 14.2180 0.0002337 ***
width       12.59   1  2.3345 0.1286416    
weight     291.55   1 54.0649 1.170e-11 ***
cylinders  115.73   5  4.2922 0.0011189 ** 
fuel.sys    72.29   6  2.2343 0.0428853 *  
comp.ratio  49.01   1  9.0878 0.0030224 ** 
rpm        112.51   1 20.8627 1.022e-05 ***
price       28.25   1  5.2378 0.0234978 *  
Residuals  808.90 150                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model diagnostics - error distribution

Once all the signal is removed, only iid Normal stuff should remain.

Shape

  • Check shape of errors by the distribution of residuals (\( y_i-\hat{y}_i \))
  • Use QQ-Norm plots and tests of Normality (e.g. Shapiro-Wilks, \( H_0 \) the data (errors) are Normally distributed)

Model diagnostics - error shape

  qqnorm(resid(carModel))
  qqline(resid(carModel))

plot of chunk unnamed-chunk-7

  shapiro.test(resid(carModel))

    Shapiro-Wilk normality test

data:  resid(carModel)
W = 0.92131, p-value = 1.166e-08

Model diagnostics - error shape

We can try to track down the extreme residuals

  hist(resid(carModel))

plot of chunk unnamed-chunk-10

Model diagnostics - error shape

We can try to track down the extreme residuals

  bigResid <- which(abs(resid(carModel))>5)
  carData[bigResid,]
# A tibble: 7 x 24
  symbol make  fuel  aspir doors style drive eng.loc    wb length width
   <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>   <dbl>  <dbl> <dbl>
1      2 honda gas   std   two   hatc~ fwd   front    86.6   145.  63.9
2      1 mazda gas   std   two   hatc~ fwd   front    93.1   159.  64.2
3      1 niss~ dies~ std   two   sedan fwd   front    94.5   165.  63.8
4      0 suba~ gas   std   four  sedan 4wd   front    97     172   65.4
5      0 toyo~ dies~ std   four  sedan fwd   front    95.7   166.  64.4
6      0 toyo~ gas   std   four  sedan fwd   front    95.7   166.  64.4
7     -1 toyo~ dies~ turbo four  sedan fwd   front   102.    176.  66.5
# ... with 13 more variables: height <dbl>, weight <dbl>, eng.type <chr>,
#   cylinders <chr>, eng.cc <dbl>, fuel.sys <chr>, bore <dbl>,
#   stroke <dbl>, comp.ratio <dbl>, hp <dbl>, rpm <dbl>, hw.mpg <dbl>,
#   price <dbl>

Model diagnostics - error distribution

Once all the signal is removed, only iid Normal stuff should remain.

Spread

  • Check variance of residuals (variance of the error distribution) is constant WRT \( \hat{y} \) and the x
  • Plots residuals against \( \hat{y} \) and the x
  • Test with Breusch-Pagan (\( H_0 \) the errors are homoscedastic/have constant variance)

Model diagnostics - error spread

  carResid <- resid(carModel)

  plot(fitted(carModel), carResid, ylab = 'residuals', xlab = 'Fitted values')

plot of chunk unnamed-chunk-12

  plot(carData$hp, carResid, ylab = 'residuals', xlab = 'Horse power')

plot of chunk unnamed-chunk-13

Model diagnostics - error spread

  • Testing using the Breusch-Pagan test (ncvTest - non-constant variance test).
  • Without arguments beyond the model, the test is WRT the fitted values
  ncvTest(carModel)
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 43.44292, Df = 1, p = 4.3651e-11

We can be explicit about which covariate to test against:

   ncvTest(carModel, ~hp)
Non-constant Variance Score Test 
Variance formula: ~ hp 
Chisquare = 35.76066, Df = 1, p = 2.2311e-09

Model diagnostics - error distribution

Once all the signal is removed, only iid Normal stuff should remain.

Independence

  • Check serial correlation of residuals, when ordered in some logical way e.g. order of collection, \( \hat{y} \), or an x
  • Plots residuals against \( \hat{y} \) and the x
  • Test with Durbin-Watson (\( H_0 \) the errors are uncorrelated)

Model diagnostics - error independence

   plot(carData$width, carResid, ylab = 'residuals', xlab = 'car width')

plot of chunk unnamed-chunk-16

  durbinWatsonTest(carModel)
 lag Autocorrelation D-W Statistic p-value
   1     -0.07326368      2.146045   0.454
 Alternative hypothesis: rho != 0

Model diagnostics - default plots

   plot(carModel, which = 1:2)

plot of chunk unnamed-chunk-18plot of chunk unnamed-chunk-18

Model diagnostics - error distribution

While it looks like a lot of tests for different things, it is simply one assumption:

the errors are iid Normal

Model diagnostics - model for signal

If we have modelled all the signal, only iid Normal stuff should remain. So, it follows that problems in the residuals may be linked to a poor model for the signal

Signal

  • Check pattern of residuals, when ordered in some logical way e.g. order of collection, \( \hat{y} \), or an x
  • Patterns in the residuals may indicate the model for signal has the incorrect complexity - e.g. maybe curves are warranted where lines were used?

Model diagnostics - collinearity

This is a little different - it is not strictly signal or noise.

  • Ideally the covariates are independent of one another
  • This would give good coverage of our covariate space and good support for the model
  • Strong dependencies between covariates lead to unstable parameter estimates

Model diagnostics - collinearity

We can examine this using Variance Inflation Factors (VIFs).

  • These measure the instability of parameter estimates due to dependencies between covariates
  • They are non-negative and a rule of thumb is that >10 is problematic
  • If we have this, we should consider removing one of the offending covariates from the model
  • (or use something fancier, like ridge regresson or principal components regression)

Model diagnostics - collinearity

  library(GGally)
  numericOnly <- carData %>% select_if(is.numeric)

  ggpairs(numericOnly)

plot of chunk unnamed-chunk-21

Model diagnostics - collinearity

  vif(carModel)
                   GVIF Df GVIF^(1/(2*Df))
make       3.095147e+05 20        1.371725
aspir      3.301658e+00  1        1.817046
style      6.468664e+00  4        1.262850
wb         9.620137e+00  1        3.101635
width      1.140642e+01  1        3.377339
weight     1.870658e+01  1        4.325111
cylinders  2.600441e+02  5        1.743832
fuel.sys   2.906645e+04  6        2.354775
comp.ratio 1.696585e+02  1       13.025303
rpm        3.038367e+00  1        1.743091
price      2.055915e+01  1        4.534220

Model diagnostics - collinearity

  alteredModel <- update(carModel, .~.-comp.ratio)
  vif(alteredModel)
                  GVIF Df GVIF^(1/(2*Df))
make      1.873647e+05 20        1.354619
aspir     2.767284e+00  1        1.663516
style     6.267025e+00  4        1.257861
wb        9.606971e+00  1        3.099511
width     1.107595e+01  1        3.328055
weight    1.803180e+01  1        4.246387
cylinders 1.688878e+02  5        1.670167
fuel.sys  3.578376e+02  6        1.632327
rpm       2.936689e+00  1        1.713677
price     1.951306e+01  1        4.417359

Model diagnostics - collinearity

  qqnorm(resid(alteredModel))

plot of chunk unnamed-chunk-24

  shapiro.test(resid(alteredModel))

    Shapiro-Wilk normality test

data:  resid(alteredModel)
W = 0.93355, p-value = 1.011e-07
  ncvTest(alteredModel)
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 39.0538, Df = 1, p = 4.1229e-10

Model diagnostics - collinearity

  summary(alteredModel)

Call:
lm(formula = hw.mpg ~ make + aspir + style + wb + width + weight + 
    cylinders + fuel.sys + rpm + price, data = carData)

Residuals:
   Min     1Q Median     3Q    Max 
-7.883 -1.177 -0.063  1.184  9.715 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        5.839e+01  1.565e+01   3.730 0.000270 ***
makeaudi           2.376e+00  2.522e+00   0.942 0.347648    
makebmw            2.131e+00  2.050e+00   1.039 0.300434    
makechevrolet      9.216e+00  2.452e+00   3.759 0.000244 ***
makedodge          4.617e+00  1.983e+00   2.328 0.021213 *  
makehonda          5.195e-01  2.435e+00   0.213 0.831323    
makeisuzu         -8.328e-01  3.033e+00  -0.275 0.783987    
makejaguar         8.003e+00  2.772e+00   2.887 0.004454 ** 
makemazda          1.603e+00  1.860e+00   0.862 0.390234    
makemercedes-benz  4.533e+00  3.020e+00   1.501 0.135519    
makemercury        3.178e+00  3.062e+00   1.038 0.301020    
makemitsubishi     4.185e+00  1.990e+00   2.103 0.037094 *  
makenissan         3.646e+00  1.770e+00   2.060 0.041094 *  
makepeugot         4.577e+00  2.159e+00   2.120 0.035638 *  
makeplymouth       5.075e+00  2.019e+00   2.513 0.013020 *  
makeporsche       -1.999e-01  2.367e+00  -0.084 0.932797    
makesaab           3.041e+00  1.977e+00   1.539 0.125989    
makesubaru        -1.801e-02  1.827e+00  -0.010 0.992149    
maketoyota         3.686e+00  1.704e+00   2.163 0.032111 *  
makevolkswagen     3.461e+00  1.817e+00   1.905 0.058657 .  
makevolvo          5.219e+00  2.003e+00   2.605 0.010097 *  
aspirturbo        -3.205e+00  7.408e-01  -4.327 2.74e-05 ***
stylehardtop       8.437e-01  1.484e+00   0.569 0.570486    
stylehatchback     2.137e+00  1.397e+00   1.529 0.128264    
stylesedan         2.666e+00  1.421e+00   1.876 0.062632 .  
stylewagon         3.188e+00  1.536e+00   2.076 0.039631 *  
wb                -3.092e-01  8.666e-02  -3.567 0.000484 ***
width              5.404e-01  2.678e-01   2.018 0.045371 *  
weight            -1.091e-02  1.387e-03  -7.864 6.56e-13 ***
cylindersfive      1.601e+00  2.155e+00   0.743 0.458758    
cylindersfour      5.246e+00  2.912e+00   1.802 0.073603 .  
cylinderssix       3.360e+00  2.864e+00   1.173 0.242464    
cylindersthree     1.045e+01  4.218e+00   2.478 0.014329 *  
cylinderstwelve   -3.537e+00  4.179e+00  -0.846 0.398681    
fuel.sys2bbl      -4.231e+00  1.906e+00  -2.219 0.027949 *  
fuel.sysidi        3.566e+00  2.129e+00   1.675 0.096010 .  
fuel.sysmfi       -8.147e+00  3.342e+00  -2.437 0.015952 *  
fuel.sysmpfi      -4.979e+00  1.959e+00  -2.542 0.012026 *  
fuel.sysspdi      -6.582e+00  2.302e+00  -2.860 0.004844 ** 
fuel.sysspfi      -8.275e-01  3.960e+00  -0.209 0.834738    
rpm               -2.503e-03  6.290e-04  -3.980 0.000107 ***
price              1.511e-04  9.394e-05   1.609 0.109781    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.384 on 151 degrees of freedom
Multiple R-squared:  0.9038,    Adjusted R-squared:  0.8777 
F-statistic: 34.62 on 41 and 151 DF,  p-value: < 2.2e-16

Model diagnostics - collinearity

  Anova(alteredModel)
Anova Table (Type II tests)

Response: hw.mpg
          Sum Sq  Df F value    Pr(>F)    
make      354.26  20  3.1177 3.856e-05 ***
aspir     106.36   1 18.7200 2.741e-05 ***
style      35.54   4  1.5641 0.1867782    
wb         72.30   1 12.7261 0.0004836 ***
width      23.13   1  4.0720 0.0453711 *  
weight    351.40   1 61.8503 6.555e-13 ***
cylinders  77.47   5  2.7272 0.0217222 *  
fuel.sys  622.63   6 18.2648 6.971e-16 ***
rpm        89.99   1 15.8391 0.0001068 ***
price      14.70   1  2.5878 0.1097809    
Residuals 857.90 151                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model diagnostics - collinearity

  confint(alteredModel)
                          2.5 %        97.5 %
(Intercept)        2.746083e+01 89.3169847817
makeaudi          -2.606924e+00  7.3587282908
makebmw           -1.920726e+00  6.1818500668
makechevrolet      4.371420e+00 14.0609259892
makedodge          6.992340e-01  8.5338880841
makehonda         -4.291007e+00  5.3300193726
makeisuzu         -6.824992e+00  5.1592988875
makejaguar         2.526847e+00 13.4790899822
makemazda         -2.072309e+00  5.2777851596
makemercedes-benz -1.434954e+00 10.5003940472
makemercury       -2.871971e+00  9.2271074253
makemitsubishi     2.537551e-01  8.1161413725
makenissan         1.494042e-01  7.1430080285
makepeugot         3.113616e-01  8.8418359891
makeplymouth       1.084842e+00  9.0644387695
makeporsche       -4.876275e+00  4.4764402764
makesaab          -8.641071e-01  6.9465280115
makesubaru        -3.627642e+00  3.5916310912
maketoyota         3.190514e-01  7.0520317932
makevolkswagen    -1.282779e-01  7.0505709494
makevolvo          1.261057e+00  9.1768666447
aspirturbo        -4.668568e+00 -1.7414069576
stylehardtop      -2.088202e+00  3.7756562640
stylehatchback    -6.238389e-01  4.8981490905
stylesedan        -1.423027e-01  5.4734466081
stylewagon         1.532166e-01  6.2227268622
wb                -4.803811e-01 -0.1379280095
width              1.127866e-02  1.0695021269
weight            -1.364723e-02 -0.0081668691
cylindersfive     -2.656893e+00  5.8580823389
cylindersfour     -5.072034e-01 10.9987135560
cylinderssix      -2.297719e+00  9.0186063776
cylindersthree     2.116376e+00 18.7826300635
cylinderstwelve   -1.179524e+01  4.7203963415
fuel.sys2bbl      -7.997001e+00 -0.4644003955
fuel.sysidi       -6.404693e-01  7.7724981337
fuel.sysmfi       -1.475087e+01 -1.5430860747
fuel.sysmpfi      -8.849510e+00 -1.1092831268
fuel.sysspdi      -1.112945e+01 -2.0340271154
fuel.sysspfi      -8.650982e+00  6.9959366661
rpm               -3.745829e-03 -0.0012604514
price             -3.448908e-05  0.0003367182

Model diagnostics - collinearity

  library(effects)

  carData <- carData %>% mutate(aspir = factor(aspir), style = factor(style), 
                                make = factor(make), fuel.sys = factor(fuel.sys), cylinders = factor(cylinders))
  alteredModel <- update(alteredModel, .~., data = carData)

  plot(effect(term = "style", mod = alteredModel))

plot of chunk unnamed-chunk-30

Model diagnostics - collinearity

  plot(effect(term = "rpm", mod = alteredModel))

plot of chunk unnamed-chunk-31

  plot(effect(term = "weight", mod = alteredModel))

plot of chunk unnamed-chunk-32

Recap and look-forwards

We've covered:

  • Various diagnostics for noise, signal and collinearity

Upcoming:

  • Computer intensive inference - bootstrapping and randomisation tests