13/1/2021

Statistical elimination

  • The inclusion of a third variable allows its influence to be eliminated, so we call the process statistical elimination
  • Although not strictly how it is done, the logic of statistical elimination can be understood by looking at residuals

Elimination by considering residuals

Elimination by considering residuals

Are the children who are taller than expected for their age also better than average for their age at maths?

But what order should I eliminate explanatory variables

  • This doesn’t matter if your data is balanced (equal number of observations for all possible level combinations) and orthogonal (Dr. Ott future lecture)
  • This leads to different ways of calculating sum of squares

Two types of sum of squares

  • Type I or sequential SS
    • Amount of variation explained by a variable when the preceding term in the model has been statistically eliminated
    • In R anova or aov
  • Type II or adjusted SS
    • Amount of variation explained by a variable when all other explanatory variables in the model has been statistically eliminated
    • In R Anova in the car package

Eliminating a third variable makes the second less informative

One explanatory variable

anova(lm(WGHT~ RLEG, data = legs))
## Analysis of Variance Table
## 
## Response: WGHT
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## RLEG       1 3627.7  3627.7  125.75 < 2.2e-16 ***
## Residuals 98 2827.1    28.8                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Eliminating a third variable makes the second less informative

Sequential SS (Type I)

anova(lm(WGHT~ RLEG+ LLEG, data = legs))
## Analysis of Variance Table
## 
## Response: WGHT
##           Df Sum Sq Mean Sq  F value Pr(>F)    
## RLEG       1 3627.7  3627.7 127.4416 <2e-16 ***
## LLEG       1   66.0    66.0   2.3173 0.1312    
## Residuals 97 2761.1    28.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Eliminating a third variable makes the second less informative

Adjusted SS (Type II)

library(car)
## Loading required package: carData
Anova(lm(WGHT~ RLEG+ LLEG, data = legs))
## Anova Table (Type II tests)
## 
## Response: WGHT
##            Sum Sq Df F value  Pr(>F)  
## RLEG        83.33  1  2.9275 0.09028 .
## LLEG        65.96  1  2.3173 0.13120  
## Residuals 2761.14 97                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Eliminating a third variable makes the second more informative

One explanatory variable

anova(lm(POETSAGE~ BYEAR, data = poets))
## Analysis of Variance Table
## 
## Response: POETSAGE
##           Df Sum Sq Mean Sq F value Pr(>F)
## BYEAR      1    1.2    1.16  0.0035 0.9541
## Residuals 10 3333.5  333.35

Eliminating a third variable makes the second more informative

Sequential SS (Type I)

anova(lm(POETSAGE~ BYEAR+ DYEAR, data = poets))
## Analysis of Variance Table
## 
## Response: POETSAGE
##           Df Sum Sq Mean Sq    F value    Pr(>F)    
## BYEAR      1    1.2     1.2     3.5657   0.09158 .  
## DYEAR      1 3330.6  3330.6 10222.2001 4.596e-15 ***
## Residuals  9    2.9     0.3                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Eliminating a third variable makes the second more informative

Adjusted SS (Type II)

library(car)
Anova(lm(POETSAGE~ BYEAR+ DYEAR, data = poets))
## Anova Table (Type II tests)
## 
## Response: POETSAGE
##           Sum Sq Df F value    Pr(>F)    
## BYEAR     3299.7  1   10127 4.793e-15 ***
## DYEAR     3330.6  1   10222 4.596e-15 ***
## Residuals    2.9  9                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1