W3 Practical Support

Author

Swidbert R. Ott

Scenario

Study aim: Does anxiety in lab mice depend on social environment and cage enrichment?

Study design: Mice from three litters were kept either in solitary confinement, in pairs, or in groups of five. The mice were either kept in bare cages, or had a running wheel and/or toys in their cage. After a set number of weeks, the mice were tested in a behavioural assay that produces an anxiety score.

  • anxiety score (fear): dependent variable.
  • litter (litter): candidate blocking variable: 1 = litter 1, 2 = litter 2, 3 = litter 3.
  • social environment (social): 1 = solitary, 2 = in pairs, 3 = in group of five.
  • cage enrichment (play): 1 = bare cage, 2 = running wheel, 3 = toys, 4 = wheel + toys.

Loading the data

Mice <- read.csv(file = "w3-classdemo.csv")

Setting up for the analysis

summary(Mice)
      fear           litter      social       play     
 Min.   :2.217   Min.   :1   Min.   :1   Min.   :1.00  
 1st Qu.:3.542   1st Qu.:1   1st Qu.:1   1st Qu.:1.75  
 Median :4.107   Median :2   Median :2   Median :2.50  
 Mean   :4.059   Mean   :2   Mean   :2   Mean   :2.50  
 3rd Qu.:4.547   3rd Qu.:3   3rd Qu.:3   3rd Qu.:3.25  
 Max.   :5.517   Max.   :3   Max.   :3   Max.   :4.00  

Aha. As expected, R has treated our predictors litter, social and play as continuous variables rather than categorical variables (factors), because it sees the values (1, 2, 3) as numbers, not as ‘factor levels’.

So, I first need to turn litter, social and play into factors (categorical variables):

# Simple way - repeat this for all factors:
Mice$litter <- factor(Mice$litter)

# Fancy way -- avoids having to write `Mice$` over and over:
Mice <- within(Mice,{
  litter <- factor(litter)
  social <- factor(social)
  play <- factor(play)
})

Next, I set up sum contrasts. This can be done in two ways:

either by changing the global default setting in R from treatment contrasts to sum contrasts,

# Change global default (Eamonn Mallon style)
options(contrasts = rep ("contr.sum", 2))

or by assigning sum contrasts to each factor variable in the data frame,

# Repeat this for all factors:
contrasts(Mice$litter) <- contr.sum

# or, more fancy way:
Mice <- within(Mice, {
  contrasts(litter) <- contr.sum
  contrasts(social) <- contr.sum
  contrasts(play) <- contr.sum
})

R has a large number of global settings – you can see them all with options(). This includes the default for what contrasts to use with factors. A factor can also be assigned its own contrasts, and these will override the global default. Think of each factor as having a little drawer or slot where you can put its own contrasts: so, a factor variable has slots for each of its data values, and an extra slot where it can store the contrasts. When you first declare a variable as a factor, that extra slot is empty – R doesn’t put a copy of the current default contrasts into that slot. Therefore, when you fit a model that includes a factor, R will use whatever contrasts are the default at the time you fit a model – unless you’ve ‘put something into the factor’s contrast slot’ by assigning it its own contrast setting. This is why we can declare litter a factor first, and later change the global default contrasts.

Assessing orthogonality

Using the anova function in base R

NoteCode annotations

Below, I’m using Quarto code annotations rather than R code comments. Below the code blocks, you’ll see a numbered list. If you click on the circled numbers ①, ②, ③ in the list below the code, it will highlight the code that the annotation refers to (clicking on the circled numbers in the code block won’t do anything).

1fit.1 <- lm(fear ~ litter + social + play, data = Mice)
2fit.1a <- update(fit.1, . ~ play + litter + social)

3anova(fit.1)
anova(fit.1a)
1
fitting the full model.
2
using update() to make a new model with predictors in different order. This is just an alternative to writing out a new model from scratch with lm(...).
3
print the ANOVA tables for both: recall that anova gives sequential SSQ.
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
litter     2 5.1923  2.5962 13.3558 8.455e-05 ***
social     2 6.6047  3.3023 16.9886 1.476e-05 ***
play       3 1.5403  0.5134  2.6413   0.06885 .  
Residuals 28 5.4428  0.1944                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
play       3 1.5403  0.5134  2.6413   0.06885 .  
litter     2 5.1923  2.5962 13.3558 8.455e-05 ***
social     2 6.6047  3.3023 16.9886 1.476e-05 ***
Residuals 28 5.4428  0.1944                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TASK: Print the ANOVA tables from the two versions of the model to obtain the sequential SSQs for both. Are they different? Write down in your notes what this means for the orthogonality of the data.

Comparing the ANOVA tables, I see that for each of the three predictors, the SSQ (Sum Sq) values are the same in fit.1 and fit.1a (and so are the mean squares, F- and p- values.). So my dataset is fully orthogonal.

Using Anova from car package

Note

If you have trouble remembering the package name car – it is short for Correlation And Regression, not automobiles.

1library(car)
2Anova(fit.1)
1
getting adjusted SSQ requires the Anova() function from package car, which isn’t loaded by default so we need to load it first.
2
print the adjusted SSQ ANOVA table, and compare to table from anova()
Anova Table (Type II tests)

Response: fear
          Sum Sq Df F value    Pr(>F)    
litter    5.1923  2 13.3558 8.455e-05 ***
social    6.6047  2 16.9886 1.476e-05 ***
play      1.5403  3  2.6413   0.06885 .  
Residuals 5.4428 28                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I see that the sequential and the adjusted SSQ (Sum Sq) values are the same (and hence they give the same mean squares, F- and p- values). So the conclusion from this is the same as above – my dataset is fully orthogonal.

Remember that even in a sequential SSQ table (“Type I ANOVA”), the SSQ of any predictor are adjusted for all predictors that come before it in the model formula (and are above it in the table). Only the SSQ for the first predictor in the model are ‘fully unadjusted.’ The SSQ for the second predictor are adjusted for the first predictor, but not for the later predictors, and so on. So when you compare the output from anova and Anova for the same model fit, the bottom row will always be the same, even if the dataset is not orthogonal.

Assessing the merit of blocking

I fit a new model that omits the blocking variable (litter) – I’m using update() but you could just as well write out a new lm() from scratch:

fit.2 <- update(fit.1, . ~ social + play)

TASK: In your notes, write down how dropping [the blocking variable] has changed the model results.

  • How have the residual SSQ changed, and why?
  • How have the residual DoF changed, and why?
  • What about the adjusted SSQ and the \(F\)-ratios for [the predictors of interest]?
  • What happened to the \(P\)-values for [the predictors of interest]?

I print the ANOVA tables for the full model and for the model that omits the blocking variable.

anova(fit.1)  # original model from earlier
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
litter     2 5.1923  2.5962 13.3558 8.455e-05 ***
social     2 6.6047  3.3023 16.9886 1.476e-05 ***
play       3 1.5403  0.5134  2.6413   0.06885 .  
Residuals 28 5.4428  0.1944                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit.2)  # reduced model without `litter`
Analysis of Variance Table

Response: fear
          Df  Sum Sq Mean Sq F value    Pr(>F)    
social     2  6.6047  3.3023  9.3154 0.0007131 ***
play       3  1.5403  0.5134  1.4483 0.2484202    
Residuals 30 10.6352  0.3545                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To assess the effect of dropping the blocking variable on the standard errors, we need to look at the model summaries, not the ANOVA tables.

summary(fit.1)

Call:
lm(formula = fear ~ litter + social + play, data = Mice)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.8682 -0.1909  0.1149  0.2218  0.8132 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.05852    0.07348  55.231  < 2e-16 ***
litter1      0.15804    0.10392   1.521  0.13951    
litter2      0.36551    0.10392   3.517  0.00151 ** 
social1     -0.53797    0.10392  -5.177 1.71e-05 ***
social2      0.51009    0.10392   4.909 3.56e-05 ***
play1        0.12363    0.12727   0.971  0.33968    
play2        0.20561    0.12727   1.615  0.11743    
play3        0.00770    0.12727   0.060  0.95219    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4409 on 28 degrees of freedom
Multiple R-squared:  0.7102,    Adjusted R-squared:  0.6377 
F-statistic: 9.802 on 7 and 28 DF,  p-value: 3.967e-06
summary(fit.2)

Call:
lm(formula = fear ~ social + play, data = Mice)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.23242 -0.39949 -0.00796  0.36757  0.97121 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.05852    0.09923  40.899  < 2e-16 ***
social1     -0.53797    0.14034  -3.833 0.000602 ***
social2      0.51009    0.14034   3.635 0.001031 ** 
play1        0.12363    0.17188   0.719 0.477526    
play2        0.20561    0.17188   1.196 0.240977    
play3        0.00770    0.17188   0.045 0.964565    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5954 on 30 degrees of freedom
Multiple R-squared:  0.4337,    Adjusted R-squared:  0.3393 
F-statistic: 4.595 on 5 and 30 DF,  p-value: 0.003132

Now, simply copy / paste from the (rounded) values in the summary tables, and round (again) to 2 decimals. Here’s what I get with my data, for the fold-change of the SE for social:

round(0.14034 / 0.10392, 2)
[1] 1.35

So in my data, the SE for social is \(1.35\times\) larger after omitting the blocking variable (litter). That’s bad – we want small SEs.

The Raid by Animal Liberation Activists

Well-meaning activists have broken into our animal facility and set free three of our mice! We need to see how this has affected our experiment, in terms of balance and orthogonality, and the conclusions we can draw.

Set up the new data

Mice.2 <- read.csv(file = "w3-classdemo-2.csv")

1Mice.2 <- within(Mice.2,{
  litter <- factor(litter)
  social <- factor(social)
  play <- factor(play)
})

2Mice.2 <- within(Mice.2, {
  contrasts(litter) <- contr.sum
  contrasts(social) <- contr.sum
  contrasts(play) <- contr.sum
})
1
don’t forget you need to again set up all predictors as factors..
2
… and set up sum contrasts (you don’t need to do this if you set up sum contrasts as the default; I prefer doing it explicitly).

Checking balance

table(Mice.2$litter)

 1  2  3 
11 10 12 

My data are no longer balanced for litter.

table(Mice.2$social)

 1  2  3 
11 11 11 

My data are still balanced for social.

table(Mice.2$play)

1 2 3 4 
8 8 8 9 

My data are no longer balanced for play.

Checking orthogonality

TASK: Now use one of the two SSQ methods that we used earlier to assess whether the design is still orthogonal after removal of [the three observations]. Analyse [the new data set] and write down in your notes whether this fundamentally alters your conclusions about the effect of [the predictors of interest].

Method 1, based on sequential SSQ (Type I ANOVAs) and swapping the order of EVs:

fit.3 <- lm(fear ~ litter + social + play, Mice.2)
fit.4 <- update(fit.3, . ~ play + litter + social)
anova(fit.3)
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
litter     2 4.4176  2.2088 12.6214 0.0001625 ***
social     2 7.0951  3.5475 20.2712 5.858e-06 ***
play       3 1.7602  0.5867  3.3527 0.0348751 *  
Residuals 25 4.3751  0.1750                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit.4)
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
play       3 1.2554  0.4185  2.3911 0.0925466 .  
litter     2 4.5673  2.2837 13.0492 0.0001316 ***
social     2 7.4502  3.7251 21.2858 4.001e-06 ***
Residuals 25 4.3751  0.1750                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Method 2, comparing sequential and adjusted SSQ (Type I vs. Type II):

anova(fit.3)  # seq. / Type I
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
litter     2 4.4176  2.2088 12.6214 0.0001625 ***
social     2 7.0951  3.5475 20.2712 5.858e-06 ***
play       3 1.7602  0.5867  3.3527 0.0348751 *  
Residuals 25 4.3751  0.1750                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(fit.3)  # adj. / Type II
Anova Table (Type II tests)

Response: fear
          Sum Sq Df F value    Pr(>F)    
litter    4.4429  2 12.6937 0.0001568 ***
social    7.4502  2 21.2858 4.001e-06 ***
play      1.7602  3  3.3527 0.0348751 *  
Residuals 4.3751 25                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this example, the escape of the three mice means that the design / the dataset is no longer orthogonal. The conclusions that we can draw from the data are broadly the same as before, but note a somewhat perplexing detail:

If I put play first in the model, the F-value from a sequential ANOVA is smaller than in the original data, and the p-value is even ‘less significant’ than before. But the F-value based on adjusted SSQ is larger than in the original data, and play comes out ‘significant’!

This just goes to show that using a hard significance cutoff of e.g. \(\alpha = 0.05\) is not sensible.