W3 Practical Support

Anxious Mice

Author

Swidbert R. Ott

The Scenario: Design, Data, and the Model

I’ve come up with a fictive scenario that has the same structure as what you do in the practical.

Research question: Does anxiety in lab mice depend on social environment and cage enrichment?

Study design: Mice from three litters were kept either in solitary confinement, in pairs, or in groups of five. The mice were either kept in bare cages, or had a running wheel and/or toys in their cage. After a set number of weeks, the mice were tested in a behavioural assay that produces an anxiety score.

Outcome: Anxiety score (fear)

Predictors:

  • litter (litter): candidate blocking variable: 1 = litter 1, 2 = litter 2, 3 = litter 3.
  • social environment (social): 1 = solitary, 2 = in pairs, 3 = in group of five.
  • cage enrichment (play): 1 = bare cage, 2 = running wheel, 3 = toys, 4 = wheel + toys.

Loading the data

Mice <- read.csv(file = "w3-classdemo.csv")

Setting up for the analysis

summary(Mice)
      fear           litter      social       play     
 Min.   :2.217   Min.   :1   Min.   :1   Min.   :1.00  
 1st Qu.:3.542   1st Qu.:1   1st Qu.:1   1st Qu.:1.75  
 Median :4.107   Median :2   Median :2   Median :2.50  
 Mean   :4.059   Mean   :2   Mean   :2   Mean   :2.50  
 3rd Qu.:4.547   3rd Qu.:3   3rd Qu.:3   3rd Qu.:3.25  
 Max.   :5.517   Max.   :3   Max.   :3   Max.   :4.00  

Aha. As expected, R has treated our predictors litter, social and play as continuous variables rather than categorical variables (factors), because it sees the values (1, 2, 3) as numbers, not as ‘factor levels’.

So, I first need to turn litter, social and play into factors (categorical variables):

TipCode annotations

Below, I’m using Quarto code annotations in addition to plain R code comments. Below the code blocks, you’ll see a numbered list. If you click on the circled numbers ①, ②, ③ in the list below the code, it will highlight the code that the annotation refers to (clicking on the circled numbers in the code block won’t do anything).

# Simple way - repeat this for all factors:
1Mice$litter <- factor(Mice$litter)

# Fancy way -- avoids having to write `Mice$` over and over:
2Mice <- within(Mice,{
3  litter <- factor(litter)
  social <- factor(social)
  play <- factor(play)
})
1
The downside of this is that you have to reference the data frame (i.e., write Mice$) every time you refer to one of the variables. This gets a bit tedious, and forgetting about the Mice$ part is a common source of error.
2
The within() syntax provides a more elegant way. It literally means that everything within the brackets refer to variables within the data frame (Mice). On its own, within() will return a ‘copy’ of the data i.e. it doesn’t alter the data in Mice itself, so we need to overwrite the old version of Mice, hence the Mice <- within(Mice, ...)
3
Since we are ‘within’ data frame Mice, we can refer to the variables within it without the Mice$ bit.

Next, I set up sum contrasts. This can be done in two ways: either by changing the global default setting in R from treatment contrasts to sum contrasts,

# Change global default (Eamonn Mallon style)
options(contrasts = rep ("contr.sum", 2))

or by assigning sum contrasts to each factor variable in the data frame (nothing bad will happen if you were to do both, but it’s obviously unnecessary):

# Repeat this for all factors:
contrasts(Mice$litter) <- contr.sum

# or, more fancy way (see above):
Mice <- within(Mice, {
  contrasts(litter) <- contr.sum
  contrasts(social) <- contr.sum
  contrasts(play) <- contr.sum
})

R has a large number of global settings – you can see them all with options(). This includes the default for what contrasts to use with factors. A factor can also be assigned its own contrasts, and these will override the global default. Think of each factor as having a little drawer or slot where you can put its own contrasts: so, a factor variable has slots for each of its data values, and an extra slot where it can store the contrasts. When you first declare a variable as a factor, that extra slot is empty – R doesn’t put a copy of the current default contrasts into that slot. Therefore, when you fit a model that includes a factor, R will use whatever contrasts are the default at the time you fit a model – unless you’ve ‘put something into the factor’s contrast slot’ by assigning it its own contrast setting. This is why we can declare litter a factor first, and later change the global default contrasts.

Fitting the model

Let’s start by fitting a model that is appropriate for this experimental design. In a blocked design, the blocking factor (here, litter) should always come first in the model formula.

m.1 <- lm(fear ~ litter + social + play, data = Mice)

Assessing orthogonality

Using just the anova function in base R

1m.1a <- update(m.1, . ~ play + social + litter)
1
using update() to make a new model m.1a based on m.1 but with predictors in different order. This is just an alternative to writing out a new model from scratch with lm(...).

Print the ANOVA tables from the two versions of the model to obtain the sequential SSQs for both. Are they different? Write down in your notes what this means for the orthogonality of the data.

# recall that `anova()` gives sequential SSQ
anova(m.1)
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
litter     2 5.1923  2.5962 13.3558 8.455e-05 ***
social     2 6.6047  3.3023 16.9886 1.476e-05 ***
play       3 1.5403  0.5134  2.6413   0.06885 .  
Residuals 28 5.4428  0.1944                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.1a)
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
play       3 1.5403  0.5134  2.6413   0.06885 .  
social     2 6.6047  3.3023 16.9886 1.476e-05 ***
litter     2 5.1923  2.5962 13.3558 8.455e-05 ***
Residuals 28 5.4428  0.1944                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing the ANOVA tables, I see that for each of the three predictors, the SSQ (Sum Sq) values are the same in m.1 and m.1a (and so are the mean squares, F- and p- values.). So my dataset is fully orthogonal.

What is the value of the sequential SSQ for [the blocking variable – in this example: litter] in your model m.1?

For this, you need to run anova(m.1) and look at the column Sum Sq in the output. In my data, the sequential SSQ for litter is 5.1923.

Using Anova from car package

Note

If you have trouble remembering the package name car – it is short for Correlation And Regression, not automobiles.

library(car)   # needs loading first
Anova(m.1)     # print the adjusted SSQ ANOVA table
Anova Table (Type II tests)

Response: fear
          Sum Sq Df F value    Pr(>F)    
litter    5.1923  2 13.3558 8.455e-05 ***
social    6.6047  2 16.9886 1.476e-05 ***
play      1.5403  3  2.6413   0.06885 .  
Residuals 5.4428 28                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I see that the sequential and the adjusted SSQ (Sum Sq) values are the same (and hence they give the same mean squares, F- and p- values). So the conclusion from this is the same as above – my dataset is fully orthogonal.

Remember that even in a sequential SSQ table (“Type I ANOVA”), the SSQ of any predictor are adjusted for all predictors that come before it in the model formula (and are above it in the table). Only the SSQ for the first predictor in the model are ‘fully unadjusted.’ The SSQ for the second predictor are adjusted for the first predictor, but not for the later predictors, and so on. So when you compare the output from anova and Anova for the same model fit, the bottom row will always be the same, even if the dataset is not orthogonal. For the same reason, you could get the adjusted SSQ for litter by running base-R anova() on m.1a which has litter last.

What is the value of the adjusted SSQ for [the blocking variable -in this example: litter] in your m.1?

For this, you run Anova(m.1) and look at the column Sum Sq in the output. In my data, the sequential SSQ for litter is again 5.1923 – the same as the sequential SSQ.

What is your overall assessment of the orthogonality of the Mice dataset?

My Mice dataset is fully orthogonal (see above – sequential and adjusted SSQ are the same)

Was Blocking Worthwhile?

Like our lazy flower grower, our mouse researcher wonders whether he could do without blocking (for litter, in this case)

A first answer

Given the \(P\)-value for the blocking variable in your data, what advice should you give the mouse researcher?

The \(P\)-value for the blocking variable (litter) in my data is \(P=8.455\times 10^{-5} < 0.001\); so we’d say that the data provide compelling evidence in favour of blocking future experiments \((P < 0.001)\).

A deeper look…

I fit a new model that omits the blocking variable (litter) – I’m using update() but you could just as well write out a new lm() from scratch:

m.2 <- update(m.1, . ~ social + play)

In your notes, write down how dropping [the blocking variable] has changed the model results.

  • How have the residual SSQ changed, and why?
  • How have the residual DoF changed, and why?
  • What about the adjusted SSQ and the \(F\)-ratios for [the predictors of interest]?
  • What happened to the \(P\)-values for [the predictors of interest]?

I print the ANOVA tables for the full model and for the model that omits the blocking variable.

anova(m.1)  # original model from earlier
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
litter     2 5.1923  2.5962 13.3558 8.455e-05 ***
social     2 6.6047  3.3023 16.9886 1.476e-05 ***
play       3 1.5403  0.5134  2.6413   0.06885 .  
Residuals 28 5.4428  0.1944                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.2)  # reduced model without `litter`
Analysis of Variance Table

Response: fear
          Df  Sum Sq Mean Sq F value    Pr(>F)    
social     2  6.6047  3.3023  9.3154 0.0007131 ***
play       3  1.5403  0.5134  1.4483 0.2484202    
Residuals 30 10.6352  0.3545                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • The residual SSQ are much larger without blocking – up from 5.44 to 10.64; this is because there is quite a lot of variation in the anxiety scores that is attributable to litter differences, and without litter in the model, this part of the variation is not accounted for and ends up in the residual SSQ.
  • The residual DoF are also up, from 28 to 30; this is because litter has three levels = 2 DoF, and not including it in the model frees up those 2 DoF.
  • The SSQ for the predictors of interest (social and play) haven’t changed at all (this is because the data are orthogonal), but the \(F\)-ratios are considerably smaller. That’s because the residual Mean Squares have gone up, from 0.19 to 0.35. This in turn is because the ‘savings’ in DoF (from 28 to 30) were fare outweighed by the increase in unexplained variation (residual SSQ, which almost doubled).
  • The \(P\)-values for social and play are larger without blocking – because the \(F\)-ratios are smaller.

What is the fold-change in the standard error of the effect estimates for water due to removing the blocking factor from the model? Use the round function to round your answer to two decimals.

To assess the effect of dropping the blocking variable on the standard errors, we need to look at the model summaries, not the ANOVA tables.

summary(m.1)

Call:
lm(formula = fear ~ litter + social + play, data = Mice)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.8682 -0.1909  0.1149  0.2218  0.8132 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.05852    0.07348  55.231  < 2e-16 ***
litter1      0.15804    0.10392   1.521  0.13951    
litter2      0.36551    0.10392   3.517  0.00151 ** 
social1     -0.53797    0.10392  -5.177 1.71e-05 ***
social2      0.51009    0.10392   4.909 3.56e-05 ***
play1        0.12363    0.12727   0.971  0.33968    
play2        0.20561    0.12727   1.615  0.11743    
play3        0.00770    0.12727   0.060  0.95219    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4409 on 28 degrees of freedom
Multiple R-squared:  0.7102,    Adjusted R-squared:  0.6377 
F-statistic: 9.802 on 7 and 28 DF,  p-value: 3.967e-06
summary(m.2)

Call:
lm(formula = fear ~ social + play, data = Mice)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.23242 -0.39949 -0.00796  0.36757  0.97121 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.05852    0.09923  40.899  < 2e-16 ***
social1     -0.53797    0.14034  -3.833 0.000602 ***
social2      0.51009    0.14034   3.635 0.001031 ** 
play1        0.12363    0.17188   0.719 0.477526    
play2        0.20561    0.17188   1.196 0.240977    
play3        0.00770    0.17188   0.045 0.964565    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5954 on 30 degrees of freedom
Multiple R-squared:  0.4337,    Adjusted R-squared:  0.3393 
F-statistic: 4.595 on 5 and 30 DF,  p-value: 0.003132

Now, simply copy / paste from the (rounded) values in the summary tables, and round (again) to 2 decimals. Note that it doesn’t matter whether you get your SEs from social1 or social2 – they are the same for all factor levels. Here’s what I get with my data, for the fold-change of the SE for social:

round(0.14034 / 0.10392, 2)
[1] 1.35

If you wanted to do this ‘in code’, here’s one way of doing it:

1coef.1 <- coef( summary(m.1) )
coef.2 <- coef( summary(m.2) )
2social_SE.1 <- coef.1["social1", "Std. Error"]
social_SE.2 <- coef.2["social1", "Std. Error"]

3round( round(social_SE.2, 5) / round(social_SE.1, 5), 2)
1
To get the table of coefficients with the SEs, we need to call coef() on the summary() of the model; if we run coef() on the model itself (as in coef(m.1)), we get only the values of the coefficients (try it!).
2
Saving the SEs. I’m indexing the table by row and column names – you’d first want to print the table (just type coef.1) to remind yourself what the row/col names are. Indexing by row/col number would also work, but more error prone.
3
A bit ugly to do this in one line… rounding both SEs to five decimals, then rounding the division to two decimals.
[1] 1.35

In my data, the SE for social is \(1.35\times\) larger after omitting the blocking variable (litter). That’s bad – we want small SEs.

The Raid by Animal Liberation Activists

Well-meaning activists have broken into our animal facility and set free three of our mice! We need to see how this has affected our experiment, in terms of balance and orthogonality, and the conclusions we can draw.

Mice.2 <- read.csv(file = "w3-classdemo-2.csv")

1Mice.2 <- within(Mice.2,{
  litter <- factor(litter)
  social <- factor(social)
  play <- factor(play)
})

2Mice.2 <- within(Mice.2, {
  contrasts(litter) <- contr.sum
  contrasts(social) <- contr.sum
  contrasts(play) <- contr.sum
})
1
don’t forget you need to again set up all predictors as factors..
2
… and set up sum contrasts (you don’t need to do this if you set up sum contrasts as the default; I prefer doing it explicitly).

Checking balance

First, let’s assess how the removal of the three plots from the data has affected the balance of the design. For this, we can look at the number of replicates for each of the three EVs (litter, social and play) using the table function.

For which EVs has the design become unbalanced. Select all statements that apply. To answer correctly, you may need to select more than one!

table(Mice.2$litter)

 1  2  3 
11 10 12 

My data are no longer balanced for litter.

table(Mice.2$social)

 1  2  3 
11 11 11 

My data are still balanced for social.

table(Mice.2$play)

1 2 3 4 
8 8 8 9 

My data are no longer balanced for play.

Checking orthogonality

Now use one of the two SSQ methods that we used earlier to assess whether the design is still orthogonal after removal of [the three observations]. Analyse [the new data set] and write down in your notes whether this fundamentally alters your conclusions about the effect of [the predictors of interest].

Method 1, based on sequential SSQ (Type I ANOVAs) and swapping the order of EVs:

m.3 <- lm(fear ~ litter + social + play, Mice.2)
m.4 <- update(m.3, . ~ play + litter + social)
anova(m.3)
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
litter     2 4.4176  2.2088 12.6214 0.0001625 ***
social     2 7.0951  3.5475 20.2712 5.858e-06 ***
play       3 1.7602  0.5867  3.3527 0.0348751 *  
Residuals 25 4.3751  0.1750                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m.4)
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
play       3 1.2554  0.4185  2.3911 0.0925466 .  
litter     2 4.5673  2.2837 13.0492 0.0001316 ***
social     2 7.4502  3.7251 21.2858 4.001e-06 ***
Residuals 25 4.3751  0.1750                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Method 2, comparing sequential and adjusted SSQ (Type I vs. Type II):

anova(m.3)  # seq. / Type I
Analysis of Variance Table

Response: fear
          Df Sum Sq Mean Sq F value    Pr(>F)    
litter     2 4.4176  2.2088 12.6214 0.0001625 ***
social     2 7.0951  3.5475 20.2712 5.858e-06 ***
play       3 1.7602  0.5867  3.3527 0.0348751 *  
Residuals 25 4.3751  0.1750                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(m.3)  # adj. / Type II
Anova Table (Type II tests)

Response: fear
          Sum Sq Df F value    Pr(>F)    
litter    4.4429  2 12.6937 0.0001568 ***
social    7.4502  2 21.2858 4.001e-06 ***
play      1.7602  3  3.3527 0.0348751 *  
Residuals 4.3751 25                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this example, the escape of the three mice means that the design / the dataset is no longer orthogonal. The conclusions that we can draw from the data are broadly the same as before, but note a somewhat perplexing detail (this may or may not happen with your data):

If I put play first in the model, the F-value from a sequential ANOVA is smaller than in the original data, and the p-value is even ‘less significant’ than before. But the F-value based on adjusted SSQ is larger than in the original data, and play comes out ‘significant’!

This just goes to show that using a hard significance cutoff of e.g. \(\alpha = 0.05\) is not sensible.