Mice <- read.csv(file = "w3-classdemo.csv")W3 Practical Support
Scenario
Study aim: Does anxiety in lab mice depend on social environment and cage enrichment?
Study design: Mice from three litters were kept either in solitary confinement, in pairs, or in groups of five. The mice were either kept in bare cages, or had a running wheel and/or toys in their cage. After a set number of weeks, the mice were tested in a behavioural assay that produces an anxiety score.
- anxiety score (
fear): dependent variable. - litter (
litter): candidate blocking variable: 1 = litter 1, 2 = litter 2, 3 = litter 3. - social environment (
social): 1 = solitary, 2 = in pairs, 3 = in group of five. - cage enrichment (
play): 1 = bare cage, 2 = running wheel, 3 = toys, 4 = wheel + toys.
Loading the data
Setting up for the analysis
summary(Mice) fear litter social play
Min. :2.217 Min. :1 Min. :1 Min. :1.00
1st Qu.:3.542 1st Qu.:1 1st Qu.:1 1st Qu.:1.75
Median :4.107 Median :2 Median :2 Median :2.50
Mean :4.059 Mean :2 Mean :2 Mean :2.50
3rd Qu.:4.547 3rd Qu.:3 3rd Qu.:3 3rd Qu.:3.25
Max. :5.517 Max. :3 Max. :3 Max. :4.00
Aha. As expected, R has treated our predictors litter, social and play as continuous variables rather than categorical variables (factors), because it sees the values (1, 2, 3) as numbers, not as ‘factor levels’.
So, I first need to turn litter, social and play into factors (categorical variables):
# Simple way - repeat this for all factors:
Mice$litter <- factor(Mice$litter)
# Fancy way -- avoids having to write `Mice$` over and over:
Mice <- within(Mice,{
litter <- factor(litter)
social <- factor(social)
play <- factor(play)
})Next, I set up sum contrasts. This can be done in two ways:
either by changing the global default setting in R from treatment contrasts to sum contrasts,
# Change global default (Eamonn Mallon style)
options(contrasts = rep ("contr.sum", 2))or by assigning sum contrasts to each factor variable in the data frame,
# Repeat this for all factors:
contrasts(Mice$litter) <- contr.sum
# or, more fancy way:
Mice <- within(Mice, {
contrasts(litter) <- contr.sum
contrasts(social) <- contr.sum
contrasts(play) <- contr.sum
})R has a large number of global settings – you can see them all with options(). This includes the default for what contrasts to use with factors. A factor can also be assigned its own contrasts, and these will override the global default. Think of each factor as having a little drawer or slot where you can put its own contrasts: so, a factor variable has slots for each of its data values, and an extra slot where it can store the contrasts. When you first declare a variable as a factor, that extra slot is empty – R doesn’t put a copy of the current default contrasts into that slot. Therefore, when you fit a model that includes a factor, R will use whatever contrasts are the default at the time you fit a model – unless you’ve ‘put something into the factor’s contrast slot’ by assigning it its own contrast setting. This is why we can declare litter a factor first, and later change the global default contrasts.
Assessing orthogonality
Using the anova function in base R
Below, I’m using Quarto code annotations rather than R code comments. Below the code blocks, you’ll see a numbered list. If you click on the circled numbers ①, ②, ③ in the list below the code, it will highlight the code that the annotation refers to (clicking on the circled numbers in the code block won’t do anything).
- 1
- fitting the full model.
- 2
-
using
update()to make a new model with predictors in different order. This is just an alternative to writing out a new model from scratch withlm(...). - 3
-
print the ANOVA tables for both: recall that
anovagives sequential SSQ.
Analysis of Variance Table
Response: fear
Df Sum Sq Mean Sq F value Pr(>F)
litter 2 5.1923 2.5962 13.3558 8.455e-05 ***
social 2 6.6047 3.3023 16.9886 1.476e-05 ***
play 3 1.5403 0.5134 2.6413 0.06885 .
Residuals 28 5.4428 0.1944
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Variance Table
Response: fear
Df Sum Sq Mean Sq F value Pr(>F)
play 3 1.5403 0.5134 2.6413 0.06885 .
litter 2 5.1923 2.5962 13.3558 8.455e-05 ***
social 2 6.6047 3.3023 16.9886 1.476e-05 ***
Residuals 28 5.4428 0.1944
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TASK: Print the ANOVA tables from the two versions of the model to obtain the sequential SSQs for both. Are they different? Write down in your notes what this means for the orthogonality of the data.
Comparing the ANOVA tables, I see that for each of the three predictors, the SSQ (Sum Sq) values are the same in fit.1 and fit.1a (and so are the mean squares, F- and p- values.). So my dataset is fully orthogonal.
Using Anova from car package
If you have trouble remembering the package name car – it is short for Correlation And Regression, not automobiles.
- 1
-
getting adjusted SSQ requires the
Anova()function from packagecar, which isn’t loaded by default so we need to load it first. - 2
-
print the adjusted SSQ ANOVA table, and compare to table from
anova()
Anova Table (Type II tests)
Response: fear
Sum Sq Df F value Pr(>F)
litter 5.1923 2 13.3558 8.455e-05 ***
social 6.6047 2 16.9886 1.476e-05 ***
play 1.5403 3 2.6413 0.06885 .
Residuals 5.4428 28
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I see that the sequential and the adjusted SSQ (Sum Sq) values are the same (and hence they give the same mean squares, F- and p- values). So the conclusion from this is the same as above – my dataset is fully orthogonal.
Remember that even in a sequential SSQ table (“Type I ANOVA”), the SSQ of any predictor are adjusted for all predictors that come before it in the model formula (and are above it in the table). Only the SSQ for the first predictor in the model are ‘fully unadjusted.’ The SSQ for the second predictor are adjusted for the first predictor, but not for the later predictors, and so on. So when you compare the output from anova and Anova for the same model fit, the bottom row will always be the same, even if the dataset is not orthogonal.
Assessing the merit of blocking
I fit a new model that omits the blocking variable (litter) – I’m using update() but you could just as well write out a new lm() from scratch:
fit.2 <- update(fit.1, . ~ social + play)TASK: In your notes, write down how dropping [the blocking variable] has changed the model results.
- How have the residual SSQ changed, and why?
- How have the residual DoF changed, and why?
- What about the adjusted SSQ and the \(F\)-ratios for [the predictors of interest]?
- What happened to the \(P\)-values for [the predictors of interest]?
I print the ANOVA tables for the full model and for the model that omits the blocking variable.
anova(fit.1) # original model from earlierAnalysis of Variance Table
Response: fear
Df Sum Sq Mean Sq F value Pr(>F)
litter 2 5.1923 2.5962 13.3558 8.455e-05 ***
social 2 6.6047 3.3023 16.9886 1.476e-05 ***
play 3 1.5403 0.5134 2.6413 0.06885 .
Residuals 28 5.4428 0.1944
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit.2) # reduced model without `litter`Analysis of Variance Table
Response: fear
Df Sum Sq Mean Sq F value Pr(>F)
social 2 6.6047 3.3023 9.3154 0.0007131 ***
play 3 1.5403 0.5134 1.4483 0.2484202
Residuals 30 10.6352 0.3545
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To assess the effect of dropping the blocking variable on the standard errors, we need to look at the model summaries, not the ANOVA tables.
summary(fit.1)
Call:
lm(formula = fear ~ litter + social + play, data = Mice)
Residuals:
Min 1Q Median 3Q Max
-0.8682 -0.1909 0.1149 0.2218 0.8132
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.05852 0.07348 55.231 < 2e-16 ***
litter1 0.15804 0.10392 1.521 0.13951
litter2 0.36551 0.10392 3.517 0.00151 **
social1 -0.53797 0.10392 -5.177 1.71e-05 ***
social2 0.51009 0.10392 4.909 3.56e-05 ***
play1 0.12363 0.12727 0.971 0.33968
play2 0.20561 0.12727 1.615 0.11743
play3 0.00770 0.12727 0.060 0.95219
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4409 on 28 degrees of freedom
Multiple R-squared: 0.7102, Adjusted R-squared: 0.6377
F-statistic: 9.802 on 7 and 28 DF, p-value: 3.967e-06
summary(fit.2)
Call:
lm(formula = fear ~ social + play, data = Mice)
Residuals:
Min 1Q Median 3Q Max
-1.23242 -0.39949 -0.00796 0.36757 0.97121
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.05852 0.09923 40.899 < 2e-16 ***
social1 -0.53797 0.14034 -3.833 0.000602 ***
social2 0.51009 0.14034 3.635 0.001031 **
play1 0.12363 0.17188 0.719 0.477526
play2 0.20561 0.17188 1.196 0.240977
play3 0.00770 0.17188 0.045 0.964565
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5954 on 30 degrees of freedom
Multiple R-squared: 0.4337, Adjusted R-squared: 0.3393
F-statistic: 4.595 on 5 and 30 DF, p-value: 0.003132
Now, simply copy / paste from the (rounded) values in the summary tables, and round (again) to 2 decimals. Here’s what I get with my data, for the fold-change of the SE for social:
round(0.14034 / 0.10392, 2)[1] 1.35
So in my data, the SE for social is \(1.35\times\) larger after omitting the blocking variable (litter). That’s bad – we want small SEs.
The Raid by Animal Liberation Activists
Well-meaning activists have broken into our animal facility and set free three of our mice! We need to see how this has affected our experiment, in terms of balance and orthogonality, and the conclusions we can draw.
Set up the new data
- 1
- don’t forget you need to again set up all predictors as factors..
- 2
- … and set up sum contrasts (you don’t need to do this if you set up sum contrasts as the default; I prefer doing it explicitly).
Checking balance
table(Mice.2$litter)
1 2 3
11 10 12
My data are no longer balanced for litter.
table(Mice.2$social)
1 2 3
11 11 11
My data are still balanced for social.
table(Mice.2$play)
1 2 3 4
8 8 8 9
My data are no longer balanced for play.
Checking orthogonality
TASK: Now use one of the two SSQ methods that we used earlier to assess whether the design is still orthogonal after removal of [the three observations]. Analyse [the new data set] and write down in your notes whether this fundamentally alters your conclusions about the effect of [the predictors of interest].
Method 1, based on sequential SSQ (Type I ANOVAs) and swapping the order of EVs:
fit.3 <- lm(fear ~ litter + social + play, Mice.2)
fit.4 <- update(fit.3, . ~ play + litter + social)
anova(fit.3)Analysis of Variance Table
Response: fear
Df Sum Sq Mean Sq F value Pr(>F)
litter 2 4.4176 2.2088 12.6214 0.0001625 ***
social 2 7.0951 3.5475 20.2712 5.858e-06 ***
play 3 1.7602 0.5867 3.3527 0.0348751 *
Residuals 25 4.3751 0.1750
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit.4)Analysis of Variance Table
Response: fear
Df Sum Sq Mean Sq F value Pr(>F)
play 3 1.2554 0.4185 2.3911 0.0925466 .
litter 2 4.5673 2.2837 13.0492 0.0001316 ***
social 2 7.4502 3.7251 21.2858 4.001e-06 ***
Residuals 25 4.3751 0.1750
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Method 2, comparing sequential and adjusted SSQ (Type I vs. Type II):
anova(fit.3) # seq. / Type IAnalysis of Variance Table
Response: fear
Df Sum Sq Mean Sq F value Pr(>F)
litter 2 4.4176 2.2088 12.6214 0.0001625 ***
social 2 7.0951 3.5475 20.2712 5.858e-06 ***
play 3 1.7602 0.5867 3.3527 0.0348751 *
Residuals 25 4.3751 0.1750
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(fit.3) # adj. / Type IIAnova Table (Type II tests)
Response: fear
Sum Sq Df F value Pr(>F)
litter 4.4429 2 12.6937 0.0001568 ***
social 7.4502 2 21.2858 4.001e-06 ***
play 1.7602 3 3.3527 0.0348751 *
Residuals 4.3751 25
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this example, the escape of the three mice means that the design / the dataset is no longer orthogonal. The conclusions that we can draw from the data are broadly the same as before, but note a somewhat perplexing detail:
If I put play first in the model, the F-value from a sequential ANOVA is smaller than in the original data, and the p-value is even ‘less significant’ than before. But the F-value based on adjusted SSQ is larger than in the original data, and play comes out ‘significant’!
This just goes to show that using a hard significance cutoff of e.g. \(\alpha = 0.05\) is not sensible.