For this week’s demo, I’m using the lecture example of the bacterial growth experiment with or without lactoes in the medium. I have simulated a larger sample size than in the lecture dataset, to make it more similar to what you have in the practical.
Research question: Effect of medium (with / without lactose) on bacterial counts over time.
Outcome: Bacterial count (bacteria)
Predictors:
day of counting (day: 1, 2, 3, 4, 5)
growth medium (lactose – 0 = without, 1 = with; in the lecture, it was 1 = without, 2 = with)
Load and plot the data
HINT: if you click on the circled numbers Below the code, it will highlight the code that I’m referring to.
The within( ... ) trick saves me having to say MyBugs$ every time I want to refer to a variable within the dataframe (in this instance, it is just lactose).
2
I’m setting up the contrast for lactose here already.
3
I’m setting up and saving an “empty” ggplot, for reusing later; saves me typing this line over and over below.
4
Only now am I actually plotting it with the points.
Jittering the points sideways (only!)
base.plot +geom_jitter(height =0, width =0.1)
Or use transparency (alpha)
base.plot +geom_point(size=3, alpha=0.5)
In my example, I’d say that the counts are higher with lactose on all days, but the difference clearly gets bigger.
Model fit and interpretation
I’ve already set up my contrasts above, while I declared lactose a factor.
m.bugs <-lm(bacteria ~ day * lactose, MyBugs)anova(m.bugs)
Analysis of Variance Table
Response: bacteria
Df Sum Sq Mean Sq F value Pr(>F)
day 1 1212.04 1212.04 505.526 < 2.2e-16 ***
lactose 1 1772.62 1772.62 739.338 < 2.2e-16 ***
day:lactose 1 97.09 97.09 40.497 1.351e-08 ***
Residuals 76 182.22 2.40
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Remember how to interpret models with interaction terms
Recall, we have two EVs (\(X_1 = \mathtt{day},\ X_2 = \mathtt{lactose}\)), but we are asking three research questions.
if the evidence for an interaction day:lactose is weak (“non-significant” – in the practical, P>0.01), report the separate findings for the two main effects, day and lactose.
if there is clear evidence for an interaction (in the practical, P<0.01), only report the results for the interaction – since this implies that both main effects (day and lactose) are important.
this is also why we don’t do, or need, a “Type III ANOVA”!
Using emmeans for “ANCOVA” interaction plots
library(emmeans)
Warning: package 'emmeans' was built under R version 4.5.2
Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'
emmip(m.bugs, lactose ~ day, cov.reduce = range)
Predictions ‘by hand’ from coefs
I’d strongly suggest you do it by hand from the model coefficients, it will help you get your head round the meaning of the coefficients.
Important: If the interaction test came out “non-significant” (P > 0.01) in the ANOVA, you usually would not include the interaction effect in the prediction. However, for this practical, I want you to include the interaction effect in the prediction, even if it came out with P > 0.01 in your data.
Call:
lm(formula = bacteria ~ day * lactose, data = MyBugs)
Residuals:
Min 1Q Median 3Q Max
-2.8368 -1.0668 -0.2863 0.8977 4.1417
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.6557 0.4060 6.541 6.34e-09 ***
day 2.7523 0.1224 22.484 < 2e-16 ***
lactose1 -2.3702 0.4060 -5.838 1.23e-07 ***
day:lactose1 -0.7790 0.1224 -6.364 1.35e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.548 on 76 degrees of freedom
Multiple R-squared: 0.9442, Adjusted R-squared: 0.942
F-statistic: 428.5 on 3 and 76 DF, p-value: < 2.2e-16
Assume we want the bacterial count after 2.5 days (!) with lactose.
Careful: With sum contrasts, the coefficient for the first factor level will be level1. In my example, the first level of lactose is “0” which means no lactose, so this is what lactose1 refers to!
Sanity check: Refer to the plot - without lactose (“0”) comes out lower, so it makes sense that the estimate for lactose1 comes out negative! (Remember that the lactose1 estimate would be for day = 0…)
Same for interaction: The slope without lactose (“0”) is shallower and the day:lactose1 estimate is negative (since it is the deviation from the fictive average slope).
Self-explanatory, we need to include the intercept…
2
Add the estimate for day multiplied by 2.5 (days).
At this point, we’d have the estimate for the mean of with and without lactose 2.5 days into the experiment.
3
Add the estimate for the lactose1 effect multiplied by the value of lactose in terms of its contrast: for “with lactose” this is -1, i.e., we sign-invert.
4
Finally, add the interaction – we need to multiply by both the value of day and the contrast value of lactose.