W5 Practical Demo

Author

Swidbert R. Ott

Mid-module survey, if you haven’t done it yet

Lecture data example

For this week’s demo, I’m using the lecture example of the bacterial growth experiment with or without lactoes in the medium. I have simulated a larger sample size than in the lecture dataset, to make it more similar to what you have in the practical.

Research question: Effect of medium (with / without lactose) on bacterial counts over time.

Outcome: Bacterial count (bacteria)

Predictors:

day of counting (day: 1, 2, 3, 4, 5)
growth medium (lactose – 0 = without, 1 = with; in the lecture, it was 1 = without, 2 = with)

Load and plot the data

HINT: if you click on the circled numbers Below the code, it will highlight the code that I’m referring to.

MyBugs <- read.csv(file = "w5-classdemo.csv")
1MyBugs <- within(MyBugs, {
  lactose <- factor(lactose);             #
2  contrasts(lactose) <- contr.sum
})

library(ggplot2)
3base.plot <- ggplot(MyBugs, aes(x = day, y = bacteria, colour = lactose))
4base.plot + geom_point()

1: The within( ... ) trick saves me having to say MyBugs$ every time I want to refer to a variable within the dataframe (in this instance, it is just lactose).
2: I’m setting up the contrast for lactose here already.
3: I’m setting up and saving an “empty” ggplot, for reusing later; saves me typing this line over and over below.
4: Only now am I actually plotting it with the points.

Jittering the points sideways (only!)

base.plot + geom_jitter(height = 0, width = 0.1)

Or use transparency (`alpha`)

base.plot + geom_point(size=3, alpha=0.5)

In my example, I’d say that the counts are higher with lactose on all days, but the difference clearly gets bigger.

Model fit and interpretation

I’ve already set up my contrasts above, while I declared lactose a factor.

m.bugs <- lm(bacteria ~ day * lactose, MyBugs)
anova(m.bugs)

Analysis of Variance Table

Response: bacteria
            Df  Sum Sq Mean Sq F value    Pr(>F)    
day          1 1212.04 1212.04 505.526 < 2.2e-16 ***
lactose      1 1772.62 1772.62 739.338 < 2.2e-16 ***
day:lactose  1   97.09   97.09  40.497 1.351e-08 ***
Residuals   76  182.22    2.40                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Remember how to interpret models with interaction terms

Recall, we have two EVs ($X_1 = \mathtt{day},\ X_2 = \mathtt{lactose}$), but we are asking three research questions.

if the evidence for an interaction day:lactose is weak (“non-significant” – in the practical, P>0.01), report the separate findings for the two main effects, day and lactose.
if there is clear evidence for an interaction (in the practical, P<0.01), only report the results for the interaction – since this implies that both main effects (day and lactose) are important.
this is also why we don’t do, or need, a “Type III ANOVA”!

Using `emmeans` for “ANCOVA” interaction plots

library(emmeans)

Warning: package 'emmeans' was built under R version 4.5.2

Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'

emmip(m.bugs, lactose ~ day, cov.reduce = range)

Predictions ‘by hand’ from coefs

I’d strongly suggest you do it by hand from the model coefficients, it will help you get your head round the meaning of the coefficients.

Important: If the interaction test came out “non-significant” (P > 0.01) in the ANOVA, you usually would not include the interaction effect in the prediction. However, for this practical, I want you to include the interaction effect in the prediction, even if it came out with P > 0.01 in your data.

1summary(m.bugs)
2bug.coefs <- coef(m.bugs)

1: Not needed - only to see the results.
2: Saving the coefficients for use below.


Call:
lm(formula = bacteria ~ day * lactose, data = MyBugs)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8368 -1.0668 -0.2863  0.8977  4.1417 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    2.6557     0.4060   6.541 6.34e-09 ***
day            2.7523     0.1224  22.484  < 2e-16 ***
lactose1      -2.3702     0.4060  -5.838 1.23e-07 ***
day:lactose1  -0.7790     0.1224  -6.364 1.35e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.548 on 76 degrees of freedom
Multiple R-squared:  0.9442,    Adjusted R-squared:  0.942 
F-statistic: 428.5 on 3 and 76 DF,  p-value: < 2.2e-16

Assume we want the bacterial count after 2.5 days (!) with lactose.

Careful: With sum contrasts, the coefficient for the first factor level will be level1. In my example, the first level of lactose is “0” which means no lactose, so this is what lactose1 refers to!
Sanity check: Refer to the plot - without lactose (“0”) comes out lower, so it makes sense that the estimate for lactose1 comes out negative! (Remember that the lactose1 estimate would be for day = 0…)
Same for interaction: The slope without lactose (“0”) is shallower and the day:lactose1 estimate is negative (since it is the deviation from the fictive average slope).

OK, on with the calculation:

1bug.coefs["(Intercept)"] +
2  bug.coefs["day"] * 2.5 +
3  bug.coefs["lactose1"] * -1 +
4  bug.coefs["day:lactose1"] * -1 * 2.5

1: Self-explanatory, we need to include the intercept…
2: Add the estimate for day multiplied by 2.5 (days).
At this point, we’d have the estimate for the mean of with and without lactose 2.5 days into the experiment.
3: Add the estimate for the lactose1 effect multiplied by the value of lactose in terms of its contrast: for “with lactose” this is -1, i.e., we sign-invert.
4: Finally, add the interaction – we need to multiply by both the value of day and the contrast value of lactose.

(Intercept) 
   13.85415

Easy visual check

Using the interaction plot we did earlier:

1emmip(m.bugs, lactose ~ day, cov.reduce = range) +
2  geom_vline(xintercept = 2.5) +
3  geom_hline(yintercept = 13.85)

1: emmip() is just a ‘wrapper’ around ggplot() so we can use ggplot commands to add stuff to the plot!
2: draw a vertical line at 2.5 along the day axis – the value we’re predicting for.
3: draw a horizontal line at 13.85 along the y-axis – the value that we got for our prediction.

Great, the predicted point is bang on the fit line for “with lactose”

Visualising the slopes

The mean slope – half-way between with and without lactose:

emmip(m.bugs, lactose ~ day, cov.reduce = range) +
  geom_abline(linetype = 2,  # dashed
    intercept = bug.coefs["(Intercept)"],
    slope = bug.coefs["day"]
    )

The slope for “with lactose” (but not yet with the “with lactose” shift in intercept):

emmip(m.bugs, lactose ~ day, cov.reduce = range) +
  geom_abline(linetype = 2,  # dashed
    intercept = bug.coefs["(Intercept)"],
    slope = bug.coefs["day"] + bug.coefs["day:lactose1"] * -1
    )

Correcting the intercept to be “with lactose” as well:

emmip(m.bugs, lactose ~ day, cov.reduce = range) +
  geom_abline(linetype = 2,  # dashed
    intercept = bug.coefs["(Intercept)"] + bug.coefs["lactose1"] * -1,
    slope = bug.coefs["day"] + bug.coefs["day:lactose1"] * -1
    )

Putting the predicted lines on the original data

base.plot + geom_point(size=3, alpha=0.5) +
  geom_abline(
    intercept = bug.coefs["(Intercept)"] + bug.coefs["lactose1"],
    slope = bug.coefs["day"] + bug.coefs["day:lactose1"]
    ) +
  geom_abline(
    intercept = bug.coefs["(Intercept)"] + bug.coefs["lactose1"] * -1,
    slope = bug.coefs["day"] + bug.coefs["day:lactose1"] * -1
    )

Treating dose as categorical

m.bugs2 <- lm(bacteria ~ factor(day) * lactose, MyBugs)

anova(m.bugs)     # `days` as covariate

Analysis of Variance Table

Response: bacteria
            Df  Sum Sq Mean Sq F value    Pr(>F)    
day          1 1212.04 1212.04 505.526 < 2.2e-16 ***
lactose      1 1772.62 1772.62 739.338 < 2.2e-16 ***
day:lactose  1   97.09   97.09  40.497 1.351e-08 ***
Residuals   76  182.22    2.40                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(m.bugs2)    # `days` as factor

Analysis of Variance Table

Response: bacteria
                    Df  Sum Sq Mean Sq F value    Pr(>F)    
factor(day)          4 1225.39  306.35 127.757 < 2.2e-16 ***
lactose              1 1772.62 1772.62 739.240 < 2.2e-16 ***
factor(day):lactose  4   98.11   24.53  10.229 1.404e-06 ***
Residuals           70  167.85    2.40                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using emmip() function to make interaction plot for the day-as-factor model.

emmip(m.bugs2, lactose ~ factor(day))

or the other way around – probably less intuitive here:

emmip(m.bugs2, factor(day) ~ lactose)

BONUS: multiple predictions in one go

Lets do this with the model that has day as continuous: predict

without lactose, on days 4, 4.5 and 6 (6 is dangerous – we are extrapolating beyond the range of the available data!)
with lactose, on the same days.

pred.df <- data.frame(
  day = c(4, 4.5, 6, 4, 4.5, 6),
  lactose = factor( c(0, 0, 0, 1, 1, 1) )
)

pred.df

  day lactose
1 4.0       0
2 4.5       0
3 6.0       0
4 4.0       1
5 4.5       1
6 6.0       1

Looking good… could also do this using rep() which comes in handy for building bigger dataframes from scratch:

pred.df <- data.frame(
  day = rep( c(4, 4.5, 6), 2),
  lactose = factor(
    rep(c(0, 1), each=3))
)
pred.df

  day lactose
1 4.0       0
2 4.5       0
3 6.0       0
4 4.0       1
5 4.5       1
6 6.0       1

Now, predict on these data.

predict(m.bugs, newdata = pred.df)

        1         2         3         4         5         6 
 8.178718  9.165378 12.125356 19.151126 20.916784 26.213758

But may be easier to read if we put it together with the data

pred.df$prediction <- predict(m.bugs, newdata = pred.df)
pred.df

  day lactose prediction
1 4.0       0   8.178718
2 4.5       0   9.165378
3 6.0       0  12.125356
4 4.0       1  19.151126
5 4.5       1  20.916784
6 6.0       1  26.213758