An experiment was conducted into the effectiveness of two antidotes (A and B) to four different doses of toxin. The antidote was given 5 minutes after the toxin, and 25 minutes later the response was measured as the concentration of toxin-related products in the blood — higher values indicate the antidote was less effective. There were three subjects at each combination of the antidote and dose level. (There are no data on how many of the subjects died as a result of the experiment).
The data are in the file antidotes.csv, with the
variables
antidote - the two antidotes, coded A and
B;dose - the toxin dose administered;blood - the concentration of the toxin-induced products
measured in the subjects’ blood.As always, start by clarifying in your mind:
The answer to the last question is actually not clear-cut in this study! See last week’s lecture, where we discussed that some variables could be treated as either continuous or categorical.
Make sure you understand the study design - it is simple enough, but maybe different from what you are “primed” to see. (For example, I kept catching myself thinking that one of the EVs is time! — but time is not a variable in the experimental design, despite time points being mentioned in the description of the study above. Also, at first, I misread and thought that the dose that was varied was the dose of the antidote!).
Before you proceed to fitting a LM, you should always plot the data.
TASK: Plot the data as a scatterplot (treating
dose as a continuous EV), with different colours for the
two antidotes.
TASK: Do you notice something problematic about the
data — or rather, about fitting a LM to the data as they are? We haven’t
really covered this in the module just yet, but think back to last
year’s course, and assumptions about variances and residuals…: compare
the scatter of the response values (blood)
We could try a log-transformation of the response BUT this comes with considerable complications in interpreting the model fit — something they don’t tell you when they say, “If you have inhomogeneity of variances, do a log-transform”!!
For this practical, we will proceed as if there were no issues with the data as they are. We will analyse the untransformed response values (not a log-transformed respose).
MCQ1: From just looking at the plot of the raw data, what statement best describes the conclusions from the experiment?
[_] The two antidotes, A and B, seem about equally effective.
[_] Antidote B seems to be increasingly harmful at higher doses.
[_] Antidote B seems more effective than A with high toxin doses, but less effective than A when the toxin dose is low.
[_] Antidote B seems generally more effictive than A, but the difference in efficacy depends on the dose toxin concentration.
dose:
continuous or categorical?TASK: Look at the plot of the raw data again and
consider whether dose would best be treated as continuous
or as categorical. Recall what we said about this issue in last week’s
lecture.
We will first treat dose as a continuous EV in our
analysis.
MCQ2: What is the most compelling reason for
treating dose as a continuous EV?
[_] The response seems to closely follow a linear trend for both antidotes.
[_] The slopes of the linear trends seem to be about the same for both antidotes.
[_] The different doses tested are equally spaced.
[_] Treating any EV as continuous “saves” degrees of freedom and thus increases power.
OK, as I said above, for now we will treat dose as
continuous, but antidote is obviously a two-level factor.
As always,
Hint: even though antidote is coded
A and B (rather than numerically, as in
previous practicals), you may still need to explicitly declare it as a
factor before you can set up contrasts for it.
TASK: Fit the LM that is most appropriate for answering this research question.
Interpret the results of your LM analysis to answer the following questions:
MCQ3: Which of the following statements is most appropriate given your ANOVA results?
[_] The effectivenes of the treatment depends on both the dose of toxin administered \((P=0.00014)\) and on the antidote used \((P=2.86\times 10^{-5})\).
[_] The two antidotes differ in effectiveness, and this difference in their effectiveness depends on the dose of toxin administered \((P=0.000502)\).
[_] The effectivenes of the treatment depends on both the dose of toxin administered \((P=0.00593)\) and on the antidote used \((P=0.00017)\).
[_] The two antidotes differ in their effectiveness, and this difference in their effectiveness depends on the dose of toxin administered \((P=0.0152)\).
We could do the interaction plot “by hand”, as I do for my lecture slides. In addition to consolidating your understanding of the estimated coefficients, this is a useful exercise in “ggplot-craft” — so if do you have time, I’d encourage you to give it a go, but you don’t have to.
emmeans for “ANCOVA” interaction plotsInstead, for the practical, we’ll give the emmeans
package that I introduced in the lecture a spin. The
emmip() function works a bit different if one of the EVs is
continuous (“ANCOVA-like models”). I’ll save you the googling and give
you the syntax for plotting the estimates in emmip if you
have a factor interacting with a
covariate:
If you want to know more, particularly what the
cov.reduce = range bit is about and why it is needed, it is
well
explained here, in the web documentation of the emmeans
package. (Also, if you omit the cov.reduce argument, you’ll
see what happens without it…)
TASK: Use the emmip() function to
visualise the model estimates as an interaction plot.
But I won’t let you off that lightly, so you’ll need to think about
the model coefficients after all… Because we have treated
dose as continuous, we can actually make a prediction for a
toxin dose that was not even used in the experiment!
As long as the dose we are predicting for is in the
range of values used in the experiment, this amounts to
interpolation and is safe. What would be dangerous would be
extrapolation beyond the range that we have data for. Can you
see why?
TASK: Use the model coefficients to predict the response for a toxin dose of 17 and antidote B.
IMPORTANT: As we have seen in this week’s Lecture Segment 4, if the interaction is between a continuous EV and a factor, then the coefficient(s) associated with the interaction term refer to differences in the slopes of the regression lines.
Recall the general ‘recipe’ for predicting \(y\) from a line fit: you multiply the slope with the value of the continuous variable on the \(x\)-axis (and then add the intercept). However, here, because of the interaction, the slope of each ‘antidote line’ is described by two coefficients:
dose coefficient, which is the estimated ‘mean
slope’ between the two ‘antidote lines’; anddose:antidote1 coefficient, which is the estimated
deviation of the two actual ‘antidote line’ slopes from the
‘fictive’ mean slope.In your prediction calculation, you therefore need to
Looking at the last slide (#8) of my W5 Lecture Pt. 4 may be useful if you’re struggling to visualise whast you are doing here.
If you are unsure, come to the practical drop-in session before you submit your answers!
MCQ4: Which of the values best matches your prediction for a toxin dose of 17 and antidote B?
[_] 37.40
[_] 1.687
[_] 26.40
[_] 18.49
dose as categoricalWe’ll finish off by checking whether it was wise to treat
dose as a continuous EV.
TASK: Fit a new model that treats dose
as a factor (categorical EV).
Hint: You could do this from within the model formula, without changing the data frame itself.
MCQ5: Which of the following statements about
treating dose as continuous or categorical is
incorrect:
[_] Treating
doseas continuous reduces the residual mean squares, despite a slight increase in the residual SSQ.
[_] Treatingdoseas a factor results in much stronger evidence for its interaction withantidote.
[_] Treatingdoseas continuous saves four degrees of freedom in the model.
[_] Treatingdoseas a factor does not appreciably improve model fit.
Finally, and no MCQ for this one (!), let’s see what the interaction
plot looks like if we treat dose as a factor.
TASK: Use emmip() from the
emmeans package to create an interaction plot for the model
that treats dose as a factor. Note there are two ways of
plotting this that amount to the same thing. Can you figure out what the
two ways of plotting are, and which one is better for comparing with the
continuous dose version of the model?