1 Study Design & Data

An experiment was conducted into the effectiveness of two antidotes (A and B) to four different doses of toxin. The antidote was given 5 minutes after the toxin, and 25 minutes later the response was measured as the concentration of toxin-related products in the blood — higher values indicate the antidote was less effective. There were three subjects at each combination of the antidote and dose level. (There are no data on how many of the subjects died as a result of the experiment).

The data are in the file antidotes.csv, with the variables

  • antidote - the two antidotes, coded A and B;
  • dose - the toxin dose administered;
  • blood - the concentration of the toxin-induced products measured in the subjects’ blood.

1.1 Consider the study aims and design

As always, start by clarifying in your mind:

  • what is the dependent variable (outcome, response; \(Y\))?
  • what are the explanatory variables (EVs; \(X_{1,\dots, i}\))?
  • are EVs continuous or categorical?

The answer to the last question is actually not clear-cut in this study! See last week’s lecture, where we discussed that some variables could be treated as either continuous or categorical.

Make sure you understand the study design - it is simple enough, but maybe different from what you are “primed” to see. (For example, I kept catching myself thinking that one of the EVs is time! — but time is not a variable in the experimental design, despite time points being mentioned in the description of the study above. Also, at first, I misread and thought that the dose that was varied was the dose of the antidote!).

2 First plots, and a decision

2.1 Load and plot the data

Before you proceed to fitting a LM, you should always plot the data.

TASK: Plot the data as a scatterplot (treating dose as a continuous EV), with different colours for the two antidotes.

TASK: Do you notice something problematic about the data — or rather, about fitting a LM to the data as they are? We haven’t really covered this in the module just yet, but think back to last year’s course, and assumptions about variances and residuals…: compare the scatter of the response values (blood)

  • between the lowest and highest toxin dose, with antidote A; and
  • between antidote A and B at the highest toxin dose.

We could try a log-transformation of the response BUT this comes with considerable complications in interpreting the model fit — something they don’t tell you when they say, “If you have inhomogeneity of variances, do a log-transform”!!

For this practical, we will proceed as if there were no issues with the data as they are. We will analyse the untransformed response values (not a log-transformed respose).

MCQ1: From just looking at the plot of the raw data, what statement best describes the conclusions from the experiment?

[_] The two antidotes, A and B, seem about equally effective.
[_] Antidote B seems to be increasingly harmful at higher doses.
[_] Antidote B seems more effective than A with high toxin doses, but less effective than A when the toxin dose is low.
[_] Antidote B seems generally more effictive than A, but the difference in efficacy depends on the dose toxin concentration.

2.2 dose: continuous or categorical?

TASK: Look at the plot of the raw data again and consider whether dose would best be treated as continuous or as categorical. Recall what we said about this issue in last week’s lecture.

We will first treat dose as a continuous EV in our analysis.

MCQ2: What is the most compelling reason for treating dose as a continuous EV?

[_] The response seems to closely follow a linear trend for both antidotes.
[_] The slopes of the linear trends seem to be about the same for both antidotes.
[_] The different doses tested are equally spaced.
[_] Treating any EV as continuous “saves” degrees of freedom and thus increases power.

3 Finally: the LM

3.1 Set up contrasts

OK, as I said above, for now we will treat dose as continuous, but antidote is obviously a two-level factor. As always,

  • remember to first check how categorical variables are coded in your data frame, and set them up correctly if needed; and…
  • use sum contrasts for categorical variables.

Hint: even though antidote is coded A and B (rather than numerically, as in previous practicals), you may still need to explicitly declare it as a factor before you can set up contrasts for it.

3.2 Fit the LM and interpret the results

TASK: Fit the LM that is most appropriate for answering this research question.

Interpret the results of your LM analysis to answer the following questions:

  1. What question does the test for an interaction relate to in this analysis?
  2. How strong is the evidence that the difference in the efficacy of the two antidotes is dependent on the dose of toxin administered?
  3. Would it have been sensible to fit a purely additive model?

MCQ3: Which of the following statements is most appropriate given your ANOVA results?

[_] The effectivenes of the treatment depends on both the dose of toxin administered \((P=0.00014)\) and on the antidote used \((P=2.86\times 10^{-5})\).
[_] The two antidotes differ in effectiveness, and this difference in their effectiveness depends on the dose of toxin administered \((P=0.000502)\).
[_] The effectivenes of the treatment depends on both the dose of toxin administered \((P=0.00593)\) and on the antidote used \((P=0.00017)\).
[_] The two antidotes differ in their effectiveness, and this difference in their effectiveness depends on the dose of toxin administered \((P=0.0152)\).

4 Plot the estimates and predict

We could do the interaction plot “by hand”, as I do for my lecture slides. In addition to consolidating your understanding of the estimated coefficients, this is a useful exercise in “ggplot-craft” — so if do you have time, I’d encourage you to give it a go, but you don’t have to.

4.1 Using emmeans for “ANCOVA” interaction plots

Instead, for the practical, we’ll give the emmeans package that I introduced in the lecture a spin. The emmip() function works a bit different if one of the EVs is continuous (“ANCOVA-like models”). I’ll save you the googling and give you the syntax for plotting the estimates in emmip if you have a factor interacting with a covariate:

emmip(model.fit, factor ~ covariate, cov.reduce = range)

If you want to know more, particularly what the cov.reduce = range bit is about and why it is needed, it is well explained here, in the web documentation of the emmeans package. (Also, if you omit the cov.reduce argument, you’ll see what happens without it…)

TASK: Use the emmip() function to visualise the model estimates as an interaction plot.

4.2 Prediction

But I won’t let you off that lightly, so you’ll need to think about the model coefficients after all… Because we have treated dose as continuous, we can actually make a prediction for a toxin dose that was not even used in the experiment!

As long as the dose we are predicting for is in the range of values used in the experiment, this amounts to interpolation and is safe. What would be dangerous would be extrapolation beyond the range that we have data for. Can you see why?

TASK: Use the model coefficients to predict the response for a toxin dose of 17 and antidote B.

IMPORTANT: As we have seen in this week’s Lecture Segment 4, if the interaction is between a continuous EV and a factor, then the coefficient(s) associated with the interaction term refer to differences in the slopes of the regression lines.

Recall the general ‘recipe’ for predicting \(y\) from a line fit: you multiply the slope with the value of the continuous variable on the \(x\)-axis (and then add the intercept). However, here, because of the interaction, the slope of each ‘antidote line’ is described by two coefficients:

  • the dose coefficient, which is the estimated ‘mean slope’ between the two ‘antidote lines’; and
  • the dose:antidote1 coefficient, which is the estimated deviation of the two actual ‘antidote line’ slopes from the ‘fictive’ mean slope.

In your prediction calculation, you therefore need to

  • either multiply both slope-related coefficients with the dose value you are predicting for; and then add them together with the other coefficients;
  • or first calculate the actual slope for antidote B from the two slope-related coefficients, and then multiply that with the dose value, before adding the other coefficients.

Looking at the last slide (#8) of my W5 Lecture Pt. 4 may be useful if you’re struggling to visualise whast you are doing here.

If you are unsure, come to the practical drop-in session before you submit your answers!

MCQ4: Which of the values best matches your prediction for a toxin dose of 17 and antidote B?

[_] 37.40
[_] 1.687
[_] 26.40
[_] 18.49

5 Treating dose as categorical

We’ll finish off by checking whether it was wise to treat dose as a continuous EV.

TASK: Fit a new model that treats dose as a factor (categorical EV).

Hint: You could do this from within the model formula, without changing the data frame itself.

MCQ5: Which of the following statements about treating dose as continuous or categorical is incorrect:

[_] Treating dose as continuous reduces the residual mean squares, despite a slight increase in the residual SSQ.
[_] Treating dose as a factor results in much stronger evidence for its interaction with antidote.
[_] Treating dose as continuous saves four degrees of freedom in the model.
[_] Treating dose as a factor does not appreciably improve model fit.

Finally, and no MCQ for this one (!), let’s see what the interaction plot looks like if we treat dose as a factor.

TASK: Use emmip() from the emmeans package to create an interaction plot for the model that treats dose as a factor. Note there are two ways of plotting this that amount to the same thing. Can you figure out what the two ways of plotting are, and which one is better for comparing with the continuous dose version of the model?