Hello!

This is an RMarkdown document that I’ve created for you to help you through your mixed effects models - it allows me to embed code into a document so you can copy and paste it into your RStudio session and run it without needing me to be there to talk you through it 😀 It’s a very snazzy tool that you can use within R - so I’ve written this document using my RStudio window (and I’m playing with it’s functionality for teaching, so using you as a guineapig!)

Mixed effects models in R (or GLMMs)

So, first things first we will library all your important packages, and then import your data into R in the way that you’ve always done, using the file.choose() argument wrapped inside the read.csv() function…

library(tidyverse) # This has lots of your basic R functionality in - including ggplot2
library(lme4) # This is the package you use to run your mixed effects models
library(car) # This is the package you use to extract your p-values from your models with the Anova() function
library(cAIC4) # This package will help you in your stepwise model selection process - see later on in the document!
library(emmeans) # This is the package you use to run your pairwise comparisons

coviddata <- read.csv(file.choose(), header = TRUE, stringsAsFactors = TRUE, na.strings = "")

Check that all of your columns are the format that you’re expecting them to be, and don’t forget that number of Ocrelizumab infusions will be numeric rather than a factor, so we should convert that with the following code:

coviddata$number.of.Ocrelizumab.infusions <- as.factor(coviddata$number.of.Ocrelizumab.infusions)

Right, now we’ve got our data imported into R, and everything is the correct class, the fun can begin! We need to run what is known as a fully saturated model (i.e. a model that includes ALL fixed terms, all sensible interactions between these fixed terms, and our random factor of patient ID). We use the lmer() function within the lme4 package to do this, like so:

m1 <- lmer(Spike.Bau ~ number.of.Ocrelizumab.infusions*days.between.blood.test.and.last.infusion*infusion.after.vaccine*time.interval.between.covid.vaccine.and.infusion+(1|K.number), data = coviddata)

…and then you can check your AIC

AIC(m1)

Now we get to the clever bit - there is a nifty package called cAIC4 which will do your stepwise deletions and compare your AICs for you - remember that the model with the AIC closest to zero is the model that best explains why your data look as they do (although here we’re using something called a conditional AIC, which is a derivation of a traditional AIC often used specifically for mixed effects models for mathematical reasons that you don’t need to worry too much about.

So, you use the stepcAIC() function from within the cAIC4 package to do this stepwise deletion for you, and produce an output with the best fitting model and its cAIC, we do this as follows:

stepwise <- stepcAIC(m1, direction = "backward", returnResult = TRUE, trace = TRUE)

This looks a bit scary and it’s not obvious whats going on here, so I’ll explain what each of the bits is doing!

When you run this line of code, you should end up with an output that tells you the best fitting model. In my case (and yours!), we end up with the best fitting model m2. When we run m2, and ask the Anova() function to extract us some p-values, we can see some significance in our results!

m2 <- lmer(Spike.Bau ~ number.of.Ocrelizumab.infusions + days.between.blood.test.and.last.infusion + infusion.after.vaccine + time.interval.between.covid.vaccine.and.infusion + (1 | K.number), data = coviddata)

Anova(m2)

Note: you do have to add some bits to the model yourself to make it run properly - the output that stepcAIC gives you is not functional R code

Now we have the best fitting model (write it down, it’s important for you to mention both the above process and what the final model is in your results section!!), we can begin our pairwise comparisons, using the emmeans package. Generally, you would only do pairwise comparisons on significant results, but here we are specifically interested in what might be hiding in the Ocrelizumab treatment numbers factor in our data - it’s possible that a significant result might be being masked by lots of non-significance elsewhere, so we’ll investigate:

pairsm2 <- emmeans(m2, specs = "number.of.Ocrelizuman.infusions", adjust = "none")

This bit of code works like this:

Once you’ve run that, you can ask R to show you a table of the pairwise comparisons

pairs(pairsm2)

You should get a print-out of your results that looks a little like this…

… which contains all your test statistics and p-values and other useful stuff that you need to write about your results properly! You can either copy these into an excel document, or if you’re feeling clever, you can write this output directly to a .csv file to avoid faffing with copying and pasting R output (which is famously a bit of a pain!)

write.csv(pairs(pairsm2), file = "Ocrelizumab_infusions_pairs.csv")

Plotting your data

Now you’ve got your results in your table, you’ll want to produce plots to accompany them (note: it can be useful to produce plots first, but as we learned with the number of Ocrelizumab treatments, it can cause you to be tempted to ‘p-value fish’ in a way that is not statistical best-practise, so be careful of this!)

Here, we will use the package ggplot2 to create you some nice figures to accompany your GLMM results, and convince both yourself and your reader that your results are reflecting what you know to be true of your data. The “gg” in ggplot2 stands for “Grammar of Graphics”, and ggplot2 follows a defined set of grammatical (sort of) rules, some of which you’ll learn about below…

In order to get a basic boxplot of your data, we use the ggplot2 package, and the following code:

ggplot(aes(x = number.of.Ocrelizumab.infusions, y = Spike.Bau), data = coviddata)+
  geom_boxplot()

The way the code for this is structured is as follows:

As you can see, this isn’t a particularly pretty boxplot - we can improve it by adding new aesthetics, like this:

ggplot(aes(x = number.of.Ocrelizumab.infusions, y = Spike.Bau), data = coviddata)+
  geom_boxplot(aes(fill = number.of.Ocrelizumab.infusions))

Here, we’ve added a new aesthetic, or aes to our plot, and we are asking R to create a mapping or a link between the fill colour of the boxes and our variable for number of Ocrelizumab infusions (fill = number.of.Ocrelizumab.infusions).

That looks a bit nicer, although we’ve just found that none of the pairwise comparisons are significantly different, but this graph makes it look as though they are! We can sense-check ourselevs and our results by thinking about what the distribution of points within these boxes looks like. We can do that by adding an extra line of code to our plot…

ggplot(aes(x = number.of.Ocrelizumab.infusions, y = Spike.Bau), data = coviddata)+
  geom_boxplot(aes(fill = number.of.Ocrelizumab.infusions))+
  geom_point()

Notice that we put the geom_point() underneath the geom_boxplot() function - this is beacuse ggplot2 will plot things in the order that you tell it to - if we tell it to plot the points first, our graph looks like this:

ggplot(aes(x = number.of.Ocrelizumab.infusions, y = Spike.Bau), data = coviddata)+
  geom_point()+
  geom_boxplot(aes(fill = number.of.Ocrelizumab.infusions))

Going back to the plot above this last one, where we can actually see the points, we begin to realise that with only 4 points in our 2 Ocrelizumab treatments group, perhaps our initial boxplot might be a little misleading. We can either remove the boxplot all together, leaving only the points, or we can convert the boxplot to a violin plot to give us an idea of the density of the points in each group. We do this by simply swapping geom_boxplot(...) for geom_violin(...)

ggplot(aes(x = number.of.Ocrelizumab.infusions, y = Spike.Bau), data = coviddata)+
  geom_violin(aes(fill = number.of.Ocrelizumab.infusions))+
  geom_point()

This is much better, and helps us to understand why our pairwise comparisons are the way that they are. However, if we look at the results of our best fitting model (m2), we can see that whether a participant has had their vaccine before or after their infusion is significant, so this might be something that we might want to explore graphically. We will run similar code to before, but swap the fill aesthetic from the violins to the points, and tell R that we want the aes() to map the colour of the points to data about whether the patient has had an infusion before their covid vaccine. We do this like this:

ggplot(aes(x = number.of.Ocrelizumab.infusions, y = Spike.Bau), data = coviddata)+
  geom_violin()+
  geom_point(aes(colour = infusion.after.vaccine))

Notice that when we swapped our aes down into the geom_point() we also changed fill = ... to colour = .... This is just because the default point shape that R uses is a solid circle, so you’re changing the colour of a solid line, not the fill of an empty circle!

Now, we’ve got a graph that shows us something potentially quite useful. We could make it a little easier to understand though - the points are crowded on top of oneanother and a bit tricky to see… We will swap the geom_point(...) for geom_jitter() and see if we can improve it!

ggplot(aes(x = number.of.Ocrelizumab.infusions, y = Spike.Bau), data = coviddata)+
  geom_violin()+
  geom_jitter(aes(colour = infusion.after.vaccine), width = 0.1)

We’ve used geom_jitter() because it spreads the points that are on top of oneanother out slightly to make them easier to see. You can play with the width = 0.1 argument to edit how far the points spread. Although this is much more useful than before, it’s still not the best looking plot we could make it. We will play with the aes of the geom_jitter() to try and make it a little neater!

ggplot(aes(x = number.of.Ocrelizumab.infusions, y = Spike.Bau), data = coviddata)+
  geom_violin()+
  geom_jitter(aes(fill = infusion.after.vaccine), pch = 21, size = 2, colour = "black", alpha = 0.7, width = 0.1)

This looks a bit nicer, but what have we changed?

Now, go away and have fun playing with making your plot look pretty! I’ve had a quick go, and am challenging you to make yours look as nice as this one… 😀😀

All the stuff you need to be able to this is available here: http://www.sthda.com/english/wiki/ggplot2-essentials