Overview

In their 2019 Cell paper, Sharon et al claim that mice receiving gut microbiota via faecal transplants from autistic donors show increased “autism-like” behaviours. Several commentators (including myself) had concerns about the inferences drawn from the data, which I summarised in a Medium post.

However, a re-analysis by professor of biostatistics Thomas Lumley goes further, suggesting that there were errors in the SPSS syntax used to analyse the data. Specifically, mice receiving faecal transplants from the same donor were treated as if they were independent. In other words, the numerical effect sizes were evaluated as if there had been >200 donors when in fact there were between 8. Lumley has since summarised his blogpost on PubPeer.

Here, my aim is to extend Lumley’s critique to other analyses reported in the paper. I create two functions based on Lumley’s code.

The first (SharonAnalysis) attempts to replicate as closely as possible Sharon’s SPSS analysis in R. It treats each datapoint incorrectly as being independent. Across all the measures, there is qualitative agreement with the results reported by Sharon et al.

The second function (LumleyAnalysis) provides an alternative analysis that correctly treats Donor as a random factor. In other words, it takes into account the fact that multiple mice had the same donor. Now, all but one of the effects disappears. This is the case, whether we consider the whole available dataset or just the data from the 8 donors featured in Figure 1.

Datasets

Lumley’s re-analyses focus on Figure 1 of the paper which presents data from 8 donors (5 autistic children and 3 neurotypical controls) on three tasks. This dataset is available for upload from the Mendeley repository.

data_8<-read.csv("https://data.mendeley.com/datasets/ngzmj4zkms/1/files/fb7927df-a1ea-4257-bf2f-33fecc4090e9/Fig1EFGH_subset8.csv?dl=1")

However, these 8 donors were a subset of a larger sample of 16. There were also two further tasks completed by at least some of the mice. The full data was presented in Supplementary Figure 2 and is also available from Mendeley.

data_16<-read.csv("https://data.mendeley.com/datasets/ngzmj4zkms/1/files/1d419e86-f908-455e-b32b-9dc8672377b9/FigS2ABCDE_cyc1_3.csv?dl=1")

Libraries

We’ll make use of the following R libraries. I’m adding dplyr to those used by Lumley because itss filter command is far more intuitive than base R.

library(beeswarm)
library(lmerTest)
library(pbkrtest)
library(nlme)
library(dplyr)
library(knitr)

Functions

BeePlot

Here we’re borrowing the code from Lumley’s analysis and turning it into a function.

BeePlot <- function (DV, Data=data_8) {
  Data$DV <- Data[[DV]]
  beeswarm(DV~ASD_diagnosis,data=Data,method="center",pch=19,pwcol=Donor)
}

SharonAnalysis

To replicate Sharon et al.’s SPSS analysis as closely as possible in R, Lumley used the gls from the nlme package. The model (incorrectly) treats each data point as independent.

Here’s an example of the code from Lumley’s post.

cell$ASD_sum<-C(cell$ASD_diagnosis,"contr.sum")
cell$Gender_sum<-C(cell$Gender,"contr.sum")
marbles_gls <- gls(MB_buried~ASD_sum*Gender_sum, weights=varIdent(form=~1|Donor),data=cell)

I’ve turned this into a function that can be run on any dependent variable and dataset.

Also note that the default in this function is for Gender to be included in the model. However, for ultrasonic vocalizations (see below) only male mice were tested so we need the flexibility to remove Gender from the model.

SharonAnalysis <- function (DV, Data=data_8, incGender=TRUE) {
  Data$DV <- Data[[DV]]
  Data <- filter(Data, is.na(DV)==FALSE)
  Data$ASD_sum<-C(Data$ASD_diagnosis,"contr.sum")
  Data$Gender_sum<-C(Data$Gender,"contr.sum")
  if (incGender==TRUE) {
    myGls <- gls(DV~ASD_sum*Gender_sum, weights=varIdent(form=~1|Donor),data=Data)
  } else {
    myGls <- gls(DV~ASD_sum, weights=varIdent(form=~1|Donor),data=Data)
  }
  myAnova <- anova(myGls, type="marginal")
  kable(myAnova, digits=3)
}

LumleyAnalysis

Lumley also re-analyses the data using lmer in the lmerTest package. He includes Donor as a random factor, which captures the non-independence of mice receiving faecal transplants from the same donor. Eventually, he arrives at the following model:

marbles<-lmer(MB_buried~ASD_sum*Gender_sum+(1|Round)+(1|Donor),data=cell)

Note that the model also includes Round as a random factor. Round I think refers to the round of testing and was included because he was attempting to match the description of the analysis in the paper. However, given that each mouse only participated in one Round, I don’t think it makes sense to include it as a random factor. I could be wrong on this, though!

Again, I’ve turned this into a function that we can use for different datasets and dependent variables.

LumleyAnalysis <- function (DV, Data=data_8, incGender=TRUE) {
  Data$DV <- Data[[DV]]
  if (incGender==TRUE) {
    myLmer <- lmer(DV~ASD_diagnosis*Gender+(1|Donor),data=Data)
  } else {
    myLmer <- lmer(DV~ASD_diagnosis+(1|Donor),data=Data)
  }
  myAnova <- anova(myLmer, ddf="Kenward-Roger")
  kable(myAnova, digits=3)  
}

Marble-burying

First of all, the marble-burying test. This provided the most compelling evidence for group differences. Here’s the data from 8 donors.

BeePlot ("MB_buried")

Sharon et al. reported a significant effect of donor diagnosis, which we can get close to replicating.

SharonAnalysis ("MB_buried")
numDF F-value p-value
(Intercept) 1 515.682 0.000
ASD_sum 1 20.476 0.000
Gender_sum 1 5.790 0.017
ASD_sum:Gender_sum 1 3.776 0.053

Under Lumley’s analysis of the same data, including Donor as a random factor, there is still a significant effect of diagnostic group. But it’s only just significant.

LumleyAnalysis ("MB_buried")
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 166.916 166.916 1 5.277 8.338 0.032
Gender 94.329 94.329 1 200.378 4.712 0.031
ASD_diagnosis:Gender 58.872 58.872 1 200.378 2.941 0.088

Repeating this analysis on the larger dataset with 16 donors, the group effect is still significant. Note that here we are comparing across three groups of donors - ASD, Mild ASD, and Neurotypical.

BeePlot ("MB_buried", Data=data_16)

LumleyAnalysis ("MB_buried", Data=data_16)
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 159.250 79.625 2 13.558 4.080 0.041
Gender 64.998 64.998 1 322.698 3.331 0.069
ASD_diagnosis:Gender 60.719 30.359 2 323.141 1.556 0.213

Direct Social Interaction Test

Sharon et al reported that mice with autistic donors showed reduced sociability on the Direct Social Interaction test. Here are their data.

BeePlot ("DSI_Social_duration")

Using Lumley’s approximation of Sharon et al.’s analysis, we do achieve a significant effect of group.

SharonAnalysis ("DSI_Social_duration")
numDF F-value p-value
(Intercept) 1 385.086 0.000
ASD_sum 1 5.854 0.017
Gender_sum 1 54.050 0.000
ASD_sum:Gender_sum 1 3.024 0.085

Here, however, is Lumley’s analysis replicated. With donor included in the model, there is no suggestion of a difference between mice with autistic and non-autistic donors.

LumleyAnalysis ("DSI_Social_duration")
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 999.073 999.073 1 5.935 2.692 0.152
Gender 18364.834 18364.834 1 120.364 49.488 0.000
ASD_diagnosis:Gender 339.476 339.476 1 120.364 0.915 0.341

Sharon et al. only tested mice from the 8 donors so there is no larger dataset to investigate here.

Open field test

Sharon et al reported that mice with autistic donors showed reduced locomotion (i.e., travelled less distance) during a 10 minute session in a large empty box. Here are the data for the 8 donors.

BeePlot("OFT_Distance")

Under Sharon et al.’s analysis, treating each mouse as an independent data point, there is a significant effect of ASD diagnosis

SharonAnalysis("OFT_Distance")
numDF F-value p-value
(Intercept) 1 4336.040 0.000
ASD_sum 1 15.345 0.000
Gender_sum 1 2.378 0.125
ASD_sum:Gender_sum 1 1.871 0.173

However, this disappears under Lumley’s re-analysis.

LumleyAnalysis("OFT_Distance")
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 1103717.52 1103717.52 1 5.820 1.265 0.305
Gender 648794.07 648794.07 1 197.444 0.744 0.390
ASD_diagnosis:Gender 20826.11 20826.11 1 197.444 0.024 0.877

And repeated with the full 16 participants. Again, no effect of group.

BeePlot("OFT_Distance", Data=data_16)

LumleyAnalysis("OFT_Distance", Data=data_16)
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 270645.5 135322.8 2 13.369 0.153 0.859
Gender 2945604.7 2945604.7 1 318.885 3.335 0.069
ASD_diagnosis:Gender 938860.7 469430.4 2 318.925 0.531 0.588

Sharon et al reported that there was a difference between the ASD group and the NT group, ignoring the Mild ASD cases. We can test that too. But again, we find no difference.

LumleyAnalysis("OFT_Distance", Data=filter(data_16,ASD_diagnosis!="Mild ASD"))
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 193937.3 193937.3 1 11.027 0.229 0.642
Gender 4033524.6 4033524.6 1 277.298 4.761 0.030
ASD_diagnosis:Gender 812867.1 812867.1 1 277.298 0.959 0.328

Ultrasonic vocalizations

Now moving onto the additional measures that Lumley did not look at. First, ultrasonic vocalizations. Sharon et al reported a significant group difference between mice with ASD and NT donors, excluding those whose donors had “Mild ASD”. But they didn’t include it in Figure 1, presumably because the donors were not the same as those in the other analyses.

BeePlot("USV_Duration", Data=data_16)

Here’s our approximation of Sharon et al.’s analysis, treating each mouse as independent, excluding mice with Mild ASD donors. Note that mouse gender is not included in the model because only male mice were tested.

SharonAnalysis("USV_Duration", Data=filter(data_16,ASD_diagnosis!="Mild ASD"), incGender=FALSE)
## Warning: contrasts dropped from factor ASD_sum due to missing levels
numDF F-value p-value
(Intercept) 1 49.058 0.000
ASD_sum 1 8.358 0.005

However, when we conduct the analysis with Donor as a random factor, we find no effect.

LumleyAnalysis("USV_Duration", Data=filter(data_16,ASD_diagnosis!="Mild ASD"), incGender=FALSE)
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 171.012 171.012 1 6.939 0.938 0.365

There’s also no effect if we make the three-way comparison between ASD, Mild ASD, and NT.

LumleyAnalysis("USV_Duration", Data=data_16, incGender=FALSE)
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 229.392 114.696 2 8.915 0.564 0.588

Three chambers test

Finally, we come to the Three Chambers test - a widely used assessment of sociability in mice. Here is the data for the 8 donors.

BeePlot("SI_index")

Sharon et al chose not to feature these data in Figure 1, presumably because even in their analysis, the group difference was not significant.

SharonAnalysis("SI_index")
numDF F-value p-value
(Intercept) 1 54.768 0.000
ASD_sum 1 0.829 0.364
Gender_sum 1 4.547 0.034
ASD_sum:Gender_sum 1 0.095 0.758

It seems almost pointless at this stage. But for completeness, here is Lumley’s analysis applied to the three chambers test. With 8 donors, the group difference is non-significant.

LumleyAnalysis("SI_index", Data=data_8)
## boundary (singular) fit: see ?isSingular
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 41.907 41.907 1 4.469 0.578 0.485
Gender 416.914 416.914 1 199.986 5.750 0.017
ASD_diagnosis:Gender 23.188 23.188 1 199.986 0.320 0.572

We can also do this for the full dataset, again showing no group difference.

LumleyAnalysis("SI_index", Data=data_16)
## boundary (singular) fit: see ?isSingular
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
ASD_diagnosis 340.183 170.092 2 12.848 1.457 0.269
Gender 3.903 3.903 1 324.599 0.034 0.855
ASD_diagnosis:Gender 23.454 11.727 2 325.315 0.101 0.904