Introduction

Recent statistical work within the discipline of neuroscience has been insufficient and lacking in well reasoned assumptions that are otherwise necessary for causal inference and reproducibility. In particular, where cluster data are present they must be analyzed as such, given that multiple measures within a subject will be auto-correlated. To demonstrate the detriments of using naïve measures alone, Moen et al. published an article detailing the comparative approaches of statistical methods within a clustered dataset. In mice models of soma size both pten knockout (within mice) and fatty acid (among mice) exposures were studied and analyzed using a variety of model approaches. Presented here is an attempt to reconstruct findings from the original publication, particularly Tables 3 through 6 and the interclass correlation (ICC) calculation.1

Methods

To reproduce both Tables 3 through 6 and the ICC, data from the original publication was downloaded from GitHub in the form of a CSV file, as made available by the authors. The file was read into R version 3.3.2 using the readr package and variables were coerced to appropriate types using the magrittr and dplyr packages.2, 3, 4, 5 Similarly, subsetting of the experiment data was achieved using the magrittr and dplyr packages, with different models requiring varied subsetting and summary.6, 7 All syntax used for data manipulation is available on GitHub via the following link, https://github.com/schifferl/assignment4.

It was necessary to consider five model approaches in order to reproduce the results of Moen et al., the details of which are as follows.

First, a unweighted linear regression model, as shown in equation 1, was used to reconstruct the completely naïve models. Such models assumed no weighting and complete independence of repeated measures. This type of model was implemented in R using the stats package.8

\[Y_{i,j}=\beta_0+\beta_1X_{i,j}+\epsilon_{i,j}\]

Equation 1 - Unweighted Linear Regression Model

Second, a weighted linear regression model, as shown in Equation 2, was used to reconstruct the naïve models that considered analytic weights. Such models assumed that the mean values of the variables sufficiently captured the variance of the repeated measures. This type of model was implemented in R using the stats package.9

\[Y_i=\beta_0\bar{X_i}+\bar\epsilon_i\]

Equation 2 - Weighted Linear Regression Model

Third, a marginal regression model, as shown in Equation 3, was used to reconstruct the models that recognized the data as multilevel and collapsed over the margins of the levels. Such models, recognized the hierarchical nature of the data and retained inferential power by borrowing strength across the multiple tiers. This type of model was implemented in R using the gee package.10

\[E[Y_{i,j}|X_{i,j}]=\beta_0+\beta_1X_{i,j}\]

Equation 3 - Marginal Regression Model

Fourth, a fixed-effects regression model, as shown in Equation 4, was used to reconstruct the models of fixed-effects. Such models assumed that effect estimates were related to an underlying normal distribution of possible effect estimates. This type of model was implemented in R using the stats package.11

\[Y_{i,j}=\beta_0+\beta_1ind_{i,1}+...+\beta_Kind_{i,K}+\beta_{K+1}X_{i,j}+\epsilon_{i,j}\]

Equation 4 - Fixed-Effects Regression Model

Fifth, a mixed-effects regression model, as shown in Equation 5, was used to reconstruct the models of mixed-effects. Such models assumed that outcomes were best estimated by combining fixed and normally distributed random effects, with normally distributed fixed residuals. This type of model was implemented in R using the lme4 and lmerTest packages.12, 13

\[Y_{i,j}=\beta_0+\beta_1X_{i,j}+\theta_i+\epsilon_{i,j}\]

Equation 5 - Mixed-Effects Regression Model

Additionally, in order to account for differences in methods between Stata, the software of the original publication, and R it was necessary to consider the Huber Sandwich Estimator, as seen in Equation 6. 14

\[\sum^n_{i=1}g_i(Y_i|\hat\theta)^Tg_i(Y_i|\hat\theta)\]

Equation 6 - Huber Sandwich Estimator

From the Huber Sandwich Estimator, robust standard errors could be estimated by computing the roots of the diagonals of the variance matrix, as shown in Equation 7.

\[\hat{V}=(A)^{-1}B(A)^{-1}\]

Equation 7 - Huber Sandwich Estimator Variance Matrix

Finally, three custom functions using the knitr package were written to abstract the results of model objects and statistical tests into tables.15, 16, 17 Methods used to display the table of type 3 ICC drew upon ICC methods written in the psych package.18

Results

As shown in Table 1, it was possible to reproduce the results of fatty acid exposure on soma size from different regression models almost without any deviation from the published values. However, it is noted that the standard errors of marginal and mixed-effect regression did not match the published values, being slightly smaller and larger, respectively.

Coefficient Standard Error P-Value 95% CI Lower Limit 95% CI Upper Limit
Neuron-level linear regression 3.151 1.905 0.099 -0.598 6.900
Mouse-level regression (mean); no weighting 0.612 6.376 0.926 -14.466 15.690
Mouse-level regression (mean); analytic weights 3.151 4.731 0.527 -8.037 14.339
Marginal regression 2.310 4.580 0.614 -6.667 11.287
Mixed-effect regression 1.359 5.768 0.822 -9.818 12.319

Table 1 - Reproduced results of fatty acid exposure on soma size from different regression models (Table 3)

As shown in Table 2, it was possible to reproduce the results of Pten knockdown effect on soma size from different regression models almost without any deviation from the published values. However, it is noted that the standard errors of marginal and mixed-effect regression did not match the published values, being slightly smaller in either case.

Coefficient Standard Error P-Value 95% CI Lower Limit 95% CI Upper Limit
Neuron-level linear regression 11.039 1.976 < 0.001 7.154 14.924
Marginal regression 11.511 2.196 < 0.001 7.206 15.816
Fixed-effect regression 11.540 1.862 < 0.001 7.880 15.200
Mixed-effect regression 11.509 1.859 < 0.001 7.855 15.147
Mixed effect regression with Pten as an additional predictor 11.540 1.862 < 0.001 7.888 15.192

Table 2 - Reproduced results of Pten knockdown effect on soma size from different regression models (Table 4)

As shown in Table 3, it was possible to reproduce the results of (Pten) knockdown effect on soma size from different regression models almost without any deviation from the published values, to the exception of the mixed-effect regression estimate. Additionally, it is noted that the standard errors of marginal and mixed-effect regression did not match the published values, being slightly smaller and larger, respectively.

Coefficient Standard Error P-Value 95% CI Lower Limit 95% CI Upper Limit
Mouse-level regression (Pten); weighting -0.239 0.666 0.744 -2.359 1.881
Marginal regression -0.110 0.177 0.536 -0.457 0.238
Mixed-effect regression 0.011 0.381 0.978 -0.700 0.733
Mixed-effect regression with Pten as an additional predictor -0.102 0.378 0.799 -0.808 0.614

Table 3 - Reproduced results of (Pten) knockdown effect on soma size from different regression models (Table 5)

As shown in Table 4, it was only possible to reproduce the of interaction between Pten knockdown and fatty acid exposure models to a limited extent, with the coefficient for marginal regression alone attaining the published value. Additionally, it is noted that it was not possible to reproduce any of the published standard errors.

Coefficient Standard Error P-Value 95% CI Lower Limit 95% CI Upper Limit
Marginal regression 10.380 4.830 0.032 0.914 19.846
Fixed-effect regression 9.981 3.042 0.001 4.009 15.954
Mixed-effect regression 10.137 3.041 < 0.001 4.219 16.146

Table 4 - Reproduced results of interaction between Pten knockdown and fatty acid exposure models (Table 6)

Finally, it was possible to reproduce the calculated intraclass correlation coefficient measure using type 3 ICC, as shown in Table 5. While the publication had a rounded value of 0.2 and no further information on confidence, the estimate shown below, when rounded, would be equal to the published value.

Type ICC F DF1 DF2 P-Value 95% CI Lower Limit 95% CI Upper Limit
ICC3 0.172 1.416 13.000 13.000 0.270 -0.375 0.630

Table 5 - Reproduced intraclass correlation coefficient (ICC)

Discussion

In general, the results of the original publication were found to be reproducible, at least from a methodological standpoint if not from an accuracy one. Model specification from the Moen et al. publication was sufficiently clear so as to be replicated and data was provided along with some additional information regarding software, a step above and beyond what is traditionally done. These points aside, there was extensive difficulty in replicating robust standard errors and coefficient estimates of marginal and mixed-effects models. The issue is perhaps more related to implementation of statistical methods across software and not specific to this analysis alone, yet should be a principle concern of the original publication as a major aim was to provide an analysis that was reproducible. Regardless, the Moen et al. analysis was important in proving the shortcomings of recent work within the discipline of neuroscience in regards to statistical methods. Not only did it illuminate the necessity to treat clustered data properly but illustrated that reproducibility is not only a technical question but a methodological concern as well.

References


  1. Moen, E. L., Fricano-Kugler, C. J., Luikart, B. W. & O’Malley, A. J. Analyzing Clustered Data: Why and How to Account for Multiple Observations Nested within a Study Participant? PLOS ONE 11, e0146721 (2016).

  2. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  3. Hadley Wickham, Jim Hester and Romain Francois (2016). readr: Read Tabular Data. R package version 1.0.0. https://CRAN.R-project.org/package=readr

  4. Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr

  5. Hadley Wickham and Romain Francois (2016). dplyr: A Grammar of Data Manipulation. R package version 0.5.0. https://CRAN.R-project.org/package=dplyr

  6. Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr

  7. Hadley Wickham and Romain Francois (2016). dplyr: A Grammar of Data Manipulation. R package version 0.5.0. https://CRAN.R-project.org/package=dplyr

  8. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  9. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  10. Vincent J Carey. Ported to R by Thomas Lumley and Brian Ripley. Note that maintainers are not available to give advice on using a package they did not author. (2015). gee: Generalized Estimation Equation Solver. R package version 4.13-19. https://CRAN.R-project.org/package=gee

  11. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

  12. Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

  13. Alexandra Kuznetsova, Per Bruun Brockhoff and Rune Haubo Bojesen Christensen (2016). lmerTest: Tests in Linear Mixed Effects Models. R package version 2.0-33. https://CRAN.R-project.org/package=lmerTest

  14. Freedman, D. A. On The So-Called ‘Huber Sandwich Estimator’ and ‘Robust Standard Errors’. The American Statistician 60, 299–302 (2006).

  15. Yihui Xie (2016). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.15.1.

  16. Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963

  17. Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595

  18. Revelle, W. (2016) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version = 1.6.9.