Recent statistical work within the discipline of neuroscience has been insufficient and lacking in well reasoned assumptions that are otherwise necessary for causal inference and reproducibility. In particular, where cluster data are present they must be analyzed as such, given that multiple measures within a subject will be auto-correlated. To demonstrate the detriments of using naïve measures alone, Moen et al. published an article detailing the comparative approaches of statistical methods within a clustered dataset. In mice models of soma size both pten knockout (within mice) and fatty acid (among mice) exposures were studied and analyzed using a variety of model approaches. Presented here is an attempt to reconstruct findings from the original publication, particularly Tables 3 through 6 and the interclass correlation (ICC) calculation.1
To reproduce both Tables 3 through 6 and the ICC, data from the original publication was downloaded from GitHub in the form of a CSV file, as made available by the authors. The file was read into R version 3.3.2 using the readr package and variables were coerced to appropriate types using the magrittr and dplyr packages.2, 3, 4, 5 Similarly, subsetting of the experiment data was achieved using the magrittr and dplyr packages, with different models requiring varied subsetting and summary.6, 7 All syntax used for data manipulation is available on GitHub via the following link, https://github.com/schifferl/assignment4.
It was necessary to consider five model approaches in order to reproduce the results of Moen et al., the details of which are as follows.
First, a unweighted linear regression model, as shown in equation 1, was used to reconstruct the completely naïve models. Such models assumed no weighting and complete independence of repeated measures. This type of model was implemented in R using the stats package.8
\[Y_{i,j}=\beta_0+\beta_1X_{i,j}+\epsilon_{i,j}\]
Equation 1 - Unweighted Linear Regression Model
Second, a weighted linear regression model, as shown in Equation 2, was used to reconstruct the naïve models that considered analytic weights. Such models assumed that the mean values of the variables sufficiently captured the variance of the repeated measures. This type of model was implemented in R using the stats package.9
\[Y_i=\beta_0\bar{X_i}+\bar\epsilon_i\]
Equation 2 - Weighted Linear Regression Model
Third, a marginal regression model, as shown in Equation 3, was used to reconstruct the models that recognized the data as multilevel and collapsed over the margins of the levels. Such models, recognized the hierarchical nature of the data and retained inferential power by borrowing strength across the multiple tiers. This type of model was implemented in R using the gee package.10
\[E[Y_{i,j}|X_{i,j}]=\beta_0+\beta_1X_{i,j}\]
Equation 3 - Marginal Regression Model
Fourth, a fixed-effects regression model, as shown in Equation 4, was used to reconstruct the models of fixed-effects. Such models assumed that effect estimates were related to an underlying normal distribution of possible effect estimates. This type of model was implemented in R using the stats package.11
\[Y_{i,j}=\beta_0+\beta_1ind_{i,1}+...+\beta_Kind_{i,K}+\beta_{K+1}X_{i,j}+\epsilon_{i,j}\]
Equation 4 - Fixed-Effects Regression Model
Fifth, a mixed-effects regression model, as shown in Equation 5, was used to reconstruct the models of mixed-effects. Such models assumed that outcomes were best estimated by combining fixed and normally distributed random effects, with normally distributed fixed residuals. This type of model was implemented in R using the lme4 and lmerTest packages.12, 13
\[Y_{i,j}=\beta_0+\beta_1X_{i,j}+\theta_i+\epsilon_{i,j}\]
Equation 5 - Mixed-Effects Regression Model
Additionally, in order to account for differences in methods between Stata, the software of the original publication, and R it was necessary to consider the Huber Sandwich Estimator, as seen in Equation 6. 14
\[\sum^n_{i=1}g_i(Y_i|\hat\theta)^Tg_i(Y_i|\hat\theta)\]
Equation 6 - Huber Sandwich Estimator
From the Huber Sandwich Estimator, robust standard errors could be estimated by computing the roots of the diagonals of the variance matrix, as shown in Equation 7.
\[\hat{V}=(A)^{-1}B(A)^{-1}\]
Equation 7 - Huber Sandwich Estimator Variance Matrix
Finally, three custom functions using the knitr package were written to abstract the results of model objects and statistical tests into tables.15, 16, 17 Methods used to display the table of type 3 ICC drew upon ICC methods written in the psych package.18
As shown in Table 1, it was possible to reproduce the results of fatty acid exposure on soma size from different regression models almost without any deviation from the published values. However, it is noted that the standard errors of marginal and mixed-effect regression did not match the published values, being slightly smaller and larger, respectively.
| Coefficient | Standard Error | P-Value | 95% CI Lower Limit | 95% CI Upper Limit | |
|---|---|---|---|---|---|
| Neuron-level linear regression | 3.151 | 1.905 | 0.099 | -0.598 | 6.900 |
| Mouse-level regression (mean); no weighting | 0.612 | 6.376 | 0.926 | -14.466 | 15.690 |
| Mouse-level regression (mean); analytic weights | 3.151 | 4.731 | 0.527 | -8.037 | 14.339 |
| Marginal regression | 2.310 | 4.580 | 0.614 | -6.667 | 11.287 |
| Mixed-effect regression | 1.359 | 5.768 | 0.822 | -9.818 | 12.319 |
Table 1 - Reproduced results of fatty acid exposure on soma size from different regression models (Table 3)
As shown in Table 2, it was possible to reproduce the results of Pten knockdown effect on soma size from different regression models almost without any deviation from the published values. However, it is noted that the standard errors of marginal and mixed-effect regression did not match the published values, being slightly smaller in either case.
| Coefficient | Standard Error | P-Value | 95% CI Lower Limit | 95% CI Upper Limit | |
|---|---|---|---|---|---|
| Neuron-level linear regression | 11.039 | 1.976 | < 0.001 | 7.154 | 14.924 |
| Marginal regression | 11.511 | 2.196 | < 0.001 | 7.206 | 15.816 |
| Fixed-effect regression | 11.540 | 1.862 | < 0.001 | 7.880 | 15.200 |
| Mixed-effect regression | 11.509 | 1.859 | < 0.001 | 7.855 | 15.147 |
| Mixed effect regression with Pten as an additional predictor | 11.540 | 1.862 | < 0.001 | 7.888 | 15.192 |
Table 2 - Reproduced results of Pten knockdown effect on soma size from different regression models (Table 4)
As shown in Table 3, it was possible to reproduce the results of (Pten) knockdown effect on soma size from different regression models almost without any deviation from the published values, to the exception of the mixed-effect regression estimate. Additionally, it is noted that the standard errors of marginal and mixed-effect regression did not match the published values, being slightly smaller and larger, respectively.
| Coefficient | Standard Error | P-Value | 95% CI Lower Limit | 95% CI Upper Limit | |
|---|---|---|---|---|---|
| Mouse-level regression (Pten); weighting | -0.239 | 0.666 | 0.744 | -2.359 | 1.881 |
| Marginal regression | -0.110 | 0.177 | 0.536 | -0.457 | 0.238 |
| Mixed-effect regression | 0.011 | 0.381 | 0.978 | -0.700 | 0.733 |
| Mixed-effect regression with Pten as an additional predictor | -0.102 | 0.378 | 0.799 | -0.808 | 0.614 |
Table 3 - Reproduced results of (Pten) knockdown effect on soma size from different regression models (Table 5)
As shown in Table 4, it was only possible to reproduce the of interaction between Pten knockdown and fatty acid exposure models to a limited extent, with the coefficient for marginal regression alone attaining the published value. Additionally, it is noted that it was not possible to reproduce any of the published standard errors.
| Coefficient | Standard Error | P-Value | 95% CI Lower Limit | 95% CI Upper Limit | |
|---|---|---|---|---|---|
| Marginal regression | 10.380 | 4.830 | 0.032 | 0.914 | 19.846 |
| Fixed-effect regression | 9.981 | 3.042 | 0.001 | 4.009 | 15.954 |
| Mixed-effect regression | 10.137 | 3.041 | < 0.001 | 4.219 | 16.146 |
Table 4 - Reproduced results of interaction between Pten knockdown and fatty acid exposure models (Table 6)
Finally, it was possible to reproduce the calculated intraclass correlation coefficient measure using type 3 ICC, as shown in Table 5. While the publication had a rounded value of 0.2 and no further information on confidence, the estimate shown below, when rounded, would be equal to the published value.
| Type | ICC | F | DF1 | DF2 | P-Value | 95% CI Lower Limit | 95% CI Upper Limit |
|---|---|---|---|---|---|---|---|
| ICC3 | 0.172 | 1.416 | 13.000 | 13.000 | 0.270 | -0.375 | 0.630 |
Table 5 - Reproduced intraclass correlation coefficient (ICC)
In general, the results of the original publication were found to be reproducible, at least from a methodological standpoint if not from an accuracy one. Model specification from the Moen et al. publication was sufficiently clear so as to be replicated and data was provided along with some additional information regarding software, a step above and beyond what is traditionally done. These points aside, there was extensive difficulty in replicating robust standard errors and coefficient estimates of marginal and mixed-effects models. The issue is perhaps more related to implementation of statistical methods across software and not specific to this analysis alone, yet should be a principle concern of the original publication as a major aim was to provide an analysis that was reproducible. Regardless, the Moen et al. analysis was important in proving the shortcomings of recent work within the discipline of neuroscience in regards to statistical methods. Not only did it illuminate the necessity to treat clustered data properly but illustrated that reproducibility is not only a technical question but a methodological concern as well.
Moen, E. L., Fricano-Kugler, C. J., Luikart, B. W. & O’Malley, A. J. Analyzing Clustered Data: Why and How to Account for Multiple Observations Nested within a Study Participant? PLOS ONE 11, e0146721 (2016).↩
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.↩
Hadley Wickham, Jim Hester and Romain Francois (2016). readr: Read Tabular Data. R package version 1.0.0. https://CRAN.R-project.org/package=readr↩
Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr↩
Hadley Wickham and Romain Francois (2016). dplyr: A Grammar of Data Manipulation. R package version 0.5.0. https://CRAN.R-project.org/package=dplyr↩
Stefan Milton Bache and Hadley Wickham (2014). magrittr: A Forward-Pipe Operator for R. R package version 1.5. https://CRAN.R-project.org/package=magrittr↩
Hadley Wickham and Romain Francois (2016). dplyr: A Grammar of Data Manipulation. R package version 0.5.0. https://CRAN.R-project.org/package=dplyr↩
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.↩
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.↩
Vincent J Carey. Ported to R by Thomas Lumley and Brian Ripley. Note that maintainers are not available to give advice on using a package they did not author. (2015). gee: Generalized Estimation Equation Solver. R package version 4.13-19. https://CRAN.R-project.org/package=gee↩
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.↩
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.↩
Alexandra Kuznetsova, Per Bruun Brockhoff and Rune Haubo Bojesen Christensen (2016). lmerTest: Tests in Linear Mixed Effects Models. R package version 2.0-33. https://CRAN.R-project.org/package=lmerTest↩
Freedman, D. A. On The So-Called ‘Huber Sandwich Estimator’ and ‘Robust Standard Errors’. The American Statistician 60, 299–302 (2006).↩
Yihui Xie (2016). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.15.1.↩
Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963↩
Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595↩
Revelle, W. (2016) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version = 1.6.9.↩