suppressPackageStartupMessages({
  library(tidyverse)
  library(ggplot2)
  library(gridExtra)
})

data <- read.csv("~/Downloads/clustered_data_code-master/PtenAnalysisData.csv") %>%
mutate(mouseid = factor(mouseid, levels = c("0", "1", "2", "3", "4", "5")))
data.new2 <- data[data$fa == "0",]

1 Introduction

This assignment is a reproduced data analysis of a published dataset: “Analyzing Clustered Data: Why and How to Account for Multiple Observations Nested within a Study Participant?” by Moen EL, Fricano-Kugler CJ, Luikart BW, O’Malley AJ (2016).

The hypothesized analysis of study tested the effect of Pten knockdown on the soma size of neurons within the brain. Multiple regression models are used to approximate the study effect.

The original dataset was cleaned to include a subset of only mice who had no fatty acid treatment or vehicle control (variable fa = 0).

2 Box Plot of Soma Size by Mouse ID

data.new2  %>% ggplot(aes(x=mouseid, y=somasize)) + 
  geom_boxplot(width=0.5) +
  geom_point(alpha = 0.25)+
  labs(subtitle="Soma Size by Mouse ID")+
  theme_grey(base_size=13)

3 ICC Calculation for Soma Size

library(nlme)
## 
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
## 
##     collapse
fitmix <- nlme::lme(somasize ~ pten, data = data.new2, random = ~1 | mouseid)
summary(fitmix)
## Linear mixed-effects model fit by REML
##  Data: data.new2 
##        AIC     BIC    logLik
##   3428.624 3444.53 -1710.312
## 
## Random effects:
##  Formula: ~1 | mouseid
##         (Intercept) Residual
## StdDev:    7.250348 18.11401
## 
## Fixed effects: somasize ~ pten 
##                Value Std.Error  DF   t-value p-value
## (Intercept) 90.86622  3.649473 390 24.898448       0
## pten        11.50922  1.858526 390  6.192662       0
##  Correlation: 
##      (Intr)
## pten -0.32 
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.6346588 -0.6770488 -0.1295800  0.5705812  3.8157103 
## 
## Number of Observations: 396
## Number of Groups: 5
ICC <- function(fixmix){
  cors <- as.numeric(VarCorr(fitmix))
  cors[1] / (cors[1] + cors[2])
}
ICC(fitmix)
## [1] 0.1380868

ICC for Soma Size = 0.1380868.

Intraclass correlation coefficient for Soma Size is small, showing small subject relation effects, or very little correlation in values per each mouse subject. This ICC score is smaller than the authors.

4 Neuron Level Linear Regression

\[ somasize_{ij} = \beta_0 + \beta_1 Pten_{ij} + \epsilon_{ij} \] ## Assumption of Neuron Level Linear Regression \[ \epsilon_{ij} \stackrel{}{\sim} N{(0, \sigma^2)} \] The Neuron Level Linear Regression Model assumes among all observations for error terms to follow a normal distribution around 0 and constant variance of sigma squared. This approach also assumes observations measured independently, which was violated within the study as soma measurements are clustered per mouse id.

4.1 Analysis of Neuron Level Linear Regression

NLLR <- lm(somasize ~ pten, data = data.new2)
summary(NLLR)
## 
## Call:
## lm(formula = somasize ~ pten, data = data.new2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -43.066 -13.933  -2.285  10.161  68.296 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   93.106      1.513  61.555  < 2e-16 ***
## pten          11.039      1.976   5.586 4.34e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.37 on 394 degrees of freedom
## Multiple R-squared:  0.07338,    Adjusted R-squared:  0.07103 
## F-statistic:  31.2 on 1 and 394 DF,  p-value: 4.34e-08
confint(NLLR)
##                 2.5 %   97.5 %
## (Intercept) 90.132249 96.07963
## pten         7.153512 14.92366

The Neuron Level Linear Regression Model shows a predicted soma size increase of 11.039 in neurons with Pten knockdown when compared to control neurons.

5 Marginal Regression Model

\[ somasize_{ij} = \beta_0 + \beta_1 Pten_{ij} + \epsilon_{ij} \] # Assumption of Marginal Regression \[ E[\epsilon_{ij}] = 0, and, var[\epsilon_{ij}]=\sigma^2 \] The Marginal Regression Model does not require conditional distribution of observations to satisfy a particular distribution for the model to hold, instead adjusting for clustering. The Marginal Regression Model also does not assume a particular distribution for the error model.

5.1 Analysis of Marginal Regression

MR <- gee::gee(somasize ~ pten,
                      corstr = "exchangeable",
                      id = mouseid,
                      data = data.new2)
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept)        pten 
##    93.10594    11.03859

The Marginal Regression Model shows a predicted soma size increase of 11.51057 in neurons with Pten knockdown when compared to control neurons.

6 Fixed-Effect Regression Model

\[ somasize_{ij} = \beta_0 + \beta_1 mouse_{i1}+...\beta_k mouse_{ik} + \beta_{k+1} Pten_{ij} + \epsilon_{ij} \] ## Assumption of Fixed-Effect Regression Model \[ \epsilon_{ij} \stackrel{}{\sim} Normal{(0, \sigma^2)} \] The Fixed-Effect Regression Model assumes a normal distribution around 0 and variance of sigma squared for error terms among observations from the same subject.

6.1 Analysis of Fixed-Effect Regression Model

FER <- lm(somasize ~ as.factor(pten) + as.factor(mouseid), data = data.new2)
summary(FER)
## 
## Call:
## lm(formula = somasize ~ as.factor(pten) + as.factor(mouseid), 
##     data = data.new2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.203 -12.283  -2.359  10.271  69.777 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           82.605      2.380  34.714  < 2e-16 ***
## as.factor(pten)1      11.540      1.862   6.198 1.45e-09 ***
## as.factor(mouseid)1   10.626      2.656   4.001 7.55e-05 ***
## as.factor(mouseid)2    2.733      3.299   0.828    0.408    
## as.factor(mouseid)3    8.822      5.469   1.613    0.108    
## as.factor(mouseid)4   18.555      2.632   7.051 8.14e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.12 on 390 degrees of freedom
## Multiple R-squared:  0.1974, Adjusted R-squared:  0.1871 
## F-statistic: 19.18 on 5 and 390 DF,  p-value: < 2.2e-16
confint(FER)
##                         2.5 %    97.5 %
## (Intercept)         77.926233 87.283050
## as.factor(pten)1     7.879640 15.200476
## as.factor(mouseid)1  5.404246 15.847124
## as.factor(mouseid)2 -3.753574  9.219079
## as.factor(mouseid)3 -1.931182 19.574571
## as.factor(mouseid)4 13.381214 23.729149

The fixed effects regression model shows a predicted soma size increase of 11.540 in neurons with Pten knockdown when compared to control neurons.

7 Mixed-Effect Regression Model

\[ somasize_{ij} = \beta_0 + \beta_1 Pten_{ij} + \theta_{i} + \epsilon_{ij} \] ## Assumption of Mixed-Effect Regression Model \[ \theta_i \stackrel{}{\sim} Normal{(0, \tau_{}^2)} \]

The Mixed-Effect Regression Model assumes a normal distribution around 0 and variance of tau squared for error terms among observations from the same subject.

7.1 Analysis of Mixed-Effect Regression Model

summary(fitmix)
## Linear mixed-effects model fit by REML
##  Data: data.new2 
##        AIC     BIC    logLik
##   3428.624 3444.53 -1710.312
## 
## Random effects:
##  Formula: ~1 | mouseid
##         (Intercept) Residual
## StdDev:    7.250348 18.11401
## 
## Fixed effects: somasize ~ pten 
##                Value Std.Error  DF   t-value p-value
## (Intercept) 90.86622  3.649473 390 24.898448       0
## pten        11.50922  1.858526 390  6.192662       0
##  Correlation: 
##      (Intr)
## pten -0.32 
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.6346588 -0.6770488 -0.1295800  0.5705812  3.8157103 
## 
## Number of Observations: 396
## Number of Groups: 5

The mixed effects regression model shows a predicted soma size increase of 11.50922 in neurons with Pten knockdown when compared to control neurons.

par(cex = 2)
plot(ranef(fitmix), cex = 2)

par(mfrow = c(1, 2), cex = 1.5)
qqnorm(residuals(fitmix, type = "pearson"), main = "QQ Plot Residuals")
qqline(residuals(fitmix, type = "pearson"))
qqnorm(ranef(fitmix)[, 1], main = "QQ Plot Random Intercepts")
qqline(ranef(fitmix)[, 1])

8 Mixed-Effect Regression Model with Pten as Additional Predictor

\[ somasize_{ij} = \beta_0 + \beta_1 Pten_{ij} + \beta_2 Pten_{ij} + \theta + \epsilon_{ij} \] ## Assumption of Mixed-Effect Regression Model with Pten as Additional Predictor \[ \theta_i \stackrel{}{\sim} Normal{(0, \tau_{}^2)} \] \[ \epsilon_{ij} \stackrel{}{\sim} Normal{(0, \sigma_{}^2)} \] The Mixed-Effect Regression Model with Pten as an additional predictor assumes a normal distribution around 0 and variance of tau squared for error terms among observations from the same subject.

8.1 Analysis of Mixed Effect Regression with Pten as Additional Predictor

MERPten <- lm(somasize ~ as.factor(pten) + prop, data = data.new2)
summary(MERPten)
## 
## Call:
## lm(formula = somasize ~ as.factor(pten) + prop, data = data.new2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -42.96 -13.77  -2.36  10.12  67.73 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      113.5740     9.7375  11.664  < 2e-16 ***
## as.factor(pten)1  11.5401     1.9814   5.824  1.2e-08 ***
## prop              -0.3544     0.1666  -2.128    0.034 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.28 on 393 degrees of freedom
## Multiple R-squared:  0.08394,    Adjusted R-squared:  0.07927 
## F-statistic:    18 on 2 and 393 DF,  p-value: 3.3e-08
confint(MERPten)
##                       2.5 %       97.5 %
## (Intercept)      94.4298744 132.71815766
## as.factor(pten)1  7.6445642  15.43555106
## prop             -0.6818573  -0.02690987

The mixed effects regression model adjusted for proportion of neurons with pten knockdown predicted soma size increase of 11.5401 in neurons with Pten knockdown when compared to control neurons. Pten as an additional predictor shows an increased effect of Pten when compared to the Mixed-Effect model without the additional predictor.

suppressPackageStartupMessages({
library(sjPlot)
library(lme4)})

tab_model(NLLR, MR, FER, fitmix, MERPten,
          pred.labels = c("Intercept", "Pten", "MouseID = 1", "MouseID = 2", "MouseID = 3", "MouseID = 4", "MouseID = 5", "Prop"), dv.labels = c("Neuron-Level", "Marginal", "Fixed Effect", "Mixed Effect", "Mixed Effect w/Predictor"), title = "Regression Model Results")
Regression Model Results
  Neuron-Level Marginal Fixed Effect Mixed Effect Mixed Effect w/Predictor
Predictors Estimates CI p Estimates CI p Estimates CI p Estimates CI p Estimates CI p
Intercept 93.11 90.13 – 96.08 <0.001 90.86 83.62 – 98.11 <0.001 82.60 77.93 – 87.28 <0.001 90.87 83.69 – 98.04 <0.001 113.57 94.43 – 132.72 <0.001
Pten 11.04 7.15 – 14.92 <0.001 11.51 7.89 – 15.13 <0.001 11.51 7.86 – 15.16 <0.001
MouseID = 1 11.54 7.88 – 15.20 <0.001 11.54 7.64 – 15.44 <0.001
MouseID = 2 10.63 5.40 – 15.85 <0.001
MouseID = 3 2.73 -3.75 – 9.22 0.408
MouseID = 4 8.82 -1.93 – 19.57 0.108
MouseID = 5 18.56 13.38 – 23.73 <0.001
Prop -0.35 -0.68 – -0.03 0.034
Random Effects
σ2       328.12  
τ00       52.57 mouseid  
N   5 mouseid   5 mouseid  
Observations 396 396 396 396 396
R2 / R2 adjusted 0.073 / 0.071 NA 0.197 / 0.187 0.089 / NA 0.084 / 0.079

9 Conclusion

In conclusion, the analysis outputs I conducted were very similar to those found in the paper “Analyzing Clustered Data: Why and How to Account for Multiple Observations Nested within a Study Participant”, showing pten having an effect on increasing soma size.

Slight differences between my findings and the authors findings are likely due to software used.