suppressPackageStartupMessages({
library(tidyverse)
library(ggplot2)
library(gridExtra)
})
data <- read.csv("~/Downloads/clustered_data_code-master/PtenAnalysisData.csv") %>%
mutate(mouseid = factor(mouseid, levels = c("0", "1", "2", "3", "4", "5")))
data.new2 <- data[data$fa == "0",]
This assignment is a reproduced data analysis of a published dataset: “Analyzing Clustered Data: Why and How to Account for Multiple Observations Nested within a Study Participant?” by Moen EL, Fricano-Kugler CJ, Luikart BW, O’Malley AJ (2016).
The hypothesized analysis of study tested the effect of Pten knockdown on the soma size of neurons within the brain. Multiple regression models are used to approximate the study effect.
The original dataset was cleaned to include a subset of only mice who had no fatty acid treatment or vehicle control (variable fa = 0).
data.new2 %>% ggplot(aes(x=mouseid, y=somasize)) +
geom_boxplot(width=0.5) +
geom_point(alpha = 0.25)+
labs(subtitle="Soma Size by Mouse ID")+
theme_grey(base_size=13)
library(nlme)
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
fitmix <- nlme::lme(somasize ~ pten, data = data.new2, random = ~1 | mouseid)
summary(fitmix)
## Linear mixed-effects model fit by REML
## Data: data.new2
## AIC BIC logLik
## 3428.624 3444.53 -1710.312
##
## Random effects:
## Formula: ~1 | mouseid
## (Intercept) Residual
## StdDev: 7.250348 18.11401
##
## Fixed effects: somasize ~ pten
## Value Std.Error DF t-value p-value
## (Intercept) 90.86622 3.649473 390 24.898448 0
## pten 11.50922 1.858526 390 6.192662 0
## Correlation:
## (Intr)
## pten -0.32
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.6346588 -0.6770488 -0.1295800 0.5705812 3.8157103
##
## Number of Observations: 396
## Number of Groups: 5
ICC <- function(fixmix){
cors <- as.numeric(VarCorr(fitmix))
cors[1] / (cors[1] + cors[2])
}
ICC(fitmix)
## [1] 0.1380868
ICC for Soma Size = 0.1380868.
Intraclass correlation coefficient for Soma Size is small, showing small subject relation effects, or very little correlation in values per each mouse subject. This ICC score is smaller than the authors.
\[ somasize_{ij} = \beta_0 + \beta_1 Pten_{ij} + \epsilon_{ij} \] ## Assumption of Neuron Level Linear Regression \[ \epsilon_{ij} \stackrel{}{\sim} N{(0, \sigma^2)} \] The Neuron Level Linear Regression Model assumes among all observations for error terms to follow a normal distribution around 0 and constant variance of sigma squared. This approach also assumes observations measured independently, which was violated within the study as soma measurements are clustered per mouse id.
NLLR <- lm(somasize ~ pten, data = data.new2)
summary(NLLR)
##
## Call:
## lm(formula = somasize ~ pten, data = data.new2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.066 -13.933 -2.285 10.161 68.296
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 93.106 1.513 61.555 < 2e-16 ***
## pten 11.039 1.976 5.586 4.34e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.37 on 394 degrees of freedom
## Multiple R-squared: 0.07338, Adjusted R-squared: 0.07103
## F-statistic: 31.2 on 1 and 394 DF, p-value: 4.34e-08
confint(NLLR)
## 2.5 % 97.5 %
## (Intercept) 90.132249 96.07963
## pten 7.153512 14.92366
The Neuron Level Linear Regression Model shows a predicted soma size increase of 11.039 in neurons with Pten knockdown when compared to control neurons.
\[ somasize_{ij} = \beta_0 + \beta_1 Pten_{ij} + \epsilon_{ij} \] # Assumption of Marginal Regression \[ E[\epsilon_{ij}] = 0, and, var[\epsilon_{ij}]=\sigma^2 \] The Marginal Regression Model does not require conditional distribution of observations to satisfy a particular distribution for the model to hold, instead adjusting for clustering. The Marginal Regression Model also does not assume a particular distribution for the error model.
MR <- gee::gee(somasize ~ pten,
corstr = "exchangeable",
id = mouseid,
data = data.new2)
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) pten
## 93.10594 11.03859
The Marginal Regression Model shows a predicted soma size increase of 11.51057 in neurons with Pten knockdown when compared to control neurons.
\[ somasize_{ij} = \beta_0 + \beta_1 mouse_{i1}+...\beta_k mouse_{ik} + \beta_{k+1} Pten_{ij} + \epsilon_{ij} \] ## Assumption of Fixed-Effect Regression Model \[ \epsilon_{ij} \stackrel{}{\sim} Normal{(0, \sigma^2)} \] The Fixed-Effect Regression Model assumes a normal distribution around 0 and variance of sigma squared for error terms among observations from the same subject.
FER <- lm(somasize ~ as.factor(pten) + as.factor(mouseid), data = data.new2)
summary(FER)
##
## Call:
## lm(formula = somasize ~ as.factor(pten) + as.factor(mouseid),
## data = data.new2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48.203 -12.283 -2.359 10.271 69.777
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82.605 2.380 34.714 < 2e-16 ***
## as.factor(pten)1 11.540 1.862 6.198 1.45e-09 ***
## as.factor(mouseid)1 10.626 2.656 4.001 7.55e-05 ***
## as.factor(mouseid)2 2.733 3.299 0.828 0.408
## as.factor(mouseid)3 8.822 5.469 1.613 0.108
## as.factor(mouseid)4 18.555 2.632 7.051 8.14e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.12 on 390 degrees of freedom
## Multiple R-squared: 0.1974, Adjusted R-squared: 0.1871
## F-statistic: 19.18 on 5 and 390 DF, p-value: < 2.2e-16
confint(FER)
## 2.5 % 97.5 %
## (Intercept) 77.926233 87.283050
## as.factor(pten)1 7.879640 15.200476
## as.factor(mouseid)1 5.404246 15.847124
## as.factor(mouseid)2 -3.753574 9.219079
## as.factor(mouseid)3 -1.931182 19.574571
## as.factor(mouseid)4 13.381214 23.729149
The fixed effects regression model shows a predicted soma size increase of 11.540 in neurons with Pten knockdown when compared to control neurons.
\[ somasize_{ij} = \beta_0 + \beta_1 Pten_{ij} + \theta_{i} + \epsilon_{ij} \] ## Assumption of Mixed-Effect Regression Model \[ \theta_i \stackrel{}{\sim} Normal{(0, \tau_{}^2)} \]
The Mixed-Effect Regression Model assumes a normal distribution around 0 and variance of tau squared for error terms among observations from the same subject.
summary(fitmix)
## Linear mixed-effects model fit by REML
## Data: data.new2
## AIC BIC logLik
## 3428.624 3444.53 -1710.312
##
## Random effects:
## Formula: ~1 | mouseid
## (Intercept) Residual
## StdDev: 7.250348 18.11401
##
## Fixed effects: somasize ~ pten
## Value Std.Error DF t-value p-value
## (Intercept) 90.86622 3.649473 390 24.898448 0
## pten 11.50922 1.858526 390 6.192662 0
## Correlation:
## (Intr)
## pten -0.32
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.6346588 -0.6770488 -0.1295800 0.5705812 3.8157103
##
## Number of Observations: 396
## Number of Groups: 5
The mixed effects regression model shows a predicted soma size increase of 11.50922 in neurons with Pten knockdown when compared to control neurons.
par(cex = 2)
plot(ranef(fitmix), cex = 2)
par(mfrow = c(1, 2), cex = 1.5)
qqnorm(residuals(fitmix, type = "pearson"), main = "QQ Plot Residuals")
qqline(residuals(fitmix, type = "pearson"))
qqnorm(ranef(fitmix)[, 1], main = "QQ Plot Random Intercepts")
qqline(ranef(fitmix)[, 1])
\[ somasize_{ij} = \beta_0 + \beta_1 Pten_{ij} + \beta_2 Pten_{ij} + \theta + \epsilon_{ij} \] ## Assumption of Mixed-Effect Regression Model with Pten as Additional Predictor \[ \theta_i \stackrel{}{\sim} Normal{(0, \tau_{}^2)} \] \[ \epsilon_{ij} \stackrel{}{\sim} Normal{(0, \sigma_{}^2)} \] The Mixed-Effect Regression Model with Pten as an additional predictor assumes a normal distribution around 0 and variance of tau squared for error terms among observations from the same subject.
MERPten <- lm(somasize ~ as.factor(pten) + prop, data = data.new2)
summary(MERPten)
##
## Call:
## lm(formula = somasize ~ as.factor(pten) + prop, data = data.new2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.96 -13.77 -2.36 10.12 67.73
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 113.5740 9.7375 11.664 < 2e-16 ***
## as.factor(pten)1 11.5401 1.9814 5.824 1.2e-08 ***
## prop -0.3544 0.1666 -2.128 0.034 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.28 on 393 degrees of freedom
## Multiple R-squared: 0.08394, Adjusted R-squared: 0.07927
## F-statistic: 18 on 2 and 393 DF, p-value: 3.3e-08
confint(MERPten)
## 2.5 % 97.5 %
## (Intercept) 94.4298744 132.71815766
## as.factor(pten)1 7.6445642 15.43555106
## prop -0.6818573 -0.02690987
The mixed effects regression model adjusted for proportion of neurons with pten knockdown predicted soma size increase of 11.5401 in neurons with Pten knockdown when compared to control neurons. Pten as an additional predictor shows an increased effect of Pten when compared to the Mixed-Effect model without the additional predictor.
suppressPackageStartupMessages({
library(sjPlot)
library(lme4)})
tab_model(NLLR, MR, FER, fitmix, MERPten,
pred.labels = c("Intercept", "Pten", "MouseID = 1", "MouseID = 2", "MouseID = 3", "MouseID = 4", "MouseID = 5", "Prop"), dv.labels = c("Neuron-Level", "Marginal", "Fixed Effect", "Mixed Effect", "Mixed Effect w/Predictor"), title = "Regression Model Results")
| Neuron-Level | Marginal | Fixed Effect | Mixed Effect | Mixed Effect w/Predictor | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p |
| Intercept | 93.11 | 90.13 – 96.08 | <0.001 | 90.86 | 83.62 – 98.11 | <0.001 | 82.60 | 77.93 – 87.28 | <0.001 | 90.87 | 83.69 – 98.04 | <0.001 | 113.57 | 94.43 – 132.72 | <0.001 |
| Pten | 11.04 | 7.15 – 14.92 | <0.001 | 11.51 | 7.89 – 15.13 | <0.001 | 11.51 | 7.86 – 15.16 | <0.001 | ||||||
| MouseID = 1 | 11.54 | 7.88 – 15.20 | <0.001 | 11.54 | 7.64 – 15.44 | <0.001 | |||||||||
| MouseID = 2 | 10.63 | 5.40 – 15.85 | <0.001 | ||||||||||||
| MouseID = 3 | 2.73 | -3.75 – 9.22 | 0.408 | ||||||||||||
| MouseID = 4 | 8.82 | -1.93 – 19.57 | 0.108 | ||||||||||||
| MouseID = 5 | 18.56 | 13.38 – 23.73 | <0.001 | ||||||||||||
| Prop | -0.35 | -0.68 – -0.03 | 0.034 | ||||||||||||
| Random Effects | |||||||||||||||
| σ2 | 328.12 | ||||||||||||||
| τ00 | 52.57 mouseid | ||||||||||||||
| N | 5 mouseid | 5 mouseid | |||||||||||||
| Observations | 396 | 396 | 396 | 396 | 396 | ||||||||||
| R2 / R2 adjusted | 0.073 / 0.071 | NA | 0.197 / 0.187 | 0.089 / NA | 0.084 / 0.079 | ||||||||||
In conclusion, the analysis outputs I conducted were very similar to those found in the paper “Analyzing Clustered Data: Why and How to Account for Multiple Observations Nested within a Study Participant”, showing pten having an effect on increasing soma size.
Slight differences between my findings and the authors findings are likely due to software used.