The purpose of the analysis was to examine the effect of fatty acid treatment on the soma size of neuron in the brain. The fatty acid exposure was measured at individual neuron level and there was variation fatty acid exposure between mice. The mice were randomized between three groups: no treatment, fatty acid delivery, and vihecle control group. The present analysis was designed to see the differences of soma size between fatty acid delivery and vihecle control groups by using appropriate statistical models. The dataset of the present analysis was obtained from https://raw.githubusercontent.com/elmoen/clustered_data_code/master/PtenAnalysisData.csv.
mdata<-read.csv("https://raw.githubusercontent.com/elmoen/clustered_data_code/master/PtenAnalysisData.csv")
mdata<-mdata%>%
mutate(treatment=as.factor(recode(fa, "2"="Control", "1"="Fatty acid delivery", "0"= "No treatment")))%>%
mutate(Mouseid=as.factor(mouseid))
mouselevel<-mdata%>%
group_by(Mouseid)%>%
summarize(treatment=unique(treatment),
n=n(),
meansoma=mean(somasize))
plotdata<-mdata %>%
filter(fa !=0)
PlotA<-ggplot(plotdata, aes(x=Mouseid, y=somasize, fill=treatment))+
geom_boxplot(varwidth = TRUE, notch = TRUE)+
geom_jitter(width=0.1, alpha=0.9, size=0.9,aes(color=Mouseid))+
labs(title = " Figure A: Each boxplot represents the soma size of individual mouse") +
ylab("Soma size") +
theme_linedraw()+
scale_fill_manual(breaks = c("Fatty acid delivery", "Control"),
values=c("light cyan", "pink"))
PlotB<-ggplot(plotdata,aes(y=somasize, x=treatment))+
geom_boxplot(outlier.size = 0.5)+
geom_jitter(width=0.08, alpha=1.5, size=0.8, aes(color=Mouseid) )+
labs(title ="Figure B. Each point represents the soma size of an individual neuron within a
mouse") +
xlab("Mouse ID") +
theme(axis.title.x = element_text(color = "red", size = 8, face = "italic"))+
theme_bw()
fit.Neuron.level<-lm(somasize~treatment, data=mdata)
fit.mouse.level.nw<-lm(meansoma~treatment, weights = NULL, data=mouselevel)
fit.mouse.level.w<-lm(meansoma~treatment, weights =n, data=mouselevel)
fit.gee<-gee(somasize ~ treatment, id = Mouseid, data = mdata,family = gaussian, corstr = "exchangeable")
## Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
## running glm to get initial regression estimate
## (Intercept) treatmentFatty acid delivery
## 105.086014 3.500386
## treatmentNo treatment
## -5.513024
fit.mixed<-lme(somasize~treatment, data=mdata, random=~1|Mouseid)
Figure A and B showed the within subject and between subject difference of soma size in neurons among fatty acid delivery and control groups. Figure A shows the box plot representing the average soma size of each individual mouse. The size of box plots are respective to the number of neurons within the mouse. While the smaller box shows relatively lower number of neurons, bigger box shows greater number of neurons. Individual difference in the average soma size has been noticed from this box plot. On the other hand, Figure B shows the between group differences in soma size. It also suggested that the average soma size for each group differ from the average soma size of individual mouse.
PlotA
Figure A. Each boxplot represents the soma size of individual mouse
PlotB
Figure B. Each point represents the soma size of an individual neuron within a mouse
Table-1: Simple linear regression for Neuron-level and Mouse-Level
tab_model(fit.Neuron.level,fit.mouse.level.nw,fit.mouse.level.w)
| somasize | meansoma | meansoma | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 105.09 | 102.54 – 107.63 | <0.001 | 103.19 | 93.55 – 112.83 | <0.001 | 105.09 | 95.27 – 114.90 | <0.001 |
|
treatment [Fatty acid delivery] |
3.50 | 0.29 – 6.71 | 0.033 | 4.65 | -9.81 – 19.11 | 0.494 | 3.50 | -8.91 – 15.91 | 0.547 |
| treatment [No treatment] | -5.51 | -8.83 – -2.19 | 0.001 | -5.05 | -18.68 – 8.58 | 0.432 | -5.51 | -18.33 – 7.30 | 0.364 |
| Observations | 1142 | 14 | 14 | ||||||
| R2 / R2 adjusted | 0.032 / 0.030 | 0.166 / 0.015 | 0.223 / 0.081 | ||||||
Neuron-level effect of fatty acid on the soma size was estimated by using neuron level data. The mice in the fatty acid group have soma sizes that are 3.5 units larger on average than the mice in the vehicle control group, and the difference are statistically significance (p = 0.0394).
Mouse-level linear regression was performed without weighting and with weighting respectively for ignoring and accounting the number of neurons per mouse. The results from both models suggested an insignificant differences in the soma size between between fatty acid group and veichle control group. Howerver, the standard error from the weighted models was relatively lower compared to the model without weighting. Although the coefficient from neuron-level linear model and mouse-level linear model is exactly same, mouse-level model has relatively higher standard error compared to the neuron-level model as expected because of the significant reduction in the number of observation.
The coefficients obtained from Mouse-level regression (mean); no weighting, Marginal regression, and Mixed-effect regression were relatively larger than the coefficients reported in the published paper (1).
r_int_m<-lme(somasize~1, data=mdata, random=~1|Mouseid)
ICClme<-function(out) {
varests<-as.numeric(VarCorr(out)[1:2])
varests[1]/sum(varests)
}
ICClme(r_int_m)
## [1] 0.1729463
The intraclass correlation coefficient (ICC), was calculated to compares the variance between subjects and within subjects. The calculated ICC was 0.17 which suggested relative dependency on the correlation between and within subject measurements.
Table-2: Results from Marginal model (GEE) and Linear Mixed effect model
tab_model(fit.gee,fit.mixed)
| somasize | somasize | |||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 103.76 | 97.62 – 109.90 | <0.001 | 103.51 | 95.05 – 111.96 | <0.001 |
|
treatment [Fatty acid delivery] |
4.15 | -4.81 – 13.11 | 0.364 | 4.37 | -9.66 – 18.40 | 0.507 |
| treatment [No treatment] | -5.66 | -14.36 – 3.03 | 0.202 | -5.41 | -18.86 – 8.04 | 0.395 |
| Random Effects | ||||||
| σ2 | 418.69 | |||||
| τ00 | 84.40 Mouseid | |||||
| N | 14 Mouseid | 14 Mouseid | ||||
| Observations | 1142 | 1142 | ||||
| Marginal R2 / Conditional R2 | NA | 0.041 / NA | ||||
Both linear mixed effect model and marginal model suggested there was no significant difference in the soma size between fatty acid group and veichle control group.
There was significant difference in the models fitted by using considering clustering assumption and independent assumption. The ICC provides important information for selecting appropriate model for data analysis. The present analysis reveals how the results could be misleading without specifying the correct modeling assumption. The ICC is greater than 0 indicates that the between subject variability is not completely independent from within subject variability. Therefore, subjective specific variation should be included in addition to between subject variability to capture more accuracy in the estimation. That’s why the current analysis suggests that mixed effect model and marginal model would be the appropriate models for analyzing the current data.
I have encountered the following challenges in this assignment: 1. This is the first time I did heiracchical data analysis. Understanding of how ICC might influence the model selection took a while. 2. The gee model was not running at the first time because of not removing the “No treatment” group from the dataset. I took me hours to figure out why it was not running. I have learned the reasons for getting the error that type of error message by reviewing some articles from google and then abled to solve the problem.