In order to reduce the dimensionality and try to figure out what’s going on, here I use PCA, followed by manova and anova. I only apply it to the control population, which is the typical patients on Day 1.
I filtered the available data so that I only had the Day 1, Typical participants. The function below gave me the flexibility to filter for any group of interest.
###Running PCA, either comparing Day 1 to Day 2, Day 1 to Diabetic, or Day 2 to Diabetic (Group = 0 for typical 1 for diabetic)
PCAgroup<-function(Day1, Day2, Group1, Group2, df){
df_selected<-subset(df, (df$Day == Day1 | df$Day == Day2))
df_selected<-subset(df_selected, (df$Group == Group1 | df$Group == Group2))
df_selected<-na.omit(df_selected)
return(df_selected)
}
I wanted to perform PCA on the control population, and then see if I could use the components to support our idea that the Thalamus was a notably active region. We can see that 6 components explain ~90% of the variance in the data.
dftest<-PCAgroup(1,1,0,0,df)
df_pca <- prcomp(dftest[,5:46], scale = TRUE, rotate = "varimax")
summary(df_pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 5.3934 1.86236 1.45394 1.11922 0.95717 0.9166
## Proportion of Variance 0.6926 0.08258 0.05033 0.02982 0.02181 0.0200
## Cumulative Proportion 0.6926 0.77516 0.82549 0.85532 0.87713 0.8971
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 0.82065 0.69241 0.68181 0.62671 0.61100 0.54695
## Proportion of Variance 0.01603 0.01141 0.01107 0.00935 0.00889 0.00712
## Cumulative Proportion 0.91317 0.92458 0.93565 0.94500 0.95389 0.96102
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 0.48261 0.47068 0.41133 0.38683 0.3722 0.32789
## Proportion of Variance 0.00555 0.00527 0.00403 0.00356 0.0033 0.00256
## Cumulative Proportion 0.96656 0.97184 0.97586 0.97943 0.9827 0.98528
## PC19 PC20 PC21 PC22 PC23 PC24
## Standard deviation 0.30552 0.28786 0.26415 0.24837 0.22390 0.21283
## Proportion of Variance 0.00222 0.00197 0.00166 0.00147 0.00119 0.00108
## Cumulative Proportion 0.98751 0.98948 0.99114 0.99261 0.99380 0.99488
## PC25 PC26 PC27 PC28 PC29 PC30
## Standard deviation 0.20216 0.18725 0.16539 0.16250 0.14329 0.12870
## Proportion of Variance 0.00097 0.00083 0.00065 0.00063 0.00049 0.00039
## Cumulative Proportion 0.99585 0.99669 0.99734 0.99797 0.99846 0.99885
## PC31 PC32 PC33 PC34 PC35 PC36
## Standard deviation 0.12135 0.09561 0.08652 0.07388 0.06453 0.04939
## Proportion of Variance 0.00035 0.00022 0.00018 0.00013 0.00010 0.00006
## Cumulative Proportion 0.99920 0.99942 0.99960 0.99973 0.99983 0.99989
## PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 0.04435 0.03515 0.03040 0.02200 0.01247 2.39e-15
## Proportion of Variance 0.00005 0.00003 0.00002 0.00001 0.00000 0.00e+00
## Cumulative Proportion 0.99993 0.99996 0.99998 1.00000 1.00000 1.00e+00
These components don’t really differ by glucose level, which is what we would have hoped to see. However, PC3 does show the pattern we would expect to see.
Also, I wanted to see what comprised the components, so I calculated what percentage of each region mapped onto each component
#http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/118-principal-component-analysis-in-r-prcomp-vs-princomp/
var_coord_func <- function(loadings, comp.sdev){
loadings*comp.sdev}
loadings <- df_pca$rotation
sdev <- df_pca$sdev
var.coord <- t(apply(loadings, 1, var_coord_func, sdev))
var.cos2 <- var.coord^2
# Compute contributions of each brain region to the component
#result is between 0 and 1...should sum to 1
comp.cos2 <- apply(var.cos2, 2, sum)
contrib <- function(var.cos2, comp.cos2){var.cos2/comp.cos2}
var.contrib <- t(apply(var.cos2,1, contrib, comp.cos2))
Most seem to be a pretty big mix of regions. However, Component 3 is largely dictated by the thalamus (and a few other regions). Component 3 also seems to affect the glucose level, as we saw in the previous figure. So maybe there is something here.
Now that we know what’s in all of the components, let’s see if they actually are able to distinguish between glucose levels. To do this I used a manova and and anova. Threshold for acceptance after a Bonferroni correction is p < .0083
pcaOutput2 <- as.data.frame(df_pca$x)
pcaOutput2$groups <- dftest$Glucose.level
dependent.vars <- cbind(pcaOutput2$PC1, pcaOutput2$PC2, pcaOutput2$PC3, pcaOutput2$PC4, pcaOutput2$PC5, pcaOutput2$PC6)
summary(manova(dependent.vars~dftest$Glucose.level))
## Df Pillai approx F num Df den Df Pr(>F)
## dftest$Glucose.level 1 0.40635 3.9929 6 35 0.003792 **
## Residuals 40
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The manova shows that we are able to distinguish between glucose level on the basis of the principal components. Now we use an ANOVA to see which of the components are significant. It would probably be better to use a non-parametric test here, but this is where we are right now…
summary(aov(dependent.vars~dftest$Glucose.level)) #component 3 is the only significant one
## Response 1 :
## Df Sum Sq Mean Sq F value Pr(>F)
## dftest$Glucose.level 1 6.0 5.9983 0.2022 0.6554
## Residuals 40 1186.6 29.6655
##
## Response 2 :
## Df Sum Sq Mean Sq F value Pr(>F)
## dftest$Glucose.level 1 0.0 0.0000 0 0.9975
## Residuals 40 142.2 3.5551
##
## Response 3 :
## Df Sum Sq Mean Sq F value Pr(>F)
## dftest$Glucose.level 1 26.908 26.9077 18.009 0.0001267 ***
## Residuals 40 59.764 1.4941
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response 4 :
## Df Sum Sq Mean Sq F value Pr(>F)
## dftest$Glucose.level 1 4.202 4.2024 3.5647 0.06629 .
## Residuals 40 47.156 1.1789
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response 5 :
## Df Sum Sq Mean Sq F value Pr(>F)
## dftest$Glucose.level 1 0.191 0.19119 0.2046 0.6534
## Residuals 40 37.372 0.93429
##
## Response 6 :
## Df Sum Sq Mean Sq F value Pr(>F)
## dftest$Glucose.level 1 0.136 0.13600 0.1585 0.6926
## Residuals 40 34.312 0.85781
When we perform the same process for Day 2 and diabetics, there’s some stuff that’s kind of interesting to observe.
For the Day 2 participants, the first 5 components take care of 90% of the variance. For them, the change in glucose level is observable for PC4, rather than PC3.
What’s kind of interesting about this is that principal component 3 (the significant one that was largely comprised of the thalamus) looks similar to day 1, but it’s not significant (p = .267). Instead, principal component 4 is significant (p = .0012).
## Warning: Removed 1 rows containing missing values (position_stack).
## Warning: Removed 1 rows containing missing values (position_stack).
The diabetic participants look a lot more like Day 1 than Day 2. It takes 9 components to get to 90% of the variance. Components 2 (p = 0.005) and 5 (p = 0.0005) are significant.
## Warning: Removed 1 rows containing missing values (position_stack).