Preface: You have to log into the Pitt network via Pulse Secure first, and then either use the terminal to access Zeus or a remote desktop program.
library(MissMech)
library(dplyr)
library(DT)
library(foreign)
library(dataMeta)
library(readxl)
library(emmeans)
library(summarytools)
library(tidyverse)
library(stringr)
library(foreign)
library(corrplot)
library(kableExtra)
library(ggplot2)
library(cowplot)
knitr::opts_chunk$set(echo = TRUE)
Step 1: Using the terminal tab in RStudio (not Console tab), ssh into Zeuz by typing the following command (plus enter password):
Step 2: Still in the terminal, move to the the directory where your subject-level data is stored:
cd /Volumes/Disk1/EPICC/SubjectScans
Step 3: Generate a subject ID list using the following command in the terminal. Copy and paste the output into an excel sheet. (I haven't quite figured out how to [simply] assign variables from the terminal into R studio). Name the column SubID. Save the spreadsheet as 'Baseline_Subs_Brain.csv'. We'll continue to add columns to this below.
ls -d -1 7*_1
Step 4: Do the same thing we just did (using the terminal), but this time, list out the subject's left hippocampus .feat directories. Copy and paste the output into a new column in your excel sheet, labeled 'L_HIPP_path'
ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/lhip*.feat
Step 5a: Do the same thing we just did, but this time, list out the subject's RIGHT hippocampus .feat directories (or any other seeds that you wish): Copy and paste the output into a new column in your excel sheet, labeled 'R_HIPP_path'
ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/rhip*.feat
Step 5b: Continue to do the same thing for each seed path you would like to add to the subject list. *For each seed path, copy/paste the output into a new column in your excel sheet, labeled '
ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/tothippocampus_filtered*
ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/tothippocampus_*
ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/tothippocampus_with*
Step 6: Save as .csv and close 'Baseline_Subs_Brain.csv'
Step 7: Concatinate ROI paths-
Step 7a: Import the .csv you just made.
BRAIN_SUBS<-read.csv("Baseline_Subs_Brain.csv") #Import the .csv you just made.
BRAIN_SUBS$SubID<-as.character(BRAIN_SUBS$SubID) #read SubID as a character variable, not numeric.
Step 7b: First, we need to arrange our column rows so that each row includes the path which matches the SubID column.
* For example, Particpant #7063_1 does not have data processed for the right hippocampus. We can see this because the column for 'R_HIPP_path' does not allign propperly with the rest of the data.
+ To see this for yourself, use the command view(BRAIN_SUBS), and scroll down to SubID 7063_1. + You can see that because we do not have a path for that directory, the cell for that participant's 'R_HIPP_path', is actually the next particpant's (7066_1) 'R_HIPP_path'
BRAIN_SUBS[-1] <-lapply(BRAIN_SUBS[-1], function(x) x[match(BRAIN_SUBS$SubID, str_sub(x, 1, 6))])
# Use the following command to check the output:
# view(BRAIN_SUBS)
Step 7c: Next, we use 'str_split()', to modify our SubID column, -so we can merge our dataframe (BRAIN_SUBS) with other databases (for instance, excel spreadsheets exported from RedCap) which do not use the '_1' nomenclature to separate baseline and follow-up data. I other words, we want to change..
# Here we use a few commands to split the subject ids that we have (e.g., 7001_1) into subject ids that can be merged with other databases (i.e., 7001)
x<-BRAIN_SUBS$SubID
x<-as.character(x)
tmp<-strsplit(x, "_") #strsplit: splits a charachter string in a fixed spot. in this case at '_'
mat <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df$Session<-as.character(df$V2) #make var session
df<-df %>% select(ID, Session) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
BRAIN_SUBS<-cbind(df, BRAIN_SUBS) #bind/merge with orginal dataframe
# Run the following command to look at our resulting dataframe (BRAIN_SUBS). Now, we have a column 'ID' , which can be merged with other datasets.
# view(BRAIN_SUBS)
Step 7d: For the sake of organization, use the following commands to clean up your 'global enviroment' in R studio. This will remove all variables we've set thus far (seen in the upper right corner of R Studio), EXCEPT for the dataframe we've been working with (BRAIN_SUBS).
Step 7e: Import MeanFD text file and set-up for merging:
MeanFD<-read.delim2("meanFD.txt", header = F) #Not sure why, but R imports loads of duplicates in txt file.
MeanFD<-MeanFD[!duplicated(MeanFD),] # To remove duplicates
MeanFD<-data.frame(MeanFD) #as dataframe with 34 rows and no duplicates.
#view(MeanFD)
#Step 12b:** Next, we use a few commands to split the subject ids that we have (e.g., 7001_1) into subject ids that can be merged with other databases (i.e., 7001). Same as Step 7c
x<-MeanFD$MeanFD
x<-as.character(x)
tmp<-strsplit(x, " ") #strsplit: splits a charachter string in a fixed spot. in this case at' '[space]
mat <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df$meanFD<-as.character(df$V2) #make var session
MeanFD<-df %>% select(ID, meanFD) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
rm(x, mat,tmp, df)
# Now again, but for '_1'
x<-MeanFD$ID
x<-as.character(x)
tmp<-strsplit(x, "_") #strsplit: splits a charachter string in a fixed spot. in this case at'_1'
mat <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df<-df %>% select(ID) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
rm(mat,tmp)
MeanFD.df<-cbind(df, MeanFD) #bind/merge with orginal dataframe
MeanFD.df<-MeanFD.df[,-c(2)] # Remove duplicate col
# Step 12c:** Mean center MeanFD values, just in case we'll need them later. Same as Step 11b.
MeanFD.df$meanFD<-as.numeric(MeanFD.df$meanFD)
MeanFD.df$meanFD_mc<-scale(MeanFD.df$meanFD, scale = FALSE)
#Step 13: setup database
# Assign ID variables as character class
BRAIN_SUBS$ID<-as.character(BRAIN_SUBS$ID)
MeanFD.df$ID<-as.character(MeanFD.df$ID)
# join with current sublist
BRAIN_SUBS<-left_join(BRAIN_SUBS, MeanFD.df)
BRAIN_SUBS$meanFD<-as.numeric(BRAIN_SUBS$meanFD)
rm(list=setdiff(ls(), "BRAIN_SUBS"))
Step 8: Import volumetric data (MASTER SPSS Database doesn't include ICV).
* n88 included in mastersheet
* n27 included in our FREESURF output (7001:7063)
#Our FREESURFER OUTPUTS n27: #ASEG[,c(1,13)] #7001_1 == 3878.0
ASEG<-read.delim("/Users/alyshagilmore/Downloads/restingstate_crew/FreeSurfer_Results_sub7001_to_7063/aseg_stats.txt")
x<-ASEG$Measure.volume
x<-as.character(x)
tmp<-strsplit(x, "_") #strsplit: splits a charachter string in a fixed spot. in this case at' '[space]
mat <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
ASEG$ID<-df$ID
Brain_vols<-ASEG %>%
select(ID, EstimatedTotalIntraCranialVol, CSF, MaskVol,
BrainSegVol,BrainSegVolNotVentSurf,BrainSegVolNotVent,
SubCortGrayVol,
TotalGrayVol ,CorticalWhiteMatterVol, Brain.Stem,
Right.Hippocampus,Left.Hippocampus,
Left.Amygdala ,Right.Amygdala,
Right.Putamen,Left.Putamen,
Right.Caudate, Left.Caudate,
Right.Accumbens.area, Left.Accumbens.area,
Left.VentralDC, Right.VentralDC,
Left.Pallidum,Right.Pallidum,
Left.Thalamus.Proper,Right.Thalamus.Proper,
Left.Cerebellum.Cortex, Right.Cerebellum.Cortex,
Left.Cerebellum.White.Matter,
Right.vessel,Left.vessel,
Left.choroid.plexus,Right.choroid.plexus)
## Merge it all together and write out (check-point)
BRAIN<-left_join(BRAIN_SUBS,Brain_vols , by="ID")
BRAIN$ID<-as.character(BRAIN$ID)
write.csv(BRAIN, "EPICC_rsFCcrew_BrainData.csv")
rm(list=setdiff(ls(), "BRAIN"))
Step 9: Import FULL Sample VO2 Database. n=102
VO2.df<-readxl::read_xlsx("EPICC Data.xlsx", sheet = "VO2 Data") # Import fitness database.
VO2.df$Session<-as.character(VO2.df$`Pre / Post`)
VO2.df<-VO2.df[(VO2.df$Session=="PRE"),]
VO2.df<-VO2.df[,-c(3:4)]
VO2.df<-VO2.df[complete.cases(VO2.df$`Pre / Post`),]
VO2.df$ID<-as.character(VO2.df$`LAB ID`)
# Define variables as numeric values (because R sometimes likes to make them character strings).
VO2.df$AGE<-as.numeric(VO2.df$AGE)
VO2.df$VO2_Time<-as.numeric(VO2.df$`TEST TIME`)
VO2.df$VO2max_last2pts<-as.numeric(VO2.df$`PEAK VO2 LAST TWO PTS.`)
VO2.df$FE_O2<-as.numeric(VO2.df$`FEO2%`)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$FE_CO2<-as.numeric(VO2.df$`FECO2%`)
VO2.df$Age_max_HR_GOAL<-as.numeric(VO2.df$`85% MAX HR`)
VO2.df$VO2MAX_EST<-as.numeric(VO2.df$`ESTIMATED VO2 MAX (Extrapolated)`)
VO2.df$Age_Rank_VO2max<-as.numeric(VO2.df$`AGE PERCENTILE OF VO2 MAX`)
VO2.df$VO2_END_REASON_NOTES<-as.character(VO2.df$`TEST END REASONS`)
VO2.df$CRFrel<-as.numeric(VO2.df$`PEAK VO2/KG`)
VO2.df$CRFabs<-as.numeric(VO2.df$`PEAK VO2`)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$AGE<-as.numeric(VO2.df$AGE)
VO2.df$HR_DIFFERENCE<-as.numeric(VO2.df$`HR DIFFERENCE`)
VO2.df$PEAK_METS<-as.numeric(VO2.df$`PEAK METS`)
VO2.df$PEAK_HR<-as.numeric(VO2.df$`PEAK HR`)
VO2.df$VE_VCO2_SLOPE<-as.numeric(VO2.df$`VE / VCO2 SLOPE`)
VO2.df$PEAK_RER<-as.numeric(VO2.df$`PEAK RER`)
VO2.df$PEAK_RR<-as.numeric(VO2.df$`PEAK RR`)
VO2.df$weight_kg<-as.numeric(VO2.df$`WEIGHT (KG)`)
VO2.df$PEAK_CO2<-as.numeric(VO2.df$`PEAK CO2`)
VO2.df$PEAK_VE<-as.numeric(VO2.df$`PEAK VE`)
VO2.df$PEAK_VT<-as.numeric(VO2.df$`PEAK VT`)
VO2.df$FECO2<-as.numeric(VO2.df$`FECO2%`)
VO2.df$PEAK_RPE<-VO2.df$`PEAK RPE`
VO2.df$weight_lbs<-VO2.df$weight_kg*(2.205)
VO2.df$height_cm<-VO2.df$weight_kg/VO2.df$BMI
VO2.df$BMI_kg<-VO2.df$BMI
VO2.df$ID<-as.character(VO2.df$`LAB ID`)
VO2.df$VO2_END_factor<-as.numeric(VO2.df$`TEST END REASON?`)
VO2.df$VO2_END_REASON<-as.factor(VO2.df$`TEST END REASON?`)
VO2.df$AP_VO2MAX<-as.numeric(VO2.df$`AGE PERCENTILE OF VO2 MAX`)
VO2.df$VO2_END_REASON <- factor(VO2.df$VO2_END_REASON,
levels = c(1 ,2 ,3 , 4 ,5 , 6 ),
labels = c("Achieved HR Goal","Subject Quit", "Safety Concern", "Equipment Malfunction",
"Beta-Blocked- Reached RPE Goal",
"Other"))
Step 10: Define BMI groups (optional)
#Define Groups (Optional)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$BMI_group<-as.character(VO2.df$BMI)
VO2.df$BMI_group<-if_else(VO2.df$BMI < 25, "normal", VO2.df$BMI_group)
VO2.df$BMI_group<-if_else((VO2.df$BMI >=25 & VO2.df$BMI <=30), "overweight", VO2.df$BMI_group)
VO2.df$BMI_group<-if_else((VO2.df$BMI > 30 & VO2.df$BMI < 40), "obese", VO2.df$BMI_group)
VO2.df$BMI_group<-if_else(VO2.df$BMI > 40, "severe obese", VO2.df$BMI_group)
VO2.df$BMI_group <-as.factor(VO2.df$BMI_group)
VO2.df$BMI_group <- ordered(VO2.df$BMI_group, levels = c("normal", "overweight", "obese", "severe obese"))
table(VO2.df$BMI_group)
##
## normal overweight obese severe obese
## 15 38 36 13
Step 11: Make Beta-Blocker a factor, remove invalid VO2 subs, & make a new variable to include "valid" test end reasons.
VO2.df$BetaBlocker<-(grepl(VO2.df$VO2_END_REASON_NOTES, pattern = "beta", ignore.case = TRUE))
VO2.df$BetaBlocker<-if_else(VO2.df$BetaBlocker==TRUE, 1,0)
Step 11a: Let's see who was on a beta-blocker, but still achieved HR goal,
TEXT1<-c("Beta-Blocked- Achieved HR goal")
VO2_END_REASON_char<-as.character(VO2.df$VO2_END_REASON)
VO2.df$VO2_END_REASON<-if_else((VO2.df$BetaBlocker==1 & VO2.df$VO2_END_REASON=="Achieved HR Goal"), TEXT1, VO2_END_REASON_char)
VO2.df$VO2_END_REASON<-as.factor(VO2.df$VO2_END_REASON)
VO2.df$BetaBlocker<-as.factor(VO2.df$BetaBlocker)
table1<-VO2.df %>%
select(`LAB ID`, BetaBlocker,VO2_END_REASON, `TEST END REASONS` )
table1<-as_data_frame(table1)
table<-arrange(table1,by=VO2_END_REASON)
table1<-datatable(table,options = list( pageLength = 5))
summary(table[,2:3])
## BetaBlocker VO2_END_REASON
## 0:95 Achieved HR Goal :90
## 1: 7 Beta-Blocked- Achieved HR goal: 2
## Beta-Blocked- Reached RPE Goal: 5
## Other : 3
## Subject Quit : 2
Step 11b: Remove participants from the sample that ended their fittness test for any of the following reasons.
* Subject Quit
* Safety Concern
* Equipment Malfunction
* Other
This means we'll retain particpants who...
1. Achieved HR Goal
2. Beta-blocked & reached their goal RPE (15)
3. Beta-blocked, yet still achieved their HR goal.
CLEAN.VO2<-VO2.df
CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Other"),] #n==99
CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Subject Quit"),] #n==97
#CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Equipment Malfunction"),] #n==97
#CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Safety Concern"),] #n==97
write.csv(VO2.df, "EPICC_BS_VO2_102.csv")
write.csv(VO2.df, "clean_EPICC_BS_VO2_97.csv")
model<-(lm(CRFrel ~ AGE+ BMI_group+BetaBlocker, data=CLEAN.VO2))
#summary(lm(CRFrel ~AGE+ VO2_END_REASON+BMI, data=CLEAN.VO2))
pairs(emmeans::emmeans(model, "BMI_group"))
## contrast estimate SE df t.ratio p.value
## normal - overweight 0.996 0.913 91 1.091 0.6959
## normal - obese 2.642 0.924 91 2.861 0.0265
## normal - severe obese 5.914 1.122 91 5.271 <.0001
## overweight - obese 1.646 0.659 91 2.498 0.0669
## overweight - severe obese 4.917 0.933 91 5.273 <.0001
## obese - severe obese 3.272 0.942 91 3.474 0.0043
##
## Results are averaged over the levels of: BetaBlocker
## P value adjustment: tukey method for comparing a family of 4 estimates
pairs(emmeans::emmeans(model, "BetaBlocker"))
## contrast estimate SE df t.ratio p.value
## 0 - 1 0.682 1.11 91 0.615 0.5404
##
## Results are averaged over the levels of: BMI_group
#shapiro.test(VO2.df$CRFabs)
#shapiro.test(VO2.df$PEAK_METS)
#shapiro.test(VO2.df$PEAK_RPE)
#shapiro.test(VO2.df$height_cm)
#shapiro.test(VO2.df$PEAK_RER)
#shapiro.test(VO2.df$VO2_Time)
#shapiro.test(VO2.df$VO2MAX_EST) #NOT NORM DIST
#shapiro.test(VO2.df$HR_DIFFERENCE) #NOT NORM DIST
#shapiro.test(VO2.df$BMI) #NOT NORM DIST
#shapiro.test(VO2.df$AGE) #NOT NORM DIST
#shapiro.test(VO2.df$AP_VO2MAX)#NOT NORM DIST
Step 9 IMPORT SPSS DATABASE: Import SPSS database & use 'Codebook' package's Static Label Browser to poke around. Set 'eval=FALSE' to save knit time.
The links:
* Documentation: https://rubenarslan.github.io/codebook
* Example: https://rubenarslan.github.io/codebook/articles/codebook.html
The command: codebook::label_browser_static(), you should see a searchable data dictionary in your R viewer pane.
codebook_data <- rio::import("merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav")
codebook_data<-codebook_data %>% select(starts_with("POF"))
codebook::label_browser_static(codebook_data)
Below I'll select the following:
1. Demographics: Age, EDU, Rage, Smoking Status, Handedness.
2. I also included a few composite scores from the cognitive variables tab.
3. I included potential a few additional potential covariates.
Step X: BDH010A:HX represents binary logic (1=yes, 2=no) for catigorical racial status (White, Black, Asian, Other). It is likely that these questions were presented as multiple-entry check-boxes, and thus, they're imported as separate variables. Below, we use if_else() commands to label White and Black within their respective columns, and then again, to concatenate into one 'Race' column.
Step X: Let's also adjust our handedness column so that the values are mathmatically executeable. FSL will not recognize left vs right, but it will understand binary logic [1,0].
Step X: Import any other included datasets below. I find the dplyr to very helpful in finding groups of variables that I'm interested. See the following link, and list of cammands below. After all data is separated into respective dataframes, we can use the codebook package add-on to view any/all summary statistics.
Merged Full Sample Report (sources listed below):
VO2_DEMOS<-left_join(VO2.df, DEMOS)
VO2_DEMOS_T1<-left_join(VO2_DEMOS, T1, by="ID")
VO2_DEMOS_T1_EMO<-left_join(VO2_DEMOS_T1, EMO_data, by="ID")
VO2_DEMOS_T1_EMO_COG<-left_join(VO2_DEMOS_T1_EMO, COG_data, by="ID")
VO2_DEMOS_T1_EMO_COG_PAIN<-left_join(VO2_DEMOS_T1_EMO_COG, BPISF_FAT_data, by="ID")
VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP<-left_join(VO2_DEMOS_T1_EMO_COG_PAIN, Sleep_data, by="ID")
VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_TRI<-left_join(VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP, TRI_data, by="ID")
VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_BRAIN<-left_join(VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_TRI, BRAIN, by="ID")
EPICC_FULL<-VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_BRAIN
Full Sample: Summary Demographics Report:
FULL_SUMMARY<-EPICC_FULL %>%
select(ID, Age, BMI, BMI_group, Race, Smoker, BetaBlocker,
WeeklyMVPA_min, emo_avg, CRFrel, CRFabs, bdiav,
T1_Rest_Syst_BP, T1_Rest_Diast_BP,T1_Rest_HR,
Left.Hippocampus, meanFD)
FULL_SUMMARY<-as.tibble(FULL_SUMMARY)
print(dfSummary(FULL_SUMMARY[,2:17], plain.ascii = TRUE, display.labels = FALSE,
justify="c", graph.magnif = .60, valid.col = TRUE,
labels.col = FALSE, style = "grid",
varnumbers = FALSE, tmp.img.dir = "/tmp"),
method = 'render' )
| Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Age [numeric] | Mean (sd) : 63.8 (5.3) min < med < max: 51 < 64 < 77 IQR (CV) : 6.5 (0.1) | 23 distinct values | 72 (70.59%) | 30 (29.41%) | |||||||||||||||||
| BMI [numeric] | Mean (sd) : 31.1 (6.5) min < med < max: 20.1 < 29.8 < 51.6 IQR (CV) : 8.8 (0.2) | 101 distinct values | 102 (100%) | 0 (0%) | |||||||||||||||||
| BMI_group [ordered, factor] | 1. normal 2. overweight 3. obese 4. severe obese |
|
102 (100%) | 0 (0%) | |||||||||||||||||
| Race [factor] | 1. black 2. white |
|
72 (70.59%) | 30 (29.41%) | |||||||||||||||||
| Smoker [factor] | 1. Yes 2. No |
|
88 (86.27%) | 14 (13.73%) | |||||||||||||||||
| BetaBlocker [factor] | 1. 0 2. 1 |
|
102 (100%) | 0 (0%) | |||||||||||||||||
| WeeklyMVPA_min [numeric] | Mean (sd) : 284.9 (230) min < med < max: 25.8 < 195 < 915 IQR (CV) : 301.5 (0.8) | 67 distinct values | 71 (69.61%) | 31 (30.39%) | |||||||||||||||||
| emo_avg [numeric] | Mean (sd) : 1.5 (0.7) min < med < max: 1 < 1.2 < 3.5 IQR (CV) : 0.9 (0.4) | 18 distinct values | 67 (65.69%) | 35 (34.31%) | |||||||||||||||||
| CRFrel [numeric] | Mean (sd) : 17.1 (3.8) min < med < max: 7.3 < 16.7 < 27.8 IQR (CV) : 4.3 (0.2) | 102 distinct values | 102 (100%) | 0 (0%) | |||||||||||||||||
| CRFabs [numeric] | Mean (sd) : 1.4 (0.3) min < med < max: 0.5 < 1.4 < 2.1 IQR (CV) : 0.4 (0.2) | 102 distinct values | 102 (100%) | 0 (0%) | |||||||||||||||||
| bdiav [numeric] | Mean (sd) : 0.3 (0.2) min < med < max: 0 < 0.2 < 1.3 IQR (CV) : 0.2 (0.9) | 18 distinct values | 72 (70.59%) | 30 (29.41%) | |||||||||||||||||
| T1_Rest_Syst_BP [numeric] | Mean (sd) : 141.7 (16.4) min < med < max: 109 < 140.5 < 190 IQR (CV) : 22.2 (0.1) | 41 distinct values | 74 (72.55%) | 28 (27.45%) | |||||||||||||||||
| T1_Rest_Diast_BP [numeric] | Mean (sd) : 83.6 (9.1) min < med < max: 57 < 85 < 99 IQR (CV) : 10 (0.1) | 32 distinct values | 74 (72.55%) | 28 (27.45%) | |||||||||||||||||
| T1_Rest_HR [numeric] | Mean (sd) : 72.6 (10.5) min < med < max: 52 < 73 < 97 IQR (CV) : 13.8 (0.1) | 35 distinct values | 74 (72.55%) | 28 (27.45%) | |||||||||||||||||
| Left.Hippocampus [numeric] | Mean (sd) : 3843.4 (481.2) min < med < max: 2581.7 < 3878.2 < 4958.4 IQR (CV) : 401.7 (0.1) | 27 distinct values | 27 (26.47%) | 75 (73.53%) | |||||||||||||||||
| meanFD [numeric] | Mean (sd) : 0.1 (0) min < med < max: 0 < 0.1 < 0.2 IQR (CV) : 0.1 (0.4) | 34 distinct values | 34 (33.33%) | 68 (66.67%) |
Generated by summarytools 0.9.6 (R version 3.6.1)
2020-07-04
Full Sample: Test for Normality
* Link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3124223/
* Hawkins test of normality and homoscedasticity:
# Continuous Variables:
#shapiro.test(CLEAN_EPICC$AGE) # PASSES Shapiro-Wilk normality test
#shapiro.test(EPICC_FULL$AGE) # Fails Shapiro-Wilk normality test
#shapiro.test(EPICC_FULL$CRFrel)
#shapiro.test(EPICC_FULL$CRFabs)
#shapiro.test(EPICC_FULL$weight_kg) # Fails Shapiro-Wilk normality test
#shapiro.test(EPICC_FULL$BMI_kg) # Fails Shapiro-Wilk normality test
#shapiro.test(EPICC_FULL$VO2MAX_EST) # Fails Shapiro-Wilk normality test
#shapiro.test(EPICC_FULL$VO2MAX_EST) # Fails Shapiro-Wilk normality test
CHECK_NORM<-EPICC_FULL %>%
select(Age_max_HR_GOAL,BMI,CRFrel,BetaBlocker, Smoker, EDU, Race, Handedness)
CHECK_NORM[,(1:8)]<-lapply(CHECK_NORM[,(1:8)], as.numeric)
TestMCARNormality(CHECK_NORM)
## Call:
## TestMCARNormality(data = CHECK_NORM)
##
## Number of Patterns: 3
##
## Total number of cases used in the analysis: 102
##
## Pattern(s) used:
## Age_max_HR_GOAL BMI CRFrel BetaBlocker Smoker EDU Race
## group.1 1 1 1 1 1 1 1
## group.2 1 1 1 1 1 NA NA
## group.3 1 1 1 1 NA NA NA
## Handedness Number of cases
## group.1 1 72
## group.2 NA 16
## group.3 NA 14
##
##
## Test of normality and Homoscedasticity:
## -------------------------------------------
##
## Hawkins Test:
##
## P-value for the Hawkins test of normality and homoscedasticity: 1.109013e-26
##
## Either the test of multivariate normality or homoscedasticity (or both) is rejected.
## Provided that normality can be assumed, the hypothesis of MCAR is
## rejected at 0.05 significance level.
##
## Non-Parametric Test:
##
## P-value for the non-parametric test of homoscedasticity: 0.3686507
##
## Reject Normality at 0.05 significance level.
## There is not sufficient evidence to reject MCAR at 0.05 significance level.
Full Sample Complete Cases (n=67): Corrplot
EPIC_COR<-EPICC_FULL %>%
select( ID,Age, Race,BetaBlocker ,Smoker, emo_avg, EDU, T1_Weight_Kg, T1_BMI,T1_Rest_HR,
WeeklyMVPA_min,T1_Peak_VO2_ml_kg_min,bdiav,T1_Peak_VO2_Max_Value,
T1_Peak_VO2_ml_kg_min,T1_Rest_Syst_BP,T1_Rest_Diast_BP)
EPIC_COR<-EPIC_COR[complete.cases(EPIC_COR),]
EPIC_COR[,2:15]<-lapply(EPIC_COR[,2:15],as.numeric)
M <- cor(EPIC_COR[,2:15])
p.mat <- cor.mtest(EPIC_COR[,2:13])$p
res1 <- cor.mtest(EPIC_COR[,2:13], conf.level = .95)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(M, method = "color", col = col(200),
type = "full", number.cex = .7,
addCoef.col = "black", # Add coefficient of correlation
tl.col = "black", tl.srt = 45, # Text label color and rotation
# Combine with significance
p.mat = p.mat, sig.level = 0.05, insig = "blank",
# hide correlation coefficient on the principal diagonal
diag = FALSE)
You may choose any variabes to report here
EPICC_FULL$TOT_HIPP_VOL<-EPICC_FULL$Left.Hippocampus+EPICC_FULL$Right.Hippocampus
EPICC_FULL$TOT_VentralDC<-EPICC_FULL$Left.VentralDC+EPICC_FULL$Right.VentralDC
EPICC_FULL$TOT_Amygdala<-EPICC_FULL$Left.Amygdala+EPICC_FULL$Left.Amygdala
EPICC_FULL$TOT_Accumbens<-EPICC_FULL$Left.Accumbens.area+EPICC_FULL$Right.Accumbens.area
EPICC_FULL$TOT_Thalamus.Proper<-EPICC_FULL$Left.Thalamus.Proper+EPICC_FULL$Right.Thalamus.Proper
EPICC_FULL$TOT_Pallidum<-EPICC_FULL$Left.Pallidum+EPICC_FULL$Right.Pallidum
EPICC_FULL$TOT_Caudate<-EPICC_FULL$Left.Caudate+EPICC_FULL$Right.Caudate
EPICC_FULL$TOT_Putamen<-EPICC_FULL$Left.Putamen+EPICC_FULL$Right.Putamen
SAMPLE<-EPICC_FULL %>%
select(ID, Race, Handedness, BetaBlocker, Age, EDU, CRFrel,CRFabs,
BMI, WeeklyMVPA_min, bdiav, emo_avg, bdiav,
T1_Rest_HR,T1_Rest_Syst_BP,T1_Rest_Diast_BP, T1_Peak_Resp_Rate.,T1_Heart_Rate,
BPISF_meanpainseverity,FAT_avg,EPS_avg,PCL_tss,
EstimatedTotalIntraCranialVol, meanFD, weight_kg, BMI_group,
Left.Hippocampus, Right.Hippocampus,TOT_HIPP_VOL,
Left.Amygdala, Right.Amygdala,TOT_Amygdala,
Left.VentralDC, Right.VentralDC,TOT_VentralDC,
Left.Accumbens.area,Right.Accumbens.area,TOT_Accumbens ,Brain.Stem,
Right.Hippocampus,Left.Hippocampus,
Left.Amygdala ,Right.Amygdala,TOT_Amygdala,
Right.Putamen,Left.Putamen,TOT_Putamen,
Right.Caudate, Left.Caudate, TOT_Caudate,
Right.Accumbens.area, Left.Accumbens.area,
Left.VentralDC, Right.VentralDC,TOT_VentralDC,
Left.Pallidum,Right.Pallidum,TOT_Pallidum,
Left.Thalamus.Proper,Right.Thalamus.Proper,TOT_Thalamus.Proper,
Left.Cerebellum.Cortex, Right.Cerebellum.Cortex,
Left.Cerebellum.White.Matter,
TotalGrayVol, SubCortGrayVol, BrainSegVolNotVentSurf)
SAMPLE<-SAMPLE[complete.cases(SAMPLE),]
SAMPLE$ETZ<-scale(SAMPLE$EstimatedTotalIntraCranialVol, center=TRUE)
SAMPLE$ETZ<-as.numeric(SAMPLE$ETZ)
SAMPLE$CRFrelZ<-scale(SAMPLE$CRFrel, center=TRUE)
SAMPLE$CRFrelZ<-as.numeric(SAMPLE$CRFrelZ)
SAMPLE$AgeZ<-scale(SAMPLE$Age, center=TRUE)
SAMPLE$AgeZ<-as.numeric(SAMPLE$AgeZ)
SAMPLE$edu_Z<-scale(SAMPLE$EDU, center=TRUE)
SAMPLE$edu_Z<-as.numeric(SAMPLE$edu_Z)
SAMPLE$TotalGrayVol_Z<-scale(SAMPLE$TotalGrayVol, center=TRUE)
SAMPLE$TotalGrayVol_Z<-as.numeric(SAMPLE$TotalGrayVol_Z)
SAMPLE$BMI_Z<-scale(SAMPLE$BMI, center=TRUE)
SAMPLE$BMI_Z<-as.numeric(SAMPLE$BMI_Z)
SAMPLE$weight_kgZ<-scale(SAMPLE$weight_kg, center=TRUE)
SAMPLE$weight_kgZ<-as.numeric(SAMPLE$weight_kgZ)
SAMPLE$RaceZ<-as.numeric(SAMPLE$Race, center=TRUE)
SAMPLE$RaceZ<-scale(SAMPLE$RaceZ, center = TRUE)
SAMPLE$RaceZ<-as.numeric(SAMPLE$RaceZ, center = TRUE)
SAMPLE$HandednessZ<-as.numeric(SAMPLE$Handedness, center=TRUE)
SAMPLE$HandednessZ<-scale(SAMPLE$HandednessZ, center = TRUE)
SAMPLE$HandednessZ<-as.numeric(SAMPLE$HandednessZ)
SAMPLE$BMI_group <- ordered(SAMPLE$BMI_group, levels = c("normal", "overweight", "obese", "severe obese"))
#SAMPLE<-SAMPLE[!(SAMPLE$Smoker=="Yes"),]
#table(SAMPLE$BMI_group, by=SAMPLE$Race)
#table(SAMPLE$Handedness, by=SAMPLE$Race)
Contricted Sample Normality & Model fit:
require("emmeans")
#Continuous Variables: norm dist. vars are #'d out.
#shapiro.test(SAMPLE$Age)
#shapiro.test(SAMPLE$CRFrel)
#shapiro.test(SAMPLE$CRFabs)
#shapiro.test(SAMPLE$weight_kg)
#shapiro.test(SAMPLE$BMI)
#shapiro.test(SAMPLE$EstimatedTotalIntraCranialVol) #Normally dist
#shapiro.test(SAMPLE$edu_Z) #
#shapiro.test(SAMPLE$TOT_HIPP_VOL) #Normally dist
shapiro.test(SAMPLE$WeeklyMVPA_min) # Fails Shapiro-Wilk normality test
##
## Shapiro-Wilk normality test
##
## data: SAMPLE$WeeklyMVPA_min
## W = 0.77336, p-value = 4.893e-05
shapiro.test(SAMPLE$BPISF_meanpainseverity) # Fails Shapiro-Wilk normality test
##
## Shapiro-Wilk normality test
##
## data: SAMPLE$BPISF_meanpainseverity
## W = 0.82139, p-value = 0.0003289
shapiro.test(SAMPLE$emo_avg) # Fails Shapiro-Wilk normality test
##
## Shapiro-Wilk normality test
##
## data: SAMPLE$emo_avg
## W = 0.77311, p-value = 4.848e-05
shapiro.test(SAMPLE$bdiav) # Fails Shapiro-Wilk normality test
##
## Shapiro-Wilk normality test
##
## data: SAMPLE$bdiav
## W = 0.88627, p-value = 0.006579
shapiro.test(SAMPLE$PCL_tss) # Fails Shapiro-Wilk normality test
##
## Shapiro-Wilk normality test
##
## data: SAMPLE$PCL_tss
## W = 0.66441, p-value = 1.287e-06
shapiro.test(SAMPLE$PCL_tss) # Fails Shapiro-Wilk normality test
##
## Shapiro-Wilk normality test
##
## data: SAMPLE$PCL_tss
## W = 0.66441, p-value = 1.287e-06
#shapiro.test(SAMPLE$FAT_avg) # Fails Shapiro-Wilk normality test
cor.test(SAMPLE$CRFrelZ,SAMPLE$BMI_Z ) # R=-0.4477988
##
## Pearson's product-moment correlation
##
## data: SAMPLE$CRFrelZ and SAMPLE$BMI_Z
## t = -2.5041, df = 25, p-value = 0.01917
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.70742958 -0.08168513
## sample estimates:
## cor
## -0.4477988
lm.CRFrel <- lm(CRFrel ~ AgeZ + RaceZ + HandednessZ+ edu_Z + BMI , data = SAMPLE)
summary(lm.CRFrel)
##
## Call:
## lm(formula = CRFrel ~ AgeZ + RaceZ + HandednessZ + edu_Z + BMI,
## data = SAMPLE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4457 -2.0000 -0.1877 1.3017 4.3408
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.72532 3.03036 7.829 1.16e-07 ***
## AgeZ -0.80889 0.61639 -1.312 0.2036
## RaceZ 0.52044 0.53968 0.964 0.3458
## HandednessZ -0.49153 0.50900 -0.966 0.3452
## edu_Z 0.04416 0.51895 0.085 0.9330
## BMI -0.22284 0.09226 -2.415 0.0249 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.409 on 21 degrees of freedom
## Multiple R-squared: 0.3027, Adjusted R-squared: 0.1367
## F-statistic: 1.823 on 5 and 21 DF, p-value: 0.1518
Step 11: Constricted Sample Corrplot
Explore relative contributions of CRF and BMI on total hippocampus volume
* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4188138/
Res1<-lm(ETZ~TOT_HIPP_VOL, data=SAMPLE)
SAMPLE$tot_hipp<-Res1$fitted.values
Res2<-lm(ETZ~TOT_Amygdala, data=SAMPLE)
SAMPLE$tot_amy<-Res2$fitted.values
Res3<-lm(ETZ~TOT_VentralDC, data=SAMPLE)
SAMPLE$tot_ventalDC<-Res3$fitted.values
Res4<-lm(ETZ~TOT_Accumbens, data=SAMPLE)
SAMPLE$tot_accumbens<-Res4$fitted.values
Res5<-lm(ETZ~Brain.Stem, data=SAMPLE)
SAMPLE$BrainStem_vol<-Res5$fitted.values
Res6<-lm(ETZ~TOT_Pallidum, data=SAMPLE)
SAMPLE$tot_Pallidum<-Res6$fitted.values
Res7<-lm(ETZ~TOT_Amygdala, data=SAMPLE)
SAMPLE$tot_Amyg<-Res7$fitted.values
Res8<-lm(ETZ~TOT_Thalamus.Proper, data=SAMPLE)
SAMPLE$tot_Thalamus<-Res8$fitted.values
CORR<-SAMPLE %>%
select(ID, Age, BMI, EDU,meanFD, CRFrel,
tot_hipp, tot_ventalDC,tot_accumbens,BrainStem_vol,tot_Pallidum,tot_Amyg,tot_Thalamus,
T1_Rest_HR, T1_Rest_Syst_BP,T1_Rest_Diast_BP,
WeeklyMVPA_min,bdiav, emo_avg, PCL_tss, FAT_avg,BPISF_meanpainseverity )
M <- cor(CORR[,2:22])
p.mat <- cor.mtest(CORR[,2:22])$p
res1 <- cor.mtest(CORR[,2:22], conf.level = .95)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(M, method = "color",
col = col(200),
type = "full", number.cex = .5,
addCoef.col = "black", # Add coefficient of correlation
tl.col = "black", tl.srt = 45,tl.cex = .7, # Text label color and rotation
# Combine with significance
p.mat = p.mat, sig.level = 0.05, insig = "blank",
# hide correlation coefficient on the principal diagonal
diag = FALSE)
#summary(aov(TOT_HIPP_VOL~ CRFrel+ ETZ,data = SAMPLE)) #NS
summary(aov(TOT_HIPP_VOL~ CRFabs+ ETZ,data = SAMPLE))
## Df Sum Sq Mean Sq F value Pr(>F)
## CRFabs 1 2557586 2557586 3.282 0.0826 .
## ETZ 1 694685 694685 0.891 0.3545
## Residuals 24 18704239 779343
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(TOT_HIPP_VOL~ weight_kg+ ETZ,data = SAMPLE))
## Df Sum Sq Mean Sq F value Pr(>F)
## weight_kg 1 5202339 5202339 8.031 0.00918 **
## ETZ 1 1207046 1207046 1.863 0.18490
## Residuals 24 15547125 647797
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(TOT_HIPP_VOL~ BMI+ ETZ,data = SAMPLE))
## Df Sum Sq Mean Sq F value Pr(>F)
## BMI 1 3580228 3580228 5.175 0.0321 *
## ETZ 1 1773810 1773810 2.564 0.1224
## Residuals 24 16602471 691770
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#lm.bmi<- lm(log(TOT_HIPP_VOL) ~ BMI +ETZ , data = SAMPLE)
#summary(lm.bmi)
#cor.test(lm.bmi$residuals,SAMPLE$CRFrel ) #NS
#cor.test(lm.bmi$residuals,SAMPLE$CRFabs ) #NS
#lm.CRFabs<- lm(log(TOT_HIPP_VOL) ~ CRFabs + ETZ , data = SAMPLE)
#summary(lm.CRFabs)
#cor.test(lm.CRFabs$residuals,SAMPLE$BMI ) #NS
#cor.test(lm.CRFabs$residuals,SAMPLE$weight_kg ) #NS
#summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI , data = SAMPLE)) # NS
#summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+CRFrel , data = SAMPLE)) # NS
#summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI+CRFrel , data = SAMPLE)) # NS
summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI*CRFrelZ , data = SAMPLE))
##
## Call:
## lm(formula = TOT_HIPP_VOL ~ AgeZ + RaceZ + edu_Z + HandednessZ +
## ETZ + BMI * CRFrelZ, data = SAMPLE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1511.22 -337.19 -31.95 466.49 1093.17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6586.17 1171.06 5.624 2.46e-05 ***
## AgeZ -328.60 212.80 -1.544 0.1399
## RaceZ 157.92 224.26 0.704 0.4903
## edu_Z -216.87 182.95 -1.185 0.2513
## HandednessZ 35.94 187.56 0.192 0.8502
## ETZ 300.89 212.01 1.419 0.1729
## BMI 43.01 35.99 1.195 0.2476
## CRFrelZ -1908.41 825.43 -2.312 0.0328 *
## BMI:CRFrelZ 53.12 24.51 2.167 0.0439 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 797.5 on 18 degrees of freedom
## Multiple R-squared: 0.4786, Adjusted R-squared: 0.2468
## F-statistic: 2.065 on 8 and 18 DF, p-value: 0.09603
# BUT THIS DOESNT MAKE SENSE THE MAIN EFFECT OF CRFrel is neg and BMI is pos.
model<-(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI_Z*CRFrelZ , data = SAMPLE))
par(mfrow = c(2, 2)) # Split the plotting panel into a 2 x 2 grid
visreg::visreg(model, "BMI_Z", "CRFrelZ", overlay=TRUE)
visreg::visreg(model, "CRFrelZ", "BMI_Z", overlay=TRUE)
Import significant ZStats and PSC reports from /ROIoutput
STATS OUTPUT: # subject number, ROI, image, mean, median, max, coordinates (xyz) in standard space.
DATA=EPICC_FULL
ROI_DIR<-c("/Users/alyshagilmore/Downloads/restingstate_crew/ROIoutput/")
PSC<-list.files("/Users/alyshagilmore/Downloads/restingstate_crew/ROIoutput/", pattern = glob2rx("seed_pos*-psc.txt"))
Z<-list.files("/Users/alyshagilmore/Downloads/restingstate_crew/ROIoutput/", pattern = glob2rx("seed_pos*-zscr.txt"))
for (i in PSC) {
setwd(ROI_DIR)
featquery<-read.delim(i , sep = " ", header = FALSE)
colnames(featquery)<-c("ID", "ROI_CLUSTER_NAME", "Type", "L_R" , "POS_NEG","meanZ", "medianZ", "maxZ", "X", "Y", "Z")
featquery[,1:11]<-lapply(featquery[,1:11], as.character)
ZSTAT<-featquery$ROI_CLUSTER_NAME
ZSTAT<-strsplit(ZSTAT, "zstat")
mat <- matrix(unlist(ZSTAT), ncol=2, byrow=TRUE)
ZSTAT<-mat[,2]
POS_NEG<-if_else(ZSTAT=="2", "NEG", "POS")
featquery$L_R<-if_else(featquery$L_R=="left", "L", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="right", "R", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="0", "BILAT", featquery$L_R)
j<-i
ROI_CLUSTER_NAME<-strsplit(j, split = "-")
mat <- matrix(unlist(ROI_CLUSTER_NAME), ncol=3, byrow=TRUE)
df <- as.data.frame(mat)
ROI_CLUSTER_NAME<-df$V2
COL_NAME<-paste(featquery$L_R,ROI_CLUSTER_NAME, POS_NEG, "PBC", sep="_")
featquery<-featquery %>%
select(ID, meanZ )
names(featquery)<-c("ID", paste(COL_NAME[1]))
DATA <-left_join(featquery, DATA, by="ID" )
}
for (i in Z) {
setwd(ROI_DIR)
featquery<-read.delim(i , sep = " ", header = FALSE)
colnames(featquery)<-c("ID", "ROI_CLUSTER_NAME", "Type", "L_R" , "POS_NEG","meanZ", "medianZ", "maxZ", "X", "Y", "Z")
featquery[,1:11]<-lapply(featquery[,1:11], as.character)
ZSTAT<-featquery$ROI_CLUSTER_NAME
ZSTAT<-strsplit(ZSTAT, "zstat")
mat <- matrix(unlist(ZSTAT), ncol=2, byrow=TRUE)
ZSTAT<-mat[,2]
POS_NEG<-if_else(ZSTAT=="2", "NEG", "POS")
featquery$L_R<-if_else(featquery$L_R=="left", "L", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="right", "R", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="0", "BILAT", featquery$L_R)
j<-i
ROI_CLUSTER_NAME<-strsplit(j, split = "-")
mat <- matrix(unlist(ROI_CLUSTER_NAME), ncol=3, byrow=TRUE)
df <- as.data.frame(mat)
ROI_CLUSTER_NAME<-df$V2
COL_NAME<-paste(featquery$L_R,ROI_CLUSTER_NAME, POS_NEG, "Zstat", sep="_")
featquery<-featquery %>%
select(ID, meanZ )
names(featquery)<-c("ID", paste(COL_NAME[1]))
DATA <-left_join(featquery, DATA, by="ID" )
}
Full Sample: Corrplots
FULL_SUMMARY<-EPICC_FULL %>%
select(ID,BMI_group, BMI, VO2_END_REASON, Race, Smoker, BetaBlocker,
WeeklyMVPA_min, emo_avg, CRFrel, CRFabs, bdiav,T1_Rest_Syst_BP, T1_Rest_Diast_BP, Left.Hippocampus, meanFD)
COMPLETE<-CHECK_NORM
col_names <- names(COMPLETE)
COMPLETE[,col_names] <- lapply(COMPLETE[,col_names] , as.numeric)
COMPLETE.corr<-COMPLETE[complete.cases(COMPLETE),]
M <- cor(COMPLETE.corr[,3:10])
p.mat <- cor.mtest(COMPLETE.corr[,2:10])$p
res1 <- cor.mtest(COMPLETE.corr[,2:8], conf.level = .95)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(M, method = "color",
col = col(200),
type = "full", number.cex = .5,
addCoef.col = "black", # Add coefficient of correlation
tl.col = "black", tl.srt = 45,tl.cex = .7, # Text label color and rotation
# Combine with significance
p.mat = p.mat, sig.level = 0.05, insig = "blank",
# hide correlation coefficient on the principal diagonal
diag = FALSE)
Import SPSS & Data Dictionary Standard Import-FULL of SPSS database. * This is quick script will read any SPSS database and automatically create a labeled data dictionary.
* Tutorial Here: https://cran.r-project.org/web/packages/dataMeta/vignettes/dataMeta_Vignette.html
EPIC_data<-read.spss("merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav")
variable.labels<-attr(EPIC_data, "variable.labels")
linker<-build_linker(EPIC_data, variable.labels, 1)
dict <- build_dict(EPIC_data,linker,option_description = NULL, prompt_varopts = FALSE)
data_desc = "AVAILABLE Baseline Variables -06/26/20 AG Orginal File: MASTER SPSS merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav"
EPICC <- incorporate_attr(EPIC_data, dict, data_desc)
# Exporting dictionary only:
dict_only <- attributes(EPICC)$dictionary
write_csv(dict_only, "dict_only.csv")
# Saving as .rds (dataset with appended dictionary)
save_it(EPICC,"MASTER_BaselineEPICC_88")
Multiplot Grids: Visualize Sample Normality http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols: Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
library(grid)
# Make a list from the ... arguments and plotlist
plots <- c(list(...), plotlist)
numPlots = length(plots)
# If layout is NULL, then use 'cols' to determine layout
if (is.null(layout)) {
# Make the panel
# ncol: Number of columns of plots
# nrow: Number of rows needed, calculated from # of cols
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}
if (numPlots==1) {
print(plots[[1]])
} else {
# Set up the page
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
# Make each plot, in the correct location
for (i in 1:numPlots) {
# Get the i,j matrix positions of the regions that contain this subplot
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
layout.pos.col = matchidx$col))
}
}
}
#DATA PLOTS
EPIC<-CONSTRICTED_DAT
p1<-ggstatsplot::ggbetweenstats(
data = EPIC,
x = Race,
y = Age,
effsize.type = "biased", # type of effect size
outlier.coef = 2,
pairwise.comparisons = TRUE,
messages = FALSE,
outlier.tagging = TRUE,
outlier.label = ID,
mean.plotting = TRUE,
# arguments relevant for ggstatsplot::combine_plots
plotgrid.args = list(
align = "h", axis = "b", nrow = 1)
)
p2<-ggstatsplot::ggbetweenstats(
data = EPIC,
x = Race,
y = EDU,
effsize.type = "biased", # type of effect size
outlier.coef = 2,
pairwise.comparisons = TRUE,
messages = FALSE,
outlier.tagging = TRUE,
outlier.label = ID,
mean.plotting = TRUE,
# arguments relevant for ggstatsplot::combine_plots
plotgrid.args = list(
align = "h", axis = "b", nrow = 1)
)
p3<-ggstatsplot::ggbetweenstats(
data = EPIC,
x = Race,
y = T1_BMI,
pairwise.comparisons = TRUE,
effsize.type = "biased", # type of effect size
outlier.coef = 2,
messages = FALSE,
outlier.tagging = TRUE,
outlier.label = ID,
mean.plotting = TRUE, # arguments relevant for ggstatsplot::combine_plots
plotgrid.args = list(
align = "h", axis = "b", nrow = 1)
)
p4<-ggstatsplot::ggbetweenstats(
data = EPIC,
x = Handedness,
y = Age,
pairwise.comparisons = TRUE,
effsize.type = "biased", # type of effect size
outlier.coef = 2,
messages = FALSE,
outlier.tagging = TRUE,
outlier.label = ID,
mean.plotting = TRUE, # arguments relevant for ggstatsplot::combine_plots
plotgrid.args = list(
align = "h", axis = "b", nrow = 1)
)
p5<-ggstatsplot::ggbetweenstats(
data = EPIC,
x = Handedness,
y = EDU,
pairwise.comparisons = TRUE,
effsize.type = "biased", # type of effect size
outlier.coef = 2,
messages = FALSE,
outlier.tagging = TRUE,
outlier.label = ID,
mean.plotting = TRUE, # arguments relevant for ggstatsplot::combine_plots
plotgrid.args = list(
align = "h", axis = "b", nrow = 1)
)
p6<-ggstatsplot::ggbetweenstats(
data = EPIC,
x = Handedness,
y = T1_BMI,
pairwise.comparisons = TRUE,
effsize.type = "biased", # type of effect size
outlier.coef = 2,
messages = FALSE,
outlier.tagging = TRUE,
outlier.label = ID,
mean.plotting = TRUE, # arguments relevant for ggstatsplot::combine_plots
plotgrid.args = list(
align = "h", axis = "b", nrow = 1)
)
multiplot(p1, p2, p3,
p4,p5, p6 ,
cols=3)
StepX Select the variables you wish to include in your analyses and R will export a complete sample. That is, R will only extract particpants with complete cases for all demographics and your variables of interest. In this case, I'll extract all participant who have complete data for the following variables.
Step 11: Mean center any continuous variables:
Step 11a: First, we ensure all continuous variables for mean centering are in their numeric form. Step11b: Next, we use the scale function to mean center those variables.
Great links on mean centering here:
COMPLETE_EPICC<-FITT_DEMOS_Outcomes_BRAIN
# Assign continuous variables classes
COMPLETE_EPICC$AGE<-as.numeric(COMPLETE_EPICC$AGE)
COMPLETE_EPICC$BMI_kg<-as.numeric(COMPLETE_EPICC$BMI_kg)
COMPLETE_EPICC$EDU<-as.numeric(COMPLETE_EPICC$EDU)
COMPLETE_EPICC$CRFabs<-as.numeric(COMPLETE_EPICC$CRFabs)
COMPLETE_EPICC$CRFrel<-as.numeric(COMPLETE_EPICC$CRFrel)
COMPLETE_EPICC$weight_mc<-as.numeric(COMPLETE_EPICC$weight_kg)
COMPLETE_EPICC$Age_mc<-scale(COMPLETE_EPICC$AGE, scale = FALSE)
COMPLETE_EPICC$BMI_mc<-scale(COMPLETE_EPICC$BMI_kg, scale = FALSE)
COMPLETE_EPICC$EDU_mc<-scale(COMPLETE_EPICC$EDU, scale = FALSE)
COMPLETE_EPICC$CRFabs_mc<-scale(COMPLETE_EPICC$CRFabs, scale = FALSE)
COMPLETE_EPICC$CRFrel_mc<-scale(COMPLETE_EPICC$CRFrel, scale = FALSE)
COMPLETE_EPICC$weight_mc<-scale(COMPLETE_EPICC$weight_kg, center = TRUE)
Import feat results: (old version strait from featquery output)
featquery_Lpallidum_meants<-read.delim("/Users/alyshagilmore/Downloads/restingstate_crew/fitness_hipptot.gfeat/cope1.feat/L_Pallidum/mean_mask_ts.txt" ,header = FALSE)
Lpallidum_mts<-featquery_Lpallidum_meants$V1
DATA<-cbind(DATA,Lpallidum_mts)
DATA$Lpallidum_mts<-as.numeric(DATA$Lpallidum_mts)
featquery_Lpallidum_Zstat<-read.delim("/Users/alyshagilmore/Downloads/restingstate_crew/fitness_hipptot.gfeat/cope1.feat/L_Pallidum/max_thresh_zstat2_ts.txt" ,header = FALSE)
Lpallidum_Zstat<-featquery_Lpallidum_Zstat$V1
DATA<-cbind(DATA,Lpallidum_Zstat)
DATA$Lpallidum_Zstat<-as.numeric(DATA$Lpallidum_Zstat)