EPICC Neuroimaging Analyses I:

Preface: You have to log into the Pitt network via Pulse Secure first, and then either use the terminal to access Zeus or a remote desktop program. In the next section, use your local terminal (bash/ctsh) to complete steps 1:6. If you do not see the terminal in your RStudio set-up, use the RStudio menu 'Tools > Terminal > New Terminal'.

From your local terminal...

Step 1: Using the terminal tab in RStudio (not Console tab), ssh into Zeuz by typing the following command (plus enter password):

Step 2: Still in the terminal, move to the the directory where your subject-level data is stored:

cd /Volumes/Disk1/EPICC/SubjectScans

Step 3: Generate a subject ID list using the following command in the terminal. Copy and paste the output into an excel sheet. (I haven't quite figured out how to [simply] assign variables from the terminal into R studio). Name the column SubID. Save the spreadsheet as 'Baseline_Subs_Brain.csv'. We'll continue to add columns to this below.

ls -d -1 7*_1

Step 4: Do the same thing we just did (using the terminal), but this time, list out the subject's left hippocampus .feat directories. Copy and paste the output into a new column in your excel sheet, labeled 'L_HIPP_path'

ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/lhip*.feat

Step 5a: Do the same thing we just did, but this time, list out the subject's RIGHT hippocampus .feat directories (or any other seeds that you wish): Copy and paste the output into a new column in your excel sheet, labeled 'R_HIPP_path'

ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/rhip*.feat

Step 5b: Continue to do the same thing for each seed path you would like to add to the subject list. *For each seed path, copy/paste the output into a new column in your excel sheet, labeled '_path'*

ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/tothippocampus_filtered*
ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/tothippocampus_*
ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/tothippocampus_with*

Step 6: Save as .csv and close 'Baseline_Subs_Brain.csv'

From the R console...

If you have not installed the these packages, do so now by copy/pasting the following commands to your R Console:
- install.packages("dplyr")
- install.packages("readxl")
- install.packages("tidyverse")
- install.packages("stringr")
- install.packages("foreign")
- install.packages("emmeans")
- install.packages("MissMech")

Step 7: Concatinate .gfeat paths & import meanFD values
Step 7a: Import the .csv you just made (Baseline_Subs_Brain.csv)

setwd("~/Downloads/restingstate_crew/CREATE_SUB_LIST")
BRAIN_SUBS<-read.csv("Baseline_Subs_Brain.csv") #Import the .csv you just made. 
BRAIN_SUBS$SubID<-as.character(BRAIN_SUBS$SubID) #read SubID as a character variable, not numeric.

Step 7b: First, we need to arrange our column rows so that each row includes the path which matches the SubID column.

For example, Particpant #7063_1 does not have data processed for the right hippocampus. We can see this because the column for 'R_HIPP_path' does not allign propperly with the rest of the data.
- To see this for yourself, use the command view(BRAIN_SUBS), and scroll down to SubID 7063_1.
- You can see that because we do not have a path for that directory, the cell for that participant's 'R_HIPP_path', is actually the next particpant's (7066_1) 'R_HIPP_path'
To correct this, we'll use the following sorting command: lapply(BRAIN_SUBS[-1], function(x) x[match(df$MASTER, x)])
- The lapply() command, is a way to simply apply a set of commands, function(x), to a list of vectors.
- The first input in lapply() is the list of vectors you want to apply the command(s) to.
- In our case, that is all columns in our dataframe == BRAIN_SUBS[-1]
- The second input: function(x) x[match(df$MASTER, x)], allows you to set function(x) (to be applied to each vector).
- In this case, we'll the following combination of nested commands: x[match(BRAIN_SUBS$SubID, str_sub(x, 1, 6))], which allows us to replace the 6 leading character strings in the seed path ('7xxx_1') with that of our SubID coulmn, for that specific row.

BRAIN_SUBS[-1]  <-lapply(BRAIN_SUBS[-1], function(x) x[match(BRAIN_SUBS$SubID, str_sub(x, 1, 6))])
# Use the following command to check the output:
# view(BRAIN_SUBS)

Step 7c: Next, we use 'str_split()', to modify our SubID column, -so we can merge our dataframe (BRAIN_SUBS) with other databases (for instance, excel spreadsheets exported from RedCap) which do not use the '_1' nomenclature to separate baseline and follow-up data. I other words, we want to change..

7001_1 --> 7001
7002_1 --> 7002 ... (etc.)

# Here we use a few commands to split the subject ids that we have (e.g., 7001_1) into subject ids that can be merged with other databases (i.e., 7001)
x<-BRAIN_SUBS$SubID 
x<-as.character(x)
tmp<-strsplit(x, "_")   #strsplit: splits a charachter string in a fixed spot. in this case at '_'
mat  <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df$Session<-as.character(df$V2) #make var session
df<-df %>% select(ID, Session) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
BRAIN_SUBS<-cbind(df, BRAIN_SUBS) #bind/merge with orginal dataframe 
# Run the following command to look at our resulting dataframe (BRAIN_SUBS). Now, we have a column 'ID' , which can be merged with other datasets. 
# view(BRAIN_SUBS)

Step 7d: For the sake of organization, use the following commands to clean up your 'global enviroment' in R studio. This will remove all variables we've set thus far (seen in the upper right corner of R Studio), EXCEPT for the dataframe we've been working with (BRAIN_SUBS).

Step 7e: Import MeanFD text file and set-up for merging:

MeanFD<-read.delim2("meanFD.txt", header = F) #Not sure why, but R imports loads of duplicates in txt file. 
MeanFD<-MeanFD[!duplicated(MeanFD),] # To remove duplicates
MeanFD<-data.frame(MeanFD) #as dataframe with 34 rows and no duplicates.

#view(MeanFD)
#Step 12b:** Next, we use a few commands to split the subject ids that we have (e.g., 7001_1) into subject ids that can be merged with other databases (i.e., 7001). Same as Step 7c

x<-MeanFD$MeanFD
x<-as.character(x)
tmp<-strsplit(x, " ")   #strsplit: splits a charachter string in a fixed spot. in this case at' '[space]
mat  <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df$meanFD<-as.character(df$V2) #make var session
MeanFD<-df %>% select(ID, meanFD) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
rm(x, mat,tmp, df)

# Now again, but for '_1'
x<-MeanFD$ID 
x<-as.character(x)
tmp<-strsplit(x, "_")   #strsplit: splits a charachter string in a fixed spot. in this case at'_1'
mat  <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df<-df %>% select(ID) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
rm(mat,tmp)

MeanFD.df<-cbind(df, MeanFD) #bind/merge with orginal dataframe 
MeanFD.df<-MeanFD.df[,-c(2)] # Remove duplicate col

# Step 12c:** Mean center MeanFD values, just in case we'll need them later.  Same as Step 11b.
  MeanFD.df$meanFD<-as.numeric(MeanFD.df$meanFD)
  MeanFD.df$meanFD_mc<-scale(MeanFD.df$meanFD, scale = FALSE)

#Step 13: setup database
  # Assign ID variables as character class
  BRAIN_SUBS$ID<-as.character(BRAIN_SUBS$ID)
  MeanFD.df$ID<-as.character(MeanFD.df$ID)

  # join with current sublist
  BRAIN_SUBS<-left_join(BRAIN_SUBS, MeanFD.df)
  BRAIN_SUBS$meanFD<-as.numeric(BRAIN_SUBS$meanFD)
  rm(list=setdiff(ls(), "BRAIN_SUBS"))

Step 8: Import FreeSurfer Volumetric Data
* n88 included in mastersheet
* n27 included in our FREESURF output (7001:7063)

ASEG<-read.delim("/Users/alyshagilmore/Downloads/restingstate_crew/FreeSurfer_Results_sub7001_to_7063/aseg_stats.txt")
x<-ASEG$Measure.volume
x<-as.character(x)
tmp<-strsplit(x, "_")   #strsplit: splits a charachter string in a fixed spot. in this case at' '[space]
mat  <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
ASEG$ID<-df$ID
Brain_vols<-ASEG %>% 
  select(ID, EstimatedTotalIntraCranialVol, CSF, MaskVol,
         BrainSegVol,BrainSegVolNotVentSurf,BrainSegVolNotVent, 
         SubCortGrayVol,
         TotalGrayVol ,CorticalWhiteMatterVol, Brain.Stem, 
         Right.Hippocampus,Left.Hippocampus,
         Left.Amygdala ,Right.Amygdala,
         Right.Putamen,Left.Putamen,
         Right.Caudate, Left.Caudate, 
         Right.Accumbens.area, Left.Accumbens.area,
         Left.VentralDC, Right.VentralDC,
         Left.Pallidum,Right.Pallidum,
         Left.Thalamus.Proper,Right.Thalamus.Proper,
         Left.Cerebellum.Cortex, Right.Cerebellum.Cortex,
         Left.Cerebellum.White.Matter, Right.Cerebellum.White.Matter,
         Right.vessel,Left.vessel,
         Left.choroid.plexus,Right.choroid.plexus)

# Sum Total Volume
Brain_vols$TOT_HIPP_VOL<-Brain_vols$Left.Hippocampus+Brain_vols$Right.Hippocampus
Brain_vols$TOT_VentralDC<-Brain_vols$Left.VentralDC+Brain_vols$Right.VentralDC
Brain_vols$TOT_Amygdala<-Brain_vols$Left.Amygdala+Brain_vols$Left.Amygdala
Brain_vols$TOT_Accumbens<-Brain_vols$Left.Accumbens.area+Brain_vols$Right.Accumbens.area
Brain_vols$TOT_Thalamus.Proper<-Brain_vols$Left.Thalamus.Proper+Brain_vols$Right.Thalamus.Proper
Brain_vols$TOT_Pallidum<-Brain_vols$Left.Pallidum+Brain_vols$Right.Pallidum
Brain_vols$TOT_Caudate<-Brain_vols$Left.Caudate+Brain_vols$Right.Caudate
Brain_vols$TOT_Putamen<-Brain_vols$Left.Putamen+Brain_vols$Right.Putamen

## Merge it all together and write out (check-point)
BRAIN<-left_join(BRAIN_SUBS,Brain_vols , by="ID")
BRAIN$ID<-as.character(BRAIN$ID)
write.csv(BRAIN, "Baseline_Brain_data.csv")
rm(list=setdiff(ls(), "BRAIN"))

Step 9: Import FULL Sample VO2 Database n=102

VO2.df<-readxl::read_xlsx("EPICC Data.xlsx", sheet = "VO2 Data") # Import fitness database. 
VO2.df$Session<-as.character(VO2.df$`Pre / Post`)
VO2.df<-VO2.df[(VO2.df$Session=="PRE"),]
VO2.df<-VO2.df[,-c(3:4)]
VO2.df<-VO2.df[complete.cases(VO2.df$`Pre / Post`),]
VO2.df$ID<-as.character(VO2.df$`LAB ID`)

# Define variables as numeric values (because R sometimes likes to make them character strings).
VO2.df$AGE<-as.numeric(VO2.df$AGE)
VO2.df$VO2_Time<-as.numeric(VO2.df$`TEST TIME`)
VO2.df$VO2max_last2pts<-as.numeric(VO2.df$`PEAK VO2 LAST TWO PTS.`)
VO2.df$FE_O2<-as.numeric(VO2.df$`FEO2%`)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$FE_CO2<-as.numeric(VO2.df$`FECO2%`)
VO2.df$Age_max_HR_GOAL<-as.numeric(VO2.df$`85% MAX HR`)
VO2.df$VO2MAX_EST<-as.numeric(VO2.df$`ESTIMATED VO2 MAX (Extrapolated)`)
VO2.df$Age_Rank_VO2max<-as.numeric(VO2.df$`AGE PERCENTILE OF VO2 MAX`)
VO2.df$VO2_END_REASON_NOTES<-as.character(VO2.df$`TEST END REASONS`)
VO2.df$CRFrel<-as.numeric(VO2.df$`PEAK VO2/KG`)
VO2.df$CRFabs<-as.numeric(VO2.df$`PEAK VO2`)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$AGE<-as.numeric(VO2.df$AGE)
VO2.df$HR_DIFFERENCE<-as.numeric(VO2.df$`HR DIFFERENCE`)
VO2.df$PEAK_METS<-as.numeric(VO2.df$`PEAK METS`)
VO2.df$PEAK_HR<-as.numeric(VO2.df$`PEAK HR`)
VO2.df$VE_VCO2_SLOPE<-as.numeric(VO2.df$`VE / VCO2 SLOPE`)
VO2.df$PEAK_RER<-as.numeric(VO2.df$`PEAK RER`)
VO2.df$PEAK_RR<-as.numeric(VO2.df$`PEAK RR`)
VO2.df$weight_kg<-as.numeric(VO2.df$`WEIGHT (KG)`)
VO2.df$PEAK_CO2<-as.numeric(VO2.df$`PEAK CO2`)
VO2.df$PEAK_VE<-as.numeric(VO2.df$`PEAK VE`)
VO2.df$PEAK_VT<-as.numeric(VO2.df$`PEAK VT`)
VO2.df$FECO2<-as.numeric(VO2.df$`FECO2%`)
VO2.df$PEAK_RPE<-VO2.df$`PEAK RPE`
VO2.df$weight_lbs<-VO2.df$weight_kg*(2.205)
VO2.df$height_cm<-VO2.df$weight_kg/VO2.df$BMI
VO2.df$BMI_kg<-VO2.df$BMI
VO2.df$ID<-as.character(VO2.df$`LAB ID`)
VO2.df$VO2_END_factor<-as.numeric(VO2.df$`TEST END REASON?`)
VO2.df$VO2_END_REASON<-as.factor(VO2.df$`TEST END REASON?`)
VO2.df$AP_VO2MAX<-as.numeric(VO2.df$`AGE PERCENTILE OF VO2 MAX`)

VO2.df$VO2_END_REASON <- factor(VO2.df$VO2_END_REASON,
levels = c(1 ,2 ,3 , 4 ,5 , 6 ),
labels = c("Achieved HR Goal","Subject Quit", "Safety Concern", "Equipment Malfunction",
           "Beta-Blocked- Reached RPE Goal",
           "Other"))

# Define BMI groups (optional)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$BMI_group<-as.character(VO2.df$BMI)
VO2.df$BMI_group<-if_else(VO2.df$BMI < 25, "normal", VO2.df$BMI_group)
VO2.df$BMI_group<-if_else((VO2.df$BMI >=25 & VO2.df$BMI <=30), "overweight",  VO2.df$BMI_group)
VO2.df$BMI_group<-if_else((VO2.df$BMI > 30 & VO2.df$BMI < 40), "obese", VO2.df$BMI_group)
VO2.df$BMI_group<-if_else(VO2.df$BMI > 40, "severe obese", VO2.df$BMI_group)
VO2.df$BMI_group <-as.factor(VO2.df$BMI_group)
VO2.df$BMI_group <- ordered(VO2.df$BMI_group, levels = c("normal", "overweight", "obese", "severe obese"))
table(VO2.df$BMI_group)

## 
##       normal   overweight        obese severe obese 
##           15           38           36           13

Step 10: Make Beta-Blocker a factor.

VO2.df$BetaBlocker<-(grepl(VO2.df$VO2_END_REASON_NOTES, pattern = "beta", ignore.case = TRUE))
VO2.df$BetaBlocker<-if_else(VO2.df$BetaBlocker==TRUE, 1,0)

Step 10a: Let's see who was on a beta-blocker, but still achieved HR goal,

##  BetaBlocker                        VO2_END_REASON
##  0:95        Achieved HR Goal              :90    
##  1: 7        Beta-Blocked- Achieved HR goal: 2    
##              Beta-Blocked- Reached RPE Goal: 5    
##              Other                         : 3    
##              Subject Quit                  : 2

Step 10b: Remove participants that ended their fittness test for any of the following reasons.

Subject Quit
Safety Concern
Equipment Malfunction
Other

This means we'll retain particpants who...
1. Achieved HR Goal
2. Beta-blocked & reached their goal RPE (15)
3. Beta-blocked, yet still achieved their HR goal.

CLEAN.VO2<-VO2.df
CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Other"),]  #n==99
CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Subject Quit"),]  #n==97
CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Equipment Malfunction"),]  #n==97
CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Safety Concern"),]  #n==97

write.csv(VO2.df, "EPICC_BS_VO2_102.csv")
write.csv(VO2.df, "clean_EPICC_BS_VO2_97.csv")

Step 10b: Remove participants younger than age 50. (n=91)

CLEAN.VO2$AGE<-as.numeric(CLEAN.VO2$AGE)
table(CLEAN.VO2$AGE>50)

## 
## FALSE  TRUE 
##     6    91

CLEAN.VO2<-CLEAN.VO2[CLEAN.VO2$AGE>50,]

Step 10c: Exploratory CRF models & norm testing

model<-(lm(CRFrel ~ AGE+ BMI_group+BetaBlocker, data=CLEAN.VO2))
#summary(lm(CRFrel ~AGE+ VO2_END_REASON+BMI, data=CLEAN.VO2))
pairs(emmeans::emmeans(model, "BMI_group"))

##  contrast                  estimate    SE df t.ratio p.value
##  normal - overweight          0.592 0.956 85 0.619   0.9257 
##  normal - obese               2.069 0.967 85 2.140   0.1489 
##  normal - severe obese        5.417 1.189 85 4.557   0.0001 
##  overweight - obese           1.477 0.668 85 2.211   0.1285 
##  overweight - severe obese    4.825 0.956 85 5.045   <.0001 
##  obese - severe obese         3.348 0.974 85 3.437   0.0050 
## 
## Results are averaged over the levels of: BetaBlocker 
## P value adjustment: tukey method for comparing a family of 4 estimates

pairs(emmeans::emmeans(model, "BetaBlocker"))

##  contrast estimate  SE df t.ratio p.value
##  0 - 1       0.736 1.1 85 0.670   0.5050 
## 
## Results are averaged over the levels of: BMI_group

#shapiro.test(VO2.df$CRFabs)
#shapiro.test(VO2.df$PEAK_METS) 
#shapiro.test(VO2.df$PEAK_RPE)
#shapiro.test(VO2.df$height_cm) 
#shapiro.test(VO2.df$PEAK_RER)
#shapiro.test(VO2.df$VO2_Time)

#shapiro.test(VO2.df$VO2MAX_EST) #NOT NORM DIST
#shapiro.test(VO2.df$HR_DIFFERENCE) #NOT NORM DIST
#shapiro.test(VO2.df$BMI) #NOT NORM DIST
#shapiro.test(VO2.df$AGE) #NOT NORM DIST
#shapiro.test(VO2.df$AP_VO2MAX)#NOT NORM DIST

Step 11 IMPORT SPSS DATABASE: Import SPSS database & use 'Codebook' package's Static Label Browser to poke around. Set 'eval=FALSE' to save knit time.

The links:
* Documentation: https://rubenarslan.github.io/codebook
* Example: https://rubenarslan.github.io/codebook/articles/codebook.html

The command: codebook::label_browser_static(), you should see a searchable data dictionary in your R viewer pane.

codebook_data <- rio::import("merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav")
codebook_data<-codebook_data %>% select(starts_with("BDH"))
codebook::label_browser_static(codebook_data)

Import Demographics: Age, EDU, Rage, Smoking Status, Handedness, English_native

# Import .sav data
EPIC_vars<-read.spss("merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav" , use.value.labels = TRUE, trim.factor.names = FALSE, to.data.frame = TRUE)
EPIC_vars<-as.data.frame(EPIC_vars)
EPIC_vars$ncomorbidities<-as.numeric(EPIC_vars$ncomorbidities)
EPIC_vars$radiation_n<-EPIC_vars$SBD009

# Rename known variables for exported sublist
EPIC_vars$ID<-as.character(EPIC_vars$ID)
EPIC_vars$EDU<-EPIC_vars$BDH004   #Rename variables for exported sublist
EPIC_vars$Smoker<-EPIC_vars$SBD003   #Rename variables for exported sublist
EPIC_vars$Age<-EPIC_vars$BDH001 
EPIC_vars$Handedness<-EPIC_vars$BDH003
EPIC_vars$English_native<-EPIC_vars$BDH006
EPIC_vars$Handedness<-as.factor(EPIC_vars$Handedness)
EPIC_vars$ASA_Status<-EPIC_vars$POF009
EPIC_vars$ASA_Physical_Status<-as.numeric(EPIC_vars$POF009)
EPIC_vars$English_native<-EPIC_vars$BDH006

Step 11a: BDH010A:HX represents binary logic (1=yes, 2=no) for catigorical racial status (White, Black, Asian, Other). It is likely that these questions were presented as multiple-entry check-boxes, and thus, they're imported as separate variables. Below, we use if_else() commands to label White and Black within their respective columns, and then again, to concatenate into one 'Race' column.

EPIC_vars$BDH010A<-as.numeric(EPIC_vars$BDH010A)
EPIC_vars$BDH010A<-if_else(EPIC_vars$BDH010A==1, "White", "Other")
EPIC_vars$BDH010B<-as.numeric(EPIC_vars$BDH010B)
EPIC_vars$BDH010B<-if_else(EPIC_vars$BDH010B==1, "AA", "Other")

Race.df<-EPIC_vars  %>%
  select(ID, BDH010B,BDH010A)
EPIC_vars$Race<-""
EPIC_vars$Race<-if_else(Race.df$BDH010B=="AA", "black", EPIC_vars$Race )
EPIC_vars$Race<-if_else(Race.df$BDH010A=="White", "white", EPIC_vars$Race )
EPIC_vars$Race<-as.factor(EPIC_vars$Race)
EPIC_vars$Race_10<-if_else(EPIC_vars$Race=="white", "1", "0")
rm(Race.df)

Step 11b:: Let's also adjust our handedness column so that the values are mathmatically executeable. FSL will not recognize left vs right, but it will understand binary logic [1,0].

EPIC_vars$BDH003<-as.character(EPIC_vars$BDH003)
EPIC_vars$Handedness_10<- ifelse(EPIC_vars$BDH003=="Right", 1, EPIC_vars$BDH003)
EPIC_vars$Handedness_10<- ifelse(EPIC_vars$Handedness=="Left", 0, EPIC_vars$Handedness_10)
EPIC_vars$Handedness_10<- ifelse(EPIC_vars$Handedness=="Both", "Both", EPIC_vars$Handedness_10)

EPIC_vars$Smoker<-EPIC_vars$SBD003
EPIC_vars$Smoker_10<-if_else(EPIC_vars$SBD003=="Yes", "1", "0")
EPIC_vars$Smoker_10<-as.factor(EPIC_vars$Smoker_10)

DEMOS<-EPIC_vars %>% select(ID,Age,EDU,Race,Race_10, Handedness, Handedness_10,Smoker, Smoker_10, ncomorbidities,ASA_Physical_Status)

Step 11c: Import any other included datasets below. I find the dplyr to very helpful in finding groups of variables that I'm interested. See the following link, and list of cammands below. After all data is separated into respective dataframes, we can use the codebook package add-on to view any/all summary statistics.
Link: https://dplyr.tidyverse.org/

# Time 1 Fitness Data
T1<-EPIC_vars %>% select(ID, starts_with("T1"))
T1$ID<-as.character(T1$ID)

# Clean up a few things T1
T1$T1_Moderate_AVGhrs_perday<-as.numeric(T1$T1_Moderate_AVG)
T1$T1_Moderate_Total_hrs<-as.numeric(T1$T1_Moderate_Total)
T1$T1_ACC_weartime_hrs<-as.numeric(T1$T1_Total_Hrs)

# Weekly minutes of MVPA
T1$WeeklyMVPA_min <-T1$T1_Moderate_Total*60
# shapiro.test(T1$WeeklyMVPA_min) Not normally dist.

# EMOTIONAL/AFFECTIVE Data (One person with PTSD diagnosis)
EMO_data<- EPIC_vars %>%  select(ID, bdiav, emo_avg, PCL_ppd, PCL_tss)
#shapiro.test(EMO_data$bdiav)
#shapiro.test(EMO_data$emo_avg)
#shapiro.test(EMO_data$PCL_tss)
#table(EMO_data$PCL_ppd)

# Sleep Regulation Data
Sleep_data<- EPIC_vars %>%  select(ID, PGHglbsc,EPS_avg )

## Self-Reported Symptoms Scale
TRI_data<- EPIC_vars %>%  select(ID, starts_with("tri_") )
TRI_data<- TRI_data %>%  select(ID, ends_with("score") )
#table(complete.cases(TRI_data$tri_totscore))
#shapiro.test(TRI_data$tri_totscore)

# Pain & Fatigue Data
BPISF_FAT_data<-EPIC_vars  %>%  select(ID, FAT_avg,starts_with("BPISF"))
#shapiro.test(BPISF_FAT_data$FAT_avg) #0.02 p

# Cognitive Data
COG_data<-EPIC_vars %>%
  select(ID,verbalmemory,executivefunctionv1,
        psychomotorspeed, attention ,visualmemory   ,
        executivefunctionv2,visualworkingmemory,
        concentrationv2)

  # Normal Dist
  #shapiro.test(COG_data$verbalmemory)
  #shapiro.test(COG_data$executivefunctionv2)
  #shapiro.test(COG_data$visualworkingmemory)
  #shapiro.test(COG_data$attention)
  #shapiro.test(COG_data$executivefunctionv1)  #Nearly

  # Fails Normal Dist Testing
  #shapiro.test(COG_data$psychomotorspeed)
  #shapiro.test(COG_data$concentrationv2)
  #shapiro.test(COG_data$visualmemory)


write_csv(DEMOS, "DEMOS.csv")
write_csv(COG_data, "COG_data.csv")
write_csv(T1, "T1_FITNESS_MASTER.csv")
write_csv(EMO_data, "EMO_DATA.csv")

Step 12: Merge Full Sample (sources listed below):
1. CLEAN EPICC VO2/Fitness (n=96)
2. T1 MASTER Demographics (n=88)
3. T1 MASTER Fitness/PA data (n=88)
4. T1 MASTER Emotional/Affective data (n=88)
5. T1 MASTER Cognitive data (n=88)
6. T1 MASTER Pain and Fatague (n=88) 7. T1 MASTER Sleep Regulation (n=88) 9. T1 MASTER Self Reported Symptom Data (n=88)
10. gFEAT Path Data (n=34) 11. Brain Volume Data (n=27)

VO2_DEMOS<-left_join(VO2.df, DEMOS)
clean.VO2_DEMOS<-left_join(CLEAN.VO2, DEMOS)

VO2_DEMOS_T1<-left_join(VO2_DEMOS, T1, by="ID")
clean.VO2_DEMOS_T1<-left_join(clean.VO2_DEMOS, T1, by="ID")

VO2_DEMOS_T1_EMO<-left_join(VO2_DEMOS_T1, EMO_data, by="ID")
clean.VO2_DEMOS_T1_EMO<-left_join(clean.VO2_DEMOS_T1, EMO_data, by="ID")

VO2_DEMOS_T1_EMO_COG<-left_join(VO2_DEMOS_T1_EMO, COG_data, by="ID")
clean.VO2_DEMOS_T1_EMO_COG<-left_join(clean.VO2_DEMOS_T1_EMO, COG_data, by="ID")

VO2_DEMOS_T1_EMO_COG_PAIN<-left_join(VO2_DEMOS_T1_EMO_COG, BPISF_FAT_data, by="ID")
clean.VO2_DEMOS_T1_EMO_COG_PAIN<-left_join(clean.VO2_DEMOS_T1_EMO_COG, BPISF_FAT_data, by="ID")

VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP<-left_join(VO2_DEMOS_T1_EMO_COG_PAIN, Sleep_data, by="ID")
clean.VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP<-left_join(clean.VO2_DEMOS_T1_EMO_COG_PAIN, Sleep_data, by="ID")

VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_TRI<-left_join(VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP, TRI_data, by="ID")
clean.VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_TRI<-left_join(clean.VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP, TRI_data, by="ID")

VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_BRAIN<-left_join(VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_TRI, BRAIN, by="ID")
clean.VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_BRAIN<-left_join(clean.VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_TRI, BRAIN, by="ID")

EPICC_FULL<-VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_BRAIN
CLEAN_EPICC<-clean.VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_BRAIN
rm(list=setdiff(ls(), c("EPICC_FULL","CLEAN_EPICC")))

Step 12a: Full CLEAN Sample Variable Report: (n=91)

Data Frame Summary

FULL_SUMMARY

Dimensions: 91 x 17
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Graph

Valid

AGE [numeric]

Mean (sd) : 64.4 (5.8) min < med < max: 51 < 65 < 77 IQR (CV) : 7.5 (0.1)

24 distinct values

91 (100%)

BMI [numeric]

Mean (sd) : 31.3 (6.3) min < med < max: 20.8 < 29.9 < 51.6 IQR (CV) : 8.8 (0.2)

91 distinct values

91 (100%)

BMI_group [ordered, factor]

1. normal 2. overweight 3. obese 4. severe obese

11	(	12.1%	)
36	(	39.6%	)
33	(	36.3%	)
11	(	12.1%	)

91 (100%)

BetaBlocker [factor]

1. 0 2. 1

84	(	92.3%	)
7	(	7.7%	)

91 (100%)

CRFrel [numeric]

Mean (sd) : 16.8 (3.4) min < med < max: 7.3 < 16.6 < 26 IQR (CV) : 4 (0.2)

91 distinct values

91 (100%)

CRFabs [numeric]

Mean (sd) : 1.4 (0.3) min < med < max: 0.5 < 1.4 < 2.1 IQR (CV) : 0.4 (0.2)

91 distinct values

91 (100%)

weight_lbs [numeric]

Mean (sd) : 180.4 (36.4) min < med < max: 117.3 < 178.6 < 303.1 IQR (CV) : 49.9 (0.2)

88 distinct values

91 (100%)

EDU [numeric]

Mean (sd) : 15.5 (2.4) min < med < max: 12 < 16 < 23 IQR (CV) : 3.2 (0.2)

12	:	9	(	13.2%	)
13	:	6	(	8.8%	)
14	:	11	(	16.2%	)
15	:	5	(	7.3%	)
16	:	18	(	26.5%	)
17	:	2	(	2.9%	)
18	:	11	(	16.2%	)
19	:	2	(	2.9%	)
20	:	3	(	4.4%	)
23	:	1	(	1.5%	)

68 (74.73%)

Handedness [factor]

1. Right 2. Left 3. Both

54	(	79.4%	)
14	(	20.6%	)
0	(	0.0%	)

68 (74.73%)

WeeklyMVPA_min [numeric]

Mean (sd) : 291.1 (237.2) min < med < max: 25.8 < 192.6 < 915 IQR (CV) : 324.3 (0.8)

63 distinct values

66 (72.53%)

T1_Rest_Syst_BP [numeric]

Mean (sd) : 141.5 (15.7) min < med < max: 109 < 140 < 190 IQR (CV) : 20 (0.1)

38 distinct values

69 (75.82%)

T1_Rest_Diast_BP [numeric]

Mean (sd) : 83.3 (8.8) min < med < max: 57 < 85 < 99 IQR (CV) : 10 (0.1)

29 distinct values

69 (75.82%)

T1_Rest_HR [numeric]

Mean (sd) : 73 (10.7) min < med < max: 52 < 74 < 97 IQR (CV) : 13 (0.1)

34 distinct values

69 (75.82%)

emo_avg [numeric]

Mean (sd) : 1.6 (0.7) min < med < max: 1 < 1.2 < 3.5 IQR (CV) : 1 (0.4)

18 distinct values

63 (69.23%)

bdiav [numeric]

Mean (sd) : 0.3 (0.2) min < med < max: 0 < 0.2 < 1.3 IQR (CV) : 0.2 (1)

18 distinct values

68 (74.73%)

Left.Hippocampus [numeric]

Mean (sd) : 3843.4 (481.2) min < med < max: 2581.7 < 3878.2 < 4958.4 IQR (CV) : 401.7 (0.1)

27 distinct values

27 (29.67%)

meanFD [numeric]

Mean (sd) : 0.1 (0) min < med < max: 0 < 0.1 < 0.2 IQR (CV) : 0.1 (0.4)

33 distinct values

33 (36.26%)

Generated by summarytools 0.9.6 (R version 3.6.1)
2020-08-13

Step 12a: Full Sample Test for Normality
* Link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3124223/
* Hawkins test of normality and homoscedasticity:

# Continuous Variables:
#shapiro.test(CLEAN_EPICC$AGE)   # PASSES Shapiro-Wilk normality test 
#shapiro.test(EPICC_FULL$AGE)   # Fails Shapiro-Wilk normality test 

#shapiro.test(CLEAN_EPICC$CRFrel) # p=0.08
#shapiro.test(CLEAN_EPICC$CRFabs)  ## p=0.80

#shapiro.test(CLEAN_EPICC$weight_kg) # Fails Shapiro-Wilk normality test 
#shapiro.test(CLEAN_EPICC$BMI_kg) # Fails Shapiro-Wilk normality test
#shapiro.test(CLEAN_EPICC$VO2MAX_EST) # Fails Shapiro-Wilk normality test
#shapiro.test(CLEAN_EPICC$VO2MAX_EST) # Fails Shapiro-Wilk normality test

CHECK_NORM<-CLEAN_EPICC %>%
  select(AGE,BMI,CRFrel,BetaBlocker, EDU, Race, Handedness) 
CHECK_NORM[,(1:7)]<-lapply(CHECK_NORM[,(1:7)], as.numeric)
TestMCARNormality(CHECK_NORM)

## Call:
## TestMCARNormality(data = CHECK_NORM)
## 
## Number of Patterns:  2 
## 
## Total number of cases used in the analysis:  91 
## 
##  Pattern(s) used:
##           AGE   BMI   CRFrel   BetaBlocker   EDU   Race   Handedness
## group.1     1     1        1             1     1      1            1
## group.2     1     1        1             1    NA     NA           NA
##           Number of cases
## group.1                68
## group.2                23
## 
## 
##     Test of normality and Homoscedasticity:
##   -------------------------------------------
## 
## Hawkins Test:
## 
##     P-value for the Hawkins test of normality and homoscedasticity:  5.655921e-06 
## 
##     Either the test of multivariate normality or homoscedasticity (or both) is rejected.
##     Provided that normality can be assumed, the hypothesis of MCAR is 
##     rejected at 0.05 significance level. 
## 
## Non-Parametric Test:
## 
##     P-value for the non-parametric test of homoscedasticity:  0.3160273 
## 
##     Reject Normality at 0.05 significance level.
##     There is not sufficient evidence to reject MCAR at 0.05 significance level.

CLEAN EPICC Sample: Corrplot;
* Leaving out EDU, Handedness, & Race for now. * Complete cases n=63

Link: https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html

##  [1] 81 90 88 76 99 68 79 89 85 99 76 74 91 71 73 89 81 84 73 83 82 81 71 94 88
## [26] 91 85 89 98 69 84 86 86 84 88 80 85 86 91 66 84 90 81 59 57 89 92 85 88 89
## [51] 92 89 83 91 95 75 79 88 71 99 87 87 82 78 81 78 86 74 89 NA NA NA NA NA NA
## [76] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Select Constricted Sample Variables:

library(jtools)

SAMPLE<-EPICC_FULL %>%
  select(ID, Race, Handedness, BetaBlocker, AGE, EDU, CRFrel,CRFabs,
         BMI, WeeklyMVPA_min, bdiav, emo_avg, bdiav,
         T1_Rest_HR,T1_Rest_Syst_BP,T1_Rest_Diast_BP, T1_Peak_Resp_Rate.,T1_Heart_Rate,
         BPISF_meanpainseverity,FAT_avg,EPS_avg,PCL_tss, T1_Peak_RPE,
         EstimatedTotalIntraCranialVol, meanFD,  weight_kg, BMI_group, BPISF_meanpaininterference,
         Left.Hippocampus, Right.Hippocampus,TOT_HIPP_VOL, weight_lbs,
         Left.Amygdala, Right.Amygdala,TOT_Amygdala,
         Left.VentralDC,  Right.VentralDC,TOT_VentralDC,
         Left.Accumbens.area,Right.Accumbens.area,TOT_Accumbens ,Brain.Stem, 
         Right.Hippocampus,Left.Hippocampus,
         Left.Amygdala ,Right.Amygdala,TOT_Amygdala,
         Right.Putamen,Left.Putamen,TOT_Putamen,
         Right.Caudate, Left.Caudate, TOT_Caudate,
         Right.Accumbens.area, Left.Accumbens.area,
         Left.VentralDC, Right.VentralDC,TOT_VentralDC,
         Left.Pallidum,Right.Pallidum,TOT_Pallidum,
         Left.Thalamus.Proper,Right.Thalamus.Proper,TOT_Thalamus.Proper,
         Left.Cerebellum.Cortex, Right.Cerebellum.Cortex,
         Left.Cerebellum.White.Matter,
         TotalGrayVol, SubCortGrayVol, BrainSegVolNotVentSurf)
SAMPLE<-SAMPLE[complete.cases(SAMPLE),]
SAMPLE$ETZ<-scale(SAMPLE$EstimatedTotalIntraCranialVol, center=TRUE)
SAMPLE$ETZ<-as.numeric(SAMPLE$ETZ)
SAMPLE$CRFrelZ<-scale(SAMPLE$CRFrel, center=TRUE)
SAMPLE$CRFrelZ<-as.numeric(SAMPLE$CRFrelZ)
SAMPLE$AgeZ<-scale(SAMPLE$AGE, center=TRUE)
SAMPLE$AgeZ<-as.numeric(SAMPLE$AgeZ)
SAMPLE$edu_Z<-scale(SAMPLE$EDU, center=TRUE)
SAMPLE$edu_Z<-as.numeric(SAMPLE$edu_Z)
SAMPLE$TotalGrayVol_Z<-scale(SAMPLE$TotalGrayVol, center=TRUE)
SAMPLE$TotalGrayVol_Z<-as.numeric(SAMPLE$TotalGrayVol_Z)
SAMPLE$BMI_Z<-scale(SAMPLE$BMI, center=TRUE)
SAMPLE$BMI_Z<-as.numeric(SAMPLE$BMI_Z)
SAMPLE$weight_kgZ<-scale(SAMPLE$weight_kg, center=TRUE)
SAMPLE$weight_kgZ<-as.numeric(SAMPLE$weight_kgZ)
SAMPLE$RaceZ<-as.numeric(SAMPLE$Race, center=TRUE)
SAMPLE$RaceZ<-scale(SAMPLE$RaceZ, center = TRUE)
SAMPLE$RaceZ<-as.numeric(SAMPLE$RaceZ, center = TRUE)
SAMPLE$HandednessZ<-as.numeric(SAMPLE$Handedness, center=TRUE)
SAMPLE$HandednessZ<-scale(SAMPLE$HandednessZ, center = TRUE)
SAMPLE$HandednessZ<-as.numeric(SAMPLE$HandednessZ)

SAMPLE$BMI_group <- ordered(SAMPLE$BMI_group, levels = c("normal", "overweight", "obese", "severe obese"))




#SAMPLE<-SAMPLE[!(SAMPLE$Smoker=="Yes"),]
#table(SAMPLE$BMI_group, by=SAMPLE$Race)
#table(SAMPLE$Handedness, by=SAMPLE$Race)

Contricted Sample Normality & Model fit:
* Normally distributed variables are '#'d out. * Continuous vars failing norm test reported * https://cran.r-project.org/web/packages/jtools/vignettes/summ.html

# Continuous Variables: norm dist. vars are #'d out.
  #shapiro.test(SAMPLE$AGE) #Normally dist
  #shapiro.test(SAMPLE$CRFrel) #Normally dist
  #shapiro.test(SAMPLE$CRFabs)    #Normally dist
  #shapiro.test(SAMPLE$weight_kg) #Normally dist
  #shapiro.test(SAMPLE$BMI) #Normally dist
  #shapiro.test(SAMPLE$EstimatedTotalIntraCranialVol) #Normally dist
  #shapiro.test(SAMPLE$edu_Z) #Normally dist
  #shapiro.test(SAMPLE$TOT_HIPP_VOL) #Normally dist
  #shapiro.test(SAMPLE$FAT_avg) #p=0.08 #Normally dist

# NOT Normally distr
  #shapiro.test(SAMPLE$WeeklyMVPA_min) # Fails Shapiro-Wilk normality test 
  #shapiro.test(SAMPLE$BPISF_meanpainseverity) # Fails Shapiro-Wilk normality test 
  #shapiro.test(SAMPLE$emo_avg) # Fails Shapiro-Wilk normality test 
  #shapiro.test(SAMPLE$bdiav) # Fails Shapiro-Wilk normality test 
  #shapiro.test(SAMPLE$PCL_tss) # Fails Shapiro-Wilk normality test 
  #shapiro.test(SAMPLE$PCL_tss) # Fails Shapiro-Wilk normality test 
#cor.test(SAMPLE$CRFrelZ,SAMPLE$BMI_Z) # R=-0.4477988 
summ(lm(CRFrel ~ Race-1 + BMI_Z , data = SAMPLE))

Observations	27
Dependent variable	CRFrel
Type	OLS linear regression

F(3,24)	423.20
R²	0.98
Adj. R²	0.98

	Est.	S.E.	t val.	p
Raceblack	16.26	1.21	13.42	0.00
Racewhite	16.53	0.50	32.86	0.00
BMI_Z	-1.17	0.48	-2.46	0.02
Standard errors: OLS

summ(lm(CRFrel ~ Race-1 + AgeZ , data = SAMPLE))

Observations	27
Dependent variable	CRFrel
Type	OLS linear regression

F(3,24)	336.37
R²	0.98
Adj. R²	0.97

	Est.	S.E.	t val.	p
Raceblack	16.61	1.43	11.65	0.00
Racewhite	16.47	0.57	28.98	0.00
AgeZ	0.07	0.56	0.12	0.90
Standard errors: OLS

fit0<-(lm(CRFrel ~ BMI, data = SAMPLE))
fit1<-(lm(CRFrel ~ AGE, data = SAMPLE))
fit2<-(lm(CRFrel ~ BMI+AGE, data = SAMPLE))

summ(fit0)

Observations	27
Dependent variable	CRFrel
Type	OLS linear regression

F(1,25)	6.27
R²	0.20
Adj. R²	0.17

	Est.	S.E.	t val.	p
(Intercept)	22.09	2.28	9.69	0.00
BMI	-0.17	0.07	-2.50	0.02
Standard errors: OLS

summ(fit1)

Observations	27
Dependent variable	CRFrel
Type	OLS linear regression

F(1,25)	0.01
R²	0.00
Adj. R²	-0.04

	Est.	S.E.	t val.	p
(Intercept)	15.91	5.86	2.71	0.01
AGE	0.01	0.09	0.10	0.92
Standard errors: OLS

summ(fit2)

Observations	27
Dependent variable	CRFrel
Type	OLS linear regression

F(2,24)	3.91
R²	0.25
Adj. R²	0.18

	Est.	S.E.	t val.	p
(Intercept)	30.50	7.37	4.14	0.00
BMI	-0.22	0.08	-2.79	0.01
AGE	-0.11	0.09	-1.20	0.24
Standard errors: OLS

p1<-effect_plot(fit1, pred = AGE, interval = TRUE, plot.points = TRUE)

p2<-effect_plot(fit2, pred = BMI, interval = TRUE, plot.points = TRUE)
cowplot::plot_grid(p1,p2)

#lm.CRFrel <- lm(CRFrelZ ~ weight_kgZ+AgeZ, data = SAMPLE)
#summary(lm.CRFrel) #Non-SIG

Step 11: Constricted Sample Corrplot
Explore relative contributions of CRF and BMI on total hippocampus volume
* Link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4188138/

Run the app in a pop-up viewer

hipp.gress<-lm(EstimatedTotalIntraCranialVol~TOT_HIPP_VOL , data=SAMPLE)
SAMPLE$hipp.gress<-hipp.gress$fitted.values

putamen.gress<-lm(EstimatedTotalIntraCranialVol~TOT_Putamen , data=SAMPLE)
SAMPLE$putamen.gress<-putamen.gress$fitted.values

pallidum.gress<-lm(EstimatedTotalIntraCranialVol~TOT_Pallidum , data=SAMPLE)
SAMPLE$pallidum.gress<-pallidum.gress$fitted.values

vDC.gress<-lm(EstimatedTotalIntraCranialVol~TOT_VentralDC , data=SAMPLE)
SAMPLE$vDC.gress<-vDC.gress$fitted.values

BrainStem.gress<-lm(EstimatedTotalIntraCranialVol~Brain.Stem , data=SAMPLE)
SAMPLE$BrainStem.gress<-BrainStem.gress$fitted.values

SAMPLE$EDU<-as.numeric(SAMPLE$EDU)
Explore<-SAMPLE %>%
  select(ID, AgeZ,EDU, BMI_Z, CRFrelZ, CRFabs, T1_Peak_Resp_Rate.,PCL_tss,
         T1_Rest_Diast_BP, T1_Rest_Syst_BP, T1_Heart_Rate, T1_Rest_HR,T1_Peak_RPE,
         BrainStem.gress, vDC.gress, putamen.gress, hipp.gress, pallidum.gress,
         bdiav, emo_avg, FAT_avg, BPISF_meanpaininterference, BPISF_meanpainseverity,
         WeeklyMVPA_min, BMI_group)



ExPanD(Explore, cs_id = "SubID",  export_nb_option = TRUE )

#summary(aov(TOT_HIPP_VOL~ CRFrel+ ETZ,data = SAMPLE))  #NS
#lm.bmi<- lm(log(TOT_HIPP_VOL) ~ BMI +ETZ , data = SAMPLE)
#summary(lm.bmi)
#cor.test(lm.bmi$residuals,SAMPLE$CRFrel ) #NS
#cor.test(lm.bmi$residuals,SAMPLE$CRFabs ) #NS

#lm.CRFabs<- lm(log(TOT_HIPP_VOL) ~ CRFabs + ETZ , data = SAMPLE)
#summary(lm.CRFabs)
#cor.test(lm.CRFabs$residuals,SAMPLE$BMI ) #NS
#cor.test(lm.CRFabs$residuals,SAMPLE$weight_kg ) #NS
#summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI , data = SAMPLE))         # NS
#summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+CRFrel , data = SAMPLE))      # NS
#summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI+CRFrel , data = SAMPLE))  # NS
summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI*CRFrelZ , data = SAMPLE))

## 
## Call:
## lm(formula = TOT_HIPP_VOL ~ AgeZ + RaceZ + edu_Z + HandednessZ + 
##     ETZ + BMI * CRFrelZ, data = SAMPLE)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1511.22  -337.19   -31.95   466.49  1093.17 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6586.17    1171.06   5.624 2.46e-05 ***
## AgeZ         -328.60     212.80  -1.544   0.1399    
## RaceZ         157.92     224.26   0.704   0.4903    
## edu_Z        -216.87     182.95  -1.185   0.2513    
## HandednessZ    35.94     187.56   0.192   0.8502    
## ETZ           300.89     212.01   1.419   0.1729    
## BMI            43.01      35.99   1.195   0.2476    
## CRFrelZ     -1908.41     825.43  -2.312   0.0328 *  
## BMI:CRFrelZ    53.12      24.51   2.167   0.0439 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 797.5 on 18 degrees of freedom
## Multiple R-squared:  0.4786, Adjusted R-squared:  0.2468 
## F-statistic: 2.065 on 8 and 18 DF,  p-value: 0.09603

# BUT THIS DOESNT MAKE SENSE THE MAIN EFFECT OF CRFrel is neg and BMI is pos.
model<-(lm(TOT_HIPP_VOL ~ AGE+Race+EDU+Handedness+EstimatedTotalIntraCranialVol+BMI*CRFrel , data = SAMPLE))
visreg::visreg(model, "CRFrel", "BMI", overlay=TRUE)

visreg::visreg(model, "BMI", "CRFrel", overlay=TRUE)

## 
## Call:
## lm(formula = BrainStem_vol/EstimatedTotalIntraCranialVol ~ T1_Peak_Resp_Rate. * 
##     BMI + weight_kg, data = SAMPLE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.120302 -0.044017 -0.008144  0.040000  0.145906 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             1.6099986  0.3313249   4.859 7.41e-05 ***
## T1_Peak_Resp_Rate.     -0.0207602  0.0103877  -1.999   0.0582 .  
## BMI                    -0.0057896  0.0103940  -0.557   0.5831    
## weight_kg              -0.0069411  0.0030954  -2.242   0.0353 *  
## T1_Peak_Resp_Rate.:BMI  0.0008119  0.0003185   2.549   0.0183 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07726 on 22 degrees of freedom
## Multiple R-squared:  0.3669, Adjusted R-squared:  0.2518 
## F-statistic: 3.188 on 4 and 22 DF,  p-value: 0.03296

## 
## Call:
## lm(formula = TOT_VentralDC/EstimatedTotalIntraCranialVol ~ T1_Rest_HR * 
##     BMI, data = SAMPLE)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0007493 -0.0002381  0.0000369  0.0002473  0.0005135 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    -6.035e-03  3.331e-03  -1.812  0.08312 . 
## T1_Rest_HR      1.307e-04  4.522e-05   2.890  0.00826 **
## BMI             3.608e-04  1.043e-04   3.458  0.00214 **
## T1_Rest_HR:BMI -4.382e-06  1.401e-06  -3.128  0.00472 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.000354 on 23 degrees of freedom
## Multiple R-squared:  0.4819, Adjusted R-squared:  0.4143 
## F-statistic:  7.13 on 3 and 23 DF,  p-value: 0.001485

NEW: Chunk to loop through cluster FC results..

Step 12: Import significant ZStats and PSC reports from /ROIoutput dir.
STATS OUTPUT: # subject number, ROI, image, mean, median, max, coordinates (xyz) in standard space.

DATA=EPICC_FULL
ROI_DIR<-c("/Users/alyshagilmore/Downloads/restingstate_crew/ROIoutput/")

PSC<-list.files("/Users/alyshagilmore/Downloads/restingstate_crew/ROIoutput/", pattern = glob2rx("seed_pos*-psc.txt"))

Z<-list.files("/Users/alyshagilmore/Downloads/restingstate_crew/ROIoutput/", pattern = glob2rx("seed_pos*-zscr.txt"))

for (i in PSC) {
  setwd(ROI_DIR)
featquery<-read.delim(i , sep = " ", header = FALSE)
colnames(featquery)<-c("ID", "ROI_CLUSTER_NAME", "Type",  "L_R" , "POS_NEG","meanZ", "medianZ", "maxZ", "X", "Y", "Z")
featquery[,1:11]<-lapply(featquery[,1:11], as.character)
ZSTAT<-featquery$ROI_CLUSTER_NAME
ZSTAT<-strsplit(ZSTAT, "zstat")
mat  <- matrix(unlist(ZSTAT), ncol=2, byrow=TRUE)

ZSTAT<-mat[,2] 
POS_NEG<-if_else(ZSTAT=="2", "NEG", "POS")

featquery$L_R<-if_else(featquery$L_R=="left", "L", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="right", "R", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="0", "BILAT", featquery$L_R)
j<-i
ROI_CLUSTER_NAME<-strsplit(j, split = "-")
mat  <- matrix(unlist(ROI_CLUSTER_NAME), ncol=3, byrow=TRUE)
df   <- as.data.frame(mat)
ROI_CLUSTER_NAME<-df$V2

COL_NAME<-paste(featquery$L_R,ROI_CLUSTER_NAME, POS_NEG, "PBC", sep="_")
featquery<-featquery %>%
select(ID, meanZ )
names(featquery)<-c("ID", paste(COL_NAME[1]))
DATA <-left_join(featquery, DATA, by="ID" )
}

for (i in Z) {
  setwd(ROI_DIR)
featquery<-read.delim(i , sep = " ", header = FALSE)
colnames(featquery)<-c("ID", "ROI_CLUSTER_NAME", "Type",  "L_R" , "POS_NEG","meanZ", "medianZ", "maxZ", "X", "Y", "Z")
featquery[,1:11]<-lapply(featquery[,1:11], as.character)
ZSTAT<-featquery$ROI_CLUSTER_NAME
ZSTAT<-strsplit(ZSTAT, "zstat")
mat  <- matrix(unlist(ZSTAT), ncol=2, byrow=TRUE)

ZSTAT<-mat[,2] 
POS_NEG<-if_else(ZSTAT=="2", "NEG", "POS")

featquery$L_R<-if_else(featquery$L_R=="left", "L", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="right", "R", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="0", "BILAT", featquery$L_R)
j<-i
ROI_CLUSTER_NAME<-strsplit(j, split = "-")
mat  <- matrix(unlist(ROI_CLUSTER_NAME), ncol=3, byrow=TRUE)
df   <- as.data.frame(mat)
ROI_CLUSTER_NAME<-df$V2

COL_NAME<-paste(featquery$L_R,ROI_CLUSTER_NAME, POS_NEG, "Zstat", sep="_")
featquery<-featquery %>%
select(ID, meanZ )
names(featquery)<-c("ID", paste(COL_NAME[1]))
DATA <-left_join(featquery, DATA, by="ID" )
}

EXTRA Commands/Walk-Throughs: Pulled for organization:

Import SPSS & Data Dictionary Standard Import-FULL of SPSS database. * This is quick script will read any SPSS database and automatically create a labeled data dictionary. + install.packages("dataMeta")

Tutorial Here: https://cran.r-project.org/web/packages/dataMeta/vignettes/dataMeta_Vignette.html

EPIC_data<-read.spss("merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav")

variable.labels<-attr(EPIC_data, "variable.labels")

linker<-build_linker(EPIC_data, variable.labels, 1)

dict <- build_dict(EPIC_data,linker,option_description = NULL, prompt_varopts = FALSE)

data_desc = "AVAILABLE Baseline Variables -06/26/20 AG Orginal File: MASTER SPSS merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav" 

EPICC <- incorporate_attr(EPIC_data, dict, data_desc)

# Exporting dictionary only:
dict_only <- attributes(EPICC)$dictionary
write_csv(dict_only, "dict_only.csv")
# Saving as .rds (dataset with appended dictionary)
save_it(EPICC,"MASTER_BaselineEPICC_88")

Multiplot Grids: Visualize Sample Normality (echo=FALSE, but can be turned on.) * Link: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/

Step 11: Mean center any continuous variables:
Step 11a: First, we ensure all continuous variables for mean centering are in their numeric form. Step11b: Next, we use the scale function to mean center those variables.
Great links on mean centering here:

Why we mean center in neuroimaging analyses: http://mumford.fmripower.org/mean_centering/
How to mean center variables in R: https://www.gastonsanchez.com/visually-enforced/how-to/2014/01/15/Center-data-in-R/

COMPLETE_EPICC<-FITT_DEMOS_Outcomes_BRAIN

# Assign continuous variables classes
COMPLETE_EPICC$AGE<-as.numeric(COMPLETE_EPICC$AGE)
COMPLETE_EPICC$BMI_kg<-as.numeric(COMPLETE_EPICC$BMI_kg)
COMPLETE_EPICC$EDU<-as.numeric(COMPLETE_EPICC$EDU)
COMPLETE_EPICC$CRFabs<-as.numeric(COMPLETE_EPICC$CRFabs)
COMPLETE_EPICC$CRFrel<-as.numeric(COMPLETE_EPICC$CRFrel)
COMPLETE_EPICC$weight_mc<-as.numeric(COMPLETE_EPICC$weight_kg)

COMPLETE_EPICC$Age_mc<-scale(COMPLETE_EPICC$AGE, scale = FALSE)
COMPLETE_EPICC$BMI_mc<-scale(COMPLETE_EPICC$BMI_kg, scale = FALSE)
COMPLETE_EPICC$EDU_mc<-scale(COMPLETE_EPICC$EDU, scale = FALSE)
COMPLETE_EPICC$CRFabs_mc<-scale(COMPLETE_EPICC$CRFabs, scale = FALSE)
COMPLETE_EPICC$CRFrel_mc<-scale(COMPLETE_EPICC$CRFrel, scale = FALSE)
COMPLETE_EPICC$weight_mc<-scale(COMPLETE_EPICC$weight_kg, center = TRUE)

Import feat results: (old version strait from featquery output)

EPICC Neuroimaging Analyses I:

rsFC crew

7/05/2020

From your local terminal...

From the R console...

Data Frame Summary

FULL_SUMMARY

Run the app in a pop-up viewer

NEW: Chunk to loop through cluster FC results..

EXTRA Commands/Walk-Throughs: Pulled for organization: