Create Constricted Samples I:

Preface: You have to log into the Pitt network via Pulse Secure first, and then either use the terminal to access Zeus or a remote desktop program.

If you have not installed the these packages, do so now by copy/pasting the following commands to your R Console:
- install.packages("dplyr")
- install.packages("readxl")
- install.packages("tidyverse")
- install.packages("stringr")
- install.packages("foreign")
- install.packages("dataMeta")

library(MissMech)
library(dplyr)
library(DT)
library(foreign)
library(dataMeta)
library(readxl)
library(emmeans)
library(summarytools)
library(tidyverse)
library(stringr)
library(foreign)
library(corrplot)
library(kableExtra)
library(ggplot2)
library(cowplot)
knitr::opts_chunk$set(echo = TRUE)

From the Terminal...

Step 1: Using the terminal tab in RStudio (not Console tab), ssh into Zeuz by typing the following command (plus enter password):

Step 2: Still in the terminal, move to the the directory where your subject-level data is stored:

cd /Volumes/Disk1/EPICC/SubjectScans

Step 3: Generate a subject ID list using the following command in the terminal. Copy and paste the output into an excel sheet. (I haven't quite figured out how to [simply] assign variables from the terminal into R studio). Name the column SubID. Save the spreadsheet as 'Baseline_Subs_Brain.csv'. We'll continue to add columns to this below.

ls -d -1 7*_1

Step 4: Do the same thing we just did (using the terminal), but this time, list out the subject's left hippocampus .feat directories. Copy and paste the output into a new column in your excel sheet, labeled 'L_HIPP_path'

ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/lhip*.feat

Step 5a: Do the same thing we just did, but this time, list out the subject's RIGHT hippocampus .feat directories (or any other seeds that you wish): Copy and paste the output into a new column in your excel sheet, labeled 'R_HIPP_path'

ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/rhip*.feat

Step 5b: Continue to do the same thing for each seed path you would like to add to the subject list. *For each seed path, copy/paste the output into a new column in your excel sheet, labeled '_path'*

ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/tothippocampus_filtered*
ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/tothippocampus_*
ls -d -1 /Volumes/Disk1/EPICC/SubjectScans/*/rest/tothippocampus_with*

Step 6: Save as .csv and close 'Baseline_Subs_Brain.csv'

From the R Console...

Step 7: Concatinate ROI paths-
Step 7a: Import the .csv you just made.

BRAIN_SUBS<-read.csv("Baseline_Subs_Brain.csv") #Import the .csv you just made. 
BRAIN_SUBS$SubID<-as.character(BRAIN_SUBS$SubID) #read SubID as a character variable, not numeric.

Step 7b: First, we need to arrange our column rows so that each row includes the path which matches the SubID column.
* For example, Particpant #7063_1 does not have data processed for the right hippocampus. We can see this because the column for 'R_HIPP_path' does not allign propperly with the rest of the data.
+ To see this for yourself, use the command view(BRAIN_SUBS), and scroll down to SubID 7063_1. + You can see that because we do not have a path for that directory, the cell for that participant's 'R_HIPP_path', is actually the next particpant's (7066_1) 'R_HIPP_path'

To correct this, we'll use the following sorting command: lapply(BRAIN_SUBS[-1], function(x) x[match(df$MASTER, x)])
- The lapply() command, is a way to simply apply a set of commands, function(x), to a list of vectors.
- The first input in lapply() is the list of vectors you want to apply the command(s) to.
- In our case, that is all columns in our dataframe == BRAIN_SUBS[-1]
- The second input: function(x) x[match(df$MASTER, x)], allows you to set function(x) (to be applied to each vector).
- In this case, we'll the following combination of nested commands: x[match(BRAIN_SUBS$SubID, str_sub(x, 1, 6))], which allows us to replace the 6 leading character strings in the seed path ('7xxx_1') with that of our SubID coulmn, for that specific row.

BRAIN_SUBS[-1]  <-lapply(BRAIN_SUBS[-1], function(x) x[match(BRAIN_SUBS$SubID, str_sub(x, 1, 6))])
# Use the following command to check the output:
# view(BRAIN_SUBS)

Step 7c: Next, we use 'str_split()', to modify our SubID column, -so we can merge our dataframe (BRAIN_SUBS) with other databases (for instance, excel spreadsheets exported from RedCap) which do not use the '_1' nomenclature to separate baseline and follow-up data. I other words, we want to change..

7001_1 --> 7001
7002_1 --> 7002 ... (etc.)

# Here we use a few commands to split the subject ids that we have (e.g., 7001_1) into subject ids that can be merged with other databases (i.e., 7001)
x<-BRAIN_SUBS$SubID 
x<-as.character(x)
tmp<-strsplit(x, "_")   #strsplit: splits a charachter string in a fixed spot. in this case at '_'
mat  <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df$Session<-as.character(df$V2) #make var session
df<-df %>% select(ID, Session) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
BRAIN_SUBS<-cbind(df, BRAIN_SUBS) #bind/merge with orginal dataframe 
# Run the following command to look at our resulting dataframe (BRAIN_SUBS). Now, we have a column 'ID' , which can be merged with other datasets. 
# view(BRAIN_SUBS)

Step 7d: For the sake of organization, use the following commands to clean up your 'global enviroment' in R studio. This will remove all variables we've set thus far (seen in the upper right corner of R Studio), EXCEPT for the dataframe we've been working with (BRAIN_SUBS).

Step 7e: Import MeanFD text file and set-up for merging:

MeanFD<-read.delim2("meanFD.txt", header = F) #Not sure why, but R imports loads of duplicates in txt file. 
MeanFD<-MeanFD[!duplicated(MeanFD),] # To remove duplicates
MeanFD<-data.frame(MeanFD) #as dataframe with 34 rows and no duplicates.

#view(MeanFD)
#Step 12b:** Next, we use a few commands to split the subject ids that we have (e.g., 7001_1) into subject ids that can be merged with other databases (i.e., 7001). Same as Step 7c

x<-MeanFD$MeanFD
x<-as.character(x)
tmp<-strsplit(x, " ")   #strsplit: splits a charachter string in a fixed spot. in this case at' '[space]
mat  <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df$meanFD<-as.character(df$V2) #make var session
MeanFD<-df %>% select(ID, meanFD) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
rm(x, mat,tmp, df)

# Now again, but for '_1'
x<-MeanFD$ID 
x<-as.character(x)
tmp<-strsplit(x, "_")   #strsplit: splits a charachter string in a fixed spot. in this case at'_1'
mat  <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df<-df %>% select(ID) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
rm(mat,tmp)

MeanFD.df<-cbind(df, MeanFD) #bind/merge with orginal dataframe 
MeanFD.df<-MeanFD.df[,-c(2)] # Remove duplicate col

# Step 12c:** Mean center MeanFD values, just in case we'll need them later.  Same as Step 11b.
  MeanFD.df$meanFD<-as.numeric(MeanFD.df$meanFD)
  MeanFD.df$meanFD_mc<-scale(MeanFD.df$meanFD, scale = FALSE)

#Step 13: setup database
  # Assign ID variables as character class
  BRAIN_SUBS$ID<-as.character(BRAIN_SUBS$ID)
  MeanFD.df$ID<-as.character(MeanFD.df$ID)

  # join with current sublist
  BRAIN_SUBS<-left_join(BRAIN_SUBS, MeanFD.df)
  BRAIN_SUBS$meanFD<-as.numeric(BRAIN_SUBS$meanFD)
  rm(list=setdiff(ls(), "BRAIN_SUBS"))

Step 8: Import volumetric data (MASTER SPSS Database doesn't include ICV).
* n88 included in mastersheet
* n27 included in our FREESURF output (7001:7063)

#Our FREESURFER OUTPUTS n27: #ASEG[,c(1,13)] #7001_1 == 3878.0
ASEG<-read.delim("/Users/alyshagilmore/Downloads/restingstate_crew/FreeSurfer_Results_sub7001_to_7063/aseg_stats.txt")
x<-ASEG$Measure.volume
x<-as.character(x)
tmp<-strsplit(x, "_")   #strsplit: splits a charachter string in a fixed spot. in this case at' '[space]
mat  <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
ASEG$ID<-df$ID
Brain_vols<-ASEG %>% 
  select(ID, EstimatedTotalIntraCranialVol, CSF, MaskVol,
         BrainSegVol,BrainSegVolNotVentSurf,BrainSegVolNotVent, 
         SubCortGrayVol,
         TotalGrayVol ,CorticalWhiteMatterVol, Brain.Stem, 
         Right.Hippocampus,Left.Hippocampus,
         Left.Amygdala ,Right.Amygdala,
         Right.Putamen,Left.Putamen,
         Right.Caudate, Left.Caudate, 
         Right.Accumbens.area, Left.Accumbens.area,
         Left.VentralDC, Right.VentralDC,
         Left.Pallidum,Right.Pallidum,
         Left.Thalamus.Proper,Right.Thalamus.Proper,
         Left.Cerebellum.Cortex, Right.Cerebellum.Cortex,
         Left.Cerebellum.White.Matter,
         Right.vessel,Left.vessel,
         Left.choroid.plexus,Right.choroid.plexus)

## Merge it all together and write out (check-point)
BRAIN<-left_join(BRAIN_SUBS,Brain_vols , by="ID")
BRAIN$ID<-as.character(BRAIN$ID)
write.csv(BRAIN, "EPICC_rsFCcrew_BrainData.csv")
rm(list=setdiff(ls(), "BRAIN"))

Step 9: Import FULL Sample VO2 Database. n=102

VO2.df<-readxl::read_xlsx("EPICC Data.xlsx", sheet = "VO2 Data") # Import fitness database. 
VO2.df$Session<-as.character(VO2.df$`Pre / Post`)
VO2.df<-VO2.df[(VO2.df$Session=="PRE"),]
VO2.df<-VO2.df[,-c(3:4)]
VO2.df<-VO2.df[complete.cases(VO2.df$`Pre / Post`),]
VO2.df$ID<-as.character(VO2.df$`LAB ID`)

# Define variables as numeric values (because R sometimes likes to make them character strings).
VO2.df$AGE<-as.numeric(VO2.df$AGE)
VO2.df$VO2_Time<-as.numeric(VO2.df$`TEST TIME`)
VO2.df$VO2max_last2pts<-as.numeric(VO2.df$`PEAK VO2 LAST TWO PTS.`)
VO2.df$FE_O2<-as.numeric(VO2.df$`FEO2%`)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$FE_CO2<-as.numeric(VO2.df$`FECO2%`)
VO2.df$Age_max_HR_GOAL<-as.numeric(VO2.df$`85% MAX HR`)
VO2.df$VO2MAX_EST<-as.numeric(VO2.df$`ESTIMATED VO2 MAX (Extrapolated)`)
VO2.df$Age_Rank_VO2max<-as.numeric(VO2.df$`AGE PERCENTILE OF VO2 MAX`)
VO2.df$VO2_END_REASON_NOTES<-as.character(VO2.df$`TEST END REASONS`)
VO2.df$CRFrel<-as.numeric(VO2.df$`PEAK VO2/KG`)
VO2.df$CRFabs<-as.numeric(VO2.df$`PEAK VO2`)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$AGE<-as.numeric(VO2.df$AGE)
VO2.df$HR_DIFFERENCE<-as.numeric(VO2.df$`HR DIFFERENCE`)
VO2.df$PEAK_METS<-as.numeric(VO2.df$`PEAK METS`)
VO2.df$PEAK_HR<-as.numeric(VO2.df$`PEAK HR`)
VO2.df$VE_VCO2_SLOPE<-as.numeric(VO2.df$`VE / VCO2 SLOPE`)
VO2.df$PEAK_RER<-as.numeric(VO2.df$`PEAK RER`)
VO2.df$PEAK_RR<-as.numeric(VO2.df$`PEAK RR`)
VO2.df$weight_kg<-as.numeric(VO2.df$`WEIGHT (KG)`)
VO2.df$PEAK_CO2<-as.numeric(VO2.df$`PEAK CO2`)
VO2.df$PEAK_VE<-as.numeric(VO2.df$`PEAK VE`)
VO2.df$PEAK_VT<-as.numeric(VO2.df$`PEAK VT`)
VO2.df$FECO2<-as.numeric(VO2.df$`FECO2%`)
VO2.df$PEAK_RPE<-VO2.df$`PEAK RPE`
VO2.df$weight_lbs<-VO2.df$weight_kg*(2.205)
VO2.df$height_cm<-VO2.df$weight_kg/VO2.df$BMI
VO2.df$BMI_kg<-VO2.df$BMI
VO2.df$ID<-as.character(VO2.df$`LAB ID`)
VO2.df$VO2_END_factor<-as.numeric(VO2.df$`TEST END REASON?`)
VO2.df$VO2_END_REASON<-as.factor(VO2.df$`TEST END REASON?`)
VO2.df$AP_VO2MAX<-as.numeric(VO2.df$`AGE PERCENTILE OF VO2 MAX`)

VO2.df$VO2_END_REASON <- factor(VO2.df$VO2_END_REASON,
levels = c(1 ,2 ,3 , 4 ,5 , 6 ),
labels = c("Achieved HR Goal","Subject Quit", "Safety Concern", "Equipment Malfunction",
           "Beta-Blocked- Reached RPE Goal",
           "Other"))

Step 10: Define BMI groups (optional)

#Define Groups (Optional)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$BMI_group<-as.character(VO2.df$BMI)
VO2.df$BMI_group<-if_else(VO2.df$BMI < 25, "normal", VO2.df$BMI_group)
VO2.df$BMI_group<-if_else((VO2.df$BMI >=25 & VO2.df$BMI <=30), "overweight",  VO2.df$BMI_group)
VO2.df$BMI_group<-if_else((VO2.df$BMI > 30 & VO2.df$BMI < 40), "obese", VO2.df$BMI_group)
VO2.df$BMI_group<-if_else(VO2.df$BMI > 40, "severe obese", VO2.df$BMI_group)
VO2.df$BMI_group <-as.factor(VO2.df$BMI_group)

VO2.df$BMI_group <- ordered(VO2.df$BMI_group, levels = c("normal", "overweight", "obese", "severe obese"))
table(VO2.df$BMI_group)

## 
##       normal   overweight        obese severe obese 
##           15           38           36           13

Step 11: Make Beta-Blocker a factor, remove invalid VO2 subs, & make a new variable to include "valid" test end reasons.

VO2.df$BetaBlocker<-(grepl(VO2.df$VO2_END_REASON_NOTES, pattern = "beta", ignore.case = TRUE))
VO2.df$BetaBlocker<-if_else(VO2.df$BetaBlocker==TRUE, 1,0)

Step 11a: Let's see who was on a beta-blocker, but still achieved HR goal,

TEXT1<-c("Beta-Blocked- Achieved HR goal")
VO2_END_REASON_char<-as.character(VO2.df$VO2_END_REASON)

VO2.df$VO2_END_REASON<-if_else((VO2.df$BetaBlocker==1 & VO2.df$VO2_END_REASON=="Achieved HR Goal"), TEXT1, VO2_END_REASON_char)

VO2.df$VO2_END_REASON<-as.factor(VO2.df$VO2_END_REASON)
VO2.df$BetaBlocker<-as.factor(VO2.df$BetaBlocker)

table1<-VO2.df %>%
  select(`LAB ID`, BetaBlocker,VO2_END_REASON, `TEST END REASONS` )
  table1<-as_data_frame(table1)
  table<-arrange(table1,by=VO2_END_REASON)
table1<-datatable(table,options = list( pageLength = 5))

summary(table[,2:3])

##  BetaBlocker                        VO2_END_REASON
##  0:95        Achieved HR Goal              :90    
##  1: 7        Beta-Blocked- Achieved HR goal: 2    
##              Beta-Blocked- Reached RPE Goal: 5    
##              Other                         : 3    
##              Subject Quit                  : 2

Step 11b: Remove participants from the sample that ended their fittness test for any of the following reasons.
* Subject Quit
* Safety Concern
* Equipment Malfunction
* Other

This means we'll retain particpants who...  
    1. Achieved HR Goal  
    2. Beta-blocked & reached their goal RPE (15)    
    3. Beta-blocked, yet still achieved their HR goal.

CLEAN.VO2<-VO2.df
CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Other"),]  #n==99
CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Subject Quit"),]  #n==97
#CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Equipment Malfunction"),]  #n==97
#CLEAN.VO2<-CLEAN.VO2[!(CLEAN.VO2$VO2_END_REASON=="Safety Concern"),]  #n==97

write.csv(VO2.df, "EPICC_BS_VO2_102.csv")
write.csv(VO2.df, "clean_EPICC_BS_VO2_97.csv")

model<-(lm(CRFrel ~ AGE+ BMI_group+BetaBlocker, data=CLEAN.VO2))
#summary(lm(CRFrel ~AGE+ VO2_END_REASON+BMI, data=CLEAN.VO2))
pairs(emmeans::emmeans(model, "BMI_group"))

##  contrast                  estimate    SE df t.ratio p.value
##  normal - overweight          0.996 0.913 91 1.091   0.6959 
##  normal - obese               2.642 0.924 91 2.861   0.0265 
##  normal - severe obese        5.914 1.122 91 5.271   <.0001 
##  overweight - obese           1.646 0.659 91 2.498   0.0669 
##  overweight - severe obese    4.917 0.933 91 5.273   <.0001 
##  obese - severe obese         3.272 0.942 91 3.474   0.0043 
## 
## Results are averaged over the levels of: BetaBlocker 
## P value adjustment: tukey method for comparing a family of 4 estimates

pairs(emmeans::emmeans(model, "BetaBlocker"))

##  contrast estimate   SE df t.ratio p.value
##  0 - 1       0.682 1.11 91 0.615   0.5404 
## 
## Results are averaged over the levels of: BMI_group

#shapiro.test(VO2.df$CRFabs)
#shapiro.test(VO2.df$PEAK_METS) 
#shapiro.test(VO2.df$PEAK_RPE)
#shapiro.test(VO2.df$height_cm) 
#shapiro.test(VO2.df$PEAK_RER)
#shapiro.test(VO2.df$VO2_Time)

#shapiro.test(VO2.df$VO2MAX_EST) #NOT NORM DIST
#shapiro.test(VO2.df$HR_DIFFERENCE) #NOT NORM DIST
#shapiro.test(VO2.df$BMI) #NOT NORM DIST
#shapiro.test(VO2.df$AGE) #NOT NORM DIST
#shapiro.test(VO2.df$AP_VO2MAX)#NOT NORM DIST

Step 9 IMPORT SPSS DATABASE: Import SPSS database & use 'Codebook' package's Static Label Browser to poke around. Set 'eval=FALSE' to save knit time.

The links:
* Documentation: https://rubenarslan.github.io/codebook
* Example: https://rubenarslan.github.io/codebook/articles/codebook.html

The command: codebook::label_browser_static(), you should see a searchable data dictionary in your R viewer pane.

codebook_data <- rio::import("merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav")
codebook_data<-codebook_data %>% select(starts_with("POF"))
codebook::label_browser_static(codebook_data)

Below I'll select the following:
1. Demographics: Age, EDU, Rage, Smoking Status, Handedness.
2. I also included a few composite scores from the cognitive variables tab.
3. I included potential a few additional potential covariates.

Step X: BDH010A:HX represents binary logic (1=yes, 2=no) for catigorical racial status (White, Black, Asian, Other). It is likely that these questions were presented as multiple-entry check-boxes, and thus, they're imported as separate variables. Below, we use if_else() commands to label White and Black within their respective columns, and then again, to concatenate into one 'Race' column.

Step X: Let's also adjust our handedness column so that the values are mathmatically executeable. FSL will not recognize left vs right, but it will understand binary logic [1,0].

Step X: Import any other included datasets below. I find the dplyr to very helpful in finding groups of variables that I'm interested. See the following link, and list of cammands below. After all data is separated into respective dataframes, we can use the codebook package add-on to view any/all summary statistics.

Merged Full Sample Report (sources listed below):

BACH VO2/Fitness (n=102)
T1 MASTER Demographics (n=88)
T1 MASTER Fitness/PA data (n=88)
T1 MASTER Emotional/Affective data (n=88)
T1 MASTER Cognitive data (n=88)
T1 MASTER Pain and Fatague (n=88)
T1 MASTER Sleep Regulation (n=88)
T1 MASTER Self Reported Symptom Data (n=88)
gFEAT Path Data
Brain Volume Data (n=27)

VO2_DEMOS<-left_join(VO2.df, DEMOS)
VO2_DEMOS_T1<-left_join(VO2_DEMOS, T1, by="ID")
VO2_DEMOS_T1_EMO<-left_join(VO2_DEMOS_T1, EMO_data, by="ID")
VO2_DEMOS_T1_EMO_COG<-left_join(VO2_DEMOS_T1_EMO, COG_data, by="ID")
VO2_DEMOS_T1_EMO_COG_PAIN<-left_join(VO2_DEMOS_T1_EMO_COG, BPISF_FAT_data, by="ID")
VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP<-left_join(VO2_DEMOS_T1_EMO_COG_PAIN, Sleep_data, by="ID")
VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_TRI<-left_join(VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP, TRI_data, by="ID")
VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_BRAIN<-left_join(VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_TRI, BRAIN, by="ID")
EPICC_FULL<-VO2_DEMOS_T1_EMO_COG_PAIN_SLEEP_BRAIN

Full Sample: Summary Demographics Report:

FULL_SUMMARY<-EPICC_FULL %>%
  select(ID, Age, BMI, BMI_group, Race, Smoker, BetaBlocker,
         WeeklyMVPA_min, emo_avg, CRFrel, CRFabs, bdiav,
         T1_Rest_Syst_BP, T1_Rest_Diast_BP,T1_Rest_HR,
         Left.Hippocampus, meanFD)
FULL_SUMMARY<-as.tibble(FULL_SUMMARY)
print(dfSummary(FULL_SUMMARY[,2:17], plain.ascii = TRUE, display.labels = FALSE,
                justify="c", graph.magnif = .60, valid.col = TRUE, 
                labels.col = FALSE, style = "grid", 
                varnumbers = FALSE, tmp.img.dir = "/tmp"), 
      method = 'render' )

Data Frame Summary

FULL_SUMMARY

Dimensions: 102 x 16
Duplicates: 0

Variable

Stats / Values

Freqs (% of Valid)

Graph

Valid

Missing

Age [numeric]

Mean (sd) : 63.8 (5.3) min < med < max: 51 < 64 < 77 IQR (CV) : 6.5 (0.1)

23 distinct values

72 (70.59%)

30 (29.41%)

BMI [numeric]

Mean (sd) : 31.1 (6.5) min < med < max: 20.1 < 29.8 < 51.6 IQR (CV) : 8.8 (0.2)

101 distinct values

102 (100%)

0 (0%)

BMI_group [ordered, factor]

1. normal 2. overweight 3. obese 4. severe obese

15	(	14.7%	)
38	(	37.2%	)
36	(	35.3%	)
13	(	12.8%	)

102 (100%)

0 (0%)

Race [factor]

1. black 2. white

6	(	8.3%	)
66	(	91.7%	)

72 (70.59%)

30 (29.41%)

Smoker [factor]

1. Yes 2. No

1	(	1.1%	)
87	(	98.9%	)

88 (86.27%)

14 (13.73%)

BetaBlocker [factor]

1. 0 2. 1

95	(	93.1%	)
7	(	6.9%	)

102 (100%)

0 (0%)

WeeklyMVPA_min [numeric]

Mean (sd) : 284.9 (230) min < med < max: 25.8 < 195 < 915 IQR (CV) : 301.5 (0.8)

67 distinct values

71 (69.61%)

31 (30.39%)

emo_avg [numeric]

Mean (sd) : 1.5 (0.7) min < med < max: 1 < 1.2 < 3.5 IQR (CV) : 0.9 (0.4)

18 distinct values

67 (65.69%)

35 (34.31%)

CRFrel [numeric]

Mean (sd) : 17.1 (3.8) min < med < max: 7.3 < 16.7 < 27.8 IQR (CV) : 4.3 (0.2)

102 distinct values

102 (100%)

0 (0%)

CRFabs [numeric]

Mean (sd) : 1.4 (0.3) min < med < max: 0.5 < 1.4 < 2.1 IQR (CV) : 0.4 (0.2)

102 distinct values

102 (100%)

0 (0%)

bdiav [numeric]

Mean (sd) : 0.3 (0.2) min < med < max: 0 < 0.2 < 1.3 IQR (CV) : 0.2 (0.9)

18 distinct values

72 (70.59%)

30 (29.41%)

T1_Rest_Syst_BP [numeric]

Mean (sd) : 141.7 (16.4) min < med < max: 109 < 140.5 < 190 IQR (CV) : 22.2 (0.1)

41 distinct values

74 (72.55%)

28 (27.45%)

T1_Rest_Diast_BP [numeric]

Mean (sd) : 83.6 (9.1) min < med < max: 57 < 85 < 99 IQR (CV) : 10 (0.1)

32 distinct values

74 (72.55%)

28 (27.45%)

T1_Rest_HR [numeric]

Mean (sd) : 72.6 (10.5) min < med < max: 52 < 73 < 97 IQR (CV) : 13.8 (0.1)

35 distinct values

74 (72.55%)

28 (27.45%)

Left.Hippocampus [numeric]

Mean (sd) : 3843.4 (481.2) min < med < max: 2581.7 < 3878.2 < 4958.4 IQR (CV) : 401.7 (0.1)

27 distinct values

27 (26.47%)

75 (73.53%)

meanFD [numeric]

Mean (sd) : 0.1 (0) min < med < max: 0 < 0.1 < 0.2 IQR (CV) : 0.1 (0.4)

34 distinct values

34 (33.33%)

68 (66.67%)

Generated by summarytools 0.9.6 (R version 3.6.1)
2020-07-04

Full Sample: Test for Normality
* Link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3124223/
* Hawkins test of normality and homoscedasticity:

# Continuous Variables:
#shapiro.test(CLEAN_EPICC$AGE)   # PASSES Shapiro-Wilk normality test 
#shapiro.test(EPICC_FULL$AGE)   # Fails Shapiro-Wilk normality test 

#shapiro.test(EPICC_FULL$CRFrel)
#shapiro.test(EPICC_FULL$CRFabs)   
#shapiro.test(EPICC_FULL$weight_kg) # Fails Shapiro-Wilk normality test 
#shapiro.test(EPICC_FULL$BMI_kg) # Fails Shapiro-Wilk normality test
#shapiro.test(EPICC_FULL$VO2MAX_EST) # Fails Shapiro-Wilk normality test
#shapiro.test(EPICC_FULL$VO2MAX_EST) # Fails Shapiro-Wilk normality test

CHECK_NORM<-EPICC_FULL %>%
  select(Age_max_HR_GOAL,BMI,CRFrel,BetaBlocker, Smoker, EDU, Race, Handedness) 
CHECK_NORM[,(1:8)]<-lapply(CHECK_NORM[,(1:8)], as.numeric)
TestMCARNormality(CHECK_NORM)

## Call:
## TestMCARNormality(data = CHECK_NORM)
## 
## Number of Patterns:  3 
## 
## Total number of cases used in the analysis:  102 
## 
##  Pattern(s) used:
##           Age_max_HR_GOAL   BMI   CRFrel   BetaBlocker   Smoker   EDU   Race
## group.1                 1     1        1             1        1     1      1
## group.2                 1     1        1             1        1    NA     NA
## group.3                 1     1        1             1       NA    NA     NA
##           Handedness   Number of cases
## group.1            1                72
## group.2           NA                16
## group.3           NA                14
## 
## 
##     Test of normality and Homoscedasticity:
##   -------------------------------------------
## 
## Hawkins Test:
## 
##     P-value for the Hawkins test of normality and homoscedasticity:  1.109013e-26 
## 
##     Either the test of multivariate normality or homoscedasticity (or both) is rejected.
##     Provided that normality can be assumed, the hypothesis of MCAR is 
##     rejected at 0.05 significance level. 
## 
## Non-Parametric Test:
## 
##     P-value for the non-parametric test of homoscedasticity:  0.3686507 
## 
##     Reject Normality at 0.05 significance level.
##     There is not sufficient evidence to reject MCAR at 0.05 significance level.

Full Sample Complete Cases (n=67): Corrplot

EPIC_COR<-EPICC_FULL %>%
 select( ID,Age, Race,BetaBlocker ,Smoker, emo_avg, EDU, T1_Weight_Kg, T1_BMI,T1_Rest_HR,
         WeeklyMVPA_min,T1_Peak_VO2_ml_kg_min,bdiav,T1_Peak_VO2_Max_Value,
         T1_Peak_VO2_ml_kg_min,T1_Rest_Syst_BP,T1_Rest_Diast_BP)
EPIC_COR<-EPIC_COR[complete.cases(EPIC_COR),]
EPIC_COR[,2:15]<-lapply(EPIC_COR[,2:15],as.numeric)
M <- cor(EPIC_COR[,2:15])
p.mat <- cor.mtest(EPIC_COR[,2:13])$p
res1 <- cor.mtest(EPIC_COR[,2:13], conf.level = .95)

col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(M, method = "color", col = col(200),
         type = "full",  number.cex = .7,
         addCoef.col = "black", # Add coefficient of correlation
         tl.col = "black", tl.srt = 45, # Text label color and rotation
         # Combine with significance
         p.mat = p.mat, sig.level = 0.05, insig = "blank", 
         # hide correlation coefficient on the principal diagonal
         diag = FALSE)

You may choose any variabes to report here

EPICC_FULL$TOT_HIPP_VOL<-EPICC_FULL$Left.Hippocampus+EPICC_FULL$Right.Hippocampus
EPICC_FULL$TOT_VentralDC<-EPICC_FULL$Left.VentralDC+EPICC_FULL$Right.VentralDC
EPICC_FULL$TOT_Amygdala<-EPICC_FULL$Left.Amygdala+EPICC_FULL$Left.Amygdala
EPICC_FULL$TOT_Accumbens<-EPICC_FULL$Left.Accumbens.area+EPICC_FULL$Right.Accumbens.area
EPICC_FULL$TOT_Thalamus.Proper<-EPICC_FULL$Left.Thalamus.Proper+EPICC_FULL$Right.Thalamus.Proper
EPICC_FULL$TOT_Pallidum<-EPICC_FULL$Left.Pallidum+EPICC_FULL$Right.Pallidum
EPICC_FULL$TOT_Caudate<-EPICC_FULL$Left.Caudate+EPICC_FULL$Right.Caudate
EPICC_FULL$TOT_Putamen<-EPICC_FULL$Left.Putamen+EPICC_FULL$Right.Putamen

SAMPLE<-EPICC_FULL %>%
  select(ID,  Race, Handedness, BetaBlocker, Age, EDU, CRFrel,CRFabs,
         BMI, WeeklyMVPA_min, bdiav, emo_avg, bdiav,
         T1_Rest_HR,T1_Rest_Syst_BP,T1_Rest_Diast_BP, T1_Peak_Resp_Rate.,T1_Heart_Rate,
         BPISF_meanpainseverity,FAT_avg,EPS_avg,PCL_tss,
         EstimatedTotalIntraCranialVol, meanFD,  weight_kg, BMI_group,
         Left.Hippocampus, Right.Hippocampus,TOT_HIPP_VOL,
         Left.Amygdala, Right.Amygdala,TOT_Amygdala,
         Left.VentralDC,  Right.VentralDC,TOT_VentralDC,
         Left.Accumbens.area,Right.Accumbens.area,TOT_Accumbens ,Brain.Stem, 
         Right.Hippocampus,Left.Hippocampus,
         Left.Amygdala ,Right.Amygdala,TOT_Amygdala,
         Right.Putamen,Left.Putamen,TOT_Putamen,
         Right.Caudate, Left.Caudate, TOT_Caudate,
         Right.Accumbens.area, Left.Accumbens.area,
         Left.VentralDC, Right.VentralDC,TOT_VentralDC,
         Left.Pallidum,Right.Pallidum,TOT_Pallidum,
         Left.Thalamus.Proper,Right.Thalamus.Proper,TOT_Thalamus.Proper,
         Left.Cerebellum.Cortex, Right.Cerebellum.Cortex,
         Left.Cerebellum.White.Matter,
         TotalGrayVol, SubCortGrayVol, BrainSegVolNotVentSurf)
SAMPLE<-SAMPLE[complete.cases(SAMPLE),]
SAMPLE$ETZ<-scale(SAMPLE$EstimatedTotalIntraCranialVol, center=TRUE)
SAMPLE$ETZ<-as.numeric(SAMPLE$ETZ)
SAMPLE$CRFrelZ<-scale(SAMPLE$CRFrel, center=TRUE)
SAMPLE$CRFrelZ<-as.numeric(SAMPLE$CRFrelZ)
SAMPLE$AgeZ<-scale(SAMPLE$Age, center=TRUE)
SAMPLE$AgeZ<-as.numeric(SAMPLE$AgeZ)
SAMPLE$edu_Z<-scale(SAMPLE$EDU, center=TRUE)
SAMPLE$edu_Z<-as.numeric(SAMPLE$edu_Z)
SAMPLE$TotalGrayVol_Z<-scale(SAMPLE$TotalGrayVol, center=TRUE)
SAMPLE$TotalGrayVol_Z<-as.numeric(SAMPLE$TotalGrayVol_Z)
SAMPLE$BMI_Z<-scale(SAMPLE$BMI, center=TRUE)
SAMPLE$BMI_Z<-as.numeric(SAMPLE$BMI_Z)
SAMPLE$weight_kgZ<-scale(SAMPLE$weight_kg, center=TRUE)
SAMPLE$weight_kgZ<-as.numeric(SAMPLE$weight_kgZ)
SAMPLE$RaceZ<-as.numeric(SAMPLE$Race, center=TRUE)
SAMPLE$RaceZ<-scale(SAMPLE$RaceZ, center = TRUE)
SAMPLE$RaceZ<-as.numeric(SAMPLE$RaceZ, center = TRUE)
SAMPLE$HandednessZ<-as.numeric(SAMPLE$Handedness, center=TRUE)
SAMPLE$HandednessZ<-scale(SAMPLE$HandednessZ, center = TRUE)
SAMPLE$HandednessZ<-as.numeric(SAMPLE$HandednessZ)

SAMPLE$BMI_group <- ordered(SAMPLE$BMI_group, levels = c("normal", "overweight", "obese", "severe obese"))

#SAMPLE<-SAMPLE[!(SAMPLE$Smoker=="Yes"),]
#table(SAMPLE$BMI_group, by=SAMPLE$Race)
#table(SAMPLE$Handedness, by=SAMPLE$Race)

Contricted Sample Normality & Model fit:

require("emmeans")
#Continuous Variables: norm dist. vars are #'d out.
#shapiro.test(SAMPLE$Age)
#shapiro.test(SAMPLE$CRFrel)
#shapiro.test(SAMPLE$CRFabs)   
#shapiro.test(SAMPLE$weight_kg)
#shapiro.test(SAMPLE$BMI)
#shapiro.test(SAMPLE$EstimatedTotalIntraCranialVol) #Normally dist
#shapiro.test(SAMPLE$edu_Z) #  
#shapiro.test(SAMPLE$TOT_HIPP_VOL) #Normally dist
shapiro.test(SAMPLE$WeeklyMVPA_min) # Fails Shapiro-Wilk normality test

## 
##  Shapiro-Wilk normality test
## 
## data:  SAMPLE$WeeklyMVPA_min
## W = 0.77336, p-value = 4.893e-05

shapiro.test(SAMPLE$BPISF_meanpainseverity) # Fails Shapiro-Wilk normality test

## 
##  Shapiro-Wilk normality test
## 
## data:  SAMPLE$BPISF_meanpainseverity
## W = 0.82139, p-value = 0.0003289

shapiro.test(SAMPLE$emo_avg) # Fails Shapiro-Wilk normality test

## 
##  Shapiro-Wilk normality test
## 
## data:  SAMPLE$emo_avg
## W = 0.77311, p-value = 4.848e-05

shapiro.test(SAMPLE$bdiav) # Fails Shapiro-Wilk normality test

## 
##  Shapiro-Wilk normality test
## 
## data:  SAMPLE$bdiav
## W = 0.88627, p-value = 0.006579

shapiro.test(SAMPLE$PCL_tss) # Fails Shapiro-Wilk normality test

## 
##  Shapiro-Wilk normality test
## 
## data:  SAMPLE$PCL_tss
## W = 0.66441, p-value = 1.287e-06

shapiro.test(SAMPLE$PCL_tss) # Fails Shapiro-Wilk normality test

## 
##  Shapiro-Wilk normality test
## 
## data:  SAMPLE$PCL_tss
## W = 0.66441, p-value = 1.287e-06

#shapiro.test(SAMPLE$FAT_avg) # Fails Shapiro-Wilk normality test 

cor.test(SAMPLE$CRFrelZ,SAMPLE$BMI_Z ) # R=-0.4477988

## 
##  Pearson's product-moment correlation
## 
## data:  SAMPLE$CRFrelZ and SAMPLE$BMI_Z
## t = -2.5041, df = 25, p-value = 0.01917
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.70742958 -0.08168513
## sample estimates:
##        cor 
## -0.4477988

lm.CRFrel <- lm(CRFrel ~ AgeZ + RaceZ + HandednessZ+ edu_Z + BMI , data = SAMPLE)
summary(lm.CRFrel)

## 
## Call:
## lm(formula = CRFrel ~ AgeZ + RaceZ + HandednessZ + edu_Z + BMI, 
##     data = SAMPLE)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4457 -2.0000 -0.1877  1.3017  4.3408 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 23.72532    3.03036   7.829 1.16e-07 ***
## AgeZ        -0.80889    0.61639  -1.312   0.2036    
## RaceZ        0.52044    0.53968   0.964   0.3458    
## HandednessZ -0.49153    0.50900  -0.966   0.3452    
## edu_Z        0.04416    0.51895   0.085   0.9330    
## BMI         -0.22284    0.09226  -2.415   0.0249 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.409 on 21 degrees of freedom
## Multiple R-squared:  0.3027, Adjusted R-squared:  0.1367 
## F-statistic: 1.823 on 5 and 21 DF,  p-value: 0.1518

Step 11: Constricted Sample Corrplot
Explore relative contributions of CRF and BMI on total hippocampus volume
* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4188138/

Res1<-lm(ETZ~TOT_HIPP_VOL, data=SAMPLE)
SAMPLE$tot_hipp<-Res1$fitted.values

Res2<-lm(ETZ~TOT_Amygdala, data=SAMPLE)
SAMPLE$tot_amy<-Res2$fitted.values

Res3<-lm(ETZ~TOT_VentralDC, data=SAMPLE)
SAMPLE$tot_ventalDC<-Res3$fitted.values

Res4<-lm(ETZ~TOT_Accumbens, data=SAMPLE)
SAMPLE$tot_accumbens<-Res4$fitted.values

Res5<-lm(ETZ~Brain.Stem, data=SAMPLE)
SAMPLE$BrainStem_vol<-Res5$fitted.values

Res6<-lm(ETZ~TOT_Pallidum, data=SAMPLE)
SAMPLE$tot_Pallidum<-Res6$fitted.values

Res7<-lm(ETZ~TOT_Amygdala, data=SAMPLE)
SAMPLE$tot_Amyg<-Res7$fitted.values

Res8<-lm(ETZ~TOT_Thalamus.Proper, data=SAMPLE)
SAMPLE$tot_Thalamus<-Res8$fitted.values

CORR<-SAMPLE %>%
  select(ID, Age, BMI, EDU,meanFD, CRFrel,
         tot_hipp, tot_ventalDC,tot_accumbens,BrainStem_vol,tot_Pallidum,tot_Amyg,tot_Thalamus,
         T1_Rest_HR, T1_Rest_Syst_BP,T1_Rest_Diast_BP, 
         WeeklyMVPA_min,bdiav, emo_avg, PCL_tss, FAT_avg,BPISF_meanpainseverity )

M <- cor(CORR[,2:22])
p.mat <- cor.mtest(CORR[,2:22])$p
res1 <- cor.mtest(CORR[,2:22], conf.level = .95)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
 corrplot(M, method = "color", 
         col = col(200),
         type = "full",  number.cex = .5,
         addCoef.col = "black", # Add coefficient of correlation
         tl.col = "black", tl.srt = 45,tl.cex = .7, # Text label color and rotation
         # Combine with significance
         p.mat = p.mat, sig.level = 0.05, insig = "blank", 
         # hide correlation coefficient on the principal diagonal
         diag = FALSE)

#summary(aov(TOT_HIPP_VOL~ CRFrel+ ETZ,data = SAMPLE))  #NS
summary(aov(TOT_HIPP_VOL~ CRFabs+ ETZ,data = SAMPLE))

##             Df   Sum Sq Mean Sq F value Pr(>F)  
## CRFabs       1  2557586 2557586   3.282 0.0826 .
## ETZ          1   694685  694685   0.891 0.3545  
## Residuals   24 18704239  779343                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(aov(TOT_HIPP_VOL~ weight_kg+ ETZ,data = SAMPLE))

##             Df   Sum Sq Mean Sq F value  Pr(>F)   
## weight_kg    1  5202339 5202339   8.031 0.00918 **
## ETZ          1  1207046 1207046   1.863 0.18490   
## Residuals   24 15547125  647797                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(aov(TOT_HIPP_VOL~ BMI+ ETZ,data = SAMPLE))

##             Df   Sum Sq Mean Sq F value Pr(>F)  
## BMI          1  3580228 3580228   5.175 0.0321 *
## ETZ          1  1773810 1773810   2.564 0.1224  
## Residuals   24 16602471  691770                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#lm.bmi<- lm(log(TOT_HIPP_VOL) ~ BMI +ETZ , data = SAMPLE)
#summary(lm.bmi)
#cor.test(lm.bmi$residuals,SAMPLE$CRFrel ) #NS
#cor.test(lm.bmi$residuals,SAMPLE$CRFabs ) #NS

#lm.CRFabs<- lm(log(TOT_HIPP_VOL) ~ CRFabs + ETZ , data = SAMPLE)
#summary(lm.CRFabs)
#cor.test(lm.CRFabs$residuals,SAMPLE$BMI ) #NS
#cor.test(lm.CRFabs$residuals,SAMPLE$weight_kg ) #NS
#summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI , data = SAMPLE))         # NS
#summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+CRFrel , data = SAMPLE))      # NS
#summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI+CRFrel , data = SAMPLE))  # NS
summary(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI*CRFrelZ , data = SAMPLE))

## 
## Call:
## lm(formula = TOT_HIPP_VOL ~ AgeZ + RaceZ + edu_Z + HandednessZ + 
##     ETZ + BMI * CRFrelZ, data = SAMPLE)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1511.22  -337.19   -31.95   466.49  1093.17 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6586.17    1171.06   5.624 2.46e-05 ***
## AgeZ         -328.60     212.80  -1.544   0.1399    
## RaceZ         157.92     224.26   0.704   0.4903    
## edu_Z        -216.87     182.95  -1.185   0.2513    
## HandednessZ    35.94     187.56   0.192   0.8502    
## ETZ           300.89     212.01   1.419   0.1729    
## BMI            43.01      35.99   1.195   0.2476    
## CRFrelZ     -1908.41     825.43  -2.312   0.0328 *  
## BMI:CRFrelZ    53.12      24.51   2.167   0.0439 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 797.5 on 18 degrees of freedom
## Multiple R-squared:  0.4786, Adjusted R-squared:  0.2468 
## F-statistic: 2.065 on 8 and 18 DF,  p-value: 0.09603

# BUT THIS DOESNT MAKE SENSE THE MAIN EFFECT OF CRFrel is neg and BMI is pos.
model<-(lm(TOT_HIPP_VOL ~ AgeZ+RaceZ+edu_Z+HandednessZ+ETZ+BMI_Z*CRFrelZ , data = SAMPLE))
par(mfrow = c(2, 2)) # Split the plotting panel into a 2 x 2 grid
visreg::visreg(model, "BMI_Z", "CRFrelZ", overlay=TRUE)
visreg::visreg(model, "CRFrelZ", "BMI_Z", overlay=TRUE)

Import significant ZStats and PSC reports from /ROIoutput
STATS OUTPUT: # subject number, ROI, image, mean, median, max, coordinates (xyz) in standard space.

DATA=EPICC_FULL
ROI_DIR<-c("/Users/alyshagilmore/Downloads/restingstate_crew/ROIoutput/")

PSC<-list.files("/Users/alyshagilmore/Downloads/restingstate_crew/ROIoutput/", pattern = glob2rx("seed_pos*-psc.txt"))

Z<-list.files("/Users/alyshagilmore/Downloads/restingstate_crew/ROIoutput/", pattern = glob2rx("seed_pos*-zscr.txt"))

for (i in PSC) {
  setwd(ROI_DIR)
featquery<-read.delim(i , sep = " ", header = FALSE)
colnames(featquery)<-c("ID", "ROI_CLUSTER_NAME", "Type",  "L_R" , "POS_NEG","meanZ", "medianZ", "maxZ", "X", "Y", "Z")
featquery[,1:11]<-lapply(featquery[,1:11], as.character)
ZSTAT<-featquery$ROI_CLUSTER_NAME
ZSTAT<-strsplit(ZSTAT, "zstat")
mat  <- matrix(unlist(ZSTAT), ncol=2, byrow=TRUE)

ZSTAT<-mat[,2] 
POS_NEG<-if_else(ZSTAT=="2", "NEG", "POS")

featquery$L_R<-if_else(featquery$L_R=="left", "L", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="right", "R", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="0", "BILAT", featquery$L_R)
j<-i
ROI_CLUSTER_NAME<-strsplit(j, split = "-")
mat  <- matrix(unlist(ROI_CLUSTER_NAME), ncol=3, byrow=TRUE)
df   <- as.data.frame(mat)
ROI_CLUSTER_NAME<-df$V2

COL_NAME<-paste(featquery$L_R,ROI_CLUSTER_NAME, POS_NEG, "PBC", sep="_")
featquery<-featquery %>%
select(ID, meanZ )
names(featquery)<-c("ID", paste(COL_NAME[1]))
DATA <-left_join(featquery, DATA, by="ID" )
}

for (i in Z) {
  setwd(ROI_DIR)
featquery<-read.delim(i , sep = " ", header = FALSE)
colnames(featquery)<-c("ID", "ROI_CLUSTER_NAME", "Type",  "L_R" , "POS_NEG","meanZ", "medianZ", "maxZ", "X", "Y", "Z")
featquery[,1:11]<-lapply(featquery[,1:11], as.character)
ZSTAT<-featquery$ROI_CLUSTER_NAME
ZSTAT<-strsplit(ZSTAT, "zstat")
mat  <- matrix(unlist(ZSTAT), ncol=2, byrow=TRUE)

ZSTAT<-mat[,2] 
POS_NEG<-if_else(ZSTAT=="2", "NEG", "POS")

featquery$L_R<-if_else(featquery$L_R=="left", "L", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="right", "R", featquery$L_R)
featquery$L_R<-if_else(featquery$L_R=="0", "BILAT", featquery$L_R)
j<-i
ROI_CLUSTER_NAME<-strsplit(j, split = "-")
mat  <- matrix(unlist(ROI_CLUSTER_NAME), ncol=3, byrow=TRUE)
df   <- as.data.frame(mat)
ROI_CLUSTER_NAME<-df$V2

COL_NAME<-paste(featquery$L_R,ROI_CLUSTER_NAME, POS_NEG, "Zstat", sep="_")
featquery<-featquery %>%
select(ID, meanZ )
names(featquery)<-c("ID", paste(COL_NAME[1]))
DATA <-left_join(featquery, DATA, by="ID" )
}

EXTRA Commands/Walk-Throughs: Pulled for organization:

Full Sample: Corrplots

FULL_SUMMARY<-EPICC_FULL %>%
  select(ID,BMI_group, BMI, VO2_END_REASON, Race, Smoker, BetaBlocker,
         WeeklyMVPA_min, emo_avg, CRFrel, CRFabs, bdiav,T1_Rest_Syst_BP, T1_Rest_Diast_BP, Left.Hippocampus, meanFD)
COMPLETE<-CHECK_NORM 
col_names <- names(COMPLETE)
COMPLETE[,col_names] <- lapply(COMPLETE[,col_names] , as.numeric)
COMPLETE.corr<-COMPLETE[complete.cases(COMPLETE),]
M <- cor(COMPLETE.corr[,3:10])
p.mat <- cor.mtest(COMPLETE.corr[,2:10])$p
res1 <- cor.mtest(COMPLETE.corr[,2:8], conf.level = .95)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
        
 corrplot(M, method = "color",
         col = col(200),
         type = "full",  number.cex = .5,
         addCoef.col = "black", # Add coefficient of correlation
         tl.col = "black", tl.srt = 45,tl.cex = .7, # Text label color and rotation
         # Combine with significance
         p.mat = p.mat, sig.level = 0.05, insig = "blank", 
         # hide correlation coefficient on the principal diagonal
         diag = FALSE)

Import SPSS & Data Dictionary Standard Import-FULL of SPSS database. * This is quick script will read any SPSS database and automatically create a labeled data dictionary.
* Tutorial Here: https://cran.r-project.org/web/packages/dataMeta/vignettes/dataMeta_Vignette.html

EPIC_data<-read.spss("merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav")

variable.labels<-attr(EPIC_data, "variable.labels")

linker<-build_linker(EPIC_data, variable.labels, 1)

dict <- build_dict(EPIC_data,linker,option_description = NULL, prompt_varopts = FALSE)

data_desc = "AVAILABLE Baseline Variables -06/26/20 AG Orginal File: MASTER SPSS merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav" 

EPICC <- incorporate_attr(EPIC_data, dict, data_desc)

# Exporting dictionary only:
dict_only <- attributes(EPICC)$dictionary
write_csv(dict_only, "dict_only.csv")
# Saving as .rds (dataset with appended dictionary)
save_it(EPICC,"MASTER_BaselineEPICC_88")

Multiplot Grids: Visualize Sample Normality http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/

# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols:   Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)

  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)

  numPlots = length(plots)

  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                    ncol = cols, nrow = ceiling(numPlots/cols))
  }

 if (numPlots==1) {
    print(plots[[1]])

  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}

#DATA PLOTS
EPIC<-CONSTRICTED_DAT
p1<-ggstatsplot::ggbetweenstats(
  data = EPIC,
  x = Race,
  y = Age,
  effsize.type = "biased", # type of effect size
  outlier.coef = 2,
  pairwise.comparisons = TRUE,
  messages = FALSE,
  outlier.tagging = TRUE,
  outlier.label = ID,
  mean.plotting = TRUE,
    # arguments relevant for ggstatsplot::combine_plots
  plotgrid.args = list(
    align = "h", axis = "b", nrow = 1)
)

p2<-ggstatsplot::ggbetweenstats(
  data = EPIC,
  x = Race,
  y = EDU,
  effsize.type = "biased", # type of effect size
  outlier.coef = 2,
  pairwise.comparisons = TRUE,
  messages = FALSE,
  outlier.tagging = TRUE,
  outlier.label = ID,
  mean.plotting = TRUE,
    # arguments relevant for ggstatsplot::combine_plots
  plotgrid.args = list(
    align = "h", axis = "b", nrow = 1)
)

p3<-ggstatsplot::ggbetweenstats(
  data = EPIC,
  x = Race,
  y = T1_BMI,
  pairwise.comparisons = TRUE,
  effsize.type = "biased", # type of effect size
  outlier.coef = 2,
  messages = FALSE,
  outlier.tagging = TRUE,
  outlier.label = ID,
  mean.plotting = TRUE, # arguments relevant for ggstatsplot::combine_plots
  plotgrid.args = list(
    align = "h", axis = "b", nrow = 1)
)

p4<-ggstatsplot::ggbetweenstats(
  data = EPIC,
  x = Handedness,
  y = Age,
  pairwise.comparisons = TRUE,
  effsize.type = "biased", # type of effect size
  outlier.coef = 2,
  messages = FALSE,
  outlier.tagging = TRUE,
  outlier.label = ID,
  mean.plotting = TRUE, # arguments relevant for ggstatsplot::combine_plots
  plotgrid.args = list(
    align = "h", axis = "b", nrow = 1)
)

p5<-ggstatsplot::ggbetweenstats(
  data = EPIC,
  x = Handedness,
  y = EDU,
  pairwise.comparisons = TRUE,
  effsize.type = "biased", # type of effect size
  outlier.coef = 2,
  messages = FALSE,
  outlier.tagging = TRUE,
  outlier.label = ID,
  mean.plotting = TRUE, # arguments relevant for ggstatsplot::combine_plots
  plotgrid.args = list(
    align = "h", axis = "b", nrow = 1)
)
p6<-ggstatsplot::ggbetweenstats(
  data = EPIC,
  x = Handedness,
  y = T1_BMI,
  pairwise.comparisons = TRUE,
  effsize.type = "biased", # type of effect size
  outlier.coef = 2,
  messages = FALSE,
  outlier.tagging = TRUE,
  outlier.label = ID,
  mean.plotting = TRUE, # arguments relevant for ggstatsplot::combine_plots
  plotgrid.args = list(
    align = "h", axis = "b", nrow = 1)
)

multiplot(p1, p2, p3,
          p4,p5, p6 , 
          cols=3)

StepX Select the variables you wish to include in your analyses and R will export a complete sample. That is, R will only extract particpants with complete cases for all demographics and your variables of interest. In this case, I'll extract all participant who have complete data for the following variables.

Step 11: Mean center any continuous variables:
Step 11a: First, we ensure all continuous variables for mean centering are in their numeric form. Step11b: Next, we use the scale function to mean center those variables.
Great links on mean centering here:

Why we mean center in neuroimaging analyses: http://mumford.fmripower.org/mean_centering/
How to mean center variables in R: https://www.gastonsanchez.com/visually-enforced/how-to/2014/01/15/Center-data-in-R/

COMPLETE_EPICC<-FITT_DEMOS_Outcomes_BRAIN

# Assign continuous variables classes
COMPLETE_EPICC$AGE<-as.numeric(COMPLETE_EPICC$AGE)
COMPLETE_EPICC$BMI_kg<-as.numeric(COMPLETE_EPICC$BMI_kg)
COMPLETE_EPICC$EDU<-as.numeric(COMPLETE_EPICC$EDU)
COMPLETE_EPICC$CRFabs<-as.numeric(COMPLETE_EPICC$CRFabs)
COMPLETE_EPICC$CRFrel<-as.numeric(COMPLETE_EPICC$CRFrel)
COMPLETE_EPICC$weight_mc<-as.numeric(COMPLETE_EPICC$weight_kg)

COMPLETE_EPICC$Age_mc<-scale(COMPLETE_EPICC$AGE, scale = FALSE)
COMPLETE_EPICC$BMI_mc<-scale(COMPLETE_EPICC$BMI_kg, scale = FALSE)
COMPLETE_EPICC$EDU_mc<-scale(COMPLETE_EPICC$EDU, scale = FALSE)
COMPLETE_EPICC$CRFabs_mc<-scale(COMPLETE_EPICC$CRFabs, scale = FALSE)
COMPLETE_EPICC$CRFrel_mc<-scale(COMPLETE_EPICC$CRFrel, scale = FALSE)
COMPLETE_EPICC$weight_mc<-scale(COMPLETE_EPICC$weight_kg, center = TRUE)

Import feat results: (old version strait from featquery output)

featquery_Lpallidum_meants<-read.delim("/Users/alyshagilmore/Downloads/restingstate_crew/fitness_hipptot.gfeat/cope1.feat/L_Pallidum/mean_mask_ts.txt" ,header = FALSE)
Lpallidum_mts<-featquery_Lpallidum_meants$V1
DATA<-cbind(DATA,Lpallidum_mts)
DATA$Lpallidum_mts<-as.numeric(DATA$Lpallidum_mts)

featquery_Lpallidum_Zstat<-read.delim("/Users/alyshagilmore/Downloads/restingstate_crew/fitness_hipptot.gfeat/cope1.feat/L_Pallidum/max_thresh_zstat2_ts.txt" ,header = FALSE)
Lpallidum_Zstat<-featquery_Lpallidum_Zstat$V1
DATA<-cbind(DATA,Lpallidum_Zstat)
DATA$Lpallidum_Zstat<-as.numeric(DATA$Lpallidum_Zstat)

Create Constricted Samples I:

EPICC Neuroimaging Analyses

7/04/2020

From the Terminal...

From the R Console...

Data Frame Summary

FULL_SUMMARY

EXTRA Commands/Walk-Throughs: Pulled for organization: