You have to log into the Pitt network via Pulse Secure first, and then either use the terminal to access Zeus or a remote desktop program.
If you have not installed the following R packages, do so now by copy/pasting the following commands to your R Console:
install.packages(“dplyr”)
install.packages(“readxl”)
install.packages(“summarytools”)
library(dplyr)
library(summarytools)
library(foreign)
library(readxl)
knitr::opts_chunk$set(echo = TRUE)
Using the terminal tab in RStudio (not Console tab), ssh into Zeuz by typing the following command (plus enter password):
ssh ericksonlab@psych-0538.psychology.pitt.edu
Still in the terminal, move to the the directory where your subject-level data is stored:
cd /Volumes/Disk1/EPICC/SubjectScans
Generate a subject ID list using the following command in the terminal:
ls -d -1 7*_1
Copy and paste the output into an excel sheet. (I haven’t quite figured out how to [simply] assign variables from the terminal into R studio). Name the column SubID. Save the spreadsheet as ‘Baseline_Subs_Brain.csv’. We’ll continue to add columns to this below.
Do the same thing we just did (using the terminal), but this time, list out the subject’s left hippocampus .feat directories:
ls -d -1 /rest/lhip.feat
Do the same thing we just did, but this time, list out the subject’s RIGHT hippocampus .feat directories (or any other seeds that you wish): ls -d -1 /rest/rhip.feat
*Copy and paste the output into a new column in your excel sheet, labeled ‘R_HIPP_path’
Save as Baseline_Subs_Brain.csv and close
Import the .csv you just made.
BRAIN_SUBS<-read.csv("Baseline_Subs_Brain.csv") #Import the .csv you just made.
BRAIN_SUBS$SubID<-as.character(BRAIN_SUBS$SubID) #read SubID as a characher variable, not numeric.
Below, we use a few commands to split the subject ids that we have (e.g., 7001_1) into subject IDs that can be merged with other databases (i.e., 7001).
x<-BRAIN_SUBS$SubID
x<-as.character(x)
tmp<-strsplit(x, "_") #strsplit: splits a charachter string in a fixed spot. in this case at '_'
mat <- matrix(unlist(tmp), ncol=2, byrow=TRUE) #this breaks the char string into 2 sep columns.
df<-as.data.frame(mat) #make a data frame
df$ID<-as.character(df$V1) #make var ID
df$Session<-as.character(df$V2) #make var session
df<-df %>% select(ID, Session) #extract ID & session only (e.i., ignore identical colummns V1 & V2)
BRAIN_SUBS<-cbind(df, BRAIN_SUBS) #bind/merge with orginal dataframe
Now that we’ve cleaned up our SubID variable to match other databases, we’ll want pull all ROI paths that we’ve copied to our excel spreadsheet Baseline_Subs_Brain.csv. To do so, the R package ‘dply’ has a great function to extract any columns that end with a specific set of characters.
Seed.df<-BRAIN_SUBS %>%
select(ID, Session, ends_with("path")) #This will extract any columns that you've added to the Baseline_Subs_Brain.csv that end w/ the phrase 'path'. This is why it's important to add column labels (first row of each column) for any ROI paths generated in the Steps 4 & 5 above. Be sure each ROI path column label ends in path (i.e., L_HIPP_path, R_HIPP_path, Bilat_HIPP_path, etc.). this is also case sensative, so make sure the '_path' is lower-case.
#The following 4 lines aren't super relavant right now. However, the may become important when we start to add numerous ROIS. I can likely create a quick loop to varify/count missing .gfeat paths (if needed). For example, it looks like 7105 has baseline brain data (i.e., they're in /EPICC/SubjectScans), however, they do not have L/R Hippo .gfeat directories created from first-level (and may still need indivdual processing, and/or further inquiry).
Seed.df$L_HIPP_path<-as.character(Seed.df$L_HIPP_path)
Seed.df$R_HIPP_path<-as.character(Seed.df$R_HIPP_path)
Seed.dfL_HIPP_path<-if_else( (Seed.df$L_HIPP_path==""), "NA", Seed.df$L_HIPP_path)
Seed.dfR_HIPP_path<-if_else( (Seed.df$R_HIPP_path==""), "NA", Seed.df$R_HIPP_path)
rm(list=setdiff(ls(), "Seed.df")) #This cleans up our RStudio Global Enviroment (variable list), to remove everything but the dataframe we just created, 'Seed.df'.
Import Fitness Data
VO2.df<-readxl::read_xlsx("EPICC Data.xlsx", sheet = "VO2 Data") # Import fitness database.
# Define variables as numeric values (because R sometimes likes to make them character strings).
VO2.df$`PEAK VO2/KG`<-as.numeric(VO2.df$`PEAK VO2/KG`)
VO2.df$`PEAK VO2`<-as.numeric(VO2.df$`PEAK VO2`)
VO2.df$BMI<-as.numeric(VO2.df$BMI)
VO2.df$AGE<-as.numeric(VO2.df$AGE)
# Rename the `LAB ID` column to match the ID variable in other databases (for future merging).
VO2.df$ID<-as.character(VO2.df$`LAB ID`)
# Only include rows of data [x,] from column `Pre / Post` that is equal to "PRE".
VO2.df<-VO2.df[VO2.df$`Pre / Post`=="PRE",]
# Only include rows of data which have a value for ID (remove NA rows, which orginally had post data in)
VO2.df<-VO2.df[complete.cases(VO2.df$ID),]
#############################################################################################
######## Select the fittness variables you would like to include in your final output ####### #############################################################################################
fittness.df<-VO2.df %>%
select(ID, AGE, BMI, `PEAK VO2`, `PEAK VO2/KG`)
Import .Sav data
EPIC_vars<-read.spss("merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav", to.data.frame=TRUE,use.value.labels = TRUE)
# View data if you want to pick other vars:
#Command: View(EPIC_vars) -Viewer in R is slow, so commented out for now
Rename variables for the exported sublist. In a future tutorial, we may be able to automatically rename variables using the ‘codebook’ package. For now, we’ll just rename the veriables we’re interested in for analyses.
EPIC_vars$EDU<-EPIC_vars$BDH004 #Rename variables for exported sublist
EPIC_vars$Handedness<-EPIC_vars$BDH003 #Rename variables for exported sublist
Next select the variables you wish to include in the final output database. These will be included in the merging process below.
##############################################################################
###### Select any IV/DVs that you wish to include in your final output #######
##############################################################################
Demos.df<-EPIC_vars %>%
select(ID,EDU, Handedness)
First, assign ID variables as character strings for merging. The ‘dplyr’ pkg (used for merging) will not be able to merge dataframes by ID, if each ID variable is a different variable class. For example, if I want to merge two dataframes by the shared variable, “ID”, then “ID” must be a character vector in BOTH dataframes.
Demos.df$ID<-as.character(Demos.df$ID)
Seed.df$ID<-as.character(Seed.df$ID)
fittness.df$ID<-as.character(fittness.df$ID)
Next, use ‘dplyr’ (installed & loaded in set-up) to ‘merge’ the 3 databases we’ve just created. As a side note, the various ’_join’ functions by dplyr (left_join, rigt_join, full_join, etc.) are very useful to learn. Here, we’ll use ‘left_join()’ to combine two dataframes at a time.
Below, we enter Seed.df first (on the left), so that our resulting dataframe (data1) will only include subjects from Seed.df. Then, we’ll merge the resulting dataframe, data1, with our fittness.df If any participant from Seed.df has missing data within Demos.df, they’ll just recieve NAs for that column entry.
data1<-left_join(Seed.df,Demos.df )
BS_EPICC_Group1<-left_join(data1,fittness.df)
Assign numeric/continuous variables classes
BS_EPICC_Group1$AGE<-as.numeric(BS_EPICC_Group1$AGE)
BS_EPICC_Group1$BMI<-as.numeric(BS_EPICC_Group1$BMI)
BS_EPICC_Group1$EDU<-as.numeric(BS_EPICC_Group1$EDU)
https://www.gastonsanchez.com/visually-enforced/how-to/2014/01/15/Center-data-in-R/
https://www.theanalysisfactor.com/center-on-the-mean/
Specifically related to group-level neuroimaging analyses: https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmumford.fmripower.org%2Fmean_centering%2F&data=02%7C01%7Cadg93%40pitt.edu%7Cf379d5fb3fb144f1021308d803e64bf8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637263634936469507&sdata=By556utpRUO89r1YMxJ6xeGvvS7m2XxtAn1I7Pewdhs%3D&reserved=0
BS_EPICC_Group1$Age_mc<-scale(BS_EPICC_Group1$AGE, scale = FALSE)
BS_EPICC_Group1$BMI_mc<-scale(BS_EPICC_Group1$BMI, scale = FALSE)
BS_EPICC_Group1$EDU_mc<-scale(BS_EPICC_Group1$EDU, scale = FALSE)
write.csv(BS_EPICC_Group1, "BS_EPICC_VO2_Demos_052920.csv")
BS_EPICC_Group<-BS_EPICC_Group1[,-c(1,2:4,11:13 )]
print(dfSummary(BS_EPICC_Group[,2:6], graph.magnif = 0.75, valid.col = FALSE, varnumbers = FALSE, na.col = FALSE , labels.col =FALSE), method = 'render')
| Variable | Stats / Values | Freqs (% of Valid) | Graph | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Handedness [factor] | 1. Right 2. Left 3. Both |
|
|||||||||||||
| AGE [numeric] | Mean (sd) : 63.6 (5.7) min < med < max: 51 < 64 < 76 IQR (CV) : 7.8 (0.1) | 18 distinct values | |||||||||||||
| BMI [numeric] | Mean (sd) : 31.8 (6.6) min < med < max: 20.2 < 31.4 < 43.3 IQR (CV) : 10.7 (0.2) | 34 distinct values | |||||||||||||
| PEAK VO2 [numeric] | Mean (sd) : 1.4 (0.3) min < med < max: 0.9 < 1.4 < 2.1 IQR (CV) : 0.3 (0.2) | 34 distinct values | |||||||||||||
| PEAK VO2/KG [numeric] | Mean (sd) : 16.8 (2.7) min < med < max: 11.4 < 16.8 < 24 IQR (CV) : 3.7 (0.2) | 34 distinct values |
Generated by summarytools 0.9.6 (R version 3.6.1)
2020-06-01
library(haven)
library(expss)
spss_data = haven::read_spss("merged EPICC data with 1024 &1069 & 674 & 7701 & 745.sav")
# add missing 'labelled' class
EPICC_data = add_labelled_class(spss_data)
rm(list=setdiff(ls(), "EPICC_data"))
#Currently Working
DEMOS<-EPICC_data %>%
select(ID, starts_with("BDH"))
SWM<-EPICC_data %>%
select(ID, starts_with("SWM"))
Step 22: Working Codebook Example:
# Best reference aside from protocol publication included: https://github.com/rubenarslan/codebook
# codebook::new_codebook_rmd()