RISH matched participant visualizations

Written: 2020-01-01
Last ran: 2020-01-01
Website: http://rpubs.com/navona/RISH_participants

Purpose: This script summarizes data from the n=13 matched participants, on three matching criteria: age, handedness, and WTAR (IQ) score. Participants were additionally matched on sex. The R MatchIt package was used. We selected participants with the smallest distance. The plots show how matched participants (large circles, with line) compare to each other across the three variables, which were equally weighted. Though some distances on some variables appear large, there are no statistical differences between the matched sets (see http://rpubs.com/navona/RISH_matching). The human phantom data, which was used to bolster the male matches (for a total of n=16), is not visualized here.

Note: We have considered harmonization separately from the CCA analysis. All eligible HCs were considered for harmonization. In contrast, only eligible participants that also completed all neurocognition and social cognition assessments were include in the CCA. Thus, we have some matched target participants (n=4, SPN01_CMP_0183, SPN01_CMP_0202, SPN01_CMP_0218, SPN01_ZHP_0108) who are part of the matching set, but not included in the CCA analysis. Our belief was that this is acceptable and preferable to matching within only the HCs included in the CCA analysis (a smaller pool).

#data cleaning - for this analysis, remove the human phantoms - we know they match
rish <- subset(rish, nchar(as.character(reference)) == 14)

#data munging - list of target IDs
referenceIDs <- unique(rish$reference) #35 -- the same reference participants are being used. 
targetIDs <- rish$target #65
rishIDs <- c(referenceIDs, targetIDs)

#list of participants that are eligible and have complete data for CCA
include <- df$record_id #406

#check -- are all the reference IDs in the include list?
referenceIDs %in% include #all TRUE!

#check -- are all target IDs in the include list?
targetIDs %in% include #4 FALSE!

#what participants were used as a target match, but not included in CCA?
idx <- which(targetIDs %in% include == FALSE) # 27 28 29 64

#find ID for those indices -- so all of these are ok to include!
targetIDs[27] #SPN01_CMP_0183 -- unable to contact
targetIDs[28] #SPN01_CMP_0202 -- withdrew consent for further participation
targetIDs[29] #SPN01_CMP_0218 -- didn't complete all scog/ncog
targetIDs[64] #SPN01_ZHP_0108 -- didn't complete all scog/ncog

#also, these participants are included in the 'df_eligible' df

#include only the HCs
df_eligible <- df_eligible[df_eligible$group == 'HC',] #185

#make a smaller df so easier to grapple with
df_eligible <- df_eligible[, c("record_id",  'site', 'age', 'handedness', 'wtar_std')]

#rename WTAR variable
names(df_eligible)[names(df_eligible)=='wtar_std'] <- 'WTAR'

#make a variable controlling emphasis
df_eligible$emphasis <- ifelse(df_eligible$record_id %in% rishIDs, 2, 1)

#merge in the match data, to make a variable for connecting line segment
df_eligible <- merge(df_eligible, rish, by.x = 'record_id', by.y='reference', all.x=TRUE)

#make sure that id variables are a factor
df_eligible$record_id <- as.factor(df_eligible$record_id)
df_eligible$target <- as.factor(df_eligible$target)

#write a function to create a df that contains only data for each site pairing (reference to target)
df_function <- function(site){
  df <- df_eligible[(df_eligible$site == 'CMH' | df_eligible$site == site) , ]
  df$record_id <- droplevels(df$record_id)
  return(df)
}

#write a plotting function - plot show each df separately
plot_function <- function(df, var, colour, title, yaxis, ymin, ymax){
  var <- eval(substitute(var), df)
  var.name <- substitute(var)

  #underlying scatter plot
  plt <- ggplot() +
    geom_point(df, mapping=aes(x=record_id, y=get(var), color=site, size=emphasis), show.legend = FALSE) +
    scale_color_manual(values=c('#d53628', colour)) +
    theme_bw() + 
    theme(axis.text.x=element_blank(),
          axis.ticks.x=element_blank(),
          panel.grid=element_blank()) + 
    labs(x='participant',
         y=yaxis,
         title=title) + 
    scale_y_continuous(limits = c(ymin, ymax)) +
    scale_size_continuous(range = c(2, 4)) + #set size of emphasized points (matches)
    scale_x_discrete(expand=c(0, 10), #create buffer so x axis points don't overlap y axis
      breaks=df$record_id,
      limits=df$record_id) 

  #make a new dataframe containing data needed for line segment 
  df_new <- df %>% select(record_id, target) %>% filter(!is.na(target))
  df_new$var_ref <- df[[var]][match(df_new$record_id, df$record_id)]
  df_new$var_tar <- df[[var]][match(df_new$target, df$record_id)]
  
  #plot the line segments on top of scatter plot
  plt + geom_segment(mapping = aes(
    x=as.factor(df_new$record_id), y=df_new$var_ref, 
    xend=df_new$target, yend=df_new$var_tar))
}

Age

Handedness

WTAR (IQ)