This experiment will assess the impact of the following masking strategies on the calculation of a similarity metric, (pearson’s r):
If needed, the analysis scripts are availabe:
We want to assess how different masking strategies, including pairwise deletion, pairwise inclusion, and a standard brain mask, impact the calculation of a reasonable similarity metric, pearson’s r. This is a solid starting analysis (that perhaps should have been done first) because we have clear expectations about what the output should look like: introducing “faux zeros” into the calculation, as is done with both pairwise inclusion and brain mask, I hypothsize will “dilute” the result - meaning the variance will decrease, and overall correlation scores will be weaker. We (think) that pairwise deletion with thresholed maps will do best to preserve the “gold standard” differences (unthresholded vs unthrehsolded)
There will be three parts to this experiment.
This threshold was chosen as it corresponds to a p-value of 0.05, commonly used in analyses with Z score maps.
First we will generate the following data matrices (output):
For each di of data: For each strategy of pd,pi,bm: Calculate pearson for di using strategy against all thresh
First we will look at the overall differences in the distributions of the scores when we have the three masking strategies. First, we need to read in all of our data.
# Read in the input files
indir = "/home/vanessa/Documents/Work/BRAINMETA/IMAGE_COMPARISON/analysis/masking_scores"
# Similarity Scores
pearson_pd = read.csv(paste(indir,"/144_masking_pd.tsv",sep=""),sep="\t",stringsAsFactors=FALSE)
pearson_pi = read.csv(paste(indir,"/144_masking_pi.tsv",sep=""),sep="\t",stringsAsFactors=FALSE)
pearson_bm = read.csv(paste(indir,"/144_masking_bm.tsv",sep=""),sep="\t",stringsAsFactors=FALSE)
pearson_gs = read.csv(paste(indir,"/144_masking_gs.tsv",sep=""),sep="\t",stringsAsFactors=FALSE)
# Mask Size Differences
pd_vs_pi = read.csv(paste(indir,"/144_pd_vs_pi_sizediff.tsv",sep=""),sep="\t",stringsAsFactors=FALSE)
pd_vs_bm = read.csv(paste(indir,"/144_pd_vs_bm_sizediff.tsv",sep=""),sep="\t",stringsAsFactors=FALSE)
bm_vs_pi = read.csv(paste(indir,"/144_pi_vs_bm_sizediff.tsv",sep=""),sep="\t",stringsAsFactors=FALSE)
# Write a function to format the data frames
format_df = function(df){
colnames(df)[1] = "ID"
colnames(df) = gsub("X","",colnames(df))
rownames(df) = df[,1]
df = df[,-1]
return(df)
}
# Format all dataframes
pearson_gs = format_df(pearson_gs)
pearson_pd = format_df(pearson_pd)
pearson_pi = format_df(pearson_pi)
pearson_bm = format_df(pearson_bm)
pd_vs_pi = format_df(pd_vs_pi)
pd_vs_bm = format_df(pd_vs_bm)
bm_vs_pi = format_df(bm_vs_pi)
Just for fun / visualization, what do the scores look like?
library(pheatmap)
# Here is the gold standard, we will save the row/col ordering
gs = pheatmap(pearson_gs,main="Gold Standard Pearson Scores for Pairwise Unthresholded Maps")
col_index = gs$tree_col$order
row_index = gs$tree_row$order
# Reorder our remaining data according to the indices above
pearson_pd = pearson_pd[row_index,col_index]
pearson_pi = pearson_pi[row_index,col_index]
pearson_bm = pearson_bm[row_index,col_index]
# Now use pheatmap (with no clustering to preserve ordering)
pheatmap(pearson_pd,main="Pairwise Deletion Pearson Scores for Unthresholded vs +/-1.96 maps",cluster_rows=FALSE,cluster_cols=FALSE)
pheatmap(pearson_pi,main="Pairwise Inclusion Pearson Scores for Unthresholded vs +/-1.96 maps",cluster_rows=FALSE,cluster_cols=FALSE)
pheatmap(pearson_bm,main="Pairwise Brain Mask Pearson Scores for Unthresholded vs +/-1.96 maps",cluster_rows=FALSE,cluster_cols=FALSE)
Wow, right off the bat we see that the pairwise deletion have more intense values (darker reds and blues) that are indicative of stronger correlation scores. Could it be that the hypothesis that zeros “dilute” the scores is true? Let’s quickly look at differences, first from the gold standard:
# Calculate differences and convert to Z scores
# Pairwise Deletion vs Gold Standard
differences = abs(pearson_gs - pearson_pd)
meand = mean(rowMeans(differences))
std = sd(apply(differences,1,sd))
pheatmap((differences-meand)/std,main="Gold Standard - Pairwise Deletion, Z Scores",cluster_rows=FALSE,cluster_cols=FALSE)
# Pairwise Inclusion vs Gold Standard
differences = abs(pearson_gs - pearson_pi)
meand = mean(rowMeans(differences))
std = sd(apply(differences,1,sd))
pheatmap((differences-meand)/std,main="Gold Standard - Pairwise Inclusion, Z Scores",cluster_rows=FALSE,cluster_cols=FALSE)
# Pairwise Deletion vs Brain Mask
differences = abs(pearson_gs - pearson_bm)
meand = mean(rowMeans(differences))
std = sd(apply(differences,1,sd))
pheatmap((differences-meand)/std,main="Gold Standard - Brain Mask, Z Scores",cluster_rows=FALSE,cluster_cols=FALSE)
# Pairwise Inclusion vs Pairwise Deletion
differences = abs(pearson_pd - pearson_pi)
meand = mean(rowMeans(differences))
std = sd(apply(differences,1,sd))
pheatmap((differences-meand)/std,main="Pairwise Deletion - Pairwise Inclusion, Z Scores",cluster_rows=FALSE,cluster_cols=FALSE)
Overall, I don’t see much difference between pairwise inclusion and using a brain mask. I also don’t see huge differences between pairwise deletion and inclusion, although there are a few scattered points that have darker reds to be concerned about. Let’s use box and whisker plots to better demonstrate the differences in the distributions.
We need our data in a flat file format.
library(reshape2)
gs_vector = as.vector(as.matrix(pearson_gs))
pd_vector = as.vector(as.matrix(pearson_pd))
pi_vector = as.vector(as.matrix(pearson_pi))
bm_vector = as.vector(as.matrix(pearson_bm))
pearsons_df = data.frame(GS=gs_vector,PD=pd_vector,PI=pi_vector,BM=bm_vector)
pearsons_flat = melt(pearsons_df)
## No id variables; using all as measure variables
head(pearsons_flat)
## variable value
## 1 GS 0.52345568
## 2 GS 0.51691586
## 3 GS 0.94408453
## 4 GS 0.09526037
## 5 GS 0.72440952
## 6 GS 0.52868152
# Now plot!
library(ggplot2)
plot.new() #ggplot2 and RMarkdown bug
gp = ggplot(pearsons_flat, aes(variable,value,fill=variable))
gp + geom_boxplot() + ylab("Pearsons R") + title("Pearson Correlations for Different Masking Strategies")
These plots are not very convincing to show differences, because we don’t have a sense of how the distributions are shaped. Let’s try density plots.
plot.new() #ggplot2 and RMarkdown bug
gp = ggplot(pearsons_flat,aes(x=value, fill=variable)) + geom_density(alpha=0.25)
gp + title("Comparing Density Plots of Pearson Correlations") + ylab("Density") + xlab("Pearsons R")
There are marginal differences in the actual distributions of the overall scores, and likely the means would be significantly different with a two sample T test (if done for a paper). We can see that pairwise deletion is most in line with the gold standard distribution, and PI and BM have more “diluted” values (closer to 0)
Next we will look at the overall mask sizes.
pd_vs_bm_vector = as.vector(as.matrix(pd_vs_bm))
pd_vs_pi_vector = as.vector(as.matrix(pd_vs_pi))
bm_vs_pi_vector = as.vector(as.matrix(bm_vs_pi))
mask_df = data.frame(pd_vs_bm_vector,pd_vs_pi_vector,bm_vs_pi_vector)
mask_flat = melt(mask_df)
## No id variables; using all as measure variables
plot.new() #ggplot2 and RMarkdown bug
gp = ggplot(mask_flat,aes(x=variable, y=value,fill=variable)) + geom_boxplot(alpha=0.25)
gp + title("Comparing Density Plots of Differences in Mask Sizes") + ylab("Differences in Mask Sizes") + xlab("")
As I would have expected, there aren’t much differences in size for the brain mask versus pairwise inclusion strategies (lots of extra zeros introduced by either of the maps). There are substantial size differences for pairwise deletion versus the other two.
This is the most important part. We were just looking at overall distributions of scores, but what we are really interested in is if by using a different masksing strategy, the ranking of the scores changes. Here is where we can test the “rank scoring” procedure that was being developed for experiment 1. This time, however, we will use Kendall’s Tau and spearman’s rho to assess changes in the lists. I have read about both these metrics, and it seems that rho generally returns higher scores, however is more sensitive to “big” changes (eg, a ranking in end position moving to top of list will decrease score much more than Kendall’s tau). I don’t need to worry about using A,B,or C because we will not have any case of two items having the same ranking.
To reiterate the procedure:
For each gsr and each masking strategy pd,pi,and bm Compare gsr with each of pdr,pir,and bmr by calculating spearman’s rho and Kendall’s Tau Save these values in lists, then look at result! Boom!
# We will save our results in a data frame with format
# imageid strategy rho rho-pvalue tau tau-pvalue
results = c()
# For each *gsr* and each masking strategy *pd*,*pi*,and *bm*
for (g in 1:nrow(pearson_gs)){
rowname = rownames(pearson_gs)[g]
gsr = as.numeric(names(sort(abs(pearson_gs[g,]),decreasing=TRUE)))
pdr = as.numeric(names(sort(abs(pearson_pd[g,]),decreasing=TRUE)))
pir = as.numeric(names(sort(abs(pearson_pi[g,]),decreasing=TRUE)))
bmr = as.numeric(names(sort(abs(pearson_bm[g,]),decreasing=TRUE)))
# PAIRWISE DELETION
rho = cor.test(gsr, pdr, method = c("spearman"), conf.level = 0.95)
tau = cor.test(gsr, pdr, method = c("kendall"), conf.level = 0.95)
row = cbind(rowname,"PD",rho$p.value,rho$estimate,tau$p.value,tau$estimate)
results = rbind(results,row)
# PAIRWISE INCLUSION
rho = cor.test(gsr, pir, method = c("spearman"), conf.level = 0.95)
tau = cor.test(gsr, pir, method = c("kendall"), conf.level = 0.95)
row = cbind(rowname,"PI",rho$p.value,rho$estimate,tau$p.value,tau$estimate)
results = rbind(results,row)
# BRAIN MASK
rho = cor.test(gsr, bmr, method = c("spearman"), conf.level = 0.95)
tau = cor.test(gsr, bmr, method = c("kendall"), conf.level = 0.95)
row = cbind(rowname,"BM",rho$p.value,rho$estimate,tau$p.value,tau$estimate)
results = rbind(results,row)
}
colnames(results) = c("imageid","strategy","rho-pvalue","rho","tau-pvalue","tau")
rownames(results) = seq(1,nrow(results))
results = as.data.frame(results,stringsAsFactors=FALSE)
head(results)
## imageid strategy rho-pvalue rho
## 1 539 PD 0.507820203370622 0.0555702917771883
## 2 539 PI 0.227000418958694 0.101237842617153
## 3 539 BM 0.354901087832974 0.0775862068965517
## 4 183 PD 0.0736754225063921 0.1495337995338
## 5 183 PI 0.358836822462811 0.0769552286793666
## 6 183 BM 0.125094833955927 0.128369905956113
## tau-pvalue tau
## 1 0.489630409515365 0.0388500388500388
## 2 0.218773405623669 0.0691530691530692
## 3 0.336880841888808 0.054001554001554
## 4 0.0769446605729214 0.0994560994560995
## 5 0.35454905091148 0.0520590520590521
## 6 0.109725968599869 0.0899378399378399
Finally, we need to separate this huge data frame into each separate analysis (determined by the mask type) and then correct for multiple comparisons, and see if there are any images with significant differences in the score rankings.
pd_result = results[results$strategy=="PD",]
pi_result = results[results$strategy=="PI",]
bm_result = results[results$strategy=="BM",]
# Correct each of rho and tau separately
# We have to convert to character --> numeric because stupid R made them factors!
# PAIRWISE DELETION
hist(p.adjust(as.numeric(as.character(pd_result[,3])),method="fdr"),main="Rho FDR P-Values for PD",col="tomato") # Rho
hist(p.adjust(as.numeric(as.character(pd_result[,5])),method="fdr"),main="Tau FDR P-Values for PD",col="violet") # Tau
# PAIRWISE INCLUSION
hist(p.adjust(as.numeric(as.character(pi_result[,3])),method="fdr"),main="Rho FDR P-Values for PI",col="tomato") # Rho
hist(p.adjust(as.numeric(as.character(pi_result[,5])),method="fdr"),main="Tau FDR P-Values for PI",col="violet") # Tau
# BRAIN MASK
hist(p.adjust(as.numeric(as.character(bm_result[,3])),method="fdr"),main="Rho FDR P-Values for BM",col="tomato") # Rho
hist(p.adjust(as.numeric(as.character(bm_result[,5])),method="fdr"),main="Tau FDR P-Values for BM",col="violet") # Tau
Finally, create a table of counts of images that are significantly different
counts = rbind(table(p.adjust(as.numeric(as.character(pd_result[,3])),method="fdr") <= 0.05),
table(p.adjust(as.numeric(as.character(pd_result[,5])),method="fdr") <= 0.05),
table(p.adjust(as.numeric(as.character(pi_result[,3])),method="fdr") <= 0.05),
table(p.adjust(as.numeric(as.character(pi_result[,5])),method="fdr") <= 0.05),
table(p.adjust(as.numeric(as.character(bm_result[,3])),method="fdr") <= 0.05),
table(p.adjust(as.numeric(as.character(bm_result[,5])),method="fdr") <= 0.05))
rownames(counts) = c("PD_RHO","PD_TAU","PI_RHO","PI_TAU","BM_RHO","RM_TAU")
colnames(counts) = c("NOT_SIG","SIG_DIFF")
counts
## NOT_SIG SIG_DIFF
## PD_RHO 133 11
## PD_TAU 135 9
## PI_RHO 142 2
## PI_TAU 142 2
## BM_RHO 135 9
## RM_TAU 138 6
If I were writing this into a paper, I would next look specifically at the maps with the significant differences to better understand what is driving this result!