Pilot and Affect Data Analysis and Plotting- AN

This is a markdown document for taking a look at the data from both the Pilot version of our Crossmodality Toolkit study, and the Affect data collected as part of that study.

In previous RMDs, we have loaded in the raw data and sanitized it, adding some coding columns.

Here we will further collapse that data using melt() from the reshape package

library(tidyverse)
library(reshape2)
library(plyr)
library(doBy)

pilotdata <- read.csv("F:/Google Drive/GitHub Repos/Crossmodality-Toolkit/data/cleandatatemp.csv")
affectdata <- read.csv("F:/Google Drive/GitHub Repos/Crossmodality-Toolkit/data/affectcleandatatemp.csv")

Now we need to compare the columns and make sure we have the same columns in both data frames

colnames(pilotdata)

##  [1] "X"            "trialNum"     "rt"           "subject"     
##  [5] "condition"    "Focal1"       "Focal2"       "Focal3"      
##  [9] "InducerL"     "InducerR"     "ConcurrentL"  "ConcurrentR" 
## [13] "choice"       "LeftPair"     "Resp"         "MagRespCorr" 
## [17] "IndDomainL2"  "ConDomainL2"  "ConDomainR2"  "Comparison"  
## [21] "Prediction"   "Resp2"        "RespCorrPred"

colnames(affectdata)

##  [1] "X"            "trialNum"     "rt"           "subject"     
##  [5] "condition"    "InducerL"     "InducerR"     "ConcurrentL" 
##  [9] "ConcurrentR"  "choice"       "LeftPair"     "Resp"        
## [13] "MagRespCorr"  "IndDomainL2"  "ConDomainL2"  "ConDomainR2" 
## [17] "Comparison"   "Prediction"   "Resp2"        "RespCorrPred"

setdiff(colnames(pilotdata), colnames(affectdata))

## [1] "Focal1" "Focal2" "Focal3"

So the only columns that are different are ‘focal1’, ‘focal2’, and ‘focal3’, which are in the pilotdata but not the affectdata - to make life easy on ourselves we will just get rid of those columns (we won’t use them in any analyses anyways)

pilotdata <- subset(pilotdata, select = -c(Focal1, Focal2, Focal3))
setdiff(colnames(pilotdata), colnames(affectdata))

## character(0)

Now we are going to melt all of this data into one longform data frame with a single “Correctness” column

First, we add a column to each data frame saying where the data comes from

Second, we rbind the dataframes together

Third, we aggregate (with SummaryBy()) the data per participant

XTH, we melt the two RespCorr columns together, adding a column that says which Prediction set we are using

#Check that the same comparisons are made in both data frames
affectcomps <- sort(unique(affectdata$Comparison))
pilotcomps <- sort(unique(pilotdata$Comparison))

setdiff(affectcomps, pilotcomps[1:44])

## character(0)

affectdata$Source <- "Affect"
pilotdata$Source <- "Pilot"

data <- rbind(pilotdata, affectdata)

dataAgg <- summaryBy(MagRespCorr + RespCorrPred ~ 
                       subject + Source + condition +
                       IndDomainL2 + ConDomainL2 + Comparison,
                     data= data, Fun = c(mean))

colnames(dataAgg) <-c('Subject', "Source", "Condition", "Inducer", "Concurrent", "Comparison",
                      "Magnitude", "Lit Review")

MoltenData <- melt(dataAgg,
                   variable.name = "Correctness",
                   id.vars = c ("Subject", "Source", "Condition", "Inducer", "Concurrent",
                                "Comparison"))

colnames(MoltenData) <- c('Subject', 'Source', 'Condition', 'Inducer', 'Concurrent', 'Comparison',
                          'Prediction', 'pCorrect')

So, that’s our data in Molten format, which means that there is only one depedent variable of “correctness”, and then enough unique identifiers that we can plot where all of that data comes from (i.e. whether the Source is the Pilot or the Affect Data, and whether the “Correctness” is based on Magnitude Symbolism or our predictions.)

Now that means we can start to look at the data visually.

ggplot(data=MoltenData, aes(x = factor(0), y = pCorrect)) + 
  geom_jitter(aes(colour = Source, shape = Prediction),size = 0.8, alpha =0.4) +
  ggtitle("Pilot Data - Proportion Correct by Comparison by Condition") +
  scale_y_continuous("Proportion Correct", breaks = c(0, 0.5, 1), 
                     labels = c("0", "0.5", "1"), limits = c(0, 1)) +
  scale_x_discrete('', breaks = '', labels = '')+
  theme(axis.text = element_text(size=8)) +
  theme(plot.title = element_text(size=16, face="bold", hjust=0, color="#666666")) +
  #facet_grid(IndL ~ ConL) +
  theme(strip.text.x = element_text(size = 8, colour = "black"),
        strip.text.y = element_text(size = 8, colour = "black", angle = 0),
        strip.background = element_rect(fill= "#FFFFFF"))

You can see a graph like this is pretty useless- that’s because the pCorrect column only has a few values in it: ‘r unique(MoltenData$pCorrect)’. Because our current melted data set include Participant as an ID feature though, we just have a lot of data in very few bins - most graphs are much easier to interpret when data is aggregated across participants as well, especially when the data is binomial, so lets do that

dataAgg2 <- summaryBy(MagRespCorr + RespCorrPred ~ 
                       Source + condition +
                       IndDomainL2 + ConDomainL2 + Comparison,
                     data= data, Fun = c(mean))

colnames(dataAgg2) <-c("Source", "Condition", "Inducer", "Concurrent", "Comparison",
                      "Magnitude", "Lit Review")

MoltenData2 <- melt(dataAgg2,
                   variable.name = "Correctness",
                   id.vars = c ("Source", "Condition", "Inducer", "Concurrent",
                                "Comparison"))



colnames(MoltenData2) <- c('Source', 'Condition', 'Inducer', 'Concurrent', 'Comparison',
                          'Prediction', 'pCorrect')

So now we have a data frame that is a little bit more amenable to plotting- lets see what we can do with it.

ggplot(data=MoltenData2, aes(x = factor(0), y = pCorrect)) + 
  geom_jitter(aes(colour = Source, shape = Prediction),size = 0.8, alpha =0.4) +
  ggtitle("Pilot Data - Proportion Correct by Comparison by Condition") +
  scale_y_continuous("Proportion Correct", breaks = c(0, 0.5, 1), 
                     labels = c("0", "0.5", "1"), limits = c(0, 1)) +
  scale_x_discrete('', breaks = '', labels = '')+
  theme(axis.text = element_text(size=8)) +
  theme(plot.title = element_text(size=16, face="bold", hjust=0, color="#666666")) +
  #facet_grid(IndL ~ ConL) +
  theme(strip.text.x = element_text(size = 8, colour = "black"),
        strip.text.y = element_text(size = 8, colour = "black", angle = 0),
        strip.background = element_rect(fill= "#FFFFFF"))

## Warning: Removed 205 rows containing missing values (geom_point).

ggplot(data=MoltenData2, aes(x = factor(0), y = pCorrect)) + 
  geom_boxplot(aes(colour = Source, linetype = Prediction),size = 0.8, alpha =0.4) +
  ggtitle("Pilot Data - Proportion Correct by Comparison by Condition") +
  scale_y_continuous("Proportion Correct", breaks = c(0, 0.25, 0.5, 0.75, 1), 
                     labels = c("0", "0.25", "0.5", "0.75", "1"), limits = c(0, 1)) +
  scale_x_discrete('', breaks = '', labels = '')+
  theme(axis.text = element_text(size=8)) +
  theme(plot.title = element_text(size=16, face="bold", hjust=0, color="#666666")) +
  #facet_grid(IndL ~ ConL) +
  theme(strip.text.x = element_text(size = 8, colour = "black"),
        strip.text.y = element_text(size = 8, colour = "black", angle = 0),
        strip.background = element_rect(fill= "#FFFFFF"))

So it looks like Magnitude Symbolism actually does a pretty damn good job at predicting performance - there is a big spread in the data, to be sure, but the mean is close to 75% correct in the Affect-Only data and ~65% in the Pilot. The predictions generated by our literature review are a little bit higher than an explanation that only involves Magnitude Symbolism, but not by much!

But of course, we aren’t really here for the overall effects- that’s an interesting bit of trivia, but what we want to know about is the associations between all of the various domains.

The first thing we can look at then is the overall performance for each type of comparison

ggplot(data=MoltenData2, aes(x = factor(0), y = pCorrect)) + 
  geom_jitter(aes(colour = Source, shape = Prediction),size = 0.8, alpha =0.4) +
  ggtitle("Pilot Data - Proportion Correct by Comparison by Condition") +
  scale_y_continuous("Proportion Correct", breaks = c(0, 1), 
                     labels = c("0", "1"), limits = c(0, 1)) +
  scale_x_discrete('', breaks = '', labels = '')+
  theme(axis.text = element_text(size=8)) +
  theme(plot.title = element_text(size=16, face="bold", hjust=0, color="#666666")) +
  facet_wrap( ~ Comparison, ncol =12) +
  theme(strip.text.x = element_text(size = 4, colour = "black"),
        strip.text.y = element_text(size = 8, colour = "black", angle = 0),
        strip.background = element_rect(fill= "#FFFFFF"))

## Warning: Removed 188 rows containing missing values (geom_point).

Okay, that doesn’t actually look very impressive - partially because ggplot has a maximum plot size for displaying in a document. We can get around this by saving the file in a larger format and looking at it outside of R Markdown though

ggplot(data=MoltenData2, aes(x = factor(0), y = pCorrect)) + 
  geom_jitter(aes(colour = Source, shape = Prediction),size = 0.8, alpha =0.4) +
  ggtitle("Pilot Data - Proportion Correct by Comparison by Condition") +
  scale_y_continuous("Proportion Correct", breaks = c(0, 1), 
                     labels = c("0", "1"), limits = c(0, 1)) +
  scale_x_discrete('', breaks = '', labels = '')+
  theme(axis.text = element_text(size=8)) +
  theme(plot.title = element_text(size=16, face="bold", hjust=0, color="#666666")) +
  facet_wrap( ~ Comparison, ncol =12) +
  theme(strip.text.x = element_text(size = 4, colour = "black"),
        strip.text.y = element_text(size = 8, colour = "black", angle = 0),
        strip.background = element_rect(fill= "#FFFFFF"))

## Warning: Removed 205 rows containing missing values (geom_point).

ggsave("pCorrbyComparison.png", plot = last_plot(), 
       path = "F:/Google Drive/GitHub Repos/Crossmodality-Toolkit/scripts/figures/",
       width = 12, height = 12, units = c("in", "cm", "mm"),
       dpi = 300)

## Warning: Removed 205 rows containing missing values (geom_point).

So if you want, you can go take a look at that graph, but it’s actually not super informative - making use of the Comparison column is something we may want to do statistically, but it doesn’t lend itself to very pretty graphs.

For that, we want to use a facet grid with Inducer on the rows and Concurrent on the columns

Let’s take a look at that

ggplot(data=MoltenData2, aes(x = factor(0), y = pCorrect)) + 
  geom_jitter(aes(colour = Source, shape = Prediction),size = 0.8, alpha =0.4) +
  ggtitle("Pilot Data - Proportion Correct by Comparison by Condition") +
  scale_y_continuous("Proportion Correct", breaks = c(0, 1), 
                     labels = c("0", "1"), limits = c(0, 1)) +
  scale_x_discrete('', breaks = '', labels = '')+
  theme(axis.text = element_text(size=8)) +
  theme(plot.title = element_text(size=16, face="bold", hjust=0, color="#666666")) +
  facet_grid(Inducer ~ Concurrent, switch = 'y') +
  theme(strip.text.x = element_text(size = 4, colour = "black"),
        strip.text.y = element_text(size = 8, colour = "black", angle = 180),
        strip.background = element_rect(fill= "#FFFFFF"))

## Warning: Removed 190 rows containing missing values (geom_point).

ggsave("pCorrFacet.png", plot = last_plot(), 
       path = "F:/Google Drive/GitHub Repos/Crossmodality-Toolkit/scripts/figures/",
       width = 12, height = 12, units = c("in", "cm", "mm"),
       dpi = 300)

## Warning: Removed 216 rows containing missing values (geom_point).

So this graph is a little bit more interesting - you can see that generally there isn’t much difference depending on the source of the data, so lets get rid of that (and switch to a boxplot)

ggplot(data=MoltenData2, aes(x = factor(0), y = pCorrect)) + 
  geom_boxplot(aes(colour = Prediction),size = 0.8, alpha =0.4) +
  ggtitle("Pilot Data - Proportion Correct by Comparison by Condition") +
  scale_y_continuous("Proportion Correct", breaks = c(0, 1), 
                     labels = c("0", "1"), limits = c(0, 1)) +
  scale_x_discrete('', breaks = '', labels = '')+
  theme(axis.text = element_text(size=8)) +
  theme(plot.title = element_text(size=16, face="bold", hjust=0, color="#666666")) +
  facet_grid(Inducer ~ Concurrent, switch = 'y') +
  theme(strip.text.x = element_text(size = 4, colour = "black"),
        strip.text.y = element_text(size = 8, colour = "black", angle = 180),
        strip.background = element_rect(fill= "#FFFFFF"))

ggsave("pCorrFacet-NS.png", plot = last_plot(), 
       path = "F:/Google Drive/GitHub Repos/Crossmodality-Toolkit/scripts/figures/",
       width = 12, height = 12, units = c("in", "cm", "mm"),
       dpi = 300)

This graph type isn’t seeming super useful- let’s try to take a look at outputting this as a heatmap instead.

Unfortunately, this requires reshaping our data once again

library(stats)
colnames(dataAgg2)[7] <- "LitReview"

dataAgg3 <- aggregate(LitReview ~ Inducer + Concurrent, dataAgg2, mean)

heatmapdata <-as.data.frame(xtabs(LitReview ~ Inducer + Concurrent, data = dataAgg3))
heatmapdata <- subset(heatmapdata, Freq != 0)

heatmapdata$Freq2 <- (heatmapdata$Freq - 0.5) *2

ggplot(data= heatmapdata, aes(x=Concurrent, y=Inducer, fill=Freq2)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(low = "red", high = "blue", mid = "white", 
                       midpoint = 0, limit = c(-1,1),
                       name="Prediction Strength") +
  theme_minimal()+ 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))+
  coord_fixed()

So that makes for a pretty neat graph, but maybe it doesn’t actually make sense to do a heatmap of what we have called “Correctness” - Instead we want a heatmap of the CHOICES that participants are making. Lets take a look at producing that.

To do this, we’re going to need to re-aggregate the data based on Resp2 (where 1= High goes with High, 0 = Not) rather than our two correctness predictors

We’re also going to recode the data so it is more explicit what participants think goes with what

dataAgg4 <- summaryBy(Resp2 ~ 
                       Source + IndDomainL2 + ConDomainL2 + Comparison,
                     data= data, Fun = c(mean))
colnames(dataAgg4) <-c("Source", "Inducer", "Concurrent", "Comparison", "Response")
dataAgg4$Response2 <- (dataAgg4$Response - 0.5) *2

RespDataAgg <- summaryBy(Resp2 ~ 
                       IndDomainL2 + ConDomainL2 + Comparison,
                     data= data, Fun = c(mean))
colnames(RespDataAgg) <-c("Inducer", "Concurrent", "Comparison", "Response")
RespDataAgg$Response2 <- (RespDataAgg$Response - 0.5) *2

RespDataAggPilot <- subset (dataAgg4, Source == "Pilot")
RespDataAggAffect <- subset(dataAgg4, Source == "Affect")

Domains <- sort(unique(RespDataAgg$Inducer))
HighValues <- c("Excited/Bored", "Happy/Sad", "Pleased/Disgusted", "Stressed/Calm",
                "Loud/Quiet", "Bright/Dark", "Red/Blue", "Red/Green", "Red/Yellow",
                "Yellow/Blue", "Noisy/Tonal", "High Pitch/Low Pitch", "Jagged/Curvy", 
                "Large/Small", "Fast/Slow")

RespDataAgg$Inducer2 <- mapvalues(RespDataAgg$Inducer, from = Domains, to= HighValues)
RespDataAgg$Concurrent2 <- mapvalues(RespDataAgg$Concurrent, from = Domains, to= HighValues)

RespDataAggPilot$Inducer2 <- mapvalues(RespDataAggPilot$Inducer, from = Domains, to= HighValues)
RespDataAggPilot$Concurrent2 <- mapvalues(RespDataAggPilot$Concurrent, from = Domains, to= HighValues)

RespDataAggAffect$Inducer2 <- mapvalues(RespDataAggAffect$Inducer, from = Domains, to= HighValues)
RespDataAggAffect$Concurrent2 <- mapvalues(RespDataAggAffect$Concurrent, from = Domains, to= HighValues)

ggplot(data= RespDataAgg, aes(x=Concurrent2, y=Inducer2, fill=Response2)) +
  geom_tile(color = "white") +
  ggtitle("Biases - All Data") +
  scale_fill_gradient2(low = "red", high = "blue", mid = "white", 
                       midpoint = 0, limit = c(-1,1),
                       name="Direction and Strength of Associaton") +
  theme_minimal()+ 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))+
  coord_fixed()

ggplot(data= RespDataAggAffect, aes(x=Concurrent2, y=Inducer2, fill=Response2)) +
  geom_tile(color = "white") +
  ggtitle("Biases - Affect Data") +
  scale_fill_gradient2(low = "red", high = "blue", mid = "white", 
                       midpoint = 0, limit = c(-1,1),
                       name="Direction and Strength of Associaton") +
  theme_minimal()+ 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))+
  coord_fixed()

ggplot(data= RespDataAggPilot, aes(x=Concurrent2, y=Inducer2, fill=Response2)) +
  geom_tile(color = "white") +
  ggtitle("Biases - Pilot") +
  scale_fill_gradient2(low = "red", high = "blue", mid = "white", 
                       midpoint = 0, limit = c(-1,1),
                       name="Direction and Strength of Associaton") +
  theme_minimal()+ 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))+
  coord_fixed()

Pilot and Affect Data Analysis and Plotting- AN

Alan Nielsen

September 13, 2017