This is a markdown document for basic data analysis and visualisation. At the moment, it only includes analysis and visualisation of our Simulated Data Set

In previous RMDs, we have loaded in the raw data and sanitized it, adding some coding columns.

Here we will further collapse that data using melt() from the reshape package

library(tidyverse)
library(reshape2)
library(plyr)
library(doBy)
library(scales)
library(lmerTest)

CleanData <- read.csv("F:/Google Drive/GitHub Repos/Crossmodality-Toolkit/data/CleanData.csv")
simdata <- subset(CleanData, DataSet == "Simulated")
pilotdata <- subset(CleanData, DataSet == "Pilot")

Correctness Analysis

Now we can start taking a look at our “Correctness” data- this involves manipulating our data frame differently than previously

Before, we had our DV (Response) in a single column (skinny format), so we just had to aggregate that column.

For Correctness, our data is in three columns (wide format), so we need to melt the dataframe

# Getting rid of numerical response column that we aren't using
CorrData <- subset(pilotdata, select = -c(Response))

# Melting data
CorrData <- melt(CorrData,
                 variable.name = "Prediction",
                 id.vars = c("DataSet", "Subject", "Condition", "TrialNum", "Inducer",
                             "Concurrent", "Comparison"))

# Aggregating Data

CorrDataAgg <- aggregate(value ~ Prediction + Subject + Condition + Inducer + Concurrent + Comparison,
                         CorrData, mean)

So that gives us correctness data from our three sets of predictions, aggregated so that it’s in a nice form for doing some GLM/LMER on

FullModel <- glmer(value ~ Prediction + (1|Subject), data = CorrDataAgg, family = binomial )
## Warning in eval(family$initialize, rho): non-integer #successes in a
## binomial glm!
summary(FullModel)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: value ~ Prediction + (1 | Subject)
##    Data: CorrDataAgg
## 
##      AIC      BIC   logLik deviance df.resid 
##  13408.4  13437.2  -6700.2  13400.4     9896 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2210 -0.9921  0.5960  0.7622  1.1299 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev.
##  Subject (Intercept) 0.129    0.3591  
## Number of obs: 9900, groups:  Subject, 61
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.22552    0.05821   3.874 0.000107 ***
## PredictionLitReview  0.05824    0.05032   1.157 0.247146    
## PredictionAffect     0.01895    0.05026   0.377 0.706205    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) PrdcLR
## PrdctnLtRvw -0.431       
## PrdctnAffct -0.431  0.499
anova(FullModel)
## Analysis of Variance Table
##            Df Sum Sq Mean Sq F value
## Prediction  2 1.3317 0.66585  0.6658