MitsMark - Pilot - Data Analysis

First, we need to collect all of the data into one place

files  <- list.files(path = "C:/Users/Alan/Documents/GitHub/MitsMark/Data/Pilot50/", pattern = '.csv',full.names= TRUE)  

tables <- lapply(files, read.csv, header = TRUE, sep = ',')
PilotRaw <- do.call(rbind , tables)

That’s all the raw data, but jsPsych outputs everythign together- including the answers to the explanation trials, the answers to the ethics approval questions, etc (participants who do not give full consent are not allowed to do the experiment, so we can ignore those things).

So lets just separate this into Training Trials (which we won’t look at likely) and Testing Trials (which we are interested in)

#Fortunately we can subset these really easily out by their trial_indexes
PilotTraining <- subset(PilotRaw, trial_index >= 15 & trial_index <63)

PilotTesting <- subset(PilotRaw, trial_index >= 64 & trial_index <=111)

#Now we can get rid of a bunch of columns
PilotTesting$ParticipantID <- rep(1:(nrow(PilotTesting)/48), each = 48)
PilotTesting$ParticipantID <- paste("x", PilotTesting$ParticipantID, sep = '')

PilotTesting <- subset(PilotTesting, select = c("ParticipantID", 'trialNum', 'TrialType', 'word', "sound", 'WordType',
                                                'TrueMeaning', 'TargetMeaning', 'answer', 'correct', 'rt'))


colnames(PilotTesting) <- c("ParticipantID", "TrialNum", "TrialType", "English", "Japanese", "WordType", "TrueMeaning",
                            "TargetMeaning", "Answer", "Correct", "RT")

round(mean((PilotTesting$Correct)*100), digits =2)

## [1] 64.2

So that’s the basic data - the first thing we can obviously look at is just omnibus performance - how good were people at the task overall

That’s pretty easy, it’s 64.2%, which isn’t great but is better than the preliminary pilot data

What we’re really interested in though, is the difference between the 6 types of words: most simply we can look at each type with as single tapply, and we can also visualise easily enough

tapply(PilotTesting$Correct, PilotTesting$WordType, mean)

##                NAdj       NRI        RI     RNAdj      RNRI       SRI 
##        NA 0.7152778 0.6319444 0.6388889 0.6319444 0.6250000 0.6087963

PilotTestingAgg <- aggregate(Correct ~ WordType ,
                  data= PilotTesting, 
                  mean)

PilotTestingAggSD <- aggregate(Correct ~ WordType ,
                  data= PilotTesting, 
                  sd)

PilotTestingAgg$SE <- PilotTestingAggSD$Correct/sqrt(432)

library(ggplot2)
library(ggthemes)


ggplot(data=PilotTestingAgg, aes(x=WordType, y=Correct), group = 6) +
  geom_bar(stat = "summary", fun.y = "mean") +
  geom_errorbar(aes(ymin= Correct - SE, ymax= Correct + SE, width = 0.2)) +
  labs(x="Word Type", y="Proportion Correct") +
  guides(colour=FALSE) +
  scale_y_continuous(breaks = c(0, 0.25, 0.5, 0.75, 1.00), limits = c(0,1)) +
  theme_tufte()

So that’s super surprising- Normal Adjectives are the best, which is definitely not what we’ve found in previous work

Why might this be the case? It could be the effects of individual words or sets (yoked participants), but we won’t have data to look at those - what we can look at however is trial type (targets vs. distractors)

PilotTestingAgg2 <- aggregate(Correct ~ WordType + TrialType,
                  data= PilotTesting, 
                  mean)

PilotTestingAggSD2 <- aggregate(Correct ~ WordType + TrialType ,
                  data= PilotTesting, 
                  sd)

PilotTestingAgg2$SE <- PilotTestingAggSD2$Correct/sqrt(216)

library(ggplot2)
library(ggthemes)


ggplot(data=PilotTestingAgg2, aes(x=WordType, y=Correct), group = 6) +
  geom_bar(stat = "summary", fun.y = "mean", aes(fill = TrialType), position = position_dodge(width=0.9)) +
  #geom_errorbar(aes(ymin= Correct - SE, ymax= Correct + SE, width = 0.2)) +
  labs(x="Word Type", y="Proportion Correct") +
  scale_y_continuous(breaks = c(0, 0.25, 0.5, 0.75, 1.00), limits = c(0,1)) +
  theme_tufte()

So that’s actually pretty interesting- the main reason Normal Adjectives appear to be best is that they are robust to distractors - for everything else (except maybe Split Reduplicative Ideophones) Target Trials are performed about equally well, but Normal Adjectives have a pertty clear advantage for Distractor Trials

The statistical Analysis might look a bit like this:

library(lme4)

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following object is masked from 'package:tidyr':
## 
##     expand

library(lmerTest)

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

library(plyr)

## -------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## -------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following object is masked from 'package:purrr':
## 
##     compact

#First we need to add some columns to the PilotTesting file

PilotTesting$Ideo <- mapvalues(PilotTesting$WordType,
                               from = c("RI", "SRI", "NRI", "RNRI", "NAdj", "RNAdj"),
                               to = c("Y", "Y", "Y", "Y", "N", "N"))


PilotTesting$Redup <- mapvalues(PilotTesting$WordType,
                               from = c("RI", "SRI", "NRI", "RNRI", "NAdj", "RNAdj"),
                               to = c("Y", "N", "N", "Y", "N", "Y"))

PilotTesting$Nat <- mapvalues(PilotTesting$WordType,
                               from = c("RI", "SRI", "NRI", "RNRI", "NAdj", "RNAdj"),
                               to = c("Y", "N", "Y", "N", "Y", "N"))


PilotTestingAgg3 <- aggregate(Correct ~ Ideo + Redup + Nat + TrialType + ParticipantID,
                  data= PilotTesting, 
                  mean)

FirstModel <- lmer(Correct ~ Ideo*Redup*Nat*TrialType + (1|ParticipantID) + (1|Ideo:ParticipantID) + (1|Redup:ParticipantID), data= PilotTestingAgg3)

## fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients

stepFM <- step(FirstModel)

## fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients

## fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients
## fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients
## fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients
## fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients
## fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients
## fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients

FinalModel <- get_model(stepFM)

anova(FinalModel)

## Missing cells for: IdeoN:RedupY:NatY, IdeoN:RedupN:NatN, IdeoN:RedupY:NatY:TrialTypeD, IdeoN:RedupN:NatN:TrialTypeD, IdeoN:RedupY:NatY:TrialTypeT, IdeoN:RedupN:NatN:TrialTypeT.  
## Interpret type III hypotheses with care.

## Type III Analysis of Variance Table with Satterthwaite's method
##                           Sum Sq Mean Sq NumDF DenDF F value   Pr(>F)    
## Ideo                     0.22005 0.22005     1   583  3.6249  0.05741 .  
## Redup                    0.00815 0.00815     1   583  0.1343  0.71419    
## Nat                      0.03704 0.03704     1   583  0.6101  0.43507    
## TrialType                1.23633 1.23633     1   583 20.3659 7.74e-06 ***
## Ideo:Redup               0.15755 0.15755     1   583  2.5953  0.10772    
## Redup:Nat                0.00231 0.00231     1   583  0.0381  0.84525    
## Ideo:TrialType           0.13903 0.13903     1   583  2.2903  0.13073    
## Redup:TrialType          0.34843 0.34843     1   583  5.7396  0.01690 *  
## Nat:TrialType            0.00000 0.00000     1   583  0.0000  1.00000    
## Ideo:Redup:TrialType     0.00709 0.00709     1   583  0.1168  0.73268    
## Redup:Nat:TrialType      0.03704 0.03704     1   583  0.6101  0.43507    
## Ideo:Nat                                                                 
## Ideo:Redup:Nat                                                           
## Ideo:Nat:TrialType                                                       
## Ideo:Redup:Nat:TrialType                                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1