source("/Users/titlis/cogsci/teaching/_2016/PSYCH140/assignments/optional/results/rscripts/helpers.R")
Load data.
d = read.table("/Users/titlis/cogsci/teaching/_2016/PSYCH140/assignments/optional/results/data/rawdata.txt",header=T,sep="\t",quote="")
We have 679 data points.
First we clean the data so everyone is using the same labels for agents and patients and gestured order.
d$Agent = tolower(d$Agent)
d$Patient = tolower(d$Patient)
d$Agent = gsub("(^| )(50's woman|50's Woman|50's woman|50s woman|5o's women|woman|Woman|girl|woman|Woman)($| )","50s_woman",as.character(d$Agent))
d$Agent = gsub("(^| )(chef|Chef|chef)($| )","chef",as.character(d$Agent))
d$Agent = gsub("(^| )(mountie|Mountie|mountie )($| )","mountie",as.character(d$Agent))
d$Patient = gsub("(^| )(chair|Chair)($| )","chair",as.character(d$Patient))
d$Patient = gsub("(^| )(chef)($| )","chef",as.character(d$Patient))
d$Patient = gsub("(^| )(conductor)($| )","conductor",as.character(d$Patient))
d$Patient = gsub("(^| )(flight attendant|flight attendent)($| )","flight_attendant",as.character(d$Patient))
d$Patient = gsub("(^| )(coyboy|cowboy)($| )","cowboy",as.character(d$Patient))
d$Patient = gsub("(^| )(mailbox)($| )","mailbox",as.character(d$Patient))
d$Patient = gsub("(^| )(statue)($| )","statue",as.character(d$Patient))
d$Patient = gsub("(^| )(table)($| )","table",as.character(d$Patient))
d$PatientAnimacy = gsub("(^| )(animate|Animate|animate )($| )","animate",as.character(d$PatientAnimacy))
d$PatientAnimacy = gsub("(^| )(inanimate|Inanimate|inanimate |inanimiate|inamimate)($| )","inanimate",as.character(d$PatientAnimacy))
d$GesturedOrder = gsub("(^| )(SOV )($| )","SOV",as.character(d$GesturedOrder))
d$GesturedOrder = gsub("(^| )(SVO )($| )","SVO",as.character(d$GesturedOrder))
d$GesturedOrder = toupper(d$GesturedOrder)
There should be an equal number of animate and inanimate patients if everyone ran the experiment correctly:
table(d$PatientAnimacy)
##
## animate inanimate
## 335 344
Almost! :)
How often was each word order gestured?
sort(table(d$GesturedOrder),decreasing=T)
##
## SVO V SOV OSV
## 194 150 125 90
## SV VO OV V/SOV
## 23 21 19 10
## (S)OV S/O V O/S V
## 7 6 5 3
## OV/SOV (S)VO O(S)V SVO/SOV
## 3 2 2 2
## V/SV VO/SOV VO/SVO (S/O)V
## 2 2 2 1
## CAN'T SEE MISSED THIS ONE OSOV OSV(O)
## 1 1 1 1
## OVS SOV/SOV SOV/SVO SV
## 1 1 1 1
## SVO/SVO UNCLEAR
## 1 1
A few things of note:
Many more orders were gestured than just the two (SVO and SOV) that we’re most interested in. In fact, only 0.4698085% of cases were SVO or SOV.
The second most frequent order was just “V”, and there were many cases where only the verb and one of the event participants was gestured. Looking at some of the videos, this seems to have been the result of common ground issues: participants were more likely to just gesture the verb if the experimenter was sitting on their side of the screen and watching the videos with them. If we did this again, we would need to be very explicit about asking participants to gesture all the actions and participants involved in the event.
In contrast to Gibson et al 2013, we don’t find a default preference for the SOV order – SVO was gestured more frequently than SOV.
Let’s analyze what we’re interested in: whether SVO is more likely than SOV when the patient is animate than when it’s inanimate. First, we restrict the dataset to include only the SVO and SOV cases:
dd = droplevels(d[d$GesturedOrder %in% c("SVO","SOV"),])
dd$SOV = ifelse(dd$GesturedOrder == "SOV",1,0)
dd$PatientAnimacy = as.factor(as.character(dd$PatientAnimacy))
Let’s visualize the data by plotting the proportion of SOV mentions as a function of patient animacy.
agr = dd %>%
group_by(PatientAnimacy) %>%
summarise(ProportionSOV=mean(SOV),ci.low=ci.low(SOV),ci.high=ci.high(SOV)) %>%
mutate(YMin=ProportionSOV-ci.low,YMax=ProportionSOV+ci.high)
agr = as.data.frame(agr)
ggplot(agr, aes(x=PatientAnimacy,y=ProportionSOV)) +
geom_bar(stat="identity") +
geom_errorbar(aes(ymin=YMin,ymax=YMax),width=.25) +
ylab("Proportion of SOV") +
xlab("Patient animacy")
Numerically, there are slightly more SOV cases when the patient is inanimate than when it’s animate (0.3947368 vs. 0.3892216), which is the right direction for the effect, but the fact that the error bars are so large suggests that there is no difference between the two conditions.
Let’s see if there’s a difference depending on who the agent was:
agr = dd %>%
group_by(PatientAnimacy,Agent) %>%
summarise(ProportionSOV=mean(SOV),ci.low=ci.low(SOV),ci.high=ci.high(SOV)) %>%
mutate(YMin=ProportionSOV-ci.low,YMax=ProportionSOV+ci.high)
agr = as.data.frame(agr)
dodge = position_dodge(.9)
ggplot(agr, aes(x=PatientAnimacy,y=ProportionSOV,fill=Agent)) +
geom_bar(stat="identity",position=dodge) +
geom_errorbar(aes(ymin=YMin,ymax=YMax),width=.25,position=dodge) +
ylab("Proportion of SOV") +
xlab("Patient animacy")
There seems to be some variability, but for one agent (50s woman) the effect is in the predicted direction, for another (mountie) it’s in the opposite direction, and for the third (chef) there appears to be no effect at all.
Is there variation by participants? Did everyone use SOV around 40% of the time, or did some people always gesture one order while others always gestured another?
agr = dd %>%
group_by(PatientAnimacy,ParticipantNumber) %>%
summarise(ProportionSOV=mean(SOV),ci.low=ci.low(SOV),ci.high=ci.high(SOV)) %>%
mutate(YMin=ProportionSOV-ci.low,YMax=ProportionSOV+ci.high)
agr = as.data.frame(agr)
dodge = position_dodge(.9)
ggplot(agr, aes(x=PatientAnimacy,y=ProportionSOV)) +
geom_bar(stat="identity",position=dodge) +
geom_errorbar(aes(ymin=YMin,ymax=YMax),width=.25,position=dodge) +
facet_wrap(~ParticipantNumber) +
ylab("Proportion of SOV") +
xlab("Patient animacy")
It appears that most people have a strategy of either always gesturing SOV (e.g., AH1 or EH3) or always gesturing SVO (e.g., IS1, LDL1). Let’s look at the people who don’t have a consistent strategy – do they display the predicted effect?
ddiff = agr %>%
group_by(ParticipantNumber) %>%
summarise(SameStrategy = ProportionSOV[1] == ProportionSOV[2])
ddiff = as.data.frame(ddiff[!is.na(ddiff$SameStrategy),])
row.names(ddiff) = ddiff$ParticipantNumber
ddiffp = as.character(ddiff[!ddiff$SameStrategy,]$ParticipantNumber)
ggplot(agr[agr$ParticipantNumber %in% ddiffp,],aes(x=PatientAnimacy,y=ProportionSOV)) +
geom_bar(stat="identity",position=dodge) +
geom_errorbar(aes(ymin=YMin,ymax=YMax),width=.25,position=dodge) +
facet_wrap(~ParticipantNumber) +
ylab("Proportion of SOV") +
xlab("Patient animacy")
Here we see some weak evidence for the effect – six out of eight participants show the effect numerically in the predicted direction. We don’t need the statistical analysis to see that there’s no effect, but we can run it anyway. We ask: are the log-odds of SOV over SVO greater with inanimate patients, allowing for random by-participant and by-agent variability?
m = glmer(SOV ~ PatientAnimacy + (1|ParticipantNumber) + (1|Agent), data=dd, family="binomial")
summary(m)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: SOV ~ PatientAnimacy + (1 | ParticipantNumber) + (1 | Agent)
## Data: dd
##
## AIC BIC logLik deviance df.resid
## 194.0 209.1 -93.0 186.0 315
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.5980 -0.1231 -0.0805 0.1157 2.0686
##
## Random effects:
## Groups Name Variance Std.Dev.
## ParticipantNumber (Intercept) 32.15 5.67
## Agent (Intercept) 0.00 0.00
## Number of obs: 319, groups: ParticipantNumber, 33; Agent, 3
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.8771 1.4049 0.624 0.532
## PatientAnimacyinanimate 0.6984 0.4995 1.398 0.162
##
## Correlation of Fixed Effects:
## (Intr)
## PtntAnmcynn -0.125
While the coefficient goes in the predicted direction (.698), the effect does not reach significance at the .05 level (p<.17).
Welcome to the world of data and thanks for participating as experimenters! :)