Set your path here and load some packages that are necessary for visualization (ggplot), tidy data handling (tidyr and dplyr), and mixed effects modeling (lme4:
setwd("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/code_sheets/")
source("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/results/rscripts/helpers.R")
library(ggplot2)
library(tidyr)
library(dplyr)
library(lme4)
theme_set(theme_bw())
Read the database into R and explore it.
d = read.table("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/factives_corpus_search/results/swbdext.tab",sep="\t",header=T,quote="")
nrow(d)
## [1] 4705
head(d)
## Item_ID Verb
## 1 89:5 think
## 2 118:5 think
## 3 178:59 thought
## 4 214:7 think
## 5 215:37 think
## 6 236:41 think
## Complement
## 1 when she finally came to the realization that, you know, no, i can not, i can not take care of myself --n40242b
## 2 that they, they had a great deal of, um
## 3 --n404078 should be done --n40408b
## 4 that when she passed away --n404c45 it was probably one of the greatest
## 5 it would be
## 6 that what one thing that they were concerned --n405458 probably was the fact it wasn't necessarily, you know, like the quantity of care but the quality of, care
## Sentence
## 1 i think when she finally came to the realization that, you know, no, i can not, i can not take care of myself --n40242b.
## 2 i think that they, they had a great deal of, um,
## 3 and, uh, fortunately, we agreed, you know, on exactly, you know, what we thought --n404078 should be done --n40408b.
## 4 and i think that when she passed away --n404c45 it was probably one of the greatest,
## 5 um, i, i, i think it would be,
## 6 and, um, i, i, i think that what one thing that they were concerned --n405458 probably was the fact it wasn't necessarily, you know, like the quantity of care but the quality of, care.
## Environment Subject VerbNiteID CompNiteID
## 1 i sw2005_s54_2 sw2005_s54_503
## 2 i sw2005_s71_2 sw2005_s71_503
## 3 we sw2005_s104_21 sw2005_s104_519
## 4 i sw2005_s125_3 sw2005_s125_503
## 5 i sw2005_s126_14 sw2005_s126_514
## 6 i sw2005_s137_16 sw2005_s137_514
str(d)
## 'data.frame': 4705 obs. of 8 variables:
## $ Item_ID : Factor w/ 4705 levels "100037:5","100041:5",..: 4324 506 2369 2472 2476 2531 2630 2742 2770 2861 ...
## $ Verb : Factor w/ 35 levels "admit","admits",..: 32 32 35 32 32 32 32 32 32 32 ...
## $ Complement : Factor w/ 4496 levels "","--n10077 will be doing well this year",..: 4328 2736 317 2882 1496 2875 2241 2485 3036 4471 ...
## $ Sentence : Factor w/ 4516 levels "--ing seems --n40001a to be a, a topic that --n40003d's going --n40004c to probably take about a generation --n40006b --n400072"| __truncated__,..: 2922 2532 762 312 4053 863 2477 3204 1367 2952 ...
## $ Environment: Factor w/ 4 levels "","conditional",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Subject : Factor w/ 132 levels "","--n10f7c",..: 83 83 128 83 83 83 83 83 83 83 ...
## $ VerbNiteID : Factor w/ 4705 levels "sw2005_s104_21",..: 9 10 1 2 3 4 5 6 7 8 ...
## $ CompNiteID : Factor w/ 4705 levels "sw2005_s104_519",..: 9 10 1 2 3 4 5 6 7 8 ...
summary(d)
## Item_ID Verb Complement
## 100037:5 : 1 think :3462 it is : 19
## 100041:5 : 1 know : 516 it's : 18
## 100166:10: 1 thought : 251 that's true : 12
## 100181:10: 1 believe : 129 you're right: 11
## 10019:18 : 1 remember: 41 that's right: 10
## 10023:9 : 1 realize : 36 it : 9
## (Other) :4699 (Other) : 270 (Other) :4626
## Sentence
## i think it is. : 9
## i think that's true. : 6
## i think you're right. : 6
## i think that's right. : 5
## but i, i really think that in, in terms of like this, i'd, i think that it, it might not be such a bad thing. because, i don't know that anybody, w-, in-, i don't know that anybody would feel good, you know, like if you let someone like that loose in your community.: 4
## i know in jamaica, uh, it think it's jamaica, i think it's jamaica, i know that they have, you know, crimes punishable by death, : 4
## (Other) :4671
## Environment Subject VerbNiteID
## :3946 i :4048 sw2005_s104_21: 1
## conditional: 113 you : 262 sw2005_s125_3 : 1
## negation : 460 they : 88 sw2005_s126_14: 1
## question : 186 we : 54 sw2005_s137_16: 1
## it : 40 sw2005_s153_2 : 1
## : 32 sw2005_s174_12: 1
## (Other): 181 (Other) :4699
## CompNiteID
## sw2005_s104_519: 1
## sw2005_s125_503: 1
## sw2005_s126_514: 1
## sw2005_s137_514: 1
## sw2005_s153_503: 1
## sw2005_s174_513: 1
## (Other) :4699
Recode some variables (verb type, lemma).
d$VerbType = as.factor(ifelse(d$Verb %in% c("believe","believed","thinking","thinks","thought","think"),"control",ifelse(d$Verb %in% c("saw","see","seeing","seemed","seems","seen","sees","sense"),"sense","factive")))
d$VerbLemma = as.factor(ifelse(d$Verb %in% c("believe","believed"),"believe",ifelse(d$Verb %in% c("thinking","thinks","thought","think"),"think",ifelse(d$Verb %in% c("saw","see","seeing"),"see",ifelse(d$Verb %in% c("know","knows","knew","known"),"know",ifelse(d$Verb %in% c("realize","realized"),"realize",ifelse(d$Verb %in% c("notice","noticed"),"notice","other")))))))
Output overview of cases.
table(d$Verb,d$VerbType)
##
## control factive sense
## admit 0 17 0
## admits 0 1 0
## believe 129 0 0
## believed 2 0 0
## discover 0 3 0
## discovered 0 4 0
## foresaw 0 1 0
## forget 0 1 0
## forgotten 0 1 0
## knew 0 34 0
## know 0 516 0
## knowing 0 7 0
## known 0 1 0
## knows 0 10 0
## notice 0 9 0
## noticed 0 17 0
## realize 0 36 0
## realized 0 13 0
## realizes 0 1 0
## realizing 0 4 0
## recognize 0 6 0
## remember 0 41 0
## saw 0 0 2
## see 0 0 35
## seeing 0 0 1
## seem 0 4 0
## seemed 0 0 5
## seems 0 0 34
## seen 0 0 3
## sees 0 0 1
## sense 0 0 2
## think 3462 0 0
## thinking 34 0 0
## thinks 17 0 0
## thought 251 0 0
table(d$VerbLemma,d$Environment)
##
## conditional negation question
## believe 98 7 22 4
## know 476 19 55 11
## notice 22 0 1 3
## other 130 3 2 1
## realize 30 4 14 1
## see 29 4 4 1
## think 3161 76 362 165
Restrict to those cases where subject is pronominal or “people”.
d = droplevels(d[d$Subject %in% c("people","she","he","it","we","they","you","i"),])
d$SubjectPerson = as.factor(ifelse(d$Subject %in% c("i","we"), "1st",ifelse(d$Subject == "you","2nd",ifelse(d$Subject %in% c("it","he","she","people","they"),"3rd","other"))))
table(d$Verb,d$SubjectPerson,d$Environment)
## , , =
##
##
## 1st 2nd 3rd
## admit 9 0 0
## believe 81 1 2
## believed 1 0 0
## discover 1 0 0
## discovered 3 0 1
## foresaw 0 0 0
## forgotten 1 0 0
## knew 23 3 6
## know 376 21 23
## known 0 0 1
## knows 0 0 6
## notice 7 0 0
## noticed 14 0 1
## realize 9 3 2
## realized 8 0 2
## realizing 1 0 0
## recognize 2 0 1
## remember 38 1 0
## saw 1 0 0
## see 14 4 2
## seem 0 0 2
## seemed 0 0 4
## seems 0 0 32
## seen 2 0 0
## sees 0 0 1
## sense 1 1 0
## think 2758 38 26
## thinking 15 0 2
## thinks 1 1 7
## thought 204 3 30
##
## , , = conditional
##
##
## 1st 2nd 3rd
## admit 0 0 0
## believe 5 0 0
## believed 0 1 0
## discover 0 1 0
## discovered 0 0 0
## foresaw 0 1 0
## forgotten 0 0 0
## knew 0 0 0
## know 11 2 2
## known 0 0 0
## knows 0 0 1
## notice 0 0 0
## noticed 0 0 0
## realize 0 2 1
## realized 1 0 0
## realizing 0 0 0
## recognize 0 0 0
## remember 0 0 0
## saw 0 1 0
## see 2 0 1
## seem 0 0 0
## seemed 0 0 0
## seems 0 0 0
## seen 0 0 0
## sees 0 0 0
## sense 0 0 0
## think 66 3 2
## thinking 0 0 0
## thinks 0 0 0
## thought 3 0 1
##
## , , = negation
##
##
## 1st 2nd 3rd
## admit 0 0 0
## believe 20 0 2
## believed 0 0 0
## discover 0 0 0
## discovered 0 0 0
## foresaw 0 0 0
## forgotten 0 0 0
## knew 0 0 0
## know 48 1 3
## known 0 0 0
## knows 0 0 0
## notice 0 1 0
## noticed 0 0 0
## realize 12 1 0
## realized 0 0 0
## realizing 0 0 0
## recognize 0 0 0
## remember 0 0 0
## saw 0 0 0
## see 4 0 0
## seem 0 0 2
## seemed 0 0 0
## seems 0 0 0
## seen 0 0 0
## sees 0 0 0
## sense 0 0 0
## think 348 6 5
## thinking 0 0 0
## thinks 0 0 0
## thought 0 0 0
##
## , , = question
##
##
## 1st 2nd 3rd
## admit 0 0 0
## believe 0 4 0
## believed 0 0 0
## discover 0 0 0
## discovered 0 0 0
## foresaw 0 0 0
## forgotten 0 0 0
## knew 0 0 0
## know 0 9 1
## known 0 0 0
## knows 0 0 0
## notice 0 1 0
## noticed 0 2 0
## realize 0 0 0
## realized 0 0 1
## realizing 0 0 0
## recognize 0 0 0
## remember 0 0 0
## saw 0 0 0
## see 0 1 0
## seem 0 0 0
## seemed 0 0 0
## seems 0 0 0
## seen 0 0 0
## sees 0 0 0
## sense 0 0 0
## think 11 146 2
## thinking 0 1 0
## thinks 0 0 0
## thought 1 2 0
Select verbs
see = droplevels(d[d$VerbLemma == "see",])
believe = droplevels(d[d$VerbLemma == "believe",])
think = droplevels(d[d$VerbLemma == "think",])
know = droplevels(d[d$VerbLemma == "know",])
realize = droplevels(d[d$VerbLemma == "realize",])
sampledsee = see %>%
filter(Environment != "question") %>%
filter(nchar(as.character(Complement)) > 12) %>%
group_by(Environment) %>%
sample_n(.,2)
sampledsee = as.data.frame(sampledsee)
sampledsee
## Item_ID Verb
## 1 27221:69 see
## 2 48236:7 saw
## 3 43219:23 see
## 4 96908:52 see
## 5 107158:12 see
## 6 151319:13 see
## Complement
## 1 that it's a lot better than what we have --n406ef5 now
## 2 that, it didn't stop crime in that state
## 3 it's a problem in the yard
## 4 that they should stop her
## 5 that it would work for probably the majority of people
## 6 that that's a very, very valid, uh, thing for a company to say --n4006d7
## Sentence
## 1 as much as i didn't like school when i was going through it --n406ea6, from my perspective now i can see that it's a lot better than what we have --n406ef5 now.
## 2 and i saw that, it didn't stop crime in that state.
## 3 so i'll remember that if we see it's a problem in the yard.
## 4 and, you know, if she wanted --n40618c to go to combat, i do not see that they should stop her.
## 5 but i can't see that it would work for probably the majority of people.
## 6 i really don't see that that's a very, very valid, uh, thing for a company to say --n4006d7,
## Environment Subject VerbNiteID CompNiteID VerbType VerbLemma
## 1 i sw2362_s211_23 sw2362_s211_525 sense see
## 2 i sw2540_s7_3 sw2540_s7_503 sense see
## 3 conditional we sw2492_s103_8 sw2492_s103_509 sense see
## 4 conditional i sw3096_s180_18 sw3096_s180_518 sense see
## 5 negation i sw3200_s87_5 sw3200_s87_504 sense see
## 6 negation i sw4049_s5_5 sw4049_s5_505 sense see
## SubjectPerson
## 1 1st
## 2 1st
## 3 1st
## 4 1st
## 5 1st
## 6 1st
nrow(sampledsee)
## [1] 6
sampledthink = think %>%
filter(nchar(as.character(Complement)) > 12) %>%
group_by(Environment) %>%
sample_n(.,2)
sampledthink = as.data.frame(sampledthink)
sampledthink
## Item_ID Verb
## 1 68457:69 think
## 2 13690:15 think
## 3 68161:44 think
## 4 130429:93 think
## 5 170495:12 think
## 6 132556:12 think
## 7 57888:9 think
## 8 37539:14 think
## Complement
## 1 --n407ddf is good because there's a lot of people that --n407e0e are very
## 2 we have probably conversed long enough
## 3 that's where they're, uh, they're coming from --n40086a
## 4 i'd become a vegetarian
## 5 our part of the country is particularly bad compared to some
## 6 it is anymore
## 7 that maybe that's why we had it this time --n401423
## 8 the royals are going --n404c0a to do --n404c19
## Sentence
## 1 so, when we're invited --n407d8d to people's house --n407da4, he will not smoke in their house. which i think --n407ddf is good because there's a lot of people that --n407e0e are very,
## 2 oh, dana, i think we have probably conversed long enough.
## 3 well, uh, if you, you know, compare the figures, i think that's where they're, uh, they're coming from --n40086a.
## 4 i've always said that if i, if i had to kill and clean and do my own, my own meat, i think i'd become a vegetarian.
## 5 and i don't think our part of the country is particularly bad compared to some.
## 6 but i don't think it is anymore,
## 7 don't you think that maybe that's why we had it this time --n401423.
## 8 well how do you think the royals are going --n404c0a to do --n404c19.
## Environment Subject VerbNiteID CompNiteID VerbType VerbLemma
## 1 i sw2734_s214_24 sw2734_s214_523 control think
## 2 i sw2139_s188_6 sw2139_s188_505 control think
## 3 conditional i sw2734_s14_16 sw2734_s14_514 control think
## 4 conditional i sw3509_s140_33 sw3509_s140_530 control think
## 5 negation i sw4633_s124_5 sw4633_s124_504 control think
## 6 negation i sw3541_s29_5 sw3541_s29_504 control think
## 7 question you sw2617_s24_4 sw2617_s24_503 control think
## 8 question you sw2460_s98_5 sw2460_s98_506 control think
## SubjectPerson
## 1 1st
## 2 1st
## 3 1st
## 4 1st
## 5 1st
## 6 1st
## 7 2nd
## 8 2nd
nrow(sampledthink)
## [1] 8
sampledbelieve = believe %>%
filter(nchar(as.character(Complement)) > 12) %>%
group_by(Environment) %>%
sample_n(.,2)
sampledbelieve = as.data.frame(sampledbelieve)
sampledbelieve
## Item_ID Verb
## 1 48405:31 believe
## 2 68509:10 believe
## 3 52611:113 believe
## 4 157530:93 believed
## 5 120299:10 believe
## 6 15981:10 believe
## 7 44862:17 believe
## 8 34213:7 believe
## Complement
## 1 that people should be allowed --n404b5d to carry guns in their vehicles
## 2 they're better
## 3 for murder, uh, rape, i even believe --n404059 incest, things, that --n404074 will permanently damage, uh, the character of the child
## 4 that that was really the proper response
## 5 it could be better
## 6 i was so brazen before
## 7 only fifty percent of the people actually vote
## 8 there ought --n405c0e to be legislation guiding the, uh, buyer and the seller
## Sentence
## 1 i don't, i don't believe that people should be allowed --n404b5d to carry guns in their vehicles.
## 2 well, i believe they're better.
## 3 if i don't think see, if i don't believe that there's not a character change and the authorities agree, that this person needs --n404007 to be excused --n40401a, i believe for murder, uh, rape, i even believe --n404059 incest, things, that --n404074 will permanently damage, uh, the character of the child.
## 4 you, you would hope that if you were in that situation that you'd have the moral fortitude, uh, to hold out if you believed that that was really the proper response.
## 5 i don't believe it could be better
## 6 i can't believe i was so brazen before.
## 7 well, uh, do you believe only fifty percent of the people actually vote.
## 8 do you believe there ought --n405c0e to be legislation guiding the, uh, buyer and the seller.
## Environment Subject VerbNiteID CompNiteID VerbType VerbLemma
## 1 i sw2540_s116_11 sw2540_s116_511 control believe
## 2 i sw2743_s12_4 sw2743_s12_504 control believe
## 3 conditional i sw2571_s95_39 sw2571_s95_537 control believe
## 4 conditional you sw4148_s56_32 sw4148_s56_532 control believe
## 5 negation i sw3345_s120_4 sw3345_s120_504 control believe
## 6 negation i sw2184_s110_4 sw2184_s110_504 control believe
## 7 question you sw2504_s3_7 sw2504_s3_505 control believe
## 8 question you sw2434_s146_3 sw2434_s146_503 control believe
## SubjectPerson
## 1 1st
## 2 1st
## 3 1st
## 4 2nd
## 5 1st
## 6 1st
## 7 2nd
## 8 2nd
nrow(sampledbelieve)
## [1] 8
sampledknow = know %>%
filter(nchar(as.character(Complement)) > 12) %>%
group_by(Environment) %>%
sample_n(.,4)
sampledknow = as.data.frame(sampledknow)
sampledknow
## Item_ID Verb
## 1 146931:7 know
## 2 35747:109 knew
## 3 176936:7 know
## 4 5335:10 know
## 5 2746:43 know
## 6 93863:97 know
## 7 2473:37 know
## 8 101970:46 know
## 9 24644:10 know
## 10 62374:27 know
## 11 45105:17 know
## 12 103965:203 know
## 13 2524:47 know
## 14 12548:9 know
## 15 755:168 know
## 16 160340:7 know
## Complement
## 1 that we very often wore heels despite the fact that, it was tiring
## 2 i was going --n40725a to be able --n40726d to get it for --n407284 in arlington
## 3 that there are a lot of foreigners, uh, here, you know, doing my line of work
## 4 even if you watched a b c, n b c or the other
## 5 --n408b67 are going --n408b76 to be dressed down
## 6 that it's exercise
## 7 that they're going --n40163e to be there
## 8 we're not going --n403ea2 to eat
## 9 that i don't use drugs
## 10 that, um, if you step back from the current issue and look at it more intellectually, there are forever over, as long as we know there are races of people that --n4049ef are dropping out
## 11 that, uh, a lot of people vote in primaries for that very reason
## 12 that anybody would feel good, you know, like if you let someone like that loose in your community
## 13 you're not supposed --n402c93 to do that
## 14 john sununu is, uh, half arab
## 15 it was going --n403324 to be good
## 16 that, like, something like fifty percent of the world's landfills is like paper. filled --n403569 with paper
## Sentence
## 1 and i know that we very often wore heels despite the fact that, it was tiring.
## 2 but i, i guess i got a pretty good deal, because i went back to the town north mazda right off central and offered them the same price as what i knew i was going --n40725a to be able --n40726d to get it for --n407284 in arlington
## 3 and i know that there are a lot of foreigners, uh, here, you know, doing my line of work.
## 4 well, i know even if you watched a b c, n b c or the other,
## 5 if when they're meeting with the engineers who they know --n408b67 are going --n408b76 to be dressed down --n408b8d, if they come in, in, you know, a six hundred dollar three piece suit, it's going --n408bec to make the people they're meeting with --n408c17 feel very uncomfortable, you know,
## 6 i think that's the basic point of it, is i'm not, i, i don't enjoy it if i know that it's exercise,
## 7 and, and, you know, if i know that they're going --n40163e to be there, you know, you, i try --n401671 to really watch it and like you say, you know, really dress up
## 8 so, or i give it to my mom and dad if i know we're not going --n403ea2 to eat,
## 9 they don't know that i don't use drugs.
## 10 i, i, don't know that, um, if you step back from the current issue and look at it more intellectually, there are forever over, as long as we know there are races of people that --n4049ef are dropping out.
## 11 uh, and i don't know that, uh, a lot of people vote in primaries for that very reason.
## 12 but i, i really think that in, in terms of like this, i'd, i think that it, it might not be such a bad thing. because, i don't know that anybody, w-, in-, i don't know that anybody would feel good, you know, like if you let someone like that loose in your community.
## 13 he didn't at, at least say to us, did you know you're not supposed --n402c93 to do that.
## 14 and did you know john sununu is, uh, half arab.
## 15 the, the difficulty with, with dancing with wolves is that when you make a movie like that --n403297, and you produce it --n4032b2, and then you star in it --n4032d5, uh, the question is, did he, did he really know it was going --n403324 to be good
## 16 did you know that, like, something like fifty percent of the world's landfills is like paper. filled --n403569 with paper.
## Environment Subject VerbNiteID CompNiteID VerbType VerbLemma
## 1 i sw3883_s29_3 sw3883_s29_503 factive know
## 2 i sw2439_s220_38 sw2439_s220_536 factive know
## 3 i sw4902_s18_3 sw4902_s18_503 factive know
## 4 i sw2060_s20_4 sw2060_s20_504 factive know
## 5 conditional they sw2027_s215_13 sw2027_s215_519 factive know
## 6 conditional i sw3080_s29_33 sw3080_s29_535 factive know
## 7 conditional i sw2027_s30_13 sw2027_s30_513 factive know
## 8 conditional i sw3138_s148_17 sw3138_s148_514 factive know
## 9 negation they sw2314_s102_4 sw2314_s102_504 factive know
## 10 negation i sw2657_s126_10 sw2657_s126_509 factive know
## 11 negation i sw2504_s150_7 sw2504_s150_505 factive know
## 12 negation i sw3150_s177_69 sw3150_s177_567 factive know
## 13 question you sw2027_s66_17 sw2027_s66_515 factive know
## 14 question you sw2130_s87_4 sw2130_s87_503 factive know
## 15 question he sw2010_s100_57 sw2010_s100_556 factive know
## 16 question you sw4175_s114_3 sw4175_s114_503 factive know
## SubjectPerson
## 1 1st
## 2 1st
## 3 1st
## 4 1st
## 5 3rd
## 6 1st
## 7 1st
## 8 1st
## 9 3rd
## 10 1st
## 11 1st
## 12 1st
## 13 2nd
## 14 2nd
## 15 3rd
## 16 2nd
nrow(sampledknow)
## [1] 16
sampledrealize = realize %>%
filter(nchar(as.character(Complement)) > 12) %>%
filter(Environment != "question") %>%
group_by(Environment) %>%
sample_n(.,4)
sampledrealize = as.data.frame(sampledrealize)
sampledrealize
## Item_ID Verb
## 1 47976:10 realize
## 2 31945:30 realized
## 3 89551:5 realize
## 4 74478:7 realize
## 5 70930:73 realized
## 6 169350:34 realize
## 7 166967:27 realize
## 8 113018:67 realize
## 9 4431:14 realize
## 10 70192:10 realize
## 11 105944:10 realize
## 12 106635:12 realize
## Complement
## 1 that it was better
## 2 that it was eucalyptus
## 3 that he would like his career to develop
## 4 it --n407bc1's important that, uh
## 5 we could never buy a house anyway no matter how much we saved --n40244c
## 6 that what they print --n40223a is stuff that you probably knew --n402259 already and the stuff that you want --n40227c they're not printing --n402293 because the average person doesn't need or want --n4022be to know that much
## 7 that you're subject to paying, uh, income tax on something that you purchase --n400a02 mail order
## 8 they're one short
## 9 that they ha-, were going --n403703 to reach out to people from, all over the country
## 10 that dallas had that same problem
## 11 it was that expensive
## 12 that i needed that
## Sentence
## 1 but now i realize that it was better because, um, they have got into a lot of trouble, because of lack of supervision.
## 2 and then one day i, i realized that it was eucalyptus.
## 3 i realize that he would like his career to develop,
## 4 and you realize it --n407bc1's important that, uh,
## 5 so, we virtually did that for about two years, which --n4023dd worked real well, and then moved from california where we realized we could never buy a house anyway no matter how much we saved --n40244c --n402453, and moved to texas, bought a house --n40247e immediately, you know, which --n40249d, of course, is now devalued --n4024c0 with the housing market,
## 6 if you were deeply involved in it, then you immediately realize that what they print --n40223a is stuff that you probably knew --n402259 already and the stuff that you want --n40227c they're not printing --n402293 because the average person doesn't need or want --n4022be to know that much.
## 7 i mean what if you don't even realize that you're subject to paying, uh, income tax on something that you purchase --n400a02 mail order.
## 8 if they'll take a group of kids to the zoo or somewhere and then come back and not even count them and realize they're one short.
## 9 and, i didn't realize that they ha-, were going --n403703 to reach out to people from, all over the country.
## 10 i didn't realize that dallas had that same problem.
## 11 i didn't realize it was that expensive.
## 12 so i didn't realize that i needed that.
## Environment Subject VerbNiteID CompNiteID VerbType VerbLemma
## 1 i sw2539_s80_4 sw2539_s80_504 factive realize
## 2 i sw2423_s145_11 sw2423_s145_510 factive realize
## 3 i sw3049_s179_2 sw3049_s179_503 factive realize
## 4 you sw2826_s230_3 sw2826_s230_503 factive realize
## 5 conditional we sw2782_s69_25 sw2782_s69_525 factive realize
## 6 conditional you sw4611_s45_12 sw4611_s45_512 factive realize
## 7 conditional you sw4372_s11_9 sw4372_s11_511 factive realize
## 8 conditional they sw3266_s133_24 sw3266_s133_521 factive realize
## 9 negation i sw2041_s116_6 sw2041_s116_504 factive realize
## 10 negation i sw2772_s118_4 sw2772_s118_504 factive realize
## 11 negation i sw3174_s89_4 sw3174_s89_504 factive realize
## 12 negation i sw3188_s32_5 sw3188_s32_504 factive realize
## SubjectPerson
## 1 1st
## 2 1st
## 3 1st
## 4 2nd
## 5 1st
## 6 2nd
## 7 2nd
## 8 3rd
## 9 1st
## 10 1st
## 11 1st
## 12 1st
nrow(sampledrealize)
## [1] 12
Combine all the sampled cases and write them to a stimulus file.
library(jsonlite)
merged = bind_rows(sampledsee,sampledthink,sampledbelieve,sampledknow,sampledrealize) %>%
mutate(NoThatComplement=gsub("^that ","",Complement,perl=T)) %>%
mutate(StrippedComplement=gsub("--n[a-zA-Z0-9]*( |\\.|,|'|$)","",NoThatComplement,perl=T),StrippedSentence = gsub("--n[a-zA-Z0-9]*( |\\.|,|'|$)","",Sentence,perl=T)) %>%
select(Item_ID,StrippedComplement,StrippedSentence,VerbType) %>%
rename(id=Item_ID,complement=StrippedComplement,sentence=StrippedSentence,stimType=VerbType)
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
merged = as.data.frame(merged)
writeOutJSON <- toJSON(merged,pretty=T)
write(writeOutJSON, "/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/factives_experiment/js/stimuli.json")
The data analyzed in the following come from a mini projection judgment experiment conducted as part of the ESSLLI 2016 course “Corpus Methods for Research in Pragmatics”.
pd = data.frame()
temp = list.files("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/results/data/projection_data")
for (i in 1:length(temp)) {
raw = fromJSON(paste("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/results/data/projection_data/",temp[i],sep=""))
td = raw$trials
subj = raw$subject_information
td$language = subj$language
td$enjoyment = subj$enjoyment
td$asses = subj$asses
td$age = subj$age
td$gender = subj$gender
td$education = subj$education
td$comments = subj$comments
td$participant = i
pd = bind_rows(pd,td)
}
pd = as.data.frame(pd)
pd[sapply(pd, is.character)] <- lapply(pd[sapply(pd, is.character)],as.factor)
pd$participant = as.factor(as.character(pd$participant))
Because we didn’t record all information from the original database in the experiment, we need to merge that information back in by matching the TGrep2 item IDs.
row.names(d) = d$Item_ID
pd$subject = d[as.character(pd$id),]$Subject
pd$verb = d[as.character(pd$id),]$Verb
pd$environment = d[as.character(pd$id),]$Environment
pd$environment = as.character(pd$environment)
pd[pd$environment == "",]$environment = "main"
pd$environment = as.factor(as.character(pd$environment))
pd$subjectperson = d[as.character(pd$id),]$SubjectPerson
summary(pd)
## trial_type response id
## projection:375 Min. :0.0000 96908:52 : 12
## 1st Qu.:0.4000 152345:18: 11
## Median :0.7900 82397:7 : 11
## Mean :0.6485 100957:13: 10
## 3rd Qu.:0.9700 158110:10: 10
## Max. :1.0000 165765:12: 10
## (Other) :311
## sentence
## and, you know, if she wanted to go to combat, i do not see that they should stop her. : 12
## do you know my grandparents live in durant. : 11
## so what else do you think is important. : 11
## and i didn't realize that they were putting dual, uh, air bags in that car now. : 10
## and she said how did you know those are the colors we used : 10
## and they've also see that there's, there's a different way of life and those families are really close.: 10
## (Other) :311
## complement
## they should stop her : 12
## is important : 11
## my grandparents live in durant : 11
## , you know, everybody is sitting here screaming about, we don't want a state income tax: 10
## are going to be dressed down : 10
## it's exercise : 10
## (Other) :311
## stim_type language enjoyment asses age
## control:112 : 25 -1:125 Confused: 25 27 : 75
## factive:210 Dutch : 50 1 :125 No : 25 30 : 50
## sense : 53 English:125 2 :125 Yes :325 31 : 50
## French : 25 32 : 50
## German :100 : 25
## Russian: 25 23 : 25
## Slovene: 25 (Other):100
## gender education comments participant
## Female:250 -1: 50 :350 1 : 25
## Male :125 3 :100 This was so much fun: 25 10 : 25
## 4 :225 11 : 25
## 12 : 25
## 13 : 25
## 14 : 25
## (Other):225
## subject verb environment subjectperson
## i :224 know :122 conditional:110 1st:230
## you : 87 realize : 69 main :106 2nd: 87
## they : 41 think : 48 negation : 95 3rd: 58
## he : 9 see : 46 question : 64
## people : 8 believe : 42
## we : 6 realized: 19
## (Other): 0 (Other) : 29
nrow(pd)
## [1] 375
Get unique participant information.
pinfo = unique(pd[,c("language","enjoyment","asses","age","gender","comments")])
nrow(pinfo)
## [1] 15
Is everyone a native speaker of English?
table(pinfo$language)
##
## Dutch English French German Russian Slovene
## 1 2 5 1 4 1 1
Did people love or hate the experiment?
table(pinfo$enjoyment)
##
## -1 1 2
## 5 5 5
Did people understand the experiment?
table(pinfo$asses)
##
## Confused No Yes
## 1 1 13
How old are our participants?
ggplot(pd, aes(x=as.numeric(as.character(age)))) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 25 rows containing non-finite values (stat_bin).
What is the gender of our participants?
table(pinfo$gender)
##
## Female Male
## 10 5
What did participants have to say about the experiment?
pinfo$comments
## [1]
## [4]
## [7]
## [10] This was so much fun
## [13]
## Levels: This was so much fun
Plot the overall distribution of responses.
ggplot(pd, aes(x=response)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(pd, aes(x=response,fill=gender)) +
geom_histogram(position="dodge")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This histogram includes cognitive factives (know, realize), sense factives (see), and non-factivs (believe, think). Let’s group histograms by these three groups.
ggplot(pd, aes(x=response)) +
geom_histogram() +
facet_wrap(~stim_type)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Do the distributions vary by whether the subject is 1st, 2nd, or 3rd person?
ggplot(pd, aes(x=response,fill=subjectperson)) +
geom_histogram(position="dodge",binwidth=.1) +
facet_wrap(~stim_type)
Do the distributions vary by environment? Let’s re-order the environment factor levels so “main clause” is in the top row.
pd$env = factor(x=pd$environment, levels=c("main","negation","conditional","question"))
ggplot(pd, aes(x=response)) +
geom_histogram(position="dodge") +
facet_grid(environment~stim_type)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(pd, aes(x=response)) +
geom_histogram(position="dodge") +
facet_grid(env~stim_type)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
How much variability or agreement is there for each item?
cog = droplevels(pd[pd$stim_type == "factive",])
nrow(cog)
## [1] 210
ggplot(cog, aes(x=response,fill=verb)) +
geom_histogram() +
facet_wrap(~id)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Examples of sentences that are judged very differently by different participants:
unique(pd[pd$id == "93863:97",]$sentence)
## [1] i think that's the basic point of it, is i'm not, i, i don't enjoy it if i know that it's exercise,
## 50 Levels: and finally they realized that they were, they were abusing them and weren't going to get out of the hole ...
unique(pd[pd$id == "755:168",]$sentence)
## [1] the, the difficulty with, with dancing with wolves is that when you make a movie like that and you produce it and then you star in it uh, the question is, did he, did he really know it was going to be good
## 50 Levels: and finally they realized that they were, they were abusing them and weren't going to get out of the hole ...
Do the distributions vary both by environment and by person?
ggplot(pd, aes(x=response,fill=subjectperson)) +
geom_histogram(position="dodge") +
facet_grid(env~stim_type)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This visualization is getting unwieldy. There are too many facets with too many variables being plotted, all the while the numbers of observations in each cell are unbalanced. Let’s try to compress this information into a simpler form.
agr = pd %>%
group_by(environment,stim_type) %>%
summarise(mean_projection=mean(response))
ggplot(agr, aes(x=environment,y=mean_projection)) +
geom_bar(stat="identity",position="dodge",color="black",fill="gray60") +
facet_wrap(~stim_type)
This is easier to read, but we lost a lot of information about variability in each condition. Let’s instead plot condition means with bootstrapped 95% confidence intervals.
agr = pd %>%
group_by(environment,stim_type) %>%
summarise(mean_speakercommitment=mean(response),ymin=mean(response)-ci.low(response),ymax=mean(response)+ci.high(response))
ggplot(agr, aes(x=environment,y=mean_speakercommitment)) +
geom_bar(stat="identity",position="dodge",color="black",fill="gray60") +
geom_errorbar(aes(ymin=ymin,ymax=ymax),width=.25) +
facet_wrap(~stim_type)
Let’s add the subject information back in.
agr = pd %>%
group_by(environment,subjectperson,stim_type) %>%
summarise(mean_projection=mean(response),ymin=mean(response)-ci.low(response),ymax=mean(response)+ci.high(response))
dodge = position_dodge(.9)
ggplot(agr, aes(x=environment,y=mean_projection,fill=subjectperson)) +
geom_bar(stat="identity",position=dodge,color="black") +
geom_errorbar(aes(ymin=ymin,ymax=ymax),width=.25,position=dodge) +
facet_wrap(~stim_type)
This looks ugly. Let’s try plotting points instead and make the x axis labels more readable.
agr = pd %>%
group_by(environment,subjectperson,stim_type) %>%
summarise(mean_projection=mean(response),ymin=mean(response)-ci.low(response),ymax=mean(response)+ci.high(response))
ggplot(agr, aes(x=environment,y=mean_projection,color=subjectperson)) +
geom_point() +
geom_errorbar(aes(ymin=ymin,ymax=ymax),width=.25) +
facet_wrap(~stim_type) +
theme(axis.text.x = element_text(angle=45,vjust=1,hjust=1))
Does complement length matter?
pd$complement_length = nchar(as.character(pd$complement))
ggplot(pd, aes(x=complement_length,y=response)) +
geom_point() +
geom_smooth(method="lm")
Does complement length matter differently for different verbs?
ggplot(pd, aes(x=complement_length,y=response,color=stim_type)) +
geom_point() +
geom_smooth(method="lm")
We want to know for a variety of factors, whether they predict similarity ratings. Because the slider scale is continuous, we use linear models to address this question. Because there is likely to be random variability by participants and by items, we use mixed effects. Keep in mind that we don’t have nearly enough data to run this analysis properly, and for the sake of the example we are only investigating the effect of two factors (environment and subject) instead of the full slew of factors that we have reason to believe might matter (including, e.g., intonation, frequency of the verb, whether or not the complementizer was omitted, etc).
First we only look at the factives. We set the contrasts of the environment predictor so “main clause” is the reference level that all other environments are compared to. For the person predictor, “1st person” is the default reference level, and we’ll leave it at that. We want to include random effects both by participants and by items. In this case, because every item necessaril¥ occurs with one environment and with one subject, we can’t include by-item random slopes for environment and subject, so we only include random by-item intercepts. In contrast, each participant did see multiple environments and subjects, so we can include by-participant random slopes for both predictors in addition to random intercepts. Play with this and you will see, however, that the model does not converge if you include both random slopes.
factives = droplevels(pd[pd$stim_type == "factive",])
contrasts(factives$environment) = cbind("conditional.v.main"=c(1,0,0,0),"negation.v.main"=c(0,0,1,0),"question.v.main"=c(0,0,0,1))
m = lmer(response ~ environment + subjectperson + (1|id) + (1+environment|participant), data=factives)
The model summary tells us: the projection mean at the reference level of both predictors (i.e. main clauses with 1st person subjects) is .91. Compared to that baseline, ratings are lower (i.e. there is less projection) in conditionals. No other fixed effects have t values with absolute values that are big enough to suggest the presence of an effect.
However, a glance at the correlation matrix tells us that there is a lot of colinearity between fixed effects (values > .4). This points to the effects being unreliable (remember, we have very little data!)
summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## response ~ environment + subjectperson + (1 | id) + (1 + environment |
## participant)
## Data: factives
##
## REML criterion at convergence: -33.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.10814 -0.37398 0.07102 0.46483 2.55281
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 0.0400578 0.20014
## participant (Intercept) 0.0050241 0.07088
## environmentconditional.v.main 0.0127774 0.11304 -0.60
## environmentnegation.v.main 0.0002876 0.01696 -0.77 0.97
## environmentquestion.v.main 0.0055723 0.07465 0.75 0.08
## Residual 0.0305469 0.17478
##
##
##
##
##
## -0.14
##
## Number of obs: 210, groups: id, 28; participant, 15
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.90983 0.09363 9.717
## environmentconditional.v.main -0.33012 0.11089 -2.977
## environmentnegation.v.main -0.10177 0.11861 -0.858
## environmentquestion.v.main -0.13706 0.15253 -0.899
## subjectperson2nd 0.06620 0.13470 0.491
## subjectperson3rd -0.15920 0.11635 -1.368
##
## Correlation of Fixed Effects:
## (Intr) envrnmntc.. envrnmntn.. envrnmntq.. sbjct2
## envrnmntc.. -0.582
## envrnmntn.. -0.765 0.445
## envrnmntq.. -0.213 0.392 0.181
## sbjctprsn2n -0.386 -0.095 0.305 -0.509
## sbjctprsn3r -0.546 0.060 0.432 -0.157 0.452