Step 1: select stimuli for experiment

Set your path here and load some packages that are necessary for visualization (ggplot), tidy data handling (tidyr and dplyr), and mixed effects modeling (lme4:

setwd("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/code_sheets/")
source("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/results/rscripts/helpers.R")
library(ggplot2)
library(tidyr)
library(dplyr)
library(lme4)
theme_set(theme_bw())

Read the database into R and explore it.

d = read.table("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/factives_corpus_search/results/swbdext.tab",sep="\t",header=T,quote="")
nrow(d)
## [1] 4705
head(d)
##   Item_ID    Verb
## 1    89:5   think
## 2   118:5   think
## 3  178:59 thought
## 4   214:7   think
## 5  215:37   think
## 6  236:41   think
##                                                                                                                                                         Complement
## 1                                                  when she finally came to the realization that, you know, no, i can not, i can not take care of myself --n40242b
## 2                                                                                                                          that they, they had a great deal of, um
## 3                                                                                                                               --n404078 should be done --n40408b
## 4                                                                                          that when she passed away --n404c45 it was probably one of the greatest
## 5                                                                                                                                                      it would be
## 6 that what one thing that they were concerned --n405458 probably was the fact it wasn't necessarily, you know, like the quantity of care but the quality of, care
##                                                                                                                                                                                   Sentence
## 1                                                                 i think when she finally came to the realization that, you know, no, i can not, i can not take care of myself --n40242b.
## 2                                                                                                                                         i think that they, they had a great deal of, um,
## 3                                                                     and, uh, fortunately, we agreed, you know, on exactly, you know, what we thought --n404078 should be done --n40408b.
## 4                                                                                                     and i think that when she passed away --n404c45 it was probably one of the greatest,
## 5                                                                                                                                                           um, i, i, i think it would be,
## 6 and, um, i, i, i think that what one thing that they were concerned --n405458 probably was the fact it wasn't necessarily, you know, like the quantity of care but the quality of, care.
##   Environment Subject     VerbNiteID      CompNiteID
## 1                   i   sw2005_s54_2  sw2005_s54_503
## 2                   i   sw2005_s71_2  sw2005_s71_503
## 3                  we sw2005_s104_21 sw2005_s104_519
## 4                   i  sw2005_s125_3 sw2005_s125_503
## 5                   i sw2005_s126_14 sw2005_s126_514
## 6                   i sw2005_s137_16 sw2005_s137_514
str(d)
## 'data.frame':    4705 obs. of  8 variables:
##  $ Item_ID    : Factor w/ 4705 levels "100037:5","100041:5",..: 4324 506 2369 2472 2476 2531 2630 2742 2770 2861 ...
##  $ Verb       : Factor w/ 35 levels "admit","admits",..: 32 32 35 32 32 32 32 32 32 32 ...
##  $ Complement : Factor w/ 4496 levels "","--n10077 will be doing well this year",..: 4328 2736 317 2882 1496 2875 2241 2485 3036 4471 ...
##  $ Sentence   : Factor w/ 4516 levels "--ing seems --n40001a to be a, a topic that --n40003d's going --n40004c to probably take about a generation --n40006b --n400072"| __truncated__,..: 2922 2532 762 312 4053 863 2477 3204 1367 2952 ...
##  $ Environment: Factor w/ 4 levels "","conditional",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Subject    : Factor w/ 132 levels "","--n10f7c",..: 83 83 128 83 83 83 83 83 83 83 ...
##  $ VerbNiteID : Factor w/ 4705 levels "sw2005_s104_21",..: 9 10 1 2 3 4 5 6 7 8 ...
##  $ CompNiteID : Factor w/ 4705 levels "sw2005_s104_519",..: 9 10 1 2 3 4 5 6 7 8 ...
summary(d)
##       Item_ID           Verb             Complement  
##  100037:5 :   1   think   :3462   it is       :  19  
##  100041:5 :   1   know    : 516   it's        :  18  
##  100166:10:   1   thought : 251   that's true :  12  
##  100181:10:   1   believe : 129   you're right:  11  
##  10019:18 :   1   remember:  41   that's right:  10  
##  10023:9  :   1   realize :  36   it          :   9  
##  (Other)  :4699   (Other) : 270   (Other)     :4626  
##                                                                                                                                                                                                                                                                        Sentence   
##  i think it is.                                                                                                                                                                                                                                                            :   9  
##  i think that's true.                                                                                                                                                                                                                                                      :   6  
##  i think you're right.                                                                                                                                                                                                                                                     :   6  
##  i think that's right.                                                                                                                                                                                                                                                     :   5  
##  but i, i really think that in, in terms of like this, i'd, i think that it, it might not be such a bad thing. because, i don't know that anybody, w-, in-, i don't know that anybody would feel good, you know, like if you let someone like that loose in your community.:   4  
##  i know in jamaica, uh, it think it's jamaica, i think it's jamaica, i know that they have, you know, crimes punishable by death,                                                                                                                                          :   4  
##  (Other)                                                                                                                                                                                                                                                                   :4671  
##       Environment      Subject              VerbNiteID  
##             :3946   i      :4048   sw2005_s104_21:   1  
##  conditional: 113   you    : 262   sw2005_s125_3 :   1  
##  negation   : 460   they   :  88   sw2005_s126_14:   1  
##  question   : 186   we     :  54   sw2005_s137_16:   1  
##                     it     :  40   sw2005_s153_2 :   1  
##                            :  32   sw2005_s174_12:   1  
##                     (Other): 181   (Other)       :4699  
##            CompNiteID  
##  sw2005_s104_519:   1  
##  sw2005_s125_503:   1  
##  sw2005_s126_514:   1  
##  sw2005_s137_514:   1  
##  sw2005_s153_503:   1  
##  sw2005_s174_513:   1  
##  (Other)        :4699

Recode some variables (verb type, lemma).

d$VerbType = as.factor(ifelse(d$Verb %in% c("believe","believed","thinking","thinks","thought","think"),"control",ifelse(d$Verb %in% c("saw","see","seeing","seemed","seems","seen","sees","sense"),"sense","factive")))
d$VerbLemma = as.factor(ifelse(d$Verb %in% c("believe","believed"),"believe",ifelse(d$Verb %in% c("thinking","thinks","thought","think"),"think",ifelse(d$Verb %in% c("saw","see","seeing"),"see",ifelse(d$Verb %in% c("know","knows","knew","known"),"know",ifelse(d$Verb %in% c("realize","realized"),"realize",ifelse(d$Verb %in% c("notice","noticed"),"notice","other")))))))

Output overview of cases.

table(d$Verb,d$VerbType)
##             
##              control factive sense
##   admit            0      17     0
##   admits           0       1     0
##   believe        129       0     0
##   believed         2       0     0
##   discover         0       3     0
##   discovered       0       4     0
##   foresaw          0       1     0
##   forget           0       1     0
##   forgotten        0       1     0
##   knew             0      34     0
##   know             0     516     0
##   knowing          0       7     0
##   known            0       1     0
##   knows            0      10     0
##   notice           0       9     0
##   noticed          0      17     0
##   realize          0      36     0
##   realized         0      13     0
##   realizes         0       1     0
##   realizing        0       4     0
##   recognize        0       6     0
##   remember         0      41     0
##   saw              0       0     2
##   see              0       0    35
##   seeing           0       0     1
##   seem             0       4     0
##   seemed           0       0     5
##   seems            0       0    34
##   seen             0       0     3
##   sees             0       0     1
##   sense            0       0     2
##   think         3462       0     0
##   thinking        34       0     0
##   thinks          17       0     0
##   thought        251       0     0
table(d$VerbLemma,d$Environment)
##          
##                conditional negation question
##   believe   98           7       22        4
##   know     476          19       55       11
##   notice    22           0        1        3
##   other    130           3        2        1
##   realize   30           4       14        1
##   see       29           4        4        1
##   think   3161          76      362      165

Restrict to those cases where subject is pronominal or “people”.

d = droplevels(d[d$Subject %in% c("people","she","he","it","we","they","you","i"),])
d$SubjectPerson = as.factor(ifelse(d$Subject %in% c("i","we"), "1st",ifelse(d$Subject == "you","2nd",ifelse(d$Subject %in% c("it","he","she","people","they"),"3rd","other"))))

table(d$Verb,d$SubjectPerson,d$Environment)
## , ,  = 
## 
##             
##               1st  2nd  3rd
##   admit         9    0    0
##   believe      81    1    2
##   believed      1    0    0
##   discover      1    0    0
##   discovered    3    0    1
##   foresaw       0    0    0
##   forgotten     1    0    0
##   knew         23    3    6
##   know        376   21   23
##   known         0    0    1
##   knows         0    0    6
##   notice        7    0    0
##   noticed      14    0    1
##   realize       9    3    2
##   realized      8    0    2
##   realizing     1    0    0
##   recognize     2    0    1
##   remember     38    1    0
##   saw           1    0    0
##   see          14    4    2
##   seem          0    0    2
##   seemed        0    0    4
##   seems         0    0   32
##   seen          2    0    0
##   sees          0    0    1
##   sense         1    1    0
##   think      2758   38   26
##   thinking     15    0    2
##   thinks        1    1    7
##   thought     204    3   30
## 
## , ,  = conditional
## 
##             
##               1st  2nd  3rd
##   admit         0    0    0
##   believe       5    0    0
##   believed      0    1    0
##   discover      0    1    0
##   discovered    0    0    0
##   foresaw       0    1    0
##   forgotten     0    0    0
##   knew          0    0    0
##   know         11    2    2
##   known         0    0    0
##   knows         0    0    1
##   notice        0    0    0
##   noticed       0    0    0
##   realize       0    2    1
##   realized      1    0    0
##   realizing     0    0    0
##   recognize     0    0    0
##   remember      0    0    0
##   saw           0    1    0
##   see           2    0    1
##   seem          0    0    0
##   seemed        0    0    0
##   seems         0    0    0
##   seen          0    0    0
##   sees          0    0    0
##   sense         0    0    0
##   think        66    3    2
##   thinking      0    0    0
##   thinks        0    0    0
##   thought       3    0    1
## 
## , ,  = negation
## 
##             
##               1st  2nd  3rd
##   admit         0    0    0
##   believe      20    0    2
##   believed      0    0    0
##   discover      0    0    0
##   discovered    0    0    0
##   foresaw       0    0    0
##   forgotten     0    0    0
##   knew          0    0    0
##   know         48    1    3
##   known         0    0    0
##   knows         0    0    0
##   notice        0    1    0
##   noticed       0    0    0
##   realize      12    1    0
##   realized      0    0    0
##   realizing     0    0    0
##   recognize     0    0    0
##   remember      0    0    0
##   saw           0    0    0
##   see           4    0    0
##   seem          0    0    2
##   seemed        0    0    0
##   seems         0    0    0
##   seen          0    0    0
##   sees          0    0    0
##   sense         0    0    0
##   think       348    6    5
##   thinking      0    0    0
##   thinks        0    0    0
##   thought       0    0    0
## 
## , ,  = question
## 
##             
##               1st  2nd  3rd
##   admit         0    0    0
##   believe       0    4    0
##   believed      0    0    0
##   discover      0    0    0
##   discovered    0    0    0
##   foresaw       0    0    0
##   forgotten     0    0    0
##   knew          0    0    0
##   know          0    9    1
##   known         0    0    0
##   knows         0    0    0
##   notice        0    1    0
##   noticed       0    2    0
##   realize       0    0    0
##   realized      0    0    1
##   realizing     0    0    0
##   recognize     0    0    0
##   remember      0    0    0
##   saw           0    0    0
##   see           0    1    0
##   seem          0    0    0
##   seemed        0    0    0
##   seems         0    0    0
##   seen          0    0    0
##   sees          0    0    0
##   sense         0    0    0
##   think        11  146    2
##   thinking      0    1    0
##   thinks        0    0    0
##   thought       1    2    0

Select verbs

see = droplevels(d[d$VerbLemma == "see",])
believe = droplevels(d[d$VerbLemma == "believe",])
think = droplevels(d[d$VerbLemma == "think",])
know = droplevels(d[d$VerbLemma == "know",])
realize = droplevels(d[d$VerbLemma == "realize",])

sampledsee = see %>%
  filter(Environment != "question") %>%
  filter(nchar(as.character(Complement)) > 12) %>%
  group_by(Environment) %>%
  sample_n(.,2)
sampledsee = as.data.frame(sampledsee)
sampledsee
##     Item_ID Verb
## 1  27221:69  see
## 2   48236:7  saw
## 3  43219:23  see
## 4  96908:52  see
## 5 107158:12  see
## 6 151319:13  see
##                                                                 Complement
## 1                   that it's a lot better than what we have --n406ef5 now
## 2                                 that, it didn't stop crime in that state
## 3                                               it's a problem in the yard
## 4                                                that they should stop her
## 5                   that it would work for probably the majority of people
## 6 that that's a very, very valid, uh, thing for a company to say --n4006d7
##                                                                                                                                                           Sentence
## 1 as much as i didn't like school when i was going through it --n406ea6, from my perspective now i can see that it's a lot better than what we have --n406ef5 now.
## 2                                                                                                              and i saw that, it didn't stop crime in that state.
## 3                                                                                                      so i'll remember that if we see it's a problem in the yard.
## 4                                                                  and, you know, if she wanted --n40618c to go to combat, i do not see that they should stop her.
## 5                                                                                          but i can't see that it would work for probably the majority of people.
## 6                                                                     i really don't see that that's a very, very valid, uh, thing for a company to say --n4006d7,
##   Environment Subject     VerbNiteID      CompNiteID VerbType VerbLemma
## 1                   i sw2362_s211_23 sw2362_s211_525    sense       see
## 2                   i    sw2540_s7_3   sw2540_s7_503    sense       see
## 3 conditional      we  sw2492_s103_8 sw2492_s103_509    sense       see
## 4 conditional       i sw3096_s180_18 sw3096_s180_518    sense       see
## 5    negation       i   sw3200_s87_5  sw3200_s87_504    sense       see
## 6    negation       i    sw4049_s5_5   sw4049_s5_505    sense       see
##   SubjectPerson
## 1           1st
## 2           1st
## 3           1st
## 4           1st
## 5           1st
## 6           1st
nrow(sampledsee)
## [1] 6
sampledthink = think %>%
  filter(nchar(as.character(Complement)) > 12) %>%
  group_by(Environment) %>%
  sample_n(.,2)
sampledthink = as.data.frame(sampledthink)
sampledthink
##     Item_ID  Verb
## 1  68457:69 think
## 2  13690:15 think
## 3  68161:44 think
## 4 130429:93 think
## 5 170495:12 think
## 6 132556:12 think
## 7   57888:9 think
## 8  37539:14 think
##                                                                  Complement
## 1 --n407ddf is good because there's a lot of people that --n407e0e are very
## 2                                    we have probably conversed long enough
## 3                   that's where they're, uh, they're coming from --n40086a
## 4                                                   i'd become a vegetarian
## 5              our part of the country is particularly bad compared to some
## 6                                                             it is anymore
## 7                       that maybe that's why we had it this time --n401423
## 8                            the royals are going --n404c0a to do --n404c19
##                                                                                                                                                                                   Sentence
## 1 so, when we're invited --n407d8d to people's house --n407da4, he will not smoke in their house. which i think --n407ddf is good because there's a lot of people that --n407e0e are very,
## 2                                                                                                                                oh, dana, i think we have probably conversed long enough.
## 3                                                                        well, uh, if you, you know, compare the figures, i think that's where they're, uh, they're coming from --n40086a.
## 4                                                                      i've always said that if i, if i had to kill and clean and do my own, my own meat, i think i'd become a vegetarian.
## 5                                                                                                          and i don't think our part of the country is particularly bad compared to some.
## 6                                                                                                                                                         but i don't think it is anymore,
## 7                                                                                                                     don't you think that maybe that's why we had it this time --n401423.
## 8                                                                                                                    well how do you think the royals are going --n404c0a to do --n404c19.
##   Environment Subject     VerbNiteID      CompNiteID VerbType VerbLemma
## 1                   i sw2734_s214_24 sw2734_s214_523  control     think
## 2                   i  sw2139_s188_6 sw2139_s188_505  control     think
## 3 conditional       i  sw2734_s14_16  sw2734_s14_514  control     think
## 4 conditional       i sw3509_s140_33 sw3509_s140_530  control     think
## 5    negation       i  sw4633_s124_5 sw4633_s124_504  control     think
## 6    negation       i   sw3541_s29_5  sw3541_s29_504  control     think
## 7    question     you   sw2617_s24_4  sw2617_s24_503  control     think
## 8    question     you   sw2460_s98_5  sw2460_s98_506  control     think
##   SubjectPerson
## 1           1st
## 2           1st
## 3           1st
## 4           1st
## 5           1st
## 6           1st
## 7           2nd
## 8           2nd
nrow(sampledthink)
## [1] 8
sampledbelieve = believe %>%
  filter(nchar(as.character(Complement)) > 12) %>%
  group_by(Environment) %>%
  sample_n(.,2)
sampledbelieve = as.data.frame(sampledbelieve)
sampledbelieve
##     Item_ID     Verb
## 1  48405:31  believe
## 2  68509:10  believe
## 3 52611:113  believe
## 4 157530:93 believed
## 5 120299:10  believe
## 6  15981:10  believe
## 7  44862:17  believe
## 8   34213:7  believe
##                                                                                                                              Complement
## 1                                                               that people should be allowed --n404b5d to carry guns in their vehicles
## 2                                                                                                                        they're better
## 3 for murder, uh, rape, i even believe --n404059 incest, things, that --n404074 will permanently damage, uh, the character of the child
## 4                                                                                              that that was really the proper response
## 5                                                                                                                    it could be better
## 6                                                                                                                i was so brazen before
## 7                                                                                        only fifty percent of the people actually vote
## 8                                                         there ought --n405c0e to be legislation guiding the, uh, buyer and the seller
##                                                                                                                                                                                                                                                                                                             Sentence
## 1                                                                                                                                                                                                                  i don't, i don't believe that people should be allowed --n404b5d to carry guns in their vehicles.
## 2                                                                                                                                                                                                                                                                                    well, i believe they're better.
## 3 if i don't think see, if i don't believe that there's not a character change and the authorities agree, that this person needs --n404007 to be excused --n40401a, i believe for murder, uh, rape, i even believe --n404059 incest, things, that --n404074 will permanently damage, uh, the character of the child.
## 4                                                                                                                                              you, you would hope that if you were in that situation that you'd have the moral fortitude, uh, to hold out if you believed that that was really the proper response.
## 5                                                                                                                                                                                                                                                                                 i don't believe it could be better
## 6                                                                                                                                                                                                                                                                            i can't believe i was so brazen before.
## 7                                                                                                                                                                                                                                           well, uh, do you believe only fifty percent of the people actually vote.
## 8                                                                                                                                                                                                                      do you believe there ought --n405c0e to be legislation guiding the, uh, buyer and the seller.
##   Environment Subject     VerbNiteID      CompNiteID VerbType VerbLemma
## 1                   i sw2540_s116_11 sw2540_s116_511  control   believe
## 2                   i   sw2743_s12_4  sw2743_s12_504  control   believe
## 3 conditional       i  sw2571_s95_39  sw2571_s95_537  control   believe
## 4 conditional     you  sw4148_s56_32  sw4148_s56_532  control   believe
## 5    negation       i  sw3345_s120_4 sw3345_s120_504  control   believe
## 6    negation       i  sw2184_s110_4 sw2184_s110_504  control   believe
## 7    question     you    sw2504_s3_7   sw2504_s3_505  control   believe
## 8    question     you  sw2434_s146_3 sw2434_s146_503  control   believe
##   SubjectPerson
## 1           1st
## 2           1st
## 3           1st
## 4           2nd
## 5           1st
## 6           1st
## 7           2nd
## 8           2nd
nrow(sampledbelieve)
## [1] 8
sampledknow = know %>%
  filter(nchar(as.character(Complement)) > 12) %>%
  group_by(Environment) %>%
  sample_n(.,4)
sampledknow = as.data.frame(sampledknow)
sampledknow
##       Item_ID Verb
## 1    146931:7 know
## 2   35747:109 knew
## 3    176936:7 know
## 4     5335:10 know
## 5     2746:43 know
## 6    93863:97 know
## 7     2473:37 know
## 8   101970:46 know
## 9    24644:10 know
## 10   62374:27 know
## 11   45105:17 know
## 12 103965:203 know
## 13    2524:47 know
## 14    12548:9 know
## 15    755:168 know
## 16   160340:7 know
##                                                                                                                                                                                    Complement
## 1                                                                                                                          that we very often wore heels despite the fact that, it was tiring
## 2                                                                                                             i was going --n40725a to be able --n40726d to get it for --n407284 in arlington
## 3                                                                                                               that there are a lot of foreigners, uh, here, you know, doing my line of work
## 4                                                                                                                                               even if you watched a b c, n b c or the other
## 5                                                                                                                                            --n408b67 are going --n408b76 to be dressed down
## 6                                                                                                                                                                          that it's exercise
## 7                                                                                                                                                    that they're going --n40163e to be there
## 8                                                                                                                                                            we're not going --n403ea2 to eat
## 9                                                                                                                                                                      that i don't use drugs
## 10 that, um, if you step back from the current issue and look at it more intellectually, there are forever over, as long as we know there are races of people that --n4049ef are dropping out
## 11                                                                                                                           that, uh, a lot of people vote in primaries for that very reason
## 12                                                                                          that anybody would feel good, you know, like if you let someone like that loose in your community
## 13                                                                                                                                                   you're not supposed --n402c93 to do that
## 14                                                                                                                                                              john sununu is, uh, half arab
## 15                                                                                                                                                          it was going --n403324 to be good
## 16                                                                               that, like, something like fifty percent of the world's landfills is like paper. filled --n403569 with paper
##                                                                                                                                                                                                                                                                                                Sentence
## 1                                                                                                                                                                                                                        and i know that we very often wore heels despite the fact that, it was tiring.
## 2                                                                 but i, i guess i got a pretty good deal, because i went back to the town north mazda right off central and offered them the same price as what i knew i was going --n40725a to be able --n40726d to get it for --n407284 in arlington
## 3                                                                                                                                                                                                             and i know that there are a lot of foreigners, uh, here, you know, doing my line of work.
## 4                                                                                                                                                                                                                                           well, i know even if you watched a b c, n b c or the other,
## 5  if when they're meeting with the engineers who they know --n408b67 are going --n408b76 to be dressed down --n408b8d, if they come in, in, you know, a six hundred dollar three piece suit, it's going --n408bec to make the people they're meeting with --n408c17 feel very uncomfortable, you know,
## 6                                                                                                                                                                                                   i think that's the basic point of it, is i'm not, i, i don't enjoy it if i know that it's exercise,
## 7                                                                                                                                 and, and, you know, if i know that they're going --n40163e to be there, you know, you, i try --n401671 to really watch it and like you say, you know, really dress up
## 8                                                                                                                                                                                                                        so, or i give it to my mom and dad if i know we're not going --n403ea2 to eat,
## 9                                                                                                                                                                                                                                                               they don't know that i don't use drugs.
## 10                                                                                         i, i, don't know that, um, if you step back from the current issue and look at it more intellectually, there are forever over, as long as we know there are races of people that --n4049ef are dropping out.
## 11                                                                                                                                                                                                               uh, and i don't know that, uh, a lot of people vote in primaries for that very reason.
## 12                           but i, i really think that in, in terms of like this, i'd, i think that it, it might not be such a bad thing. because, i don't know that anybody, w-, in-, i don't know that anybody would feel good, you know, like if you let someone like that loose in your community.
## 13                                                                                                                                                                                                             he didn't at, at least say to us, did you know you're not supposed --n402c93 to do that.
## 14                                                                                                                                                                                                                                                      and did you know john sununu is, uh, half arab.
## 15                                             the, the difficulty with, with dancing with wolves is that when you make a movie like that --n403297, and you produce it --n4032b2, and then you star in it --n4032d5, uh, the question is, did he, did he really know it was going --n403324 to be good
## 16                                                                                                                                                                           did you know that, like, something like fifty percent of the world's landfills is like paper. filled --n403569 with paper.
##    Environment Subject     VerbNiteID      CompNiteID VerbType VerbLemma
## 1                    i   sw3883_s29_3  sw3883_s29_503  factive      know
## 2                    i sw2439_s220_38 sw2439_s220_536  factive      know
## 3                    i   sw4902_s18_3  sw4902_s18_503  factive      know
## 4                    i   sw2060_s20_4  sw2060_s20_504  factive      know
## 5  conditional    they sw2027_s215_13 sw2027_s215_519  factive      know
## 6  conditional       i  sw3080_s29_33  sw3080_s29_535  factive      know
## 7  conditional       i  sw2027_s30_13  sw2027_s30_513  factive      know
## 8  conditional       i sw3138_s148_17 sw3138_s148_514  factive      know
## 9     negation    they  sw2314_s102_4 sw2314_s102_504  factive      know
## 10    negation       i sw2657_s126_10 sw2657_s126_509  factive      know
## 11    negation       i  sw2504_s150_7 sw2504_s150_505  factive      know
## 12    negation       i sw3150_s177_69 sw3150_s177_567  factive      know
## 13    question     you  sw2027_s66_17  sw2027_s66_515  factive      know
## 14    question     you   sw2130_s87_4  sw2130_s87_503  factive      know
## 15    question      he sw2010_s100_57 sw2010_s100_556  factive      know
## 16    question     you  sw4175_s114_3 sw4175_s114_503  factive      know
##    SubjectPerson
## 1            1st
## 2            1st
## 3            1st
## 4            1st
## 5            3rd
## 6            1st
## 7            1st
## 8            1st
## 9            3rd
## 10           1st
## 11           1st
## 12           1st
## 13           2nd
## 14           2nd
## 15           3rd
## 16           2nd
nrow(sampledknow)
## [1] 16
sampledrealize = realize %>%
  filter(nchar(as.character(Complement)) > 12) %>%
  filter(Environment != "question") %>%
  group_by(Environment) %>%
  sample_n(.,4)
sampledrealize = as.data.frame(sampledrealize)
sampledrealize
##      Item_ID     Verb
## 1   47976:10  realize
## 2   31945:30 realized
## 3    89551:5  realize
## 4    74478:7  realize
## 5   70930:73 realized
## 6  169350:34  realize
## 7  166967:27  realize
## 8  113018:67  realize
## 9    4431:14  realize
## 10  70192:10  realize
## 11 105944:10  realize
## 12 106635:12  realize
##                                                                                                                                                                                                                           Complement
## 1                                                                                                                                                                                                                 that it was better
## 2                                                                                                                                                                                                             that it was eucalyptus
## 3                                                                                                                                                                                           that he would like his career to develop
## 4                                                                                                                                                                                                  it --n407bc1's important that, uh
## 5                                                                                                                                                            we could never buy a house anyway no matter how much we saved --n40244c
## 6  that what they print --n40223a is stuff that you probably knew --n402259 already and the stuff that you want --n40227c they're not printing --n402293 because the average person doesn't need or want --n4022be to know that much
## 7                                                                                                                                  that you're subject to paying, uh, income tax on something that you purchase --n400a02 mail order
## 8                                                                                                                                                                                                                  they're one short
## 9                                                                                                                                              that they ha-, were going --n403703 to reach out to people from, all over the country
## 10                                                                                                                                                                                                 that dallas had that same problem
## 11                                                                                                                                                                                                             it was that expensive
## 12                                                                                                                                                                                                                that i needed that
##                                                                                                                                                                                                                                                                                                                                                                   Sentence
## 1                                                                                                                                                                                                                                                   but now i realize that it was better because, um, they have got into a lot of trouble, because of lack of supervision.
## 2                                                                                                                                                                                                                                                                                                                   and then one day i, i realized that it was eucalyptus.
## 3                                                                                                                                                                                                                                                                                                                      i realize that he would like his career to develop,
## 4                                                                                                                                                                                                                                                                                                                       and you realize it --n407bc1's important that, uh,
## 5  so, we virtually did that for about two years, which --n4023dd worked real well, and then moved from california where we realized we could never buy a house anyway no matter how much we saved --n40244c --n402453, and moved to texas, bought a house --n40247e immediately, you know, which --n40249d, of course, is now devalued --n4024c0 with the housing market,
## 6                                                                       if you were deeply involved in it, then you immediately realize that what they print --n40223a is stuff that you probably knew --n402259 already and the stuff that you want --n40227c they're not printing --n402293 because the average person doesn't need or want --n4022be to know that much.
## 7                                                                                                                                                                                                                                 i mean what if you don't even realize that you're subject to paying, uh, income tax on something that you purchase --n400a02 mail order.
## 8                                                                                                                                                                                                                                        if they'll take a group of kids to the zoo or somewhere and then come back and not even count them and realize they're one short.
## 9                                                                                                                                                                                                                                                             and, i didn't realize that they ha-, were going --n403703 to reach out to people from, all over the country.
## 10                                                                                                                                                                                                                                                                                                                     i didn't realize that dallas had that same problem.
## 11                                                                                                                                                                                                                                                                                                                                 i didn't realize it was that expensive.
## 12                                                                                                                                                                                                                                                                                                                                 so i didn't realize that i needed that.
##    Environment Subject     VerbNiteID      CompNiteID VerbType VerbLemma
## 1                    i   sw2539_s80_4  sw2539_s80_504  factive   realize
## 2                    i sw2423_s145_11 sw2423_s145_510  factive   realize
## 3                    i  sw3049_s179_2 sw3049_s179_503  factive   realize
## 4                  you  sw2826_s230_3 sw2826_s230_503  factive   realize
## 5  conditional      we  sw2782_s69_25  sw2782_s69_525  factive   realize
## 6  conditional     you  sw4611_s45_12  sw4611_s45_512  factive   realize
## 7  conditional     you   sw4372_s11_9  sw4372_s11_511  factive   realize
## 8  conditional    they sw3266_s133_24 sw3266_s133_521  factive   realize
## 9     negation       i  sw2041_s116_6 sw2041_s116_504  factive   realize
## 10    negation       i  sw2772_s118_4 sw2772_s118_504  factive   realize
## 11    negation       i   sw3174_s89_4  sw3174_s89_504  factive   realize
## 12    negation       i   sw3188_s32_5  sw3188_s32_504  factive   realize
##    SubjectPerson
## 1            1st
## 2            1st
## 3            1st
## 4            2nd
## 5            1st
## 6            2nd
## 7            2nd
## 8            3rd
## 9            1st
## 10           1st
## 11           1st
## 12           1st
nrow(sampledrealize)
## [1] 12

Combine all the sampled cases and write them to a stimulus file.

library(jsonlite)  

merged = bind_rows(sampledsee,sampledthink,sampledbelieve,sampledknow,sampledrealize) %>% 
  mutate(NoThatComplement=gsub("^that ","",Complement,perl=T)) %>%
  mutate(StrippedComplement=gsub("--n[a-zA-Z0-9]*( |\\.|,|'|$)","",NoThatComplement,perl=T),StrippedSentence = gsub("--n[a-zA-Z0-9]*( |\\.|,|'|$)","",Sentence,perl=T)) %>%
  select(Item_ID,StrippedComplement,StrippedSentence,VerbType) %>%
  rename(id=Item_ID,complement=StrippedComplement,sentence=StrippedSentence,stimType=VerbType)
## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character

## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character

## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character

## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character

## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character

## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character

## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character

## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character

## Warning in rbind_all(x, .id): Unequal factor levels: coercing to character
merged = as.data.frame(merged)

writeOutJSON <- toJSON(merged,pretty=T)
write(writeOutJSON, "/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/factives_experiment/js/stimuli.json")

Step 2: analyze projection data

The data analyzed in the following come from a mini projection judgment experiment conducted as part of the ESSLLI 2016 course “Corpus Methods for Research in Pragmatics”.

pd = data.frame()
temp = list.files("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/results/data/projection_data")
for (i in 1:length(temp)) {
  raw = fromJSON(paste("/Users/titlis/cogsci/projects/esslli2016_corpuspragmatics/factives/results/data/projection_data/",temp[i],sep=""))
  td = raw$trials
  subj = raw$subject_information
  td$language = subj$language
  td$enjoyment = subj$enjoyment
  td$asses = subj$asses
  td$age = subj$age
  td$gender = subj$gender
  td$education = subj$education
  td$comments = subj$comments
  td$participant = i
  pd = bind_rows(pd,td)
}
pd = as.data.frame(pd)
pd[sapply(pd, is.character)] <- lapply(pd[sapply(pd, is.character)],as.factor)
pd$participant = as.factor(as.character(pd$participant))

Because we didn’t record all information from the original database in the experiment, we need to merge that information back in by matching the TGrep2 item IDs.

row.names(d) = d$Item_ID
pd$subject = d[as.character(pd$id),]$Subject
pd$verb = d[as.character(pd$id),]$Verb
pd$environment = d[as.character(pd$id),]$Environment
pd$environment = as.character(pd$environment)
pd[pd$environment == "",]$environment = "main"
pd$environment = as.factor(as.character(pd$environment))
pd$subjectperson = d[as.character(pd$id),]$SubjectPerson
summary(pd)
##       trial_type     response              id     
##  projection:375   Min.   :0.0000   96908:52 : 12  
##                   1st Qu.:0.4000   152345:18: 11  
##                   Median :0.7900   82397:7  : 11  
##                   Mean   :0.6485   100957:13: 10  
##                   3rd Qu.:0.9700   158110:10: 10  
##                   Max.   :1.0000   165765:12: 10  
##                                    (Other)  :311  
##                                                                                                     sentence  
##  and, you know, if she wanted to go to combat, i do not see that they should stop her.                  : 12  
##  do you know my grandparents live in durant.                                                            : 11  
##  so what else do you think is important.                                                                : 11  
##  and i didn't realize that they were putting dual, uh, air bags in that car now.                        : 10  
##  and she said how did you know those are the colors we used                                             : 10  
##  and they've also see that there's, there's a different way of life and those families are really close.: 10  
##  (Other)                                                                                                :311  
##                                                                                    complement 
##  they should stop her                                                                   : 12  
##  is important                                                                           : 11  
##  my grandparents live in durant                                                         : 11  
##  , you know, everybody is sitting here screaming about, we don't want a state income tax: 10  
##  are going to be dressed down                                                           : 10  
##  it's exercise                                                                          : 10  
##  (Other)                                                                                :311  
##    stim_type      language   enjoyment      asses          age     
##  control:112          : 25   -1:125    Confused: 25   27     : 75  
##  factive:210   Dutch  : 50   1 :125    No      : 25   30     : 50  
##  sense  : 53   English:125   2 :125    Yes     :325   31     : 50  
##                French : 25                            32     : 50  
##                German :100                                   : 25  
##                Russian: 25                            23     : 25  
##                Slovene: 25                            (Other):100  
##     gender    education                 comments    participant 
##  Female:250   -1: 50                        :350   1      : 25  
##  Male  :125   3 :100    This was so much fun: 25   10     : 25  
##               4 :225                               11     : 25  
##                                                    12     : 25  
##                                                    13     : 25  
##                                                    14     : 25  
##                                                    (Other):225  
##     subject          verb          environment  subjectperson
##  i      :224   know    :122   conditional:110   1st:230      
##  you    : 87   realize : 69   main       :106   2nd: 87      
##  they   : 41   think   : 48   negation   : 95   3rd: 58      
##  he     :  9   see     : 46   question   : 64                
##  people :  8   believe : 42                                  
##  we     :  6   realized: 19                                  
##  (Other):  0   (Other) : 29
nrow(pd)
## [1] 375

Understand your participant population

Get unique participant information.

pinfo = unique(pd[,c("language","enjoyment","asses","age","gender","comments")])
nrow(pinfo)
## [1] 15

Is everyone a native speaker of English?

table(pinfo$language)
## 
##           Dutch English  French  German Russian Slovene 
##       1       2       5       1       4       1       1

Did people love or hate the experiment?

table(pinfo$enjoyment)
## 
## -1  1  2 
##  5  5  5

Did people understand the experiment?

table(pinfo$asses)
## 
## Confused       No      Yes 
##        1        1       13

How old are our participants?

ggplot(pd, aes(x=as.numeric(as.character(age)))) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 25 rows containing non-finite values (stat_bin).

What is the gender of our participants?

table(pinfo$gender)
## 
## Female   Male 
##     10      5

What did participants have to say about the experiment?

pinfo$comments
##  [1]                                                               
##  [4]                                                               
##  [7]                                                               
## [10] This was so much fun                                          
## [13]                                                               
## Levels:  This was so much fun

Visualize your data

Plot the overall distribution of responses.

ggplot(pd, aes(x=response)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(pd, aes(x=response,fill=gender)) +
  geom_histogram(position="dodge")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This histogram includes cognitive factives (know, realize), sense factives (see), and non-factivs (believe, think). Let’s group histograms by these three groups.

ggplot(pd, aes(x=response)) +
  geom_histogram() +
  facet_wrap(~stim_type) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Do the distributions vary by whether the subject is 1st, 2nd, or 3rd person?

ggplot(pd, aes(x=response,fill=subjectperson)) +
  geom_histogram(position="dodge",binwidth=.1) +
  facet_wrap(~stim_type) 

Do the distributions vary by environment? Let’s re-order the environment factor levels so “main clause” is in the top row.

pd$env = factor(x=pd$environment, levels=c("main","negation","conditional","question"))

ggplot(pd, aes(x=response)) +
  geom_histogram(position="dodge") +
  facet_grid(environment~stim_type) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(pd, aes(x=response)) +
  geom_histogram(position="dodge") +
  facet_grid(env~stim_type) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

How much variability or agreement is there for each item?

cog = droplevels(pd[pd$stim_type == "factive",])
nrow(cog)
## [1] 210
ggplot(cog, aes(x=response,fill=verb)) +
  geom_histogram() +
  facet_wrap(~id)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Examples of sentences that are judged very differently by different participants:

unique(pd[pd$id == "93863:97",]$sentence)
## [1] i think that's the basic point of it, is i'm not, i, i don't enjoy it if i know that it's exercise,
## 50 Levels: and finally they realized that they were, they were abusing them and weren't going to get out of the hole ...
unique(pd[pd$id == "755:168",]$sentence)
## [1] the, the difficulty with, with dancing with wolves is that when you make a movie like that  and you produce it  and then you star in it  uh, the question is, did he, did he really know it was going to be good
## 50 Levels: and finally they realized that they were, they were abusing them and weren't going to get out of the hole ...

Do the distributions vary both by environment and by person?

ggplot(pd, aes(x=response,fill=subjectperson)) +
  geom_histogram(position="dodge") +
  facet_grid(env~stim_type) 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This visualization is getting unwieldy. There are too many facets with too many variables being plotted, all the while the numbers of observations in each cell are unbalanced. Let’s try to compress this information into a simpler form.

agr = pd %>%
  group_by(environment,stim_type) %>%
  summarise(mean_projection=mean(response))
ggplot(agr, aes(x=environment,y=mean_projection)) +
  geom_bar(stat="identity",position="dodge",color="black",fill="gray60") +
  facet_wrap(~stim_type)

This is easier to read, but we lost a lot of information about variability in each condition. Let’s instead plot condition means with bootstrapped 95% confidence intervals.

agr = pd %>%
  group_by(environment,stim_type) %>%
  summarise(mean_speakercommitment=mean(response),ymin=mean(response)-ci.low(response),ymax=mean(response)+ci.high(response))
ggplot(agr, aes(x=environment,y=mean_speakercommitment)) +
  geom_bar(stat="identity",position="dodge",color="black",fill="gray60") +
  geom_errorbar(aes(ymin=ymin,ymax=ymax),width=.25) +
  facet_wrap(~stim_type)

Let’s add the subject information back in.

agr = pd %>%
  group_by(environment,subjectperson,stim_type) %>%
  summarise(mean_projection=mean(response),ymin=mean(response)-ci.low(response),ymax=mean(response)+ci.high(response))
dodge = position_dodge(.9)
ggplot(agr, aes(x=environment,y=mean_projection,fill=subjectperson)) +
  geom_bar(stat="identity",position=dodge,color="black") +
  geom_errorbar(aes(ymin=ymin,ymax=ymax),width=.25,position=dodge) +
  facet_wrap(~stim_type)

This looks ugly. Let’s try plotting points instead and make the x axis labels more readable.

agr = pd %>%
  group_by(environment,subjectperson,stim_type) %>%
  summarise(mean_projection=mean(response),ymin=mean(response)-ci.low(response),ymax=mean(response)+ci.high(response))
ggplot(agr, aes(x=environment,y=mean_projection,color=subjectperson)) +
  geom_point() +
  geom_errorbar(aes(ymin=ymin,ymax=ymax),width=.25) +
  facet_wrap(~stim_type) +
  theme(axis.text.x = element_text(angle=45,vjust=1,hjust=1))

Does complement length matter?

pd$complement_length = nchar(as.character(pd$complement))
ggplot(pd, aes(x=complement_length,y=response)) +
  geom_point() +
  geom_smooth(method="lm")

Does complement length matter differently for different verbs?

ggplot(pd, aes(x=complement_length,y=response,color=stim_type)) +
  geom_point() +
  geom_smooth(method="lm")

Model your data

We want to know for a variety of factors, whether they predict similarity ratings. Because the slider scale is continuous, we use linear models to address this question. Because there is likely to be random variability by participants and by items, we use mixed effects. Keep in mind that we don’t have nearly enough data to run this analysis properly, and for the sake of the example we are only investigating the effect of two factors (environment and subject) instead of the full slew of factors that we have reason to believe might matter (including, e.g., intonation, frequency of the verb, whether or not the complementizer was omitted, etc).

First we only look at the factives. We set the contrasts of the environment predictor so “main clause” is the reference level that all other environments are compared to. For the person predictor, “1st person” is the default reference level, and we’ll leave it at that. We want to include random effects both by participants and by items. In this case, because every item necessaril¥ occurs with one environment and with one subject, we can’t include by-item random slopes for environment and subject, so we only include random by-item intercepts. In contrast, each participant did see multiple environments and subjects, so we can include by-participant random slopes for both predictors in addition to random intercepts. Play with this and you will see, however, that the model does not converge if you include both random slopes.

factives = droplevels(pd[pd$stim_type == "factive",])
contrasts(factives$environment) = cbind("conditional.v.main"=c(1,0,0,0),"negation.v.main"=c(0,0,1,0),"question.v.main"=c(0,0,0,1))
m = lmer(response ~ environment + subjectperson + (1|id) + (1+environment|participant), data=factives)

The model summary tells us: the projection mean at the reference level of both predictors (i.e. main clauses with 1st person subjects) is .91. Compared to that baseline, ratings are lower (i.e. there is less projection) in conditionals. No other fixed effects have t values with absolute values that are big enough to suggest the presence of an effect.

However, a glance at the correlation matrix tells us that there is a lot of colinearity between fixed effects (values > .4). This points to the effects being unreliable (remember, we have very little data!)

summary(m)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## response ~ environment + subjectperson + (1 | id) + (1 + environment |  
##     participant)
##    Data: factives
## 
## REML criterion at convergence: -33.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.10814 -0.37398  0.07102  0.46483  2.55281 
## 
## Random effects:
##  Groups      Name                          Variance  Std.Dev. Corr       
##  id          (Intercept)                   0.0400578 0.20014             
##  participant (Intercept)                   0.0050241 0.07088             
##              environmentconditional.v.main 0.0127774 0.11304  -0.60      
##              environmentnegation.v.main    0.0002876 0.01696  -0.77  0.97
##              environmentquestion.v.main    0.0055723 0.07465   0.75  0.08
##  Residual                                  0.0305469 0.17478             
##       
##       
##       
##       
##       
##  -0.14
##       
## Number of obs: 210, groups:  id, 28; participant, 15
## 
## Fixed effects:
##                               Estimate Std. Error t value
## (Intercept)                    0.90983    0.09363   9.717
## environmentconditional.v.main -0.33012    0.11089  -2.977
## environmentnegation.v.main    -0.10177    0.11861  -0.858
## environmentquestion.v.main    -0.13706    0.15253  -0.899
## subjectperson2nd               0.06620    0.13470   0.491
## subjectperson3rd              -0.15920    0.11635  -1.368
## 
## Correlation of Fixed Effects:
##             (Intr) envrnmntc.. envrnmntn.. envrnmntq.. sbjct2
## envrnmntc.. -0.582                                           
## envrnmntn.. -0.765  0.445                                    
## envrnmntq.. -0.213  0.392       0.181                        
## sbjctprsn2n -0.386 -0.095       0.305      -0.509            
## sbjctprsn3r -0.546  0.060       0.432      -0.157       0.452