Trials were excluded if the speaker did not provide any description before the listener clicked.
I may be missing some echoes, although the rate of echoing is high enough we can’t subset based on it (not to mention it’s not independent of child behavior). So we’re not going to exclude based on this and it’ll just be a caveat to interpretation.
The practice trials are great – probably some improvement over time in regular trials?
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: correct.num ~ trialNum + (trialNum | game) + (1 | target)
## Data: data_for_mods
##
## AIC BIC logLik deviance df.resid
## 203.7 224.3 -95.8 191.7 225
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0088 0.2645 0.3421 0.3999 0.7981
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## game (Intercept) 1.34163 1.1583
## trialNum 0.02838 0.1685 -0.88
## target (Intercept) 0.04083 0.2021
## Number of obs: 231, groups: game, 20; target, 4
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.26009 0.50139 2.513 0.012 *
## trialNum 0.12613 0.08315 1.517 0.129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## trialNum -0.796
## 2.5 % 97.5 %
## (Intercept) 0.36827302 2.4933657
## trialNum -0.04301293 0.3080279
Accuracy is probably increasing, but interval overlaps 0.
I was trying to look at whether speakers initiated descriptions faster later on. Some weird negative outliers suggest a timing glitch in at least one expt. But also, it doesn’t look like this is true (also requires relying on more layers of timing accuracy & alignment).
TO DO try to understand negative outliers?
How long do trials take?
Note, some high outliers cut out of view.
speed to response does get faster over time!
## Linear mixed model fit by REML ['lmerMod']
## Formula: time_sec ~ trialNum + (trialNum | game) + (1 | target)
## Data: data_for_mods
##
## REML criterion at convergence: 1936.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.5336 -0.5150 -0.2638 0.1830 6.6696
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## game (Intercept) 63.8016 7.9876
## trialNum 0.2185 0.4675 -1.00
## target (Intercept) 2.3601 1.5363
## Residual 248.8829 15.7760
## Number of obs: 230, groups: game, 20; target, 4
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 27.5308 2.8056 9.813
## trialNum -1.2360 0.3246 -3.807
##
## Correlation of Fixed Effects:
## (Intr)
## trialNum -0.790
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
## 2.5 % 97.5 %
## (Intercept) 22.072032 32.9394076
## trialNum -1.878613 -0.5984693
Is getting faster over time at about 1 second / trial.
Going up slightly, if anything, not down. (Although for this task, not sure I’d expect adults to go down rather than to start at fast and stay there). But it’s still different!
Could also look at total words that are at least vaguely game related, although this will have “it looks like”, repetition, and inconsistently tagged “Yes” in response to “do you see it” so idk if that’s useful
## Linear mixed model fit by REML ['lmerMod']
## Formula: word_count ~ trialNum + (trialNum | game) + (1 | target)
## Data: words_for_mod
##
## REML criterion at convergence: 1145.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8780 -0.4702 -0.2482 0.2676 4.5879
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## game (Intercept) 1.42184 1.1924
## trialNum 0.02697 0.1642 1.00
## target (Intercept) 0.00000 0.0000
## Residual 8.98571 2.9976
## Number of obs: 220, groups: game, 19; target, 4
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.72578 0.47570 7.832
## trialNum 0.10821 0.07038 1.538
##
## Correlation of Fixed Effects:
## (Intr)
## trialNum -0.283
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
## 2.5 % 97.5 %
## (Intercept) 2.77025637 4.6783598
## trialNum -0.03299796 0.2487775
If anything, a slight positive relationship.
We expect when it’s the same, high agreement and when they’re different,
low agreement. This checks out.
Descriptions from different kids of the same item are more similar than diff items. Same item is described more like partner describes it than like random other kid does. (Note that within game we can only compare targets across blocks, so doing cross-block for everything)
Not seeing noticeable change over time.
For above, could try subsetting by successful utterances or something.
What sorts of wacky descriptions do kids use successfully?
In addition to graphical analyses (above), we said We plan to run the model: DV ~ trial_num + (trial_num|dyad) + (1|target) for the DVs: accuracy (logistic model), speed (linear model) and number of words of speaker description (linear model ).