Trials were excluded if the speaker did not provide any description before the listener clicked.

Rate of echoing

I may be missing some echoes, although the rate of echoing is high enough we can’t subset based on it (not to mention it’s not independent of child behavior). So we’re not going to exclude based on this and it’ll just be a caveat to interpretation.

Accuracy over time

The practice trials are great – probably some improvement over time in regular trials?

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: correct.num ~ trialNum + (trialNum | game) + (1 | target)
##    Data: data_for_mods
## 
##      AIC      BIC   logLik deviance df.resid 
##    203.7    224.3    -95.8    191.7      225 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0088  0.2645  0.3421  0.3999  0.7981 
## 
## Random effects:
##  Groups Name        Variance Std.Dev. Corr 
##  game   (Intercept) 1.34163  1.1583        
##         trialNum    0.02838  0.1685   -0.88
##  target (Intercept) 0.04083  0.2021        
## Number of obs: 231, groups:  game, 20; target, 4
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  1.26009    0.50139   2.513    0.012 *
## trialNum     0.12613    0.08315   1.517    0.129  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr)
## trialNum -0.796

##                   2.5 %    97.5 %
## (Intercept)  0.36827302 2.4933657
## trialNum    -0.04301293 0.3080279

Accuracy is probably increasing, but interval overlaps 0.

Speed to start of description

I was trying to look at whether speakers initiated descriptions faster later on. Some weird negative outliers suggest a timing glitch in at least one expt. But also, it doesn’t look like this is true (also requires relying on more layers of timing accuracy & alignment).

TO DO try to understand negative outliers?

Speed to response

How long do trials take?

Note, some high outliers cut out of view.

speed to response does get faster over time!

## Linear mixed model fit by REML ['lmerMod']
## Formula: time_sec ~ trialNum + (trialNum | game) + (1 | target)
##    Data: data_for_mods
## 
## REML criterion at convergence: 1936.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.5336 -0.5150 -0.2638  0.1830  6.6696 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  game     (Intercept)  63.8016  7.9876       
##           trialNum      0.2185  0.4675  -1.00
##  target   (Intercept)   2.3601  1.5363       
##  Residual             248.8829 15.7760       
## Number of obs: 230, groups:  game, 20; target, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  27.5308     2.8056   9.813
## trialNum     -1.2360     0.3246  -3.807
## 
## Correlation of Fixed Effects:
##          (Intr)
## trialNum -0.790
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

##                 2.5 %     97.5 %
## (Intercept) 22.072032 32.9394076
## trialNum    -1.878613 -0.5984693

Is getting faster over time at about 1 second / trial.

Length of description from speaker

Going up slightly, if anything, not down. (Although for this task, not sure I’d expect adults to go down rather than to start at fast and stay there). But it’s still different!

Could also look at total words that are at least vaguely game related, although this will have “it looks like”, repetition, and inconsistently tagged “Yes” in response to “do you see it” so idk if that’s useful

## Linear mixed model fit by REML ['lmerMod']
## Formula: word_count ~ trialNum + (trialNum | game) + (1 | target)
##    Data: words_for_mod
## 
## REML criterion at convergence: 1145.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8780 -0.4702 -0.2482  0.2676  4.5879 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  game     (Intercept) 1.42184  1.1924       
##           trialNum    0.02697  0.1642   1.00
##  target   (Intercept) 0.00000  0.0000       
##  Residual             8.98571  2.9976       
## Number of obs: 220, groups:  game, 19; target, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  3.72578    0.47570   7.832
## trialNum     0.10821    0.07038   1.538
## 
## Correlation of Fixed Effects:
##          (Intr)
## trialNum -0.283
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

##                   2.5 %    97.5 %
## (Intercept)  2.77025637 4.6783598
## trialNum    -0.03299796 0.2487775

If anything, a slight positive relationship.

Sbert

Check practice trials

We expect when it’s the same, high agreement and when they’re different, low agreement. This checks out.

Within a game

Descriptions from different kids of the same item are more similar than diff items. Same item is described more like partner describes it than like random other kid does. (Note that within game we can only compare targets across blocks, so doing cross-block for everything)

Do descriptions get more different?

Not seeing noticeable change over time.

For above, could try subsetting by successful utterances or something.

The fun part

What sorts of wacky descriptions do kids use successfully?

Pre-reg

In addition to graphical analyses (above), we said We plan to run the model: DV ~ trial_num + (trial_num|dyad) + (1|target) for the DVs: accuracy (logistic model), speed (linear model) and number of words of speaker description (linear model ).

Analysis - expt 1