Note

I have annotated the transcripts of the full games marking what parts are referential (descriptive of a flower) and which flower (as best I can tell) it refers to.

See https://docs.google.com/presentation/d/1qcDRZzHbLhE-fp6W9nKcbGouj4w01qCLCy-yt072iiY/edit?usp=sharing for the flower number - image correspondences.

Cogsci notes

Key points:

Prep

How much data

## total utterances
## # A tibble: 1 × 1
##       n
##   <int>
## 1  3404
## Utterances / game
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   63.00   77.00   97.26  114.50  264.00
## Utterances / flower
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   32.00   52.75   67.50   70.92   84.50  134.00

Word count basics

Take away – across the condition manipulation, the number of utterances doesn’t decrease, but the length of each one does.

Graph could use finessing if going in paper.

How many different flowers described / trial?

We can probably give this as summary stats and not as graph

Per game

Per person

Model

## stan_lmer
##  family:       gaussian [identity]
##  formula:      numword ~ trialNum + (1 | gameId) + (1 | player) + (1 | flower)
##  observations: 3404
## ------
##             Median MAD_SD
## (Intercept)  5.7    0.2  
## trialNum    -0.2    0.0  
## 
## Auxiliary parameter(s):
##       Median MAD_SD
## sigma 2.1    0.0   
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  player   (Intercept) 0.65    
##  flower   (Intercept) 0.63    
##  gameId   (Intercept) 0.96    
##  Residual             2.09    
## Num. levels: player 104, flower 48, gameId 35 
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
## # A tibble: 2 × 3
##   Term        Estimate `Credible Interval`
##   <chr>          <dbl> <chr>              
## 1 (Intercept)     5.67 [5.26, 6.09]       
## 2 trialNum       -0.15 [-0.16, -0.14]

Sbert notes

Some pre-commentary on analytic choices: - it’s unclear whether we should be looking at referential statements individually (i.e. treating anything that occured as a separate utterance, after a line break, as separate) or concatenating everything Laju said about flower yellow3 into one statement. I think the first is generally better since sometimes a flower is described twice (perhaps in the same way), as in “the best values are 3 in a row and big fluffy” … “okay, I’ll take big fluffy”. But there are also times when someone has a line break before continuing/clarifying a description. For now we go with single utterances.

Sbert pictures

These have quadratic smooths fit onto them, to allow for curvature, but no reason to choose this curve and not another.

During game

Comparing utterances within rounds that were up to 2 apart, round coded as the later (so 10-12,11-12 and 12-12 will all contribute to “12”). Up to 2 apart is somewhat arbitrary, not sure what the right rolling window to do is.

Take aways: - descriptions to the same flower within game (either same person or not) become more similar - descriptions to the same flower across games become (slightly) less similar - descriptions to different flowers become less similar (regardless of between/within game)

Between game and end

We can treat what someone wrote at the end as the convention and then compare how similar this is to earlier utterances by them and their group mates.

Big question is whether the end utts need cleaning (which probably yes?)

We see that for the same flower, similarity to the end utt increases over blocks both within individual and indiv - other group mate (this is all within a game)

Comparing utterances earlier to post-game descriptions.

General question of how to display these things – what should be color v faceting and what to do for the dots.

Within end-stuff

We can look at how converged people are (or are not) by comparing what they say at the end.

Each person was asked for names ( ~ “how would you label this to your group”) for the 12 images they were seeing and then for 4 from a different color palette.

Own = color you saw all game

Other = the color you didn’t see

Mixed = own for one person, other for other person

Not sure how to show spread here. Or how to do a viz that allows for comparisons.

Take aways:

  • People use more similar descriptions for two flowers that weren’t in their set than those that were (checks out – in set ones have been described and diverged)
  • This same thing holds (with overall lower sims) for group-mates
  • Group mates are more similar in how they describe flowers they’ve seen (v not) (at least in shared utils)
  • not sure how to interpret the diff game – this is the baseline? Maybe it’s just that you’re more similar for the same flower versus different regardless? (also this is baseline for other panels)

Future things

Boring diligence

  • Double check that non-talkers aren’t in this data at all (end game joins). Consider coding and grouping end stuff by what color they usually saw during the game
  • do sbert a different way and confirm (split v aggregate?)