Note
Cogsci notes
Prep
How much data
Word count basics
How many different flowers described / trial?
- Model
Sbert notes
Sbert pictures
Future things
- Boring diligence

Note

I have annotated the transcripts of the full games marking what parts are referential (descriptive of a flower) and which flower (as best I can tell) it refers to.

See https://docs.google.com/presentation/d/1qcDRZzHbLhE-fp6W9nKcbGouj4w01qCLCy-yt072iiY/edit?usp=sharing for the flower number - image correspondences.

Cogsci notes

Key points:

Many of the key reduction findings from tangrams generalize to this situation. Specifically, we see utterance reduction over time and w/i group convergence for each image and divergence between images. This situation is different in that we have different stimuli (more natural) and the set up is collaborative and more free-form in what is talked about. These patterns hold for both the individual and group payoff structures.
One difference we see is that groups don’t diverge (from each other). This may be dependent on stimulus properities (are there universal features of some of the images?) and group dynamics
Conclusion: The key reference game findings have some generalizability. Settings like this one may be useful for encouraging discussion of a set of images and setting up partial knowledge situations.

Prep

How much data

## total utterances

## # A tibble: 1 × 1
##       n
##   <int>
## 1  3404

## Utterances / game

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   63.00   77.00   97.26  114.50  264.00

## Utterances / flower

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   32.00   52.75   67.50   70.92   84.50  134.00

Word count basics

Take away – across the condition manipulation, the number of utterances doesn’t decrease, but the length of each one does.

Graph could use finessing if going in paper.

How many different flowers described / trial?

We can probably give this as summary stats and not as graph

Per game

Per person

Model

## stan_lmer
##  family:       gaussian [identity]
##  formula:      numword ~ trialNum + (1 | gameId) + (1 | player) + (1 | flower)
##  observations: 3404
## ------
##             Median MAD_SD
## (Intercept)  5.7    0.2  
## trialNum    -0.2    0.0  
## 
## Auxiliary parameter(s):
##       Median MAD_SD
## sigma 2.1    0.0   
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  player   (Intercept) 0.65    
##  flower   (Intercept) 0.63    
##  gameId   (Intercept) 0.96    
##  Residual             2.09    
## Num. levels: player 104, flower 48, gameId 35 
## 
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg

## # A tibble: 2 × 3
##   Term        Estimate `Credible Interval`
##   <chr>          <dbl> <chr>              
## 1 (Intercept)     5.67 [5.26, 6.09]       
## 2 trialNum       -0.15 [-0.16, -0.14]

Sbert notes

Some pre-commentary on analytic choices: - it’s unclear whether we should be looking at referential statements individually (i.e. treating anything that occured as a separate utterance, after a line break, as separate) or concatenating everything Laju said about flower yellow3 into one statement. I think the first is generally better since sometimes a flower is described twice (perhaps in the same way), as in “the best values are 3 in a row and big fluffy” … “okay, I’ll take big fluffy”. But there are also times when someone has a line break before continuing/clarifying a description. For now we go with single utterances.

Unlike tangrams where there was a very clear cut every item gets described once every round by every group, here we have no such guarantees! So there’s a big question of what scale to use for “time block”. In some within game cases it might make sense to compare utterances within a single round (for distinctiveness of different descriptions or if multiple describe the same flower differently). It also might make sense to treat round as the semi-continuous variable that it is for some modelling where we can eventually see that farther away pairs do something. But for now, I’m going to divide the game into quarters (sets of 6 rounds) and pretend that we have 4 time points. This is arbitrary and preliminary but it’s easy to do viz with.

Sbert pictures

These have quadratic smooths fit onto them, to allow for curvature, but no reason to choose this curve and not another.

During game

Comparing utterances within rounds that were up to 2 apart, round coded as the later (so 10-12,11-12 and 12-12 will all contribute to “12”). Up to 2 apart is somewhat arbitrary, not sure what the right rolling window to do is.

Take aways: - descriptions to the same flower within game (either same person or not) become more similar - descriptions to the same flower across games become (slightly) less similar - descriptions to different flowers become less similar (regardless of between/within game)

Between game and end

We can treat what someone wrote at the end as the convention and then compare how similar this is to earlier utterances by them and their group mates.

Big question is whether the end utts need cleaning (which probably yes?)

We see that for the same flower, similarity to the end utt increases over blocks both within individual and indiv - other group mate (this is all within a game)

Comparing utterances earlier to post-game descriptions.

General question of how to display these things – what should be color v faceting and what to do for the dots.

Within end-stuff

We can look at how converged people are (or are not) by comparing what they say at the end.

Each person was asked for names ( ~ “how would you label this to your group”) for the 12 images they were seeing and then for 4 from a different color palette.

Own = color you saw all game

Other = the color you didn’t see

Mixed = own for one person, other for other person

Not sure how to show spread here. Or how to do a viz that allows for comparisons.

Take aways:

People use more similar descriptions for two flowers that weren’t in their set than those that were (checks out – in set ones have been described and diverged)
This same thing holds (with overall lower sims) for group-mates
Group mates are more similar in how they describe flowers they’ve seen (v not) (at least in shared utils)
not sure how to interpret the diff game – this is the baseline? Maybe it’s just that you’re more similar for the same flower versus different regardless? (also this is baseline for other panels)

Future things

have we tried log space?
Could do more end game – earlier comparisons
tSNE??
Models of (some of ?) what’s shown in forgoing graphs
CLIP ????
clean up end utts?
are there interesting lang – performance connections ?

Boring diligence

Double check that non-talkers aren’t in this data at all (end game joins). Consider coding and grouping end stuff by what color they usually saw during the game
do sbert a different way and confirm (split v aggregate?)

Flowers text analysis

vboyce