I took the 4p-rotate data (~20 games) and only the complete games of those and ran them through a pipeline to detect shared phrases.
This included:
The goal is that we sorta get at “concepts”, although I’ll note that there’s some imperfections (i.e. skater and skating don’t map to the same lemma)
But at least it gives us a list of semantic units that are easier to work with than the raw text (ex. it’s feasible to tag them)
We have 954 phrases, with an average occurence of 10 each.
Following previous attempts to tag utterances for granularity / level of description, we repeat this here.
Abstract was tagged by hand (although kinda sloppily), all others were by regex. Not everything fell in a category, there’s definitely some ambiguities (especially when looking at the stopword-removed lemma form), some are multiple categories.
Overall (summed) rates:
We can also, on a per-group level, look at relative frequency of these different types.
Note that things can count various numbers of times, but we see the rate of abstract increasing and shape and body part slightly decreasing. Broadly consistent with our previous stuff.
So, not all of these are really things we might want to call “conventions” since some are just words that get used a lot here, but in a cross-game, cross-target way.
By far most of the data is descriptions that occur a handful of times, always or almost always with only 1 tangram.
Many of these are the same – specific to a group and a tangram, but there are also some that are used cross group, but still fairly tangram-specific.
For completeness, we can look at what descriptions are used by how many groups
So a complication here is that while most phrases are only used by one group (and only for one tangram, etc), the really frequent ones are used a lot, by everyone, for everything.
And that’s hard to weigh, especially when some of these highly frequent ones are substrings (“arm”) that can occur in expressions like “wavy arm” that is meaningful.
We’d really like to be able to test some sort of how does distinctiveness impact stickiness or something.
But we need to be careful not to confound by using count data that has stuff from after it either stuck or didn’t.
We could by fiat exclude some of the most frequent things, but that means arbitrary cutoffs.
something something pointwise mutual information?
could look at self-variance and self-repetition : would need to re-run dialign with speaker/other for each permutation and average (these are metrics dialign returns)
clip similiarity on phrases versus random baseline or versus whole utterance
could look at timing of convention emergence as correlate of accuracy / reduction / by condition (like when do last round shared phrases emerge)