Summary of what was done

I took the 4p-rotate data (~20 games) and only the complete games of those and ran them through a pipeline to detect shared phrases.

This included:

lowercase everything, lemmatize, and filter stop words from a (modified) list
run everything through dialign to detect linguistic units (“phrases”) that are repeated (within a group-tangram combo)
filter out “phrases” that are entirely words with function word parts of speech
identify the utterances where each “phrase” occurred

The goal is that we sorta get at “concepts”, although I’ll note that there’s some imperfections (i.e. skater and skating don’t map to the same lemma)

But at least it gives us a list of semantic units that are easier to work with than the raw text (ex. it’s feasible to tag them)

We have 954 phrases, with an average occurence of 10 each.

Tagging

Following previous attempts to tag utterances for granularity / level of description, we repeat this here.

Abstract was tagged by hand (although kinda sloppily), all others were by regex. Not everything fell in a category, there’s definitely some ambiguities (especially when looking at the stopword-removed lemma form), some are multiple categories.

Overall (summed) rates:

N is total instances of phrases
Abstract is abstract (zombie, baby duvet, rabbit ears, holding a bowl, etc)
Body part includes a body part (arm, leg, shoulder, ear etc)
Position has some up/down/above/below/right etc
Shape has some geometric or otherwise literal description (box, square, triangle, horizontal, line)
Posture has stand/sit/kick/ etc

We can also, on a per-group level, look at relative frequency of these different types.

Note that things can count various numbers of times, but we see the rate of abstract increasing and shape and body part slightly decreasing. Broadly consistent with our previous stuff.

Tags and stickiness

Note that this whole approach has a thumb on the scale feeling because we’re requiring the expression to have occurred at least twice.

We can look at stickiness by looking at what occurs in the first round and then how long it sticks around for.

Maybe instead of just looking at last occurrence, want to trace their occurrences through? Complicated by the fact some they can disappear and reappear.

It would be nice to have by-group error bars, but then we’d probably have to do some normalization (since some groups might have more conventions total), not sure what the right conceptual approach is.

We could also try to do this in a backward looking way.

Most of the “phrases” that occur in the last description originate pretty early (first couple rounds), so we’re mostly seeing a winnowing process?

Is there any way to do like an alluvial diagram where we can see attrition and new entrants?

Other approaches to looking at stickiness: we could look for properties that are predictive of longer/more sticking around?

Distributional stats

So, not all of these are really things we might want to call “conventions” since some are just words that get used a lot here, but in a cross-game, cross-target way.

Within-game, how tangram specific

By far most of the data is descriptions that occur a handful of times, always or almost always with only 1 tangram.

Overall how tangram specific?

Many of these are the same – specific to a group and a tangram, but there are also some that are used cross group, but still fairly tangram-specific.

Group specificity

For completeness, we can look at what descriptions are used by how many groups

Distinctiveness

So a complication here is that while most phrases are only used by one group (and only for one tangram, etc), the really frequent ones are used a lot, by everyone, for everything.

And that’s hard to weigh, especially when some of these highly frequent ones are substrings (“arm”) that can occur in expressions like “wavy arm” that is meaningful.

We’d really like to be able to test some sort of how does distinctiveness impact stickiness or something.

But we need to be careful not to confound by using count data that has stuff from after it either stuck or didn’t.

We could by fiat exclude some of the most frequent things, but that means arbitrary cutoffs.

future ideas

something something pointwise mutual information?

could look at self-variance and self-repetition : would need to re-run dialign with speaker/other for each permutation and average (these are metrics dialign returns)

clip similiarity on phrases versus random baseline or versus whole utterance

could look at timing of convention emergence as correlate of accuracy / reduction / by condition (like when do last round shared phrases emerge)

4p dialign play