It increases over repNum, which is … interesting?!
Above chance at everything! No longer hates the ice skater.
Compare the random forest with human results
## [[1]]
##
## [[2]]
## [1] "Correlation between random forest model and human 0.525"
Item level correlation between correct and wrong responses (model & per-human response)
Not sure how to look for correlated error patterns…
##
## Pearson's product-moment correlation
##
## data: for_corr$human_correct and for_corr$model_correct
## t = 14.998, df = 7288, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1506713 0.1952096
## sample estimates:
## cor
## 0.1730289
Want to know when model v tg-human is more accurate
So humans are better at first round, and models are sometimes better at last round – which is interesting
Lots of ways to carve this up since we have tangram, round, and conditions…
Model is consistently worse than in-game human, but pretty correlated?
## [[1]]
##
## [[2]]
part naming divergence (PND): “PND is computed identically to SND, but with the concatenation of all part names of an annotation as the input text”
Shape Naming Divergence (SND): “A tangram’s SND quantifies the variability among whole-shape annotations. SND is an operationalization of nameability,”
part segmentation agreement (PSA): “PSA quantifies the agreement between part segmentations as the maximum number of pieces that does not need to be”
Not sure this is at all meaningful, but tg-matcher accuracies and model accuracies show about the same correlation with this codeability measure ?
## [[1]]
##
## [[2]]