Reviewer 1 comment:

First, if this paper were written before the Piantadosi 2011 paper and the Mahowald 2013 papers, I could understand why the authors would feel that by controlling for frequency in experiment 9, they could get away from the Zipfian assumption that the main predictor of word length is frequency. By now the common assumption (e.g. Seyfarth 2014; Cognition) is that predictability, rather than frequency, should be the main determiner of length. For the first two series of experiments the authors may have felt that they could evade that issue by constructing novel objects, or objects for which no prior predictability of frequency accounts would apply, but that’s a very narrow view of predictability-based accounts. For instance, in the artificial object experiment series, if number of parts predicts complexity, it is not far-fetched to assume that subjects do have some predictability assessment for the novel objects, based on a naïve P(object) = [P(parts)], which would make objects with more parts less predictable, everything else being equal (and the authors seem to be aware of that on page 5, yet maintain the argument for complexity). That is, to rule out a predictability-based explanation, the authors need to contrast their findings with a model that can assign probabilities to first-seen items. If that’s what the authors mean by “complex”, then it should be spelled out. This is not only an issue for the artificial objects. For instance, body part names seem not to correspond to how difficult it is to understand their action, but rather to how likely we are to see them: ears, eyes and tongue are arguably more complex than intestine, muscle and cartilage. For existing words in Experiment 9 the authors should control for mean predictability rather than frequency (preferably for both), and then use whatever residual effect there is in Experiment 10, not bare complexity. Otherwise they do not rule out the possibility that frequency and predictability account for the correlation, at least in the lower pearson r languages. Without clearer controls for predictability and frequency, the experiments in this paper only show that subjects expect language to match their experience in matching longer words with less predictable and less frequent items, which is still interesting, but perhaps shouldn’t take 10 experiments to prove.



To address this, here we use the 2-gram surprisal measure calculated from the British National Corpus (BNC), and reported in the Piantadosi et al. (2011) paper. I’m not using the google books measure because I keep running in to an error when I try to process the bigrams.

In a model predicting word length (in phonemes) with complexity, log frequency, and surprisal, complexity and log frequency are reliable predictors of word length, but not surprisal.

I also added to the cross-linguistic figure the correlation between complexity and length partialing out surprisal. The reviewer suggests using the residual effect in Experiment 10, rather than partialing out surprisal, but this seems unncessarily complicated. Seems like the partial correlation is more consistent with the other analyses and is more straight-forward. What do you think?



Read in BNC data and complexity norms.

bnc.2gram = read.table('../data/corpus/English-KNN-H-2.txt', header = T) %>%
  mutate(log.bnc.frequency = log(context.count)) %>%
  top_n(25000, abs(log.bnc.frequency)) # following Piantadosi, resttrict to most frequent words

lf.data = read.csv("../data/corpus/english_complexity_norms.csv") %>%
  group_by(word) %>%
  select(-X, -workerid, -trial) %>%
  summarise_each(funs(mean)) %>%
  left_join(bnc.2gram, by="word")

Surprisals and frequencies for our 499 words are normally distributed

ggplot(lf.data, aes(x=surprisal)) + 
    geom_histogram(fill = "black", alpha = .6 , binwidth = 1,  origin = -0.5) +
  xlab("Surprisal") +
  ggtitle('BNC Surprisal') +
  themeML

ggplot(lf.data, aes(x=log.bnc.frequency)) + 
    geom_histogram(fill = "black", alpha = .6 , binwidth = 1,  origin = -0.5) +
  xlab("BNC log frequency") +
  ggtitle('BNC log frequency') +
  themeML

Surprisal and complexity are correlated with each other (r = .29), and both are correlated with length (surprisal: r = .42, complexity: .67).

Do Piantadosi analysis complexity words and all words (UNIGRAM surprisals).

Do Piantadosi analysis complexity words and all words (BIGRAM surprisals).

Do Piantadosi analysis complexity words and all words (TRIGRAM surprisals).

stats for piantadosi analysis

paired.r(abs(cor(bnc.2gram$len, bnc.2gram$log.bnc.frequency, method = "spearman")), 
         cor(bnc.2gram$len, bnc.2gram$surprisal, method = "spearman"),n=length(bnc.2gram$len))
## Call: paired.r(xy = abs(cor(bnc.2gram$len, bnc.2gram$log.bnc.frequency, 
##     method = "spearman")), xz = cor(bnc.2gram$len, bnc.2gram$surprisal, 
##     method = "spearman"), n = length(bnc.2gram$len))
## [1] "test of difference between two independent correlations"
## z = 3.6  With probability =  0
paired.r(abs(cor(bnc.3gram$len, bnc.3gram$log.bnc.frequency, method = "spearman")), 
         cor(bnc.3gram$len, bnc.3gram$surprisal, method = "spearman"),n=length(bnc.3gram$len))
## Call: paired.r(xy = abs(cor(bnc.3gram$len, bnc.3gram$log.bnc.frequency, 
##     method = "spearman")), xz = cor(bnc.3gram$len, bnc.3gram$surprisal, 
##     method = "spearman"), n = length(bnc.3gram$len))
## [1] "test of difference between two independent correlations"
## z = 4.19  With probability =  0

stats for surpisal and complexity analysis

pcor.test(bnc.2gram$len, bnc.2gram$log.bnc.frequency,bnc.2gram$surprisal,method = "spearman")
##       estimate   p.value  statistic     n gn   Method            Use
## 1 -0.005151654 0.4136878 -0.8174211 25179  1 Spearman Var-Cov matrix
tidy(lm(mrc.phon ~ complexity + surprisal + log.bnc.frequency, lf.data))
##                term    estimate  std.error  statistic      p.value
## 1       (Intercept) -0.02933587 1.63366388 -0.0179571 9.856816e-01
## 2        complexity  1.07923234 0.06705291 16.0952349 8.640533e-46
## 3         surprisal  0.32585515 0.24551279  1.3272431 1.851459e-01
## 4 log.bnc.frequency -0.04332642 0.10223195 -0.4238050 6.719239e-01

Now, let’s look cross-linguistically.

Read in xling data and merge with English complexity norms

Get correlations for length and complexity with for all 499 words with bootstrapped CIs

Partial correlations (with frequency and surprisal)

Get correlation for monomorphemic and open class subsets

prep for plotting

We counted the number of unicode characters for each translation. Variability in word length within languages was positively correlated with complexity ratings. Below the correlation coefficients are plotted for each language. Red bars indicate languages where the accuracy was checked by a native speaker and pink bars indicate unchecked languages. The dashed line indicates the grand mean correlation across languages. Full circles indicate the correlation between complexity and length, partialling out log spoken frequency in English. Empty circles show the correlation between complexity and length, partialling out surprisal. Triangles indicate the correlation between complexity and length for the subset of words that are monomorphemic in English. Squares indicate the correlation between complexity and length for the subset of open class words.